Skip to content

barth-lab/CAPSens_design

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CAPSens_design

This repository describes methods and provides examples for modeling receptor:peptide complexes and performing subsequent computational design and model refinement.

Read the paper: Jefferson RE et al, 2022

Requirements:

The commands in the demo are for use with the Bash command line interface and can be used with any Linux operating system (tested with Ubuntu 22.04). The repository can be downloaded directly from Github.

git clone https://github.com/barth-lab/CAPSens_design.git

Aliases:

export ROS=<path_to_rosetta>
export CAPSen_scripts=<path_to_scripts>

Many of the demo scripts work with the SLURM batch system (sbatch commands)for a high-performance computing cluster. You may need to modify them to work with your cluster. Allocated run times for production simulations are in the SBATCH header of the run scripts, but you may need to adjust based on your particular hardware.

Receptor:Peptide Complex Modeling

Modeling of receptor:peptide complexes follows 3 overarching steps:

  1. Receptor scaffold hybridization
  2. Flexible peptide docking and diversification
  3. Loop modeling and structure relaxation

How to set up each step will depend on your specific target receptor:peptide complex, but guidelines are discussed below. The demo provides inputs for the CXCR4:CXCL12 receptor:peptide pair used for the generation of CAPSens designs. See demo subdirectories for example inputs and specific instructions on how to setup and run simulations below. The scripts folder has useful tools for running the protocol. The CAPSen_models folder contains final models generated by the protocol and molecular dynamics simulations.

1. Receptor scaffold hybridization

Homologous template selection

Because active-state structures of membrane receptors are not always available, this method uses a homology modeling approach to generate possible scaffolds. We use HHpred to identify homologous template structures. Homologous templates should share at least 20% sequence identity. Keep in mind that alignments may need to be manually adjusted to fix local mismatches, especially around intra-receptor disulfide bonds. The native sequence can be threaded onto the homologous template using Step 1 of the RosettaCM protocol. This uses the grishin alignment format and Rosetta's partial_thread binary.

Hybridization

For hybridization, the structural elements to recombine should be selected with care. While a homologous template may be the best overall match to your target complex, they may be locally divergent segments (>10 residues) that are expected to make key contacts with the peptide ligand, where another template may have higher sequence identity, or where the native inactive structure may serve better to model the complex. The aim is to incorporate the maximal number of active-state features from the template while preventing significant de novo reconstruction of the transmembrane core region due to poor sequence-structure alignment. It is good practice to use all combinations of structural elements to cover as much conformational space as possible, as the structure of the receptor scaffold can have significant effects in subsequent modeling steps.

In the case of CXCR4, the active-state template US28 (PDB ID: 4XT1) was used with 29% sequence identity. In the absence of other homologous active-state structures, the inactive-state structure of CXCR4 (PDB ID: 4RWS) was used to model local regions that differed significantly between both templates: ECL2 (residues 87-101) and the extracellular head of transmembrane helix (TM) 2 (residues 174-192). ECL2 of native CXCR4 forms a long beta-hairpin structure, which is shorter and not well-resolved in US28. The tip of TM2 differs due to a proline kink in the native sequence, which is not present in US28.

The compose.py script assembles hybrid scaffolds from donor and host pdbs. The example takes the extracellular head and ECL2 of CXCR4 (PDB ID: 4RWS) and places them into the homology template based on US28 (PDB ID: 4XT1).

cd demo/1_receptor-peptide_modeling/1_scaffold_hybridization/
$CAPSen_scripts/compose.py -d input/donor.pdb -f input/composition_file input/host.pdb >CXCR4.TM2-ECL2_4xt1_template.pdb

Peptide threading

If the active-state template is complexed with a peptide, this can serve as a viable template for threading your peptide ligand of interest. Alternately, another complexed peptide template may serve better. Additionally, if a large enough pool of apo peptide structures are available, you may find a consensus peptide conformation that could be used as well. This threading is an initial conformation that will be allowed all possible degrees of freedom in the following peptide docking step.

In the case of CXCR4, the US28 active-state structure was complexed with the CX3CL3 chemokine. CXCL12 was threaded onto the complexed peptide structure with K1 of CXCL12 aligned to the H2 position of CX3CL1 to match the partial positive charge of the imidazole ring. Q1 of CX3CL1 is cyclized to form pyroglutamate to produce a neutral N-terminus. Alternately, K1 of CXCL12 was aligned to H3 of CX3CL1, but docking from this initial position yielded models with weak interface energies and few contacts to key binding residues.

2. Flexible peptide docking and diversification

The peptide is translated and rotated across the binding pocket to saturate all possible binding modes. From these initial positions, the peptide ligand is docked with all degrees of flexibility and receptor repacking allowed at the interface. Optionally, experimentally informed constraints can be applied in this step to favor putative interfacial contacts. The top-scoring decoys are filtered and diversified to select many different peptide poses by position, shape, and orientation. 200-220 unique peptide conformations are taken as inputs for the next step.

The constraint sets used for CXCR4:CXCL12 modeling are in demo/1_receptor-peptide_modeling/1_peptide_docking/input and can be added as a command line argument to the flexpepdock run script.

The pep_grid.py script generates inputs for flexible peptide docking by translating and rotating the peptide across the receptor binding pocket. You will probably want to adjust parameters for your particular system.

cd demo/1_receptor-peptide_modeling/2_peptide_docking/prep
$CAPSen_scripts/pep_grid.py CXCR4-CXCL12_TM2.ECL2_hybrid.pdb

Prepack the inputs before running flexible peptide docking.

for n in {0..116}; do
    printf -v sn "%03d" $n
    $ROS/source/bin/FlexPepDocking.linuxgccrelease -in:file:spanfile ../input/CXCR4_SDF1.span -database $ROS/database/ -s $sn.pdb -flexpep_prepack -ex1 -ex2aro -nstruct 1 -suffix "_pp" -no_nstruct_label > log_$sn 2> err_$sn &
done

Example pre-packed inputs are available in demo/1_receptor-peptide_modeling/2_peptide_docking/input/grid.

With the prepacked inputs, you are ready to start a production run of peptide docking. We recommend generating ~10,000 total decoys. The following generates 100 decoys per starting input peptide position to give 11,700 total decoys.

cd demo/1_receptor-peptide_modeling/2_peptide_docking
sbatch $CAPSen_scripts/flex_ar.cst.slurm <constraint set>

Alternately, you can use the flex_ar.slurm script for unconstrained docking simulations.

The pepstat.sh script will generate all the geometric statistics that are used for diversification of the peptide poses. Note that pepstat.sh uses geometry from the biophysics repo. The script calculates metrics from raw pdbs, so you will need a large temporary scratch space to extract pdbs from the silent output of flexpepdock.

cd output_flex.<constraint set>
cat out_{000..116}.silent >> out.silent
mkdir pdbs
cd pdbs
$ROS/source/bin/extract_pdbs.linuxgccrelease -database $ROS/database/ -in:file:silent ../out.silent
cd ..
$CAPSen_scripts/pepstat.sh pdbs > pepstat.sc

The diversify script will filter the resulting peptide poses by the combined interface and peptide scores from flexpepdock (literally rosetta score terms I_sc + pep_sc), taking the top 20 % and diversifying poses by position, rotation, and shape.

$CAPSen_scripts/diversify_cycle.py pepstat.sc > tags
mkdir ../../3_loop_relax/input/diverse
i=0; for t in $(grep -v "^#" tags); do
    cp pdbs/$t.pdb ../../3_loop_relax/input/diverse/diverse_$(printf "%03d" $i).pdb
    i=$((i+1))
done

3. Loop modeling and structure relaxation

Missing receptor loop residues are rebuilt de novo around each diversified peptide pose and the receptor:peptide complexes are relaxed to simulate induced fit effects. Inter-TM constraints derived from sequence conservation are applied to restrain receptor structure. Any additional experimentally informed constraints used in peptide docking (step 2) can be applied again to preserve putative interfacial contacts during the full structure relax. To apply additional constraints append to your constraint files and adjust the flags file as needed. We recommend generating ~20,000 decoys for a production run of loop modeling and full complex relax. The following will generate 100 decoys for 211 diverse inputs to give 21,100 total decoys. The script uses CCD loop closure, but other loop modeling methods may be available to you.

cd demo/1_receptor-peptide_modeling/3_loop_relax
sbatch $CAPSen_scripts/loop_ar.flags.slurm

The 10 % top-scoring decoys (~2000 structures) are clustered by structural similarity. Representative models from clusters of sufficient size (>30 members for modeling of CXCR4:CXCL12) should be analyzed for total score, buried surface area, interface score, peptide score, and key contacts with residues known to be important for receptor activation. You can rescore complexes for interface and peptide score by passing the flag -flexpep_score_only to flexpepdock. Clusters were filtered for experimentallyinterfactial constraint satisfaction (<10 REU constraint violation penalty). Final clusters of models were selected by the combined interface + peptide score. For the CXCR4:CXCL12 complex models, because the 2 N-terminal residues of CXCL12 have been shown to be essential for activity, clusters that did not display any contacts between K1 or P2 to key receptor residues known to be important for activity were not considered.

Computational Design

Computational design by conformational selection

Designable sites are identified on both the peptide and receptor sides of the different binding interfaces featured in the initial set of models. Novel combination of amino-acids and conformations are searched concurrently for improving receptor:peptide association and signaling response. The in silico mutagenesis allows all possible residue substitutions at designable sites. All residues with heteroatoms within 5.0 A of any designable residue are repacked and their backbone and side-chain minimized. Typically 200 independent trajectories are sufficient to have convergence in the top 10 % of models. Top-scoring combinations of mutations should be selected for interface energy improvement and active-state stabilization.

The demo inputs allow design at three positions in CXCR4. To run the demo simulation:

cd demo/2_design
sbatch run.MP.design_mm.array EnsembleState1_Cdyn-V3Y_pikaa.resfile

Computational design maintaining conformational dynamism

Here, binding contact networks are selected that enhance receptor:peptide interactions in several CXCR4:CXCL12 models, thereby favoring multiple conformations of the complex and maintaining high levels of conformational entropy. This is achieved by designing a library of point mutations at receptor positions in contact with the peptide that are compatible with multiple conformations of the complex. Specifically, we computationally build a small library of point mutants of receptor residues that make contact with the peptide ligand (any heteroatom within 4 A of all selected peptide models) and whose substitutions do not have significant clashes (>5 REU) in a fraction of the selected ensemble of models. Due the dynamic nature of receptor:peptide interactions, these point mutants may have significant effects on activity that are difficult to predict from single-state design.

Model Refinement

The models that best reflect experimentally validated shifts in activity are re-docked in mutational contexts to explore receptor:peptide interactions that may not have been fully sampled in the modeling of the native WT complex. As potency is largely connected to affinity between the receptor and peptide ligand, the models that best supported changes in potency by predicted interface score can be selected as starting inputs for refinement.

In the case of CXCR4:CXCL12, a single constraint was used to anchor a key electrostatic interaction in the depth of the binding pocket. The demo redocks the V3Y variant peptide onto Cdyn. To fully explore the possible landscape in the mutant context, one should prep inputs with pep_grid.py and prepack before a production run.

cd demo/3_model_refinement/prep
$CAPSen_scripts/pep_grid.py CXCR4-SDF1.4xt1_TM2-ECL2.4rws_L15I-H87N-S152A-V280Y.pdb
for n in {0..116}; do
    printf -v sn "%03d" $n
    $ROS/source/bin/FlexPepDocking.linuxgccrelease -in:file:spanfile ../input/CXCR4_SDF1.span -database $ROS/database/ -s $sn.pdb -flexpep_prepack -ex1 -ex2aro -nstruct 1 -suffix "_pp" -no_nstruct_label > log_$sn 2> err_$sn &
done
mkdir ../input/grid
cp *_pp.pdb ../input/grid
cd ..
sbatch refine_ar.cst.slurm

Output decoys can be clustered and selected as before, before being repacked into other mutational contexts for which there is experimental validation to again screen models that best support the observed signaling effects.