<!--NOTEBOOK_HEADER-->
*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta);
content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*

<!--NAVIGATION-->
< [Side Chain Conformations and Dunbrack Energies](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.01-Side-Chain-Conformations-and-Dunbrack-Energies.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Protein Design with a Resfile and FastRelax](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.03-Design-with-a-resfile-and-relax.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.02-Packing-design-and-regional-relax.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>

# RosettaAntibodyDesign
## Notes
This tutorial will walk you through how to use `RosettaAntibodyDesign` in PyRosetta.  You should also go through the parellel distribution workshop as you will most likely need to create many decoys for some of these design tasks. Note that we are using the XML interface to the code here for simplicity (and because I had a C++ workshop I am converting - truth be told).  The code-level interface is as robust as the XML - but will require more knowledge use.  You are welcome to play around with it - all functions have descriptions and all options are possible to change through code.

Grab a coffee, take a breath, and lets learn how to design some antibodies!

## Citation


[Rosetta Antibody Design (RAbD): A General Framework for Computational Antibody Design, PLOS Computational Biology, 4/27/2018](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006112)

Jared Adolf-Bryfogle, Oleks Kalyuzhniy, Michael Kubitz, Brian D. Weitzner, Xiaozhen Hu, Yumiko Adachi, William R. Schief, Roland L. Dunbrack Jr.

# Overview

__RosettaAntibodyDesign (RAbD)__ is a generalized framework for the design of antibodies, in which a user can easily tailor the run to their project needs.  __The algorithm is meant to sample the diverse sequence, structure, and binding space of an antibody-antigen complex.__  An app is available, and all components can be used within RosettaScripts for easy scripting of antibody design and incorporation into other Rosetta protocols.

The framework is based on rigorous bioinformatic analysis and rooted very much on our [recent clustering](https://www.ncbi.nlm.nih.gov/pubmed/21035459) of antibody CDR regions.  It uses the __North/Dunbrack CDR definition__ as outlined in the North/Dunbrack clustering paper. A new clustering paper will be out in the next year, and this new analysis will be incorporated into RAbD. 

The supplemental methods section of the published paper has all details of the RosettaAntibodyDesign method.  This manual serves to get you started running RAbD in typical use fashions. 


# Algorithm
  
Broadly, the RAbD protocol consists of alternating outer and inner Monte Carlo cycles. Each outer cycle consists of randomly choosing a CDR (L1, L2, etc.) from those CDRs set to design, randomly choosing a cluster and then a structure from that cluster from the database according to the input instructions, and optionally grafting that CDR's structure onto the antibody framework in place of the existing CDR (__GraftDesign__). The program then performs N rounds of the inner cycle, consisting of sequence design (__SeqDesign__) using cluster-based sequence profiles and structural constraints, energy minimization, and optional docking. Each inner cycle structurally optimizes the backbone and repacks side chains of the CDR chosen in the outer cycle as well as optional neighbors in order to optimize interactions of the CDR with the antigen and other CDRs. 

__Backbone dihedral angle (CircularHarmonic) constraints__ derived from the cluster data are applied to each CDR to limit deleterious structural perturbations. Amino acid changes are typically sampled from __profiles derived for each CDR cluster in PyIgClassify__. Conservative amino acid substitutions (according to the BLOSUM62 substitution matrix) may be performed when too few sequences are available to produce a profile (e.g., for H3). After each inner cycle is completed, the new sequence and structure are accepted according to the Metropolis Monte Carlo criterion. After N rounds within the inner cycle, the program returns to the outer cycle, at which point the energy of the resulting design is compared to the previous design in the outer cycle. The new design is accepted or rejected according to the Monte Carlo criterion.

If optimizing the antibody-antigen orientation during the design (dock), SiteConstraints are automatically used to keep the CDRs (paratope) facing the antigen surface.  These are termed __ParatopeSiteConstraints__.   Optionally, one can enable constraints that keep the paratope of the antibody around a target epitope (antigen binding site).  These are called __ParatopeEpitopeSiteConstraints__ as the constraints are between the paratope and the epitope. The epitope is automatically determined as the interface residues around the paratope on input into the program, however, any residue(s) can be set as the epitope to limit unwanted movement and sampling of the antibody.  See the examples and options below. 

More detail on the algorithm can be found in the published paper. 


# General Setup and Inputs

1.  Antibody Design Database

	This app requires the Rosetta Antibody Design Database.  A database of antibodies from the original North Clustering paper is included in Rosetta and is used as the default .  An updated database (which is currently updated bi-yearly) can be downloaded here: <http://dunbrack2.fccc.edu/PyIgClassify/>.  

	For C++, It should be placed in `Rosetta/main/database/sampling/antibodies/`. For PyRosetta, use the cmd-line option `antibody_database` and set it to the full path of the downloaded database within the `init()` function as you have done in the past. It is recommended to use this up-to-date database for production runs. For this tutorial, we will use the database within Rosetta. 


2.  Starting Structure

	The protocol begins with the three-dimensional structure of an antibody-antigen complex. Designs should start with an antibody bound to a target antigen (however optimizing just the antibody without the complex is also possible).  Camelid antibodies are fully supported.  This structure may be an experimental structure of an existing antibody in complex with its antigen, a predicted structure of an existing antibody docked computationally to its antigen, or even the best scoring result of low-resolution docking a large number of unrelated antibodies to a desired epitope on the structure of a target antigen as a prelude to de novo design.

	The program CAN computationally design an antibody to anywhere on the target protein, but it is recommended to place the antibody at the target epitope.  It is beyond the scope of this program to determine potential epitopes for binding, however servers and programs exist to predict these. Automatic SiteConstraints can be used to further limit the design to target regions.


3.  Model Numbering and Light Chain identification

	The input PDB file must be renumbered to the AHo Scheme and the light chain gene must be identified.  This can be done through the [PyIgClassify Server](http://dunbrack2.fccc.edu/pyigclassify/). 

	On input into the program, Rosetta assigns our CDR clusters using the same methodology as PyIgClassify. The RosettaAntibodyDesign protocol is then driven by a set of command-line options and a set of design instructions provided as an input file that controls which CDR(s) are designed and how. Details and example command lines and instruction files are provided below.

	The gene of the light chain should always be set on the command-line using the option `-light_chain`, these are either lamda or kappa.  PyIgClassify will identify the gene of the light chain.

	For this tutorial, the starting antibody is renumbered for you. 

4.  Notes for Tutorial Shortening

	Always set the option, `-outer_cycle_rounds` to 5 in order to run these examples quickly.  The default is 25.  We include this in our common options file that is read in by Rosetta at the start. We will only be outputting a single structure, but typical use of the protocol is with default settings of `-outer_cycle_rounds` and an `nstruct` of at least 1000, with 5000-10000 recommended for jobs that are doing a lot of grafting.  For De-novo design runs, one would want to go even higher. Note that the Docking stage increases runtime significantly as well. 

	The total number of rounds is outer_cycle_rounds * nstruct.  

	
5.  General Notes

			setenv PATH ${PATH}:${HOME}/rosetta_workshop/rosetta/main/source/tools

	We will be using JSON output of the scorefile, as this is much easier to work with in python and pandas.
	We use the option `-scorefile_format json`

	All of our common options for the tutorial are in the common file that you will copy to your working directory. 
	Rosetta/PyRosetta will look for this file in your working directory or your home folder in the directory `$HOME/.rosetta/flags`.
	See this page for more info on using rosetta with custom config files: <https://www.rosettacommons.org/docs/latest/rosetta_basics/running-rosetta-with-options#common-options-and-default-user-configuration>

	All tutorials have generated output in `outputs/rabd` and their approximate time to finish on a single (core i7) processor.



In [None]:
# Notebook setup
!pip install pyrosettacolabsetup
import pyrosettacolabsetup
pyrosettacolabsetup.setup()
print ("Notebook is set for PyRosetta use in Colab.  Have fun!")

**Make sure you are in the directory with the pdb files:**

`cd google_drive/My\ Drive/student-notebooks/`

In [29]:
from typing import *
import pandas
from pathlib import Path
import json
import re

#Functions we will be using. I like to collect any extra functions at the top of my notebook.
def load_json_scorefile(file_path: Path, sort_by: str="dG_separated") -> pandas.DataFrame:
        """
        Read scorefile lines as a dataframe, sorted by total_score with Nan's correctly replaced.
        """
        
        local_lines = open(file_path, 'r').readlines()
        decoys=[]
        for line in local_lines:
                o = json.loads(line.replace("nan", "NaN"))
                # print o[self.decoy_field_name]
                # print repr(o)
                decoys.append(o)
        local_df = pandas.DataFrame.from_dict(decoys)
        local_df = local_df.infer_objects()
        # df.to_csv("debugging.csv", sep=",")

        local_df = local_df.sort_values(sort_by, ascending=True)
        
        return local_df

def drop_cluster_columns(local_df: pandas.DataFrame, keep_cdrs: List[str]=None) -> pandas.DataFrame:
        """
        Drop cluster columns that RAbD outputs to make it easier to work with the dataframe.
        """
        to_drop = []
        for column in local_df.columns:
            if re.search("cdr_cluster", column):
                skip=False
                if (keep_cdrs):
                    for cdr in keep_cdrs:
                        if re.search(cdr, column):
                            skip=True
                            break
                if not skip:
                    to_drop.append(column)
        return local_df.drop(columns=to_drop)

## Imports

In [6]:
#Python
from pyrosetta import *
from pyrosetta.rosetta import *
from pyrosetta.teaching import *

#Core Includes
from rosetta.protocols.rosetta_scripts import *
from rosetta.protocols.antibody import *
from rosetta.protocols.antibody.design import *
from rosetta.utility import *

## Intitlialization 
Since we are sharing the working directory with all other notebooks, instead of using the common-configuration we spoke about in the introduction, we will be using the flags file located in the inputs directory.

In [4]:
init('-no_fconfig @inputs/rabd/common')

PyRosetta-4 2019 [Rosetta PyRosetta4.Release.python36.mac 2019.33+release.1e60c63beb532fd475f0f704d68d462b8af2a977 2019-08-09T15:19:57] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
[0mcore.init: [0mRosetta version: PyRosetta4.Release.python36.mac r230 2019.33+release.1e60c63beb5 1e60c63beb532fd475f0f704d68d462b8af2a977 http://www.pyrosetta.org 2019-08-09T15:19:57
[0mcore.init: [0mcommand: PyRosetta -no_fconfig @inputs/rabd/common -database /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyrosetta-2019.33+release.1e60c63beb5-py3.6-macosx-10.6-intel.egg/pyrosetta/database
[0mbasic.random.init_random_generator: [0m'RNG device' seed mode, using '/dev/urandom', seed=742267458 seed_offset=0 real_seed=742267458
[0mbasic.random.init_random_generator: [0mRandomGenerator:init: Normal mode, seed=742267458 RG_type=mt19937


In [5]:
#Import a pose
pose = pose_from_pdb("inputs/rabd/my_ab.pdb")
original_pose = pose.clone()

[0mcore.chemical.GlobalResidueTypeSet: [0mFinished initializing fa_standard residue type set.  Created 980 residue types
[0mcore.chemical.GlobalResidueTypeSet: [0mTotal time to initialize 0.947685 seconds.
[0mcore.import_pose.import_pose: [0mFile 'inputs/rabd/my_ab.pdb' automatically determined to be of type PDB
[0mcore.io.pdb.pdb_reader: [0mParsing 993 .pdb records with unknown format to search for Rosetta-specific comments.
[0mcore.conformation.Conformation: [0mFound disulfide between residues 771 845
[0mcore.conformation.Conformation: [0mcurrent variant for 771 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 845 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 771 CYD
[0mcore.conformation.Conformation: [0mcurrent variant for 845 CYD
[0mcore.conformation.Conformation: [0mFound disulfide between residues 891 956
[0mcore.conformation.Conformation: [0mcurrent variant for 891 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 

# Tutorial


## Tutorial A: General Design
	
In many of these examples, we will use the xml interface to PyRosetta for simplicity with the AntibodyDesignMover - which is  the actual C++ application as a mover. <https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/Movers/movers_pages/antibodies/AntibodyDesignMover>

Lets copy the files we need first:

		cp ../inputs/rabd/color_cdrs.pml .
		cp ../inputs/rabd/rabd.xml .

You are starting design on a new antibody that is not bound to the antigen in the crystal. This is difficult and risky, but we review how one could go about this anyway.  We start by selecting a framework.  Here, we use the trastuzumab framework as it expresses well, is thermodynamically stable with a Tm of 69.5 degrees, and has been shown repeatedly that it can tolerate CDRs of different sequence and structure.  Note that the energy of the complex is high as we are starting from a manual placement of the antibody to antigen.  If we relax the structure too much, we will fall into an energy well that is hard to escape without significant sampling. 

We are using an arbitrary protein at an arbitrary site for design.  The PDB of our target is 1qaw.  1qaw is an oligomer of the TRP RNA-Binding Attenuation Protein from Bacillus Stearothermophilus.  It is usually a monomer/dimer, but at its multimeric interface is a tryptophan residue by itself.  

It's a beautiful protein, with a cool mechanism.  We will attempt to build an antibody to bind to two subunits to stabilize the dimeric state of the complex in the absence of TRP. Note that denovo design currently takes a large amount of processing power. Each tutorial below is more complex than the one before it.  The examples we have for this tutorial are short runs to show HOW it can be done, but more outer_cycle_rounds and nstruct would produce far better models than the ones you will see here - as we will need to sample the relative orientation of the antibody-antigen complex through docking, the CDR clusters and lengths, the internal backbone degrees of freedom of the CDRs, as well as the sequence of the CDRs and possibly the framework.  As you can tell, just the sampling problem alone is difficult. However, this will give you a basis for using RAbD on your own. 


## A1. Sequence Design
	
Using the application is as simple as setting the `-seq_design_cdrs` option.
This simply designs the CDRs of the heavy chain using cdr profiles if they exist for those clusters during flexible-backbone design.  If the clusters do not exist (as is the case for H3 at the moment), we use conservative design by default.  Note that InterfaceAnalyzer is run on each output decoy in the RAbD mover. Note that you can also set `light_chain` on the command line if you are only working on a single PDB through the rosetta run. 


	<AntibodyDesignMover name="RAbD" seq_design_cdrs="L1,L3" light_chain="kappa"/>
            
This will take a about a minute (50 seconds on my laptop).  Output structures and scores are in `outputs/rabd` if you wish to copy them over - these include 4 more structures.

In [5]:
rabd = XmlObjects.static_get_mover('<AntibodyDesignMover name="RAbD" seq_design_cdrs="L1,L3" light_chain="kappa"/>')
rabd.apply(pose)

[0mprotocols.rosetta_scripts.RosettaScriptsParser: [0mGenerating XML Schema for rosetta_scripts...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: [0mInitializing schema validator...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: [0mValidating input script...
[0mprotocols.rosetta_scripts.RosettaScriptsParser: [0m...done
[0mprotocols.rosetta_scripts.RosettaScriptsParser: [0mParsed script:
<ROSETTASCRIPTS>
	<MOVERS>
		<AntibodyDesignMover light_chain="kappa" name="RAbD" seq_design_cdrs="L1,L3"/>
	</MOVERS>
	<PROTOCOLS/>
</ROSETTASCRIPTS>
[0mcore.scoring.ScoreFunctionFactory: [0mSCOREFUNCTION: [32mref2015[0m
[0mcore.scoring.etable: [0mStarting energy table calculation
[0mcore.scoring.etable: [0msmooth_etable: changing atr/rep split to bottom of energy well
[0mcore.scoring.etable: [0msmooth_etable: spline smoothing lj etables (maxdis = 6

Now, for the sake of learning how to do this - how would we do this in code instead of the XML - we just need to use setters. 

In [8]:
pose = original_pose.clone()
ab_info = AntibodyInfo(pose) #We don't need to supply scheme and definition since we do so in the flags file.
rabd2 = AntibodyDesignMover(ab_info)

cdrs = vector1_protocols_antibody_CDRNameEnum()
cdrs.append(l1)
cdrs.append(l3)

rabd2.set_seq_design_cdrs(cdrs)
rabd2.set_light_chain("kappa")
rabd2.apply(pose)

[0mbasic.io.database: [0mDatabase file opened: sampling/antibodies/cluster_center_dihedrals.txt
[0mprotocols.antibody.AntibodyNumberingParser: [0mAntibody numbering scheme definitions read successfully
[0mprotocols.antibody.AntibodyNumberingParser: [0mAntibody CDR definition read successfully
[0mantibody.AntibodyInfo: [0mSuccessfully finished the CDR definition
[0mantibody.AntibodyInfo: [0mAC Detecting Regular CDR H3 Stem Type
[0mantibody.AntibodyInfo: [0mSRWGGDGFYAMDYW
[0mantibody.AntibodyInfo: [0mAC Finished Detecting Regular CDR H3 Stem Type: KINKED
[0mantibody.AntibodyInfo: [0mAC Finished Detecting Regular CDR H3 Stem Type: Kink: 1 Extended: 0
[0mantibody.AntibodyInfo: [0mSetting up CDR Cluster for H1
[0mprotocols.antibody.cluster.CDRClusterMatcher: [0mLength: 13 Omega: TTTTTTTTTTTTT
[0mantibody.AntibodyInfo: [0mSetting up CDR Cluster for H2
[0mprotocols.antibody.cluster.CDRClusterMatcher: [0mLength: 10 Omega: TTTTTTTTTT
[0mantibody.AntibodyInfo: [0mSettin

Score the input pose using the InterfaceAnalayzerMover

In [12]:
from rosetta.protocols.analysis import InterfaceAnalyzerMover

iam = InterfaceAnalyzerMover("LH_ABCDEFGIJKZ")
iam.set_pack_separated(True)
iam.apply(pose)
iam.apply(original_pose)

dg_term = "dG_separated"
print("dG Diff:", pose.scores[dg_term] - original_pose[dg_term])


[0mprotocols.analysis.InterfaceAnalyzerMover: [0mUsing explicit constructor
[0mprotocols.analysis.InterfaceAnalyzerMover: [0mUsing interface constructor
[0mprotocols.evaluation.ChiWellRmsdEvaluatorCreator: [0mEvaluation Creator active ...
[0mprotocols.analysis.InterfaceAnalyzerMover: [0mInterface set residues total: 100
[0mprotocols.analysis.InterfaceAnalyzerMover: [0mNULL scorefunction. Initialize from cmd line.
[0mcore.scoring.ScoreFunctionFactory: [0mSCOREFUNCTION: [32mref2015[0m
[0mcore.scoring.etable: [0mStarting energy table calculation
[0mcore.scoring.etable: [0msmooth_etable: changing atr/rep split to bottom of energy well
[0mcore.scoring.etable: [0msmooth_etable: spline smoothing lj etables (maxdis = 6)
[0mcore.scoring.etable: [0msmooth_etable: spline smoothing solvation etables (max_dis = 6)
[0mcore.scoring.etable: [0mFinished calculating energy tables.
[0mbasic.io.database: [0mDatabase file opened: scoring/score_functions/hbonds/ref2015_params/HBPol

Has the energy gone down after our sequence design?  The `dG_separated` is calculated by scoring the complex, separating the antigen from the antibody, repacking side-chains at the interface, and then taking the difference in score - i.e. the dG. 
    
Lets take a look at scores from a previous run of 5 antibodies.  The scorefiles are in json format, so it will be easy to turn them into pandas Dataframes and do some cool stuff.  We'll do this often as the runtimes increase for our protocol - but all the scores in them can be accessed using the pose.scores attribute (which is PyRosetta-specific functionality.)
	
Are any of these better than our input pose?

In [30]:
df = load_json_scorefile("expected_outputs/rabd/tutA1_score.sc")
df = drop_cluster_columns(df, keep_cdrs=["L1", "L3"])
df

Unnamed: 0,decoy,atom_pair_constraint,cdr_cluster_DIS_L1,cdr_cluster_DIS_L3,cdr_cluster_LEN_L1,cdr_cluster_LEN_L3,complex_normalized,dG_cross,dG_cross/dSASAx100,dG_separated,...,ref,sc_value,side1_normalized,side1_score,side2_normalized,side2_score,total_score,yhh_planarity,cdr_cluster_ID_L1,cdr_cluster_ID_L3
0,tutA1_my_ab_0001,0.0,10.244154,14.169669,11.0,9.0,2.494409,0.0,0.0,1784.54248,...,374.81054,0.491761,14.690318,837.348145,20.825792,874.683289,2432.049006,0.134353,L1-11-1,L3-9-cis7-1
3,tutA1_my_ab_0004,0.0,9.604385,10.078285,11.0,9.0,2.459319,0.0,0.0,1820.213013,...,374.09441,0.596517,14.805477,858.717651,20.48403,860.329285,2397.836108,0.420196,L1-11-1,L3-9-cis7-1
1,tutA1_my_ab_0002,0.0,9.939879,14.834209,11.0,9.0,2.494796,0.0,0.0,1849.303833,...,372.58994,0.468819,17.130129,907.89679,20.831068,874.904846,2432.425954,0.061565,L1-11-1,L3-9-cis7-1
4,tutA1_my_ab_0005,0.0,13.124744,14.362237,11.0,9.0,2.667474,0.0,0.0,1934.344604,...,374.83686,0.415013,17.054668,920.952087,22.980696,965.189209,2600.787151,0.044588,L1-11-1,L3-9-cis7-1
2,tutA1_my_ab_0003,0.0,9.693962,12.150679,11.0,9.0,3.530113,0.0,0.0,2854.750488,...,376.96998,0.464472,24.664927,1381.235962,33.255428,1396.727905,3441.859648,0.075642,L1-11-1,L3-9-cis7-1


### A2. Graft Design

Now we will be enabling graft design AND sequence design on L1 and L3 loops.  With an nstruct (n decoys) of 5, we are doing 25 design trials total - IE 25 actual grafts. 


    <AntibodyDesignMover name="RAbD" seq_design_cdrs="L1,L3" graft_design_cdrs="L1,L3">


This will take a about 2-3 times as long as sequence design, as grafting a non-breaking loop takes time.  This was 738 seconds on my laptop to generate 5. Here, you will generate 1 at about 150 seconds  Ouptut structures and scores are in ../expected_outputs/rabd.

	



Typically, we require a much higher `-outer_cycle_rounds` and number of decoys to see anything significant.  Did this improve energies in your single antibody?  How about our pre-generated ones? Load and take a look at the scorefile as a pandas DataFrame as we did above (`expected_outputs/rabd/tutA2_score.sc`). 


In [32]:
### BEGIN SOLUTION

df_a2 = load_json_scorefile("expected_outputs/rabd/tutA2_score.sc")
df_a2 = drop_cluster_columns(df_a2, keep_cdrs=["L1", "L3"])
df_a2

### END SOLUTION

Unnamed: 0,decoy,atom_pair_constraint,cdr_cluster_DIS_L1,cdr_cluster_DIS_L3,cdr_cluster_LEN_L1,cdr_cluster_LEN_L3,complex_normalized,dG_cross,dG_cross/dSASAx100,dG_separated,...,ref,sc_value,side1_normalized,side1_score,side2_normalized,side2_score,total_score,yhh_planarity,cdr_cluster_ID_L1,cdr_cluster_ID_L3
4,tutA2_my_ab_0005,0.0,11.192533,31.677544,11.0,7.0,2.450019,0.0,0.0,1751.548462,...,364.65868,0.56496,15.000569,825.031311,21.47385,837.480164,2383.868655,0.107882,L1-11-1,L3-7-1
2,tutA2_my_ab_0003,0.0,20.217281,24.178181,15.0,11.0,2.760204,0.0,0.0,1843.231445,...,375.17336,0.537087,13.518681,851.67688,21.286449,979.176697,2707.75979,0.221997,L1-15-1,L3-11-1
0,tutA2_my_ab_0001,0.0,8.599037,27.562218,11.0,10.0,3.348572,0.0,0.0,2484.706299,...,374.05882,0.486206,22.62126,1198.926758,29.544832,1270.427734,3268.206329,0.361406,L1-11-1,"L3-10-cis7,8-1"
1,tutA2_my_ab_0002,0.0,29.971132,45.571491,17.0,9.0,3.352178,0.0,0.0,2587.653564,...,373.00912,0.620726,18.605259,1265.157593,28.719591,1292.381592,3288.486156,0.126786,L1-17-1,L3-9-1
3,tutA2_my_ab_0004,0.0,12.153269,17.002886,10.0,8.0,5.869139,0.0,0.0,5082.230469,...,376.47676,0.571551,42.399551,2501.573486,62.364841,2494.59375,5710.672501,0.100791,L1-10-2,L3-8-1


Lets merge these dataframes, sort by dG_separated, and see if any of our graft-design models did better.

In [33]:
df_tut_a12 = pandas.concat([df, df_a2], ignore_index=True).sort_values("dG_separated", ascending=True)
df_tut_a12

Unnamed: 0,decoy,atom_pair_constraint,cdr_cluster_DIS_L1,cdr_cluster_DIS_L3,cdr_cluster_LEN_L1,cdr_cluster_LEN_L3,complex_normalized,dG_cross,dG_cross/dSASAx100,dG_separated,...,ref,sc_value,side1_normalized,side1_score,side2_normalized,side2_score,total_score,yhh_planarity,cdr_cluster_ID_L1,cdr_cluster_ID_L3
5,tutA2_my_ab_0005,0.0,11.192533,31.677544,11.0,7.0,2.450019,0.0,0.0,1751.548462,...,364.65868,0.56496,15.000569,825.031311,21.47385,837.480164,2383.868655,0.107882,L1-11-1,L3-7-1
0,tutA1_my_ab_0001,0.0,10.244154,14.169669,11.0,9.0,2.494409,0.0,0.0,1784.54248,...,374.81054,0.491761,14.690318,837.348145,20.825792,874.683289,2432.049006,0.134353,L1-11-1,L3-9-cis7-1
1,tutA1_my_ab_0004,0.0,9.604385,10.078285,11.0,9.0,2.459319,0.0,0.0,1820.213013,...,374.09441,0.596517,14.805477,858.717651,20.48403,860.329285,2397.836108,0.420196,L1-11-1,L3-9-cis7-1
6,tutA2_my_ab_0003,0.0,20.217281,24.178181,15.0,11.0,2.760204,0.0,0.0,1843.231445,...,375.17336,0.537087,13.518681,851.67688,21.286449,979.176697,2707.75979,0.221997,L1-15-1,L3-11-1
2,tutA1_my_ab_0002,0.0,9.939879,14.834209,11.0,9.0,2.494796,0.0,0.0,1849.303833,...,372.58994,0.468819,17.130129,907.89679,20.831068,874.904846,2432.425954,0.061565,L1-11-1,L3-9-cis7-1
3,tutA1_my_ab_0005,0.0,13.124744,14.362237,11.0,9.0,2.667474,0.0,0.0,1934.344604,...,374.83686,0.415013,17.054668,920.952087,22.980696,965.189209,2600.787151,0.044588,L1-11-1,L3-9-cis7-1
7,tutA2_my_ab_0001,0.0,8.599037,27.562218,11.0,10.0,3.348572,0.0,0.0,2484.706299,...,374.05882,0.486206,22.62126,1198.926758,29.544832,1270.427734,3268.206329,0.361406,L1-11-1,"L3-10-cis7,8-1"
8,tutA2_my_ab_0002,0.0,29.971132,45.571491,17.0,9.0,3.352178,0.0,0.0,2587.653564,...,373.00912,0.620726,18.605259,1265.157593,28.719591,1292.381592,3288.486156,0.126786,L1-17-1,L3-9-1
4,tutA1_my_ab_0003,0.0,9.693962,12.150679,11.0,9.0,3.530113,0.0,0.0,2854.750488,...,376.96998,0.464472,24.664927,1381.235962,33.255428,1396.727905,3441.859648,0.075642,L1-11-1,L3-9-cis7-1
9,tutA2_my_ab_0004,0.0,12.153269,17.002886,10.0,8.0,5.869139,0.0,0.0,5082.230469,...,376.47676,0.571551,42.399551,2501.573486,62.364841,2494.59375,5710.672501,0.100791,L1-10-2,L3-8-1


Take a look at the lowest (dG) scoring pose in pymol - do you see any difference in L1 and L3 loops there?  Do they make better contact than what we had before?

	Lets take a look in pymol.   

			pymol inputs/rabd/my_ab.pdb inputs/rabd/tutA2_* 
			@color_cdrs.pml
			center full_epitope

 How different are the L1 and L3 loops? Have any changed length?

 Lets take a look at the clusters in our dataframe.  Have they changed from the native?

In [None]:
print("L1", original_pose.scores["cdr_cluster_ID_L1"])
print("L3", original_pose.scores["cdr_cluster_ID_L3"])

### A3.  Basic De-novo run

Here, we want to do a denovo-run (without docking), starting with random CDRs grafted in - instead of whatever we have in the antibody to start with (only for the CDRs that are actually undergoing graft-design).  This is useful, as we start the design with very high energy and work our way down.  Note that since this is an entirely new interface for our model protein, this interface is already at a very high energy - and so this is less needed, but it should be noted how to do this. (139 seconds on my laptop). Do this below as you have done in other tutorials - either through code or XML.


	<AntibodyDesignMover name="RAbD" graft_design_cdrs="L1,L3" seq_design_cdrs="L1,L3" 
			                                                                  random_start="1"/>  

In [None]:
### BEGIN SOLUTION
pose = original_pose.clone()
rabd = XmlObjects.static_get_mover('<AntibodyDesignMover name="RAbD" graft_design_cdrs="L1,L3" seq_design_cdrs="L1,L3" random_start="1"/>')
rabd.apply(pose)

## OR (REUSE code from above)
pose = original_pose.clone()
rabd2.set_seq_design_cdrs(cdrs)
rabd2.set_graft_design_cdrs(cdrs)
rabd2.set_random_start(True)
rabd2.set_light_chain("kappa")
rabd2.apply(pose)

### END SOLUTION

Would starting from a random CDR help anywhere?  Perhaps if you want an entirely new cluster or length to break a patent or remove some off target effects?  We will use it below to start de novo design with docking.

### 4. RosettaScripts RAbD Framework Components

This tutorial will give you some exprience with your own XML antibody design protocol using the RosettaAntibdyDesign components.  We will take the light chain CDRs from a malaria antibody and graft them into our antibody.  In the tutorial we are interested in stabilizing the grafted CDRs in relation to the whole antibody, instead of interface design to an antigen.  

We will graft the CDRs in, minimize the structure with CDR dihedral constraints to not purturb the CDRs too much, and then design the framework around the CDRs while designing the CDRs and neighbors.  The result should be mutations that better accomodate our new CDRs.  This can be useful for humanizing potential antibodies or framework switching, where we want the binding properties of certain CDRs, but the stability or immunological profile of a different framework.  
	

#### 1. Copy the Files

        cp ../input_files/ab_design_components.xml .
		cp ../input_files/malaria_cdrs.pdb .
	
		
Take a look at the xml.  We are using the _AntibodyCDRGrafter_ to do the grafting of our CDRs.  We then add constraints using _CDRDihderalConstraintMover_s for each CDR.  Next, we do a round of pack/min/pack using the _RestrictToCDRsAndNeighborsOperation_ and the CDRResidueSelector.  This task operation controls what we pack and design.  It first limits packing and design to only the CDRs and its neighbors.  By specifying the `design_framework=1` option we allow the neighbor framework residues to design, while the CDRs and antigen neighbors will only repack.  If we wanted to disable antigen repacking, we would pass the _DisableAntibodyRegionOperation_ task operation.  Using this, we can specify any antibody region as `antibody_region`, `cdr_region`, or `antigen_region` and we can disable just design or both packing and design.  
		

These task operations allow us to chisel exactly what we want to design in antibody, sans a residue-specific resfile (though we could combine these with one of them!).  All of these tools are available in-code.  If you've done the design workshop, you will know how to use them here. Checkout `rosetta.protocols.antibody.task_operations` for a list of them.
		
Finally, we use the new SimpleMetric system to obtain our final sequence of the CDRs to compare to our native antibody as well as pymol selections of our CDRs - which you have been introduced to in the previous tutorial. 

If you want a challenge - try to set these up in-code without RosettaScripts.  It can be tricky - which is why I made PyRosetta finally work optionally with RosettaScripts.  Its good to know how to use both.  

- <https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/SimpleMetrics/SimpleMetrics>

- More Documentation is available here:

 - <https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/RosettaScripts>

 - <https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/TaskOperations/TaskOperations-RosettaScripts#antibody-and-cdr-specific-operations>

 - <https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/Movers/Movers-RosettaScripts#antibody-modeling-and-design-movers>
	
#### 2. Run the protocol or copy the output (357 seconds).  

	rosetta_scripts.linuxgccrelease -parser:protocol ab_design_components.xml \
			    -input_ab_scheme AHO_Scheme -cdr_definition North \
			    -s my_ab.pdb -out:prefix tutA4_ \
			    -nstruct 5 -score:weights ref2015_cart -check_cdr_chainbreaks false


#### 3. Look at the score file as you have before.  Are the sequences different between what we started with?  How about the interaction energies?

<!--NAVIGATION-->
< [Side Chain Conformations and Dunbrack Energies](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.01-Side-Chain-Conformations-and-Dunbrack-Energies.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Protein Design with a Resfile and FastRelax](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.03-Design-with-a-resfile-and-relax.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.02-Packing-design-and-regional-relax.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>