<!--NOTEBOOK_HEADER-->
*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta);
content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*

<!--NAVIGATION-->
< [Side Chain Conformations and Dunbrack Energies](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.01-Side-Chain-Conformations-and-Dunbrack-Energies.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Protein Design with a Resfile and FastRelax](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.03-Design-with-a-resfile-and-relax.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.02-Packing-design-and-regional-relax.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>

# RosettaAntibody Framework
Keywords: CDRResidueSelector

## Overview
In this workshop we will learn how to use the RosettaAntibody framework.  The full RosettaAntibody (modeling) code is not available in PyRosetta, unfortunately - as it is based around an application. To use that, you will have to use either the ROSIE server, or the Rosetta application. 

For a full overview of the RosettaAntibody modeling application, see this paper: 
https://www.ncbi.nlm.nih.gov/pubmed/28125104

Snugdock, and H3 modeling component of RosettaAntibody are available here as movers. 

In [None]:
# Notebook setup
!pip install pyrosettacolabsetup
import pyrosettacolabsetup
pyrosettacolabsetup.setup()
print ("Notebook is set for PyRosetta use in Colab.  Have fun!")

**Make sure you are in the directory with the pdb files:**

`cd google_drive/My\ Drive/student-notebooks/`

## Imports

Lets import the antibody namespace so we can start using it.  Take a look at the different modules that are a part of the antibody module.

Note that we can also do `from rosetta.protocols.antibody import *` in order to make accessing the enums much easier.  For the purpose of this workshop, we will use `antibody` to traverse the contents.  This makes it easier for you to use tab completion for exploration.

In [5]:
#Python
from pyrosetta import *
from pyrosetta.rosetta import *
from pyrosetta.teaching import *

#Core Includes
from rosetta.core.select import residue_selector as selections

from rosetta.protocols import antibody


## Intitlialization 

Here, we will initialize a typical run of Rosetta. We could use the `-input_ab_scheme` option with `AHo_Scheme`, but we will learn to instead pass this to our main antibody framework code. 

In [3]:
init('-use_input_sc -ignore_unrecognized_res \
     -ignore_zero_occupancy false -load_PDB_components false -no_fconfig')

PyRosetta-4 2019 [Rosetta PyRosetta4.Release.python36.mac 2019.33+release.1e60c63beb532fd475f0f704d68d462b8af2a977 2019-08-09T15:19:57] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
[0mcore.init: [0mRosetta version: PyRosetta4.Release.python36.mac r230 2019.33+release.1e60c63beb5 1e60c63beb532fd475f0f704d68d462b8af2a977 http://www.pyrosetta.org 2019-08-09T15:19:57
[0mcore.init: [0mcommand: PyRosetta -use_input_sc -ignore_unrecognized_res -ignore_zero_occupancy false -load_PDB_components false -no_fconfig -database /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyrosetta-2019.33+release.1e60c63beb5-py3.6-macosx-10.6-intel.egg/pyrosetta/database
[0mbasic.random.init_random_generator: [0m'RNG device' seed mode, using '/dev/urandom', seed=-732183650 seed_offset=0 real_seed=-732183650
[0mbasic.random.init_random_generator: [0mRandomGenerator:init: Normal m

## Import and copy pose

Let's load an antibody - this this the same antibody we used to learn packing and design. :)

In [4]:
#Import a pose
pose = pose_from_pdb("inputs/2r0l_1_1.pdb")
original_pose = pose.clone()

[0mcore.chemical.GlobalResidueTypeSet: [0mFinished initializing fa_standard residue type set.  Created 980 residue types
[0mcore.chemical.GlobalResidueTypeSet: [0mTotal time to initialize 0.952982 seconds.
[0mcore.import_pose.import_pose: [0mFile 'inputs/2r0l_1_1.pdb' automatically determined to be of type PDB
[0mcore.conformation.Conformation: [0mFound disulfide between residues 23 88
[0mcore.conformation.Conformation: [0mcurrent variant for 23 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 88 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 23 CYD
[0mcore.conformation.Conformation: [0mcurrent variant for 88 CYD
[0mcore.conformation.Conformation: [0mFound disulfide between residues 130 204
[0mcore.conformation.Conformation: [0mcurrent variant for 130 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 204 CYS
[0mcore.conformation.Conformation: [0mcurrent variant for 130 CYD
[0mcore.conformation.Conformation: [0mcurrent va

## AntibodyInfo

The main tool that we will use is the `AntibodyInfo` object.  This allows you to get a TON of information about the antibody to use in various custom protocols.  

Note that this antibody has already been renumbered using the PyIgClassify server.

Since we are not defining the numbering scheme and cdr definition during init, we will need to pass an Enum to the AntibodyInfo object.

In [8]:
ab_info = antibody.AntibodyInfo(pose, antibody.AHO_Scheme, antibody.North)

[0mbasic.io.database: [0mDatabase file opened: sampling/antibodies/cluster_center_dihedrals.txt
[0mprotocols.antibody.AntibodyNumberingParser: [0mAntibody numbering scheme definitions read successfully
[0mprotocols.antibody.AntibodyNumberingParser: [0mAntibody CDR definition read successfully
[0mantibody.AntibodyInfo: [0mSuccessfully finished the CDR definition
[0mantibody.AntibodyInfo: [0mAC Detecting Regular CDR H3 Stem Type
[0mantibody.AntibodyInfo: [0mARFWWRSFDYW
[0mantibody.AntibodyInfo: [0mAC Finished Detecting Regular CDR H3 Stem Type: KINKED
[0mantibody.AntibodyInfo: [0mAC Finished Detecting Regular CDR H3 Stem Type: Kink: 1 Extended: 0
[0mantibody.AntibodyInfo: [0mSetting up CDR Cluster for H1
[0mprotocols.antibody.cluster.CDRClusterMatcher: [0mLength: 13 Omega: TTTTTTTTTTTTT
[0mantibody.AntibodyInfo: [0mSetting up CDR Cluster for H2
[0mprotocols.antibody.cluster.CDRClusterMatcher: [0mLength: 10 Omega: TTTTTTTTTT
[0mantibody.AntibodyInfo: [0mSetting u

Lets take a look at what AntibodyInfo prints

In [9]:
print(ab_info)

////////////////////////////////////////////////////////////////////////////////
///                          Rosetta Antibody Info                           ///
///                                                                          ///
///             Antibody Type:  Regular Antibody
///             Light Chain Type:  unknown
/// Predict H3 Cterminus Base:  KINKED
///                                                                          
/// H1 info: 
///            length:  13
///          sequence:  AASGFTISNSGIH
///     north_cluster:  H1-13-1
///         loop_info:  LOOP start: 131  stop: 143  cut: 137  size: 13  skip rate: 0  extended?: False

/// H2 info: 
///            length:  10
///          sequence:  WIYPTGGATD
///     north_cluster:  H2-10-1
///         loop_info:  LOOP start: 158  stop: 167  cut: 163  size: 10  skip rate: 0  extended?: False

/// H3 info: 
///            length:  10
///          sequence:  ARFWWRSFDY
///     north_cluster:  H3-10-1
///         l

**Isn't that AWESOME!!**  I think so.  But I wrote a lot of that code!  

Anyway, as you can see you can get a pretty fair bit of information out of the AntibodyInfo object.  In fact, most antibody-related code actually takes an AntibodyInfo object or constructs one from set numbering scheme, cdr definitions, and pose passed to it.  You will see this as we go.  

Note the north_cluster here.  This is useful in some modeling tasks, but becomes much more relevant during antibody design.  More information on what we mean by north_cluster can be found in this paper, if you want to read ahead a bit. https://www.ncbi.nlm.nih.gov/pubmed/21035459

## Basic AntibodyInfo Access
Now, lets use the AntibodyInfo class to get a bit of useful information out of our antibody.

In [14]:
print("h1", ab_info.get_CDR_start(antibody.h1, pose))
print("h2", ab_info.get_CDR_end(antibody.h2, pose))

h1 131
h2 167


Now lets use these enums a bit more.  They go in order from 1 to 8, with 7 and 8 being CDR4 loops - also known as H3 loops.  We won't worry about them just yet.  

In [19]:
for i in range(1, 7):
    print(i, ab_info.get_CDR_name(antibody.CDRNameEnum(i)))
    
for cdr in ['L1', 'l1', 'L2', 'l2', 'L3', 'H1', 'H2', 'H3']:
    print(cdr, str(ab_info.get_CDR_name_enum(cdr)))
          
print(str(antibody.h3))
print(int(antibody.h3))

1 H1
2 H2
3 H3
4 L1
5 L2
6 L3
L1 CDRNameEnum.l1
l1 CDRNameEnum.l1
L2 CDRNameEnum.l2
l2 CDRNameEnum.l2
L3 CDRNameEnum.l3
H1 CDRNameEnum.h1
H2 CDRNameEnum.h2
H3 CDRNameEnum.h3
CDRNameEnum.h3
3


Does this make enums a bit less confusing?  These are named integers.  The last function allows us to print either the actual cdr name enum or the integer from it.  The cool thing here is that we can loop through all of the CDRs just by using a range 1-6 and rosetta will understand it.  

Note that we convert the integer into a `CDRNameEnum` in the function.  If we are storing the cdr name enums as indexes to a dictionary or list, we don't need this.  That is simply for the C++ code to work properly. 

### AntibodyEnumManager
So we have seen that some of this code we can do directly within AntibodyInfo itself.  Cool. But what if we need something more advanced?  Lets use the class that actually does all this conversion.


In [23]:
enum_manager = antibody.AntibodyEnumManager()
print(enum_manager.numbering_scheme_enum_to_string(antibody.AHO_Scheme))
print(enum_manager.cdr_definition_enum_to_string(antibody.North))
print(enum_manager.cdr_name_string_to_enum("H1"))
print(enum_manager.antibody_region_enum_to_string(antibody.framework_region))

AHO_Scheme
North
CDRNameEnum.h1
framework_region


Use the function, `get_region_or_residue` and `get_CDRNameEnum_of_residue` and the manager to traverse the antibody and get relevant regions of all residues in the pose

In [27]:
### BEGIN SOLUTION

for i in range(1, pose.size()+1):
    region = ab_info.get_region_of_residue(pose, i)
    if (region == antibody.cdr_region):
        print(i, enum_manager.cdr_name_enum_to_string(ab_info.get_CDRNameEnum_of_residue(pose, i)))
    else:
        print(i, enum_manager.antibody_region_enum_to_string(region))
              
### END SOLUTIOn

1 framework_region
2 framework_region
3 framework_region
4 framework_region
5 framework_region
6 framework_region
7 framework_region
8 framework_region
9 framework_region
10 framework_region
11 framework_region
12 framework_region
13 framework_region
14 framework_region
15 framework_region
16 framework_region
17 framework_region
18 framework_region
19 framework_region
20 framework_region
21 framework_region
22 framework_region
23 framework_region
24 L1
25 L1
26 L1
27 L1
28 L1
29 L1
30 L1
31 L1
32 L1
33 L1
34 L1
35 framework_region
36 framework_region
37 framework_region
38 framework_region
39 framework_region
40 framework_region
41 framework_region
42 framework_region
43 framework_region
44 framework_region
45 framework_region
46 framework_region
47 framework_region
48 framework_region
49 L2
50 L2
51 L2
52 L2
53 L2
54 L2
55 L2
56 L2
57 framework_region
58 framework_region
59 framework_region
60 framework_region
61 framework_region
62 framework_region
63 framework_region
64 framework_re

### CDR Clusters

Use either the PyRosetta docs on AntibodyInfo, or the interactive notebook to use AntibodyInfo to get the length and cluster of L1.

In [33]:
### BEGIN SOLUTION

print(ab_info.get_CDR_length(antibody.l1))
print(ab_info.get_CDR_cluster(antibody.l1).cluster())

### END SOLUTION

11
CDRClusterEnum.L1_11_1


The CDRCluster object has a lot of information about a particular cluster.  Lets use it to get the normalized distance in degrees of the L1 cluster. 

In [35]:
L1_cluster = ab_info.get_CDR_cluster(antibody.l1)
print(L1_cluster.normalized_distance_in_degrees())

7.137242784087944


Anything below 35 or 40 degrees is very close to the cluster center.  This is a structure with a very well-defined L1-11-1 loop - one of the most common L1 lengths and clusters.

### Numbering Scheme Translation
It may not seem like much, but numbering scheme translation is a very difficult thing to do without mistakes.   Rosetta now has this ability to make it much easier to understand antibody structural papers in a highly tested and easy-to-use implementation.  Lets take a look.

<!--NAVIGATION-->
< [Side Chain Conformations and Dunbrack Energies](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.01-Side-Chain-Conformations-and-Dunbrack-Energies.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Protein Design with a Resfile and FastRelax](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.03-Design-with-a-resfile-and-relax.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.02-Packing-design-and-regional-relax.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>