<!--NOTEBOOK_HEADER-->
*This notebook contains material from [PyRosetta](https://RosettaCommons.github.io/PyRosetta);
content is available [on Github](https://github.com/RosettaCommons/PyRosetta.notebooks.git).*

<!--NAVIGATION-->
< [Side Chain Conformations and Dunbrack Energies](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.01-Side-Chain-Conformations-and-Dunbrack-Energies.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Protein Design with a Resfile and FastRelax](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.03-Design-with-a-resfile-and-relax.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.02-Packing-design-and-regional-relax.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>

# RosettaCarbohydrates: Modeling and Design
Keywords: carbohydrate, glycan, glucose, mannose, sugar, design, prediction

## Overview
Here, you will learn how to model glycans and design optimal glycosylation positions in a protein.

We will be using the RosettaCarbohydrate framework to build and model glycans.  The `GlycanModeler`, which is our main method for modeling glycans, will be published in 2020.  We will be using some custom glycan options to load pdbs. 
First, one needs the `-include_sugars` option, which will tell Rosetta to load sugars and add the sugar_bb energy term to a default scorefunction.  This scoreterm is like rama for the sugar dihedrals which connect each sugar residue. 

		-include_sugars


When loading structures from the PDB that include glycans, we use these options. This includes an option to write out the structures in pdb format instead of the Rosetta format (which is actually better).  Again, this is included in the config/flags files you will be using.

		-maintain_links
		-auto_detect_glycan_connections
		-alternate_3_letter_codes pdb_sugar
		-write_glycan_pdb_codes


More information on working with glycans can be found at this page: [Working With Glycans](https://www.rosettacommons.org/docs/wiki/application_documentation/carbohydrates/WorkingWithGlycans)

## Algorithm
  
The `GlycanModeler` essentially builds glycans from the root (The first residue of the Tree) out to the trees in a way that simulates a tree growing.  It uses a notion of a 'layer' where the layer is defined as the number of residues to the glycan root (with the glycan root being layer 0).  Within modeling, all glycan residues other than the ones being optimized are 'virtualized'.  In Rosetta, the term 'Virtual' means that these residues are present, but not scored.  (It should be noted that it is now possible to turn any residues Virtual and back to Real using two movers: `ConvertVirtualToRealMover` and `ConvertRealToVirtualMover`. )

Within the modeling application, sampling of glycan DOFs is done through the `GlycanSampler`.  The sampler attempts to sample the large amount of DOFs available to a glycan tree.  The GlycanSampler is a `WeightedRandomSampler`, which is a container of highly specific sampling strategies, where each strategy is weighted by a particular probability.  At each apply, the mover selects one of these samplers using the probability set to it. This is the same way the SnugDock algorithm for antibody modeling works. 

Sampling is always scaled with the number of glycan residues that you are modeling, so run-time will increase proportionally as well. 
If you are modeling a huge viral particle with lots of glycans, one can use quench mode, which will optimize each glycan individually. 
Tpyically for these cases, multiple rounds of glycan modeling is desired. 


### GlycanSampler Major components

Some of these components were covered in the previous tutorial.

1. __Glycan Conformers__

	These conformers have been generated through an in-depth bioinformatic analysis of the PDB using adaptive kernal density estimates and are unique for each linkage type including glycan residues connected to ASN residues.  A conformer is a specific conformation of all of the backbone dihedrals of a particular glycan linkage. Essentialy glycan 'fragments' for a particular type of linkage.


2. __SugarBB Sampling__ 

	This sampling is done through turning the `sugar_bb` energy term into a set of probabilities using the -log(e) function.  This allows us to sample on the QM derived torsonal potentials during modeling. 


3. __Random Sampling and Shear Moves__

	We sample random torsions at +/- 15 , +/- 45, +/- 90 degrees, each at decreasing probabilities at a 4:2:1 ratio of sampling Small,Medium,Large. 
	Shear sampling is done where torsions are set for two residues in order to reduce downsteam effects and allow 'flipping' of the glycan torsions.


4. __Minimization__
	
	We Minimize Sugar residues by randomly selecting a residue from what is set to model, and selecting all residues out to the tree that are not virtualized. This reduces computational time that would otherwise restrict the total number of glycan residues we could model at once.
    

5. __Packing__

	Of the residues set to optimize, we chooses a random residue and pack that residue and all residues out to the tree that are not virtualized. We pack the sugar residues (OH and constituents) and any neighboring protein sidechains. TaskOperations may be set to allow design of protein residues during this.  We do packing this way to once again reduce total computational time.



In [None]:
# Notebook setup
if 'google.colab' in sys.modules:
    !pip install pyrosettacolabsetup
    import pyrosettacolabsetup
    pyrosettacolabsetup.setup()
    print ("Notebook is set for PyRosetta use in Colab.  Have fun!")

**Make sure you are in the directory with the pdb files:**

`cd google_drive/My\ Drive/student-notebooks/`

# General Setup and Inputs

You will be using a few different inputs.  We will be designing in glycosylation spots in order to block antibody binding at a highly curved epitope, and we will be loading a human structure from the PDB that has internal glycans.   


## Notes for Tutorial Shortening


Typically, the value of `-glycan_sampler_rounds` is set to 25 (which typically is enough) and nstruct is about 5-10k per input structure. You may increase glycan_sampler_rounds to 100 and then decrease output to 1-2500 nstruct in order to have the same level of sampling, which will result in very good models as well.  Since this is denovo modeling of glycans, more nstruct is almost always better. For some tutorials, we may decrease this value below our optimal value in order to shorten the length of the tutorial.


## General Notes

We will use a flags file for all common options in this tutorial.  Note that instead of passing this flag on init, you can instead put it into your working directory or a particular place in your home directory and rename it common. 
    
See this page for more info on using rosetta with custom config files: <https://www.rosettacommons.org/docs/latest/rosetta_basics/running-rosetta-with-options#common-options-and-default-user-configuration>

All tutorials have generated output in output_files and their approximate time to finish on a single (core i7) processor.


In [1]:
#Python
from pyrosetta import *
from pyrosetta.rosetta import *
from pyrosetta.teaching import *
init('@inputs/glycans/common_glycans')

PyRosetta-4 2019 [Rosetta PyRosetta4.Release.python36.mac 2019.39+release.93456a567a8125cafdf7f8cb44400bc20b570d81 2019-09-26T14:24:44] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
[0mcore.init: [0mChecking for fconfig files in pwd and ./rosetta/flags
[0mcore.init: [0mReading fconfig.../Users/jadolfbr/.rosetta/flags/common
[0mcore.init: [0m
[0mcore.init: [0m
[0mcore.init: [0mRosetta version: PyRosetta4.Release.python36.mac r233 2019.39+release.93456a567a8 93456a567a8125cafdf7f8cb44400bc20b570d81 http://www.pyrosetta.org 2019-09-26T14:24:44
[0mcore.init: [0mcommand: PyRosetta @inputs/glycans/common_glycans -database /Users/jadolfbr/Library/Python/3.6/lib/python/site-packages/pyrosetta-2019.39+release.93456a567a8-py3.6-macosx-10.6-intel.egg/pyrosetta/database
[0mbasic.random.init_random_generator: [0m'RNG device' seed mode, using '/dev/urandom', seed=-1174404344 seed_offset=0 

# Tutorial

GlycanModeling is done through the RosettaScripts interface.  Each tutorial has you copying a base XML and adding/modifying specific components to achieve a goal.  ALL of these movers are available as components in PyRosetta - however, setup is much more difficult and time consuming.  So for now, we will rely on the RS interface, but 

## Tutorial A: Epitope Blocking, De-novo Glycan Modeling

Here, we will start with the antigen known as Bee Hyaluronidase, from PDB ID 2J88.  The PDB file has an antibody bound to it as a HIGHLY immunogenic site. We would like to block this in order to use begin to use this enzyme for therapy as Hyaluronidase can be effective in breaking down sugars in the extracellular matrix, allowing certain larger drugs to get to regions of interest.  The antibody is renumbered into the AHo numbering scheme that we use in the RAbD tutorial, and it has been relaxed with constraints into the Rosetta energy function. 

We will be designing in at least one optimal glycan at the most immunogenic site.
Note that a prototocol called SugarCoat is in development that will scan regions of interest for potential ideal glycosylation, however, one can certainly do this manually as we do below. 

1. Designing in a Glycosylation Site: 

	`CreateGlycanSequonMover` and `CreateSequenceMotifMover`

	A sugar glycosylation site is known as a `Sequon`.  The glycan sequon is made up of three protein residues which are recognized by the GlycosylTransferase Enzyme during translation in the ER.  This enzyme adds the root of nascent glycan onto a protein.  In this case, we use the sequon for ASN glycosylation.  The sequon is as follows: `N[^P][S/T]`.  The `[^P]` notation means that any residue other than P can be there.  The `[S/T]` notation means that either S or T is recognized.  This notation can be used to directly create Motifs in proteins using the `CreateSequenceMotifMover` and associated `SequenceMotifTaskOperation`. Documentation for these is available here:

	- <https://www.rosettacommons.org/docs/wiki/scripting_documentation/RosettaScripts/xsd/mover_CreateSequenceMotifMover_type>
	- <https://www.rosettacommons.org/docs/wiki/scripting_documentation/RosettaScripts/xsd/to_SequenceMotifTaskOperation_type>

	The create GlycanSequonMover can also be used for glycosylation of different AA than ASN.

	1. Design using a typical sequon


				mkdir work_dir
				cp ../input_files/common .
				cp ../input_files/tutA11.xml .
				cp ../input_files/2j88_complex.pdb .
				cp ../input_files/2j88_antigen.pdb .

				<CreateGlycanSequeonMover name="motif_creator" residue_selector="select"/>


		Before we begin, take a look at the complex.  Where can we introduce a glycan to block binding?
		Where do you think the optimal glycan position would be for this particular antibody?  Take a look at the xml.  Is this the position we are targeting?  Typically, we may want to allow some backbone movement in our sequon.  The full glycan scanning protocol can be found in an input file, simple_glycan_scanner_manual.xml, where we relax the motif residues with constraints, add the sequon, and then relax again, comparing the energy between them to get the full energetic contributions of the sequon on the structure.  In order to reduce the run time in these tutorials, we will be removing this going forward.


		Go ahead and run the xml (about 15 seconds)

				rosetta_scripts.linuxgccrelease -s 2j88_antigen.pdb -native 2j88_antigen.pdb \
				    -parser:protocol tutA11.xml -parser:script_vars start=143A end=145A \
				    -out:prefix tutA11_


		Take a look at the scorefile.  Why do we have all these extra values here?  These are the SimpleMetrics, and they have replaced filters for calculating useful values in Rosetta.  In the xml, we define a few SimpleMetrics.  We run a set before we actually create the sequon and then a set of metrics afterwards!  In the XML, you see we use a prefix in the `RunSimpleMetrics` mover to denote any metrics run after the sequeon creation.  Take a look at the protocol section and then at the RunSimpleMetrics movers we have defined.  What is the prefix that is used post-sequon creation?  Ok, now go back to the score file - what values have we output?  Did we successfully design in our motif?


	2. Design using the `N[^P][T]` motif


		This motif has been shown to have higher occupancy of the glycosation site with glycans in the resulting protein.  Glycosylation is not 100% in some cases at some positions for (currently) unknown reasons, but this paper [] is a bioinformatic analysis that concludes that this motif has a higher occupancy.  If we were creating a drug, we can use chromatography during protein isolation to choose peaks which include our glycan. 
		Here, we are using the [-] notation as to not actually design the second position.  We will use what is in the native protein here.

				cp ../input_files/tutA12.xml .

				<CreateSequenceMotifMover name="create_sequon" residue_selector="p1" motif="N[-]T"/>

				rosetta_scripts.linuxgccrelease -s 2j88_antigen.pdb -native 2j88_antigen.pdb \
				    -parser:protocol tutA12.xml -parser:script_vars start=143A end=145A \
				    -out:prefix tutA12_

		Was the sequon successfully designed?  Take a look at the scorefile.  Is the sequence that was designed different than the previous tutorial? (compare `sequence` to `post-sequon_sequence). How is the energy difference from the native protein?  Use the SimpleMetric output - look for the output that has native_delta in the name.  Did we change the SASA?   


<!--NAVIGATION-->
< [Side Chain Conformations and Dunbrack Energies](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.01-Side-Chain-Conformations-and-Dunbrack-Energies.ipynb) | [Contents](toc.ipynb) | [Index](index.ipynb) | [Protein Design with a Resfile and FastRelax](http://nbviewer.jupyter.org/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.03-Design-with-a-resfile-and-relax.ipynb) ><p><a href="https://colab.research.google.com/github/RosettaCommons/PyRosetta.notebooks/blob/master/notebooks/06.02-Packing-design-and-regional-relax.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open in Google Colaboratory"></a>