First you need to link your Google Drive to the notebook in order to access the files needed for this module.

Run the cell below and follow instructions to mount the drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Installing Biopython

At the beginning of each module, we will install **Biopython**. Biopython is a large open-source application programming interface (API) used in both bioinformatics software development and in everyday scripts for common bioinformatics tasks. It contains several packages that you will need to import which will allow you to run the analyses required for this project. 

REF:
* Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., & de Hoon, M. J. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics (Oxford, England), 25(11), 1422–1423. https://doi.org/10.1093/bioinformatics/btp163


In [None]:
!pip install biopython

# Examining the 3D structure of your protein


In this section, you will search the Protein Data Bank for the 3D structure of the HPRT protein. You will use the name 1BZY as your search query. For this, you will use a package called prody.

## Install and import the necessary packages:


**ProDy** is a free and open-source Python package for protein structural dynamics analysis.
  import * allows for use of the proteins module within ProDy

**Py3Dmol** is a widget used to embed an interactive viewer in a notebook.

REF:
* http://prody.csb.pitt.edu/
* https://pypi.org/project/py3Dmol/
* Nicholas Rego and David Koes
3Dmol.js: molecular visualization with WebGL
Bioinformatics (2015) 31 (8): 1322-1324 doi:10.1093/bioinformatics/btu829




In [None]:
!pip install prody
from prody import *
!pip install py3Dmol
import py3Dmol

STEP #1: Download the PDB file of your protein. You will use it later on for visualization purposes.

In [None]:
# Download the PDB file by writing the 4-letter code inside the parenthesis and running the cell
fetchPDB('####')

STEP #2: Obtain information about the crystal structure of your protein by parsing the data in the PDB file.

You can use the header attribute to any of the words below to get more information.

* 'A'
* 'related_entries'
* 'sheet'
* 'classification'
* 'reference'
* 'title'
* 'sheet_range'
* 'polymers'
* 'resolution'
* 'space_group'
* 'helix_range'
* 'chemicals'
* 'experiment'
* 'helix'
* 'version'
* 'authors'
* 'identifier'
* 'deposition_date'
* 'biomoltrans'

In [None]:
# Let's parse the PDB file for information

# Write the 4-leter code within the parenthesis and set 'header' to True
# 'header' attribute will let us obtain more information

atoms, header = parsePDB('####', header=####)

In [None]:
# Obtain the reference of the crystal structure, the article it came from
print(header['reference'])

# Obtain which experiment produced the crystal structure by writting 'experiment' in the brakets
print(header['####'])

# Obtain the resolution of the crystal structure
# YOUR CODE HERE

STEP #3: Read the abstract of the article you obtained in the previous cells. For this, you will search another NCBI database called PubMed and extract the abstract of the article.

In [None]:
from Bio import Entrez # Here we need Entrez once more
# Searching for the article abstract in the database Pubmed
Entrez.email = 'YOUR EMAIL HERE'

# Write the id, rettype = abstract and retmode, text
# db = database, in this case we are using the PubMed database
query6 = Entrez.efetch(db='####', id = '####', rettype = '####', retmode = '####')

# Reading the query and printing it
print(query6.read())

# Closing the query
query6.close()

## Answer the following questions:
Input your answer in the cell below each question and press SHIFT+ENTER.


1. What is the resolution of this crystal structure and what does “resolution”
mean for a crystal structure?

Answer here

2. Go to https://www.rcsb.org/ and access the entry for this protein.
  What organism was H-Ras obtained from for the crystal structure?

Answer here

## Examining Secondary Structure
Now let's look at the protein in 3D!

In [None]:
# Initialize the viewer with the appropriate PDB structure

# Write the 4-letter code of your protein inside the parenthesis: 'pdb:####'
view = py3Dmol.view(query = 'pdb:####')

# Here we set the background color as white
view.setBackgroundColor('white')

# Here we set the visualization style for the protein
view.setStyle({'cartoon': {'color':'skyblue'}})

# Here we set the visualization style for the ligand
view.addStyle({'resi':167},{'stick':{'colorscheme':'grayCarbon'}})

view.zoomTo()
view.render()

###Available colors to choose from###
* https://www.w3schools.com/cssref/css_colors.asp
* Use the name in lowercase letters.


##You can also set specific color schemes##
Try a few to see what they look like!
* ssPyMOL	- PyMol secondary colorscheme
* ssJmol - Jmol secondary colorscheme
* Jmol - Jmol primary colorscheme
* default	- default colorscheme
* amino	- amino acid colorscheme
* shapely	- shapely protein colorscheme
* nucleic	- nucleic acid colorscheme
* chain	- standard chain colorscheme
* gradient - Allows the user to provide a gradient to the colorscheme. Is either a $3Dmol.Gradient object or the name of a built-in gradient (rwb, roygb, sinebow)

Code:
```
viewer.setStyle({'cartoon':{'colorscheme':'amino'}})



## Make your own representation of the protein! 

* Change the color and style of the protein and ligand by replicating the code from previous cells.
* Write and execute your code below.
* Take a screenshot of a good view of your protein for your final report


In [None]:
# YOUR CODE HERE!

## Compare the secondary structure to the predicted secondary structure provided by PSIPRED##

PSI-blast based secondary structure PREDiction (PSIPRED) is a method used to investigate protein structure. It uses artificial neural network machine learning methods in its algorithm.

REF:
* https://en.wikipedia.org/wiki/PSIPRED



###Load the PSIPRED file with the predicted structure by running the cell below.###

Note the 'Pred' line indicates whether the amino acid (AA) is predicted to be in a coil (C), strand (E), or alpha-helix (H).

In [None]:
# Load PSIPRED file with the predicted structure

from IPython.display import IFrame
IFrame('https://drive.google.com/file/d/1tIgBuLMpwWQKpxjsnnuxF2ij6-JlFgis/preview', width=600, height=300)

## Answer the following questions:##
Input your answer in the cell below each question and press SHIFT+ENTER.

1. Carefully examine the secondary structure in the crystal structure and
record any positions where the PSIPRED predictions were incorrect.

Answer here

2. PSIPRED states its predictions are ~80% correct. Do you agree this is a
good estimate of the accuracy? Explain.

Answer here

# Modeling the mutation

Your patient has a mutation in the gene that codes for this protein. In this section you will observe how the mutation affects the function of the protein by observing where the mutation is located and performing a mutation back to the original amino acid residue. 

## Install and import the necessary packages:


**Pymol** is an open source molecular visualization system. Like **Py3Dmol**, it allows for 3D visualization of proteins. In this module, we will use it to perform the amino acid mutagenesis. 

REF:
* https://pymol.org/2/
* https://colab.research.google.com/github/pb3lab/ibm3202/blob/master/tutorials/lab02_molviz.ipynb#scrollTo=0M8qd0sTyb9p
* http://3dmol.csb.pitt.edu/doc/types.html#AtomSelectionSpec

In [None]:
# The code below installs pymol
! apt-get install pymol 

STEP #1: Select the residue you are going to mutate and set it to stick representation.


In [None]:
# Initialize the viewer with the appropriate PDB structure
# Write the 4-letter code of your protein inside the parenthesis: 'pdb:####'
view = py3Dmol.view(query = 'pdb:####')

# Here we set the background color
view.setBackgroundColor('####')

# Here we set the visualization style for the protein
view.setStyle({'cartoon': {'color':'####'}})

# Here we set the visualization style for the ligands (Magnesium, phosphoric acid, pyrophosphate)
view.addStyle({'resi':900},{'stick':{'colorscheme':'####'}})
view.addStyle({'resi':300},{'stick':{'colorscheme':'####'}})
view.addStyle({'resi':400},{'stick':{'colorscheme':'####'}})

# Here we set the visualization style for the Aspartate residue
# Write '193' after 'resi'

view.addStyle({'resi':###},{'stick':{'colorscheme':'####'}})

# Label the residue! 
view.addResLabels({'resi':['###']})

view.zoomTo()
view.render()

STEP #2: Perform the mutation! 

In [None]:
# Write and execute the command file that mutates the residue

# Open the file you will write the new structure to

# Write 'NAME OF FILE.pml' before the 'w'
com_file = open('NAME OF FILE HERE.pml','w')

# Write the file with the following commands:
com_file.write('''

# Load the PDB file. Write the pdb code in the #### space provided
load ####.pdb.gz 

# Specify the program (wizard) to run within pyMol. Write 'mutagenesis' in the parenthesis.
cmd.wizard("####")
cmd.do("refresh_wizard")

# Specify the reside to mutate and which residue to mutate it to
residue_to_change = "##/" # Which residue are you changing?
residue_mutated = "###" # What residue are you changing it to? (write the 3 letter amino acid code in CAPS)

for i in dir(cmd.get_wizard()): print i
cmd.get_wizard().set_mode(residue_mutated)
cmd.get_wizard().do_select(residue_to_change)

# Select the rotamer (conformation of residue (direction) this can be set to 1)
cmd.frame(1)

# Apply the mutation
cmd.get_wizard().apply()

# Save the new structure as a new file
save NAME THIS FILE.pdb
''')

# Close the command file
com_file.close()

In [None]:
# Execute the command file created in the previous cell
!pymol -c NAME OF FILE.pml

STEP #3: Visualize the protein again

In [None]:
# Initialize the viewer with the appropriate PDB structure

view=py3Dmol.view()

# Read ('r') the file you created in the previous step
view.addModel(open('FILE NAME HERE.pdb', 'r').read(),'pdb')

# Here we set the background color
view.setBackgroundColor('###')

# Here we set the visualization style for the protein
view.setStyle({'cartoon': {'color':'###'}})

# Here we set the visualization style for the ligands (Magnesium, phosphoric acid, pyrophosphate)
view.addStyle({'resi':900},{'stick':{'colorscheme':'####'}})
view.addStyle({'resi':300},{'stick':{'colorscheme':'####'}})
view.addStyle({'resi':400},{'stick':{'colorscheme':'####'}})

# Here we set the visualization style for the Aspartate residue
# Write '193' after 'resi'

view.addStyle({'resi':###},{'stick':{'colorscheme':'###'}})

# Label the residue! 
view.addResLabels({'resi':['###']})


view.zoomTo()
view.render()

## Take a screenshot of a good view/angle of the active site.

## Answer the following questions:##
Input your answer in the cell below each question and press SHIFT+ENTER.

1. Compare the active site before and after the mutation. After the visual inspection. do you think the mutated amino acid closer or farther from the ligand? 

Answer here

2. Write your hypothesis explaining how the mutation affects the protein function.

Answer here