# Python & PDB Tutorial: A Complete Introduction for Beginners

The Protein Data Bank (PDB) stores information about the 3D shapes of proteins, nucleic acids, and complex assemblies. This data bank can be found online at https://www.rcsb.org/. In this tutorial, we will go through how to visualize proteins from PDB, and how to locate mutations in proteins in 3D. 

## 1.0 Visualizing Proteins using Py3Dmol

Py3Dmol downloads PDB structures using the compressed binary MMTF file format from https://mmtf.rcsb.org. For more information, read the documentation at https://pypi.org/project/py3Dmol/.

In [1]:
import py3Dmol

Here, we are visualizing the human protein hemoglobin from the PDB

In [2]:
hemoglobin = py3Dmol.view(query='pdb:5WOG') # 5WOG is the PDB ID
hemoglobin.setStyle({'cartoon': {'color': 'spectrum'}}) # here, we are setting the color and animation style
hemoglobin.setStyle({'hetflag': True}, {'stick':{'radius': 0.3, 'singleBond': False}}) # here, we are setting the display style
hemoglobin.zoomTo() # this command makes sure the output zooms in on the protein
hemoglobin.show() # this command shows us the protein

This protein looks a little messy! To make it easier, we can visualize the different subunits of hemoglobin in different colors.

**We can use different commands to help visualize different structural motifs**

> setStyle command tells which protein chains to highlight and what colors and shapes

> addLabel command adds labels to the visualization below

Here, we are labeling chains A and B (which are the alpha subunits of hemoglobin), using a cartoon representation, and coloring it yellow. We next label chains C and D (which are the beta subunits of hemoglobin), using a cartoon representation, and coloring it BLUE. 


Here's the command: 

In [3]:
hemoglobin.setStyle({'chain':['A','B']},{'cartoon': {'color': 'yellow'}}) # alpha subunits of hemoglobin
hemoglobin.addLabel('alpha subunits', {'fontColor':'yellow', 'backgroundColor':'lightgray'}, {'chain': ['A','B']}) # adding a label                                              
hemoglobin.setStyle({'chain':['C','D']},{'cartoon': {'color': 'blue'}}) # beta subunits of hemoglobin
hemoglobin.addLabel('beta subunits', {'fontColor':'blue', 'backgroundColor':'lightgray'}, {'chain': ['C','D']}) #adding a label

hemoglobin.show()

## 2.0 Visualizing Residues and Ligands

We can also visualize the different residues and ligands interacting with the protein. Here, we will look at the waters interacting with the protein.

**We can use different commands to help visualize different ligands or non-protein entities, like water**

> setStyle command selects the atoms in the crystal structure that are called "HOH" that are interacting distance to the protein

The water molecules will appear as red circles (the heavy atoms, or oxygen atoms).

Here's the command: 

In [11]:
hemoglobin.setStyle({'resn': 'HEM'}, {'sphere':{'radius':2.5}})
hemoglobin.show()

In [7]:
hemoglobin.setStyle({'resn': 'HEM'}, {'sphere':{'radius':2.0}})
hemoglobin.show()

Now, turn the waters off

In [13]:
hemoglobin.setStyle({'resn': 'HOH'}, {})
hemoglobin.show()

## Exercise

**Try on your own**

1. Visualize all the heme residues that interact with hemoglobin. HINT: the residue name for heme is "HEM" and you may want to use the "stick" representation. 

2. Visualize the protein with the PDB ID "1JM7", label the two different proteins and show the zinc ions that interat with this complex. 

    HINT #1: This PDB is linked to a co-crystal structure of BRCA1 (ring domain, chain A) and BARD1 protein (chain B); therefore, you are visualizing a heterodimer, or a macromolecular complex composed of two polypeptide chains. Mutations in BRCA1 are commonly associated with breast cancer. Additionally, visualize the water residues that interact with this heterodimer.
    HINT #2: The zinc ion are called "ZN" and you may want to increase the radius size from 0.5 to 1.5.

## 3.0 Visualizing specific amino acids

Now let's visualize some specific amino acids on the hemoglobin protein. Here, we are looking at the 6th amino acid in the beta chains, which is a glutamine. Take a second to think about how changing this amino acid might affect the structure and function of the hemoglobin protein.

In [16]:
hemoglobin.setStyle({'chain': 'C', 'resi': '2'},{'stick': {'colorscheme': 'greenCarbon'}})
hemoglobin.setStyle({'chain': 'D', 'resi': '2'},{'stick': {'colorscheme': 'greenCarbon'}})
hemoglobin.setStyle({'chain': 'C', 'resi': '146'},{'stick': {'colorscheme': 'greenCarbon'}})
hemoglobin.setStyle({'chain': 'D', 'resi': '146'},{'stick': {'colorscheme': 'greenCarbon'}})
hemoglobin.show()

In [14]:
hemoglobin.setStyle({'chain': 'C', 'resi': '6'},{'stick': {'colorscheme': 'redCarbon'}})
hemoglobin.setStyle({'chain': 'D', 'resi': '6'},{'stick': {'colorscheme': 'redCarbon'}})
hemoglobin.show()

In people affected with sickle cell anemia, this glutamine is changed to a valine. Molecules of sickle cell affected hemoglobin stick to one another, forming rigid rods. These rods cause a person's red blood cells to take on a deformed, sickle-like shape. These blood cells do not carry oxygen well, and they also tend to clog capillaries. So, when a person affected by sickle cell anemia exerts themselves even slightly, they often experience terrible pain, and might even undergo heart attack or stroke—all because of a single substitution!

## 4.0 Mutations in Proteins

Saturation mutagenesis is a technique that substitutes different amino acids at each position of the protein.

We can link saturation mutagenesis of proteins to the observed changes in their structure and function using another approach, called Multiplexed Assays of Variant Effect (MAVE). This data can be used in clinical applications to study and develop cures for diseases caused by the mutations that impact protein structure and function. 

The MAVE Data Base (MAVE DB) contains a repository of MAVE data. This data base can be accessed at mavedb.org. 

In this exercise, you will look at 5 different mutations in BRCA1 (cancer oncoprotein) from a MAVE dataset and explore their effects on protein function. 

First, we will start by importing the data as a Pandas dataframe from a .csv (comma separated values) file. 

In [11]:
import pandas as pd # import the pandas package, which is a common data analysis package
MAVEdata = pd.read_csv('BRCA1_MAVE_practicedata.csv') # import the csv data as a pandas dataframe
MAVEdata = MAVEdata[['accession','hgvs_nt','hgvs_pro','score']] # only view the important columns
MAVEdata # visualize the dataframe

Unnamed: 0,accession,hgvs_nt,hgvs_pro,score
0,urn:mavedb:00000003-a-1#1370,77G>T,Cys26Phe,-4.134405
1,urn:mavedb:00000003-a-1#1622,33A>T,Gln11His,0.796523
2,urn:mavedb:00000003-a-1#9137,159G>T,Gln53His,-2.211282
3,urn:mavedb:00000003-a-1#2102,136T>G,Cys46Gly,-1.6248
4,urn:mavedb:00000003-a-1#1264,266T>G,Ile89Ser,-1.066312


In the table above, the "accession" number is the MAVE ID number. 

- The "hgvs_nt" column describes the **nucleotide substitution** for this mutation. For example, in the 0th row, the guanine in the 77th position of the BRCA1 gene is replaced by a thymine. 

- The "hgvs_pro" column translates this to the **amino acid level**. Here, the codon that is originally encoding a cysteine is now changed to a phenylalanine. This occurs in the 26th position of the protein. 

Find more information at https://www.mavedb.org/docs/mavedb/ if you are interested.

## Exercise

**Try on your own**

3. Visualize the amino acids in the positions described in the MAVE data. 

    HINT #1: Use the same heterodimer structure from the previous exercise. Adapt this using the same approach that was done for the 6th amino acid of glutamine was visualized for hemoglobin exaple above. 
   
   HINT #2: You must add 1 to all amino acid positions in the MAVE table. So, for the 0th row, Cys26 is actually Cys27 in the PDB structure. 
 
 
4. All of these mutations cause a loss of function (LOF) in the BRCA1 RING domain. The mutation in the 0th row causes the largest LOF (it has the lowest score, of -4.13), while the mutation in the 1st row causes the smallest LOF. After visualizing all five mutations, hypothesize why they have lower impacts compared to the 0th row mutant. 

**Acknowledgements**: 
This tutorial was developed by Kriti Shukla and Brunk Lab at the University of North Carolina at Chapel Hill. Parts of the tutorial were adapted from the MMTF-2018 Workshop & Hackathon, hosted by Dr. Peter W. Rose at the University of California, San Diego.  