# Install Python Requirements

The following cells install and import all Python packages required by this program.

In [2]:
%pip install -r requirements.txt

Collecting boto3
  Downloading boto3-1.28.9-py3-none-any.whl (135 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.7/135.7 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m MB/s[0m eta [36m0:00:01[0m
Collecting botocore<1.32.0,>=1.31.9
  Downloading botocore-1.31.9-py3-none-any.whl (11.0 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.0/11.0 MB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m0:01[0m:01[0m
[?25hCollecting jmespath<2.0.0,>=0.7.1
  Using cached jmespath-1.0.1-py3-none-any.whl (20 kB)
Collecting s3transfer<0.7.0,>=0.6.0
  Using cached s3transfer-0.6.1-py3-none-any.whl (79 kB)
Collecting urllib3<1.27,>=1.25.4
  Using cached urllib3-1.26.16-py2.py3-none-any.whl (143 kB)
Installing collected packages: urllib3, jmespath, botocore, s3transfer, boto3
  Attempting uninstall: urllib3
    Found existing installation: urllib3 2.0.4
    Uninstalling urllib3-2.0.4:
      Successfully uninstalled u

In [3]:
import pandas as pd
from biopandas.pdb import PandasPdb
import plotly.express as px
import os
import boto3

## Downloading the AlphaFold Results

The following cells download the `results.tar.gz` file from S3. Then we unarchive it so that it can be processed by the rest of the notebook.

In [5]:
cfn = boto3.client("cloudformation")
response = cfn.describe_stacks(StackName="alphafold-workshop")

ClientError: An error occurred (ExpiredToken) when calling the AssumeRole operation: The security token included in the request is expired

In [7]:
pdb = PandasPdb().read_pdb('results/ranked_0.pdb')

In [8]:
# printing documentation in the output. unhide to view PandasPdb class documentation
help(pdb)

Help on PandasPdb in module biopandas.pdb.pandas_pdb object:

class PandasPdb(builtins.object)
 |  Object for working with Protein Databank structure files.
 |  
 |  Attributes
 |  ----------
 |  df : dict
 |      Dictionary storing pandas DataFrames for PDB record sections.
 |      The dictionary keys are {'ATOM', 'HETATM', 'ANISOU', 'OTHERS'}
 |      where 'OTHERS' contains all entries that are not parsed as
 |      'ATOM', 'HETATM', or 'ANISOU'.
 |  
 |  pdb_text : str
 |      PDB file contents in raw text format.
 |  
 |  pdb_path : str
 |      Location of the PDB file that was read in via `read_pdb`
 |      or URL of the page where the PDB content was fetched from
 |      if `fetch_pdb` was called.
 |  
 |  header : str
 |      PDB file description.
 |  
 |  code : str
 |      PDB code
 |  
 |  Methods defined here:
 |  
 |  __init__(self)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  amino3to1(self, record='ATOM', residue_col='residue_name', fill

The `pdb.df` attribute gives us a dictionary storing pandas DataFrames for PDB record sections.
The dictionary keys are `{'ATOM', 'HETATM', 'ANISOU', 'OTHERS'}` where `'OTHERS'` contains all entries that are not parsed as `'ATOM'`, `'HETATM'`, or `'ANISOU'`.
Let's check them out.

In [9]:
for section, df in pdb.df.items():
    print("\n\n")
    print(section)
    display(df.head())




ATOM


Unnamed: 0,record_name,atom_number,blank_1,atom_name,alt_loc,residue_name,blank_2,chain_id,residue_number,insertion,...,x_coord,y_coord,z_coord,occupancy,b_factor,blank_4,segment_id,element_symbol,charge,line_idx
0,ATOM,1,,N,,GLY,,A,1,,...,-21.262,5.329,-1.973,1.0,72.0,,,N,,1
1,ATOM,2,,CA,,GLY,,A,1,,...,-20.528,5.92,-0.866,1.0,72.0,,,C,,2
2,ATOM,3,,C,,GLY,,A,1,,...,-19.28,6.661,-1.305,1.0,72.0,,,C,,3
3,ATOM,4,,O,,GLY,,A,1,,...,-18.855,6.545,-2.456,1.0,72.0,,,O,,4
4,ATOM,5,,N,,SER,,A,2,,...,-19.006,7.81,-0.74,1.0,94.07,,,N,,5





HETATM


Unnamed: 0,record_name,atom_number,blank_1,atom_name,alt_loc,residue_name,blank_2,chain_id,residue_number,insertion,...,x_coord,y_coord,z_coord,occupancy,b_factor,blank_4,segment_id,element_symbol,charge,line_idx





ANISOU


Unnamed: 0,record_name,atom_number,blank_1,atom_name,alt_loc,residue_name,blank_2,chain_id,residue_number,insertion,...,"U(1,1)","U(2,2)","U(3,3)","U(1,2)","U(1,3)","U(2,3)",blank_4,element_symbol,charge,line_idx





OTHERS


Unnamed: 0,record_name,entry,line_idx
0,MODEL,1,0
1,TER,788 ASP A 100,788
2,ENDMDL,,789
3,END,,790


## 🔥 Visualize Elements

The "ATOM" df contains our protein structure information.
Let's visualize this 3D structure and the elements using Plotly.

In [10]:
def plot_3d_structure(color, title):
    fig = px.scatter_3d(
        pdb.df["ATOM"],
        x='x_coord',
        y='y_coord',
        z='z_coord',
        color=color
    )

    fig.update_traces(marker=dict(size=4))
    fig.update_layout(
        template="ggplot2",
        height=800,
        title=title
    )

    return fig


In [11]:
fig = plot_3d_structure(color='element_symbol', title='3D Structure and Elements')
fig.show()

## 💧 Visualize Residue Number

We could also visualize the residue_number.


In [12]:
fig = plot_3d_structure(color='residue_number', title='3D Structure and Residue Number')
fig.show()

## 📛 Visualize Residue Name

We could also visualize the residue_name.

In [13]:
fig = plot_3d_structure(color='residue_name', title='3D Structure and Residue Name')
fig.show()

## 🅱️ Visualize B Factor

We could visualize the b_factor.


In [14]:
fig = plot_3d_structure(color='b_factor', title='3D Structure and B Factor')
fig.show()

## ⚛️ Visualize Atom Name

Or we could visualize the atom_name.

In [15]:
fig = plot_3d_structure(color='atom_name', title='3D Structure and Atom Name')
fig.show()