<a href="https://colab.research.google.com/github/afvallejo/CSHL2022/blob/main/gget_alphafold.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [gget alphafold](https://github.com/pachterlab/gget) demonstration
Predict the 3D structure of a protein from its amino acid sequence using a simplified version of [DeepMind](https://www.deepmind.com/)’s [AlphaFold2](https://github.com/deepmind/alphafold) originally released and benchmarked for [AlphaFold Colab](https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb). To increase speed, set "Runtime" -> "Change runtime type" -> "Hardware accelerator" -> "GPU". Also see: [ColabFold](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb).

Written by Laura Luebbert.
___

Install and import gget:

In [1]:
!pip install -q -U gget 
import gget

[K     |████████████████████████████████| 2.2 MB 7.3 MB/s 
[K     |████████████████████████████████| 128 kB 66.2 MB/s 
[K     |████████████████████████████████| 25.2 MB 1.4 MB/s 
[K     |████████████████████████████████| 1.6 MB 45.4 MB/s 
[?25h

In [2]:
# For pretty plots
%config InlineBackend.figure_format='retina'

Use Miniconda to install OpenMM (to install openmm v7.5.1 on your local machine, run `conda install -c conda-forge openmm=7.5.1` from the command-line):

In [3]:
import sys
%shell rm -rf /opt/conda
%shell wget -q -P /tmp \
  https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
    && bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda \
    && rm /tmp/Miniconda3-latest-Linux-x86_64.sh
PATH=%env PATH
%env PATH=/opt/conda/bin:{PATH}
%shell conda install -qy -c conda-forge python=3.7 openmm=7.5.1 
# Add to path
sys.path.append('/opt/conda/lib/python3.7/site-packages')

PREFIX=/opt/conda
Unpacking payload ...
Collecting package metadata (current_repodata.json): - \ | done
Solving environment: - \ done

## Package Plan ##

  environment location: /opt/conda

  added / updated specs:
    - _libgcc_mutex==0.1=main
    - _openmp_mutex==4.5=1_gnu
    - brotlipy==0.7.0=py39h27cfd23_1003
    - ca-certificates==2022.3.29=h06a4308_1
    - certifi==2021.10.8=py39h06a4308_2
    - cffi==1.15.0=py39hd667e15_1
    - charset-normalizer==2.0.4=pyhd3eb1b0_0
    - colorama==0.4.4=pyhd3eb1b0_0
    - conda-content-trust==0.1.1=pyhd3eb1b0_0
    - conda-package-handling==1.8.1=py39h7f8727e_0
    - conda==4.12.0=py39h06a4308_0
    - cryptography==36.0.0=py39h9ce1e76_0
    - idna==3.3=pyhd3eb1b0_0
    - ld_impl_linux-64==2.35.1=h7274673_9
    - libffi==3.3=he6710b0_2
    - libgcc-ng==9.3.0=h5101ec6_17
    - libgomp==9.3.0=h5101ec6_17
    - libstdcxx-ng==9.3.0=hd4cf53a_17
    - ncurses==6.3=h7f8727e_2
    - openssl==1.1.1n=h7f8727e_0
    - pip==21.2.4=py39h06a4308_0

Install third-part dependencies and download AlphaFold model parameters using `gget setup`:

In [4]:
gget.setup("alphafold")

#=#=#                                                                                                                                                    0.0%                                                                           0.5%                                                                           1.2%#                                                                          1.9%#                                                                          2.7%##                                                                         2.9%##                                                                         3.8%###                                                                        4.6%###                                                                        5.3%####                                                                       6.1%####                                                                       6.7%#####                                   

Predict a protein structure:

In [5]:
# Show gget alphafold manual
help(gget.alphafold)

Help on function alphafold in module gget.gget_alphafold:

alphafold(sequence, out='2022_08_23-2350_gget_alphafold_prediction', relax=False, plot=True, show_sidechains=True)
    Predicts the structure of a protein using a slightly simplified version of AlphaFold v2.1.0 (https://doi.org/10.1038/s41586-021-03819-2)
    published in the AlphaFold Colab notebook (https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb).
    
    Args:
      - sequence          Amino acid sequence (str), a list of sequences, or path to a FASTA file.
      - out               Path to folder to save prediction results in (str).
                          Default: "./[date_time]_gget_alphafold_prediction"
      - relax             True/False whether to AMBER relax the best model (default: False).
      - plot              True/False whether to provide a graphical overview of the prediction (default: True).
      - show_sidechains   True/False whether to show side chains i

Predict the structure of CASP14 target [T1024](https://predictioncenter.org/casp14/target.cgi?id=8&view=all):

In [None]:
gget.alphafold("MAAHKGAEHHHKAAEHHEQAAKHHHAAAEHHEKGEHEQAAHHADTAYAHHKHAEEHAAQAAKHDAEHHAPKPH")

Using the single-chain model.


Jackhmmer search:  50%|█████     | 74/147 [elapsed: 12:47 remaining: 13:38]

Predict the 3D structure of an engineered fluorescent nicotine sensor ([PDB 7S7U](https://www.rcsb.org/structure/7S7U)):

In [None]:
gget.alphafold(
    "MHHHHHHGAQPARSANDTVVVGSINFTEGIIVANMVAEMIEAHTDLKVVRKLNLGGENVNFEAIKRGGANNGIDIYVEYTGHGLVDILGFPEPNVYITADKQKNGIKANFKIRHNMEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSVLSKDPNEKRDHMVLLEFVTAAGITLGMDELYKGGTGGSMSKGEELFTGVVPILVELDGGVNGHKFSVRGEGEGDATNGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTISFKDDGTYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNFPPPSSTDPEGAYETVKKEYKRKWNIVWLKPLGFNNTYTLTVKDELAKQYNLKTFSDLAKISDKLILGATMFFLEGPDGYPGLQKLYNFKFKHTKSMDMGIRYTAIDNNEVQVIDAWATDGLLVSHKLKILEDDKAFFPPYYAAPIIRQDVLDKHPELKDVLNKLANQISLEEMQKLNYKVDGEGQDPAKVAKEFLKEKGLILQVD",
    show_sidechains=False
    )

Predict the structure of the nicotine sensor 7S7U passed as a multimer instead of a single sequence (this takes significanly more time since the MSA needs to be built for each sequence separately):

In [None]:
gget.alphafold(
    [
        "MHHHHHHGAQPARSANDTVVVGSINFTEGIIVANMVAEMIEAHTDLKVVRKLNLGGENVNFEAIKRGGANNGIDIYVEYTGHGLVDILGFPEP",
        "NVYITADKQKNGIKANFKIRHNMEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSVLSKDPNEKRDHMVLLEFVTAAGITLGMDELYKGGTGGSMSKGEELFTGVVPILVELDGGVNGHKFSVRGEGEGDATNGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTISFKDDGTYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNFPP",
        "PSSTDPEGAYETVKKEYKRKWNIVWLKPLGFNNTYTLTVKDELAKQYNLKTFSDLAKISDKLILGATMFFLEGPDGYPGLQKLYNFKFKHTKSMDMGIRYTAIDNNEVQVIDAWATDGLLVSHKLKILEDDKAFFPPYYAAPIIRQDVLDKHPELKDVLNKLANQISLEEMQKLNYKVDGEGQDPAKVAKEFLKEKGLILQVD"
     ],
    show_sidechains=False
)

Download folders created by gget alphafold:

In [None]:
from google.colab import files

# Zip all folders into one file
!zip -r gget_alphafold_predictions.zip *_gget_alphafold_prediction

# Download zipped file
files.download('/content/gget_alphafold_predictions.zip')