# RoseTTAFold Inference Pipeline for Protein Folding.

This notebook acts as a guide on how to use RoseTTAFold for protein folding using NVIDIA GPUs.

## Introduction

RoseTTAFold takes an aminoacid sequence and returns a [PDB](https://en.wikipedia.org/wiki/Protein_Data_Bank) file with atoms and their 3D coordinates. Having just an input sequence is often insufficient to provide an accurate structure prediction - to improve the accuracy, RoseTTAFold can take additional inputs in form of [MSA](https://en.wikipedia.org/wiki/Multiple_sequence_alignment), [Secondary structures](https://proteopedia.org/wiki/index.php/Structural_templates#Secondary_structure_elements), and [Templates](https://proteopedia.org/wiki/index.php/Structural_Templates). Those additional inputs are derived from the input sequence, and prepared automatically through the database lookup. Please make sure you downloaded the required databases in the Quick Start Guide.



Let's import a pipeline launcher that will allow us to execute a prediction pipeline:

In [1]:
from pipeline_utils import execute_pipeline, cleanup, display_pdb

We can provide an input in two ways:
1. As a path to a FASTA file containing a sequence
2. As a string sequence itself.

We have provided an example FASTA file in `example/input.fa`. Running the pipeline is as simple as putting that path as a parameter to the `execute_pipeline` function. The example sequence is:
`MAAPTPADKSMMAAVPEWTITNLKRVCNAGNTSCTWTFGVDTHLATATSCTYVVKANANASQASGGPVTCGPYTITSSWSGQFGPNNGFTTFAVTDFSKKLIVWPAYTDVQVQAGKVVSPNQSYAPANLPLEHHHHHH`

## Running the pipeline

The pipeline has the following steps:
1. Run HHblits - the script is looking for matching MSAs in the `UniRef30` database.
2. Run PSIPRED - the script is looking for matching Secondary Structures 
3. Run hhsearch - the script is looking for matching Templates
4. Run iterative structure refinement with SE(3)-Transformer.

In [2]:
execute_pipeline("example/input.fa")

Running inference on 1078 Tsp1, Trichoderma virens, 138 residues|

Running HHblits - looking for the MSAs
Running PSIPRED - looking for the Secondary Structures
Running hhsearch - looking for the Templates
Running end-to-end prediction


Using backend: pytorch


Done.
Output saved as /results/output.e2e.pdb


If the script has finished successfully, there should be an output PDB file present in the `/results` directory.

## Result visualization

Install visualization packages:

In [3]:
!jupyter labextension install jupyterlab_3dmol



Building jupyterlab assets (build:prod:minimize)


Load the output file and display it using `py3Dmol`. Please note that the image is interactive - it can be manipulated using the mouse.

In [4]:
display_pdb("/results/output.e2e.pdb")

## Clean up

Make sure you remove temporary files before you run another pipeline:

In [5]:
cleanup()