<a href="https://colab.research.google.com/github/kiharalab/DeepMainMast/blob/main/DeepMainMast_Single_chain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DeepMainMast (Single Chain)

<img src="DeepMainMast_Logo.png" height=300 width=300/>

**License**: GPL v3 for academic use. (For commercial use, please contact us for different licensing)

**Contact**: Daisuke Kihara (dkihara@purdue.edu)

**Cite this work**: "Integrated Protocol of Protein Structure Modeling for cryo-EM with Deep Learning and Structure Prediction, Genki Terashi, Xiao Wang, Devashish Prasad, and Daisuke Kihara, In submission (2023)"

# Run DeepMainMast pipeline (7 steps)

1. <u>Install dependencies</u>: Install libraries and download the models/scripts
2. <u>Input files </u>: Upload the cryo-em map file (.map and or .mrc), Sequence file (.fasta) and Alphafold file (.pdb)
3. <u>Configure parameters</u>: Emap Countour level, Number of seconds limit and Number of final models 
4. <u>Execute DeepMainMast</u>: Execute the DeepMainMast pipeline sequentially with the specified configuration and input.
5. <u> Visualize pre rosetta</u>: Visualize predicted models before the application of RosettaCM
6. <u>Execute Rosetta</u>: Execute Rosetta on deepmainmast predcitions to get final protein structure.
7. <u>Scoring and Ranking</u>: Score the predicted protein structures and rank them according to the scores.
8. <u>Visualization</u>: Visualize the final ranked structures.

Some steps in the pipeline require a lot of time to compute (specifically the rosetta step). You can specify number of seconds as time limit for some of these compute intensive steps in the input text boxes.<br>

Please make sure the notebook is already connected to **GPU**, DeepMainMast needs GPU support to run.<br>
Click the right top button **"connect"**, then the notebook will automatically connect to a gpu machine

### DeepMainMast can also be found at

Code Ocean Capsule:
- [Single Chain](https://codeocean.com/capsule/0866386/tree)
- [Multi Chain](https://codeocean.com/capsule/9358532/tree)

Colab Notebooks:
- [Single Chain](https://colab.research.google.com/github/kiharalab/DeepMainMast/blob/main/DeepMainMast_Single_chain.ipynb) <-- current notebook
- [Multi Chain](https://colab.research.google.com/github/kiharalab/DeepMainMast/blob/main/DeepMainMast_Multi_chain.ipynb)

Github:
- [DeepMainMast Repo](https://github.com/kiharalab/DeepMainMast)


In [None]:
#@title 1. Install dependencies <a name="Dependency"></a>
#@markdown We will download github repo and some large binary files. Then we will install dependencies using pip. And then we will download and compile some libraries using GCC compiler<br><br>
#@markdown This cell may take about 5 mins to run

print("Downloading the github repo and large binaries ... ")
# Download git directory with source code
!git clone --quiet https://github.com/kiharalab/DeepMainMast.git &>> /content/output

# Download some big files from kiharalab website
!wget -q https://kiharalab.org/emsuites/deepmainmast/dmmbin/partial_thread.static.linuxgccrelease &>> /content/output
!wget -q https://kiharalab.org/emsuites/deepmainmast/dmmbin/rosetta_scripts.static.linuxgccrelease &>> /content/output

!mv /content/partial_thread.static.linuxgccrelease /content/DeepMainMast/dmmsinglechain/results 
!mv /content/rosetta_scripts.static.linuxgccrelease /content/DeepMainMast/dmmsinglechain/results 
print("Done downloading the github repo and large binaries\n")

print("Installing python dependencies ... ")
# Install python dependencies using pip 
!pip install -r /content/DeepMainMast/dmmsinglechain/requirements.txt &>> /content/output
!pip install nglview BioTEMPy py3Dmol==2.0.0.post2 tensorboard==2.11 tensorflow-cpu==2.6.0 &>> /content/output
print("Done installing python dependencies\n")

# Install fftw library
print("Installing the fftw library\n")
!apt-get -qq install -y libfftw3-3 libfftw3-dev libfftw3-doc &>> /content/output
print("Done installing the fftw library\n")

print("Installing gcc dependencies ... ")
# Build some libraries from scratch using gcc
%cd "/content/DeepMainMast/dmmsinglechain/src/DAQscore_Unet_src"
!rm *.o
!make &>> /content/output
!mv "/content/DeepMainMast/dmmsinglechain/src/DAQscore_Unet_src/DAQscore_Unet" "/content/DeepMainMast/dmmsinglechain/bin/DAQscore_Unet"

%cd "/content/DeepMainMast/dmmsinglechain/src/MAINMAST_UnetAF2_src"
!rm *.o
!make &>> /content/output
!mv "/content/DeepMainMast/dmmsinglechain/src/MAINMAST_UnetAF2_src/MainmastC_UnetAF2" "/content/DeepMainMast/dmmsinglechain/bin/MainmastC_UnetAF2"

%cd "/content/DeepMainMast/dmmsinglechain/src/MAINMAST_UnetAssembleMtx_src"
!rm *.o
!make &>> /content/output
!mv "/content/DeepMainMast/dmmsinglechain/src/MAINMAST_UnetAssembleMtx_src/MainmastC_UnetAssembleMtx" "/content/DeepMainMast/dmmsinglechain/bin/MainmastC_UnetAssembleMtx"

%cd "/content/DeepMainMast/dmmsinglechain/src/MAINMAST_UnetProbPoints_src"
!rm *.o
!make &>> /content/output
!mv "/content/DeepMainMast/dmmsinglechain/src/MAINMAST_UnetProbPoints_src/MainmastC_Unet_node" "/content/DeepMainMast/dmmsinglechain/bin/MainmastC_Unet_node"

%cd "/content/DeepMainMast/dmmsinglechain/src/VESPER_Power_colab_src"
!rm *.o
!make &>> /content/output
!mv "/content/DeepMainMast/dmmsinglechain/src/VESPER_Power_colab_src/VESPER_Power" "/content/DeepMainMast/dmmsinglechain/bin/VESPER_Power"
print("Done installing gcc dependencies\n")

from google.colab import output
output.enable_custom_widget_manager()

Downloading the github repo and large binaries ... 
Done downloading the github repo and large binaries

Installing python dependencies ... 
Done installing python dependencies

Installing the fftw library

Done installing the fftw library

Installing gcc dependencies ... 
/content/DeepMainMast/dmmsinglechain/src/DAQscore_Unet_src
/content/DeepMainMast/dmmsinglechain/src/MAINMAST_UnetAF2_src
/content/DeepMainMast/dmmsinglechain/src/MAINMAST_UnetAssembleMtx_src
/content/DeepMainMast/dmmsinglechain/src/MAINMAST_UnetProbPoints_src
/content/DeepMainMast/dmmsinglechain/src/VESPER_Power_colab_src
Done installing gcc dependencies



In [None]:
#@title 2. Input Files

#@markdown You need to upload a cryo-EM map file (.mrc or .map), sequence file (.fasta) and optional Alphafold file (.pdb).

#@markdown You can tick the below checkbox to use a default input files as an example <br> Otherwise, please untick the checkbox and run the subsequent cells to upload your files.   

#@markdown Lastly, also provide a job name for running this job with the specified input.

use_author_example = True #@param {type:"boolean"}
job_name = "myjob" #@param {type:"string"}

if use_author_example:
  job_name = "6sper"
  use_AF_Model = True

In [None]:
#@title 2.1 Input EM map file

#@markdown <br> **Support file format: .map and/or .mrc**

from google.colab import files
import os
import os.path
import re
import hashlib
import random
import string

input_map = ""

if not use_author_example:
  upload_dir = f"/content/DeepMainMast/dmmsinglechain/data/{job_name}"
  if not os.path.exists(upload_dir):
    os.mkdir(upload_dir)
  os.chdir(upload_dir)
  map_input = files.upload()
  for fn in map_input.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(map_input[fn])))
    input_map = os.path.abspath(fn)
    print("Map saved in %s"%upload_dir)
  rename_path = upload_dir + "/input.map"
  !mv $input_map $rename_path
else:
  print("you have chosen to use author's example, you can not upload map files any more.")

you have chosen to use author's example, you can not upload map files any more.


In [None]:
#@title 2.2 Input Seq.fasta file <a name="Map"></a>

#@markdown <br> **Support file format: .fasta**

from google.colab import files
import os
import os.path
import re
import hashlib
import string

seq_file = ""

if not use_author_example:
  upload_dir = f"/content/DeepMainMast/dmmsinglechain/data/{job_name}"
  if not os.path.exists(upload_dir):
    os.mkdir(upload_dir)
  os.chdir(upload_dir)
  map_input = files.upload()
  for fn in map_input.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(map_input[fn])))
    seq_file = os.path.abspath(fn)
    print("Map saved in %s"%upload_dir)
  !mv $seq_file $upload_dir"/seq.fasta"
else:
  print("you have chosen to use author's example, you can not upload map files any more.")

you have chosen to use author's example, you can not upload map files any more.


In [None]:
#@title 2.3 Input Alphafold model [Optional]

#@markdown Please tick the below checkbox and run the cell to upload your alphafold .pdb file. Otherwise, please untick the checkbox and run this cell to make deepmainmast pipeline proceed without any alphafold model. <br> 

use_AF_Model = True #@param {type:"boolean"}

#@markdown <br> **Support file format: .pdb**

af_model_file = ""

if use_AF_Model and not use_author_example:
  upload_dir = f"/content/DeepMainMast/dmmsinglechain/data/{job_name}/"
  if not os.path.exists(upload_dir):
    os.mkdir(upload_dir)
  os.chdir(upload_dir)
  map_input = files.upload()
  for fn in map_input.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(map_input[fn])))
    af_model_file = os.path.abspath(fn)

    print("Map saved in %s"%upload_dir)
  !mv $af_model_file $upload_dir"/af2_model.pdb"
elif not use_AF_Model:
  print("You have chosen not to upload .pdb files for alphafold.")
else:
  print("You have chosen to use author's example, you can not upload .pdb files any more.")

if use_AF_Model == True:
  use_AF_Model = 1

You have chosen to use author's example, you can not upload .pdb files any more.


In [None]:
#@title 3 Configure parameters
#@markdown <br> 1. Specify the recommended contour level for your map in contour_level variable.
#@markdown If your recommended contour level is 2.5 then enter 2.5 in the input filed below
contour_level = 0.02 #@param {type:"number"}

#@markdown <br> 2. Some steps in DeepMainMast take a lot of time to compute. You can specify the number of seconds to be used per iteration for these compute heavy steps using num_seconds variable. If you keep value num_seconds to be very low then sometimes DeepMainMast might not work correctly. **It is recommended to start with num_seconds=90 for your new map. But even after setting it to 90 gets you an error or DeepMainMast does not generate resulting files then it probably means that DeepMainMast needs more time to process the map and you need to increase num_seconds and run the notebook again**.
num_seconds = 40 #@param {type:"number"}

#@markdown <br> 3. Specify the number of final models (.pdb files) to be generated in /results/ranked folder. There are 4 algorithms (DM only, AF only, all, Vesper) which is given as input to the Rosetta step that generates several models for each of the algorithm. If you increase the value of num_models variable then your topmost resulting .pdb files in ranked folder (i.e. rank1.pdb) might be more accurate and vice versa. However, increasing number of final models will make Rosetta take more time for computation and it is the most compute heavy step.
num_models = 1 #@param {type:"number"}


In [None]:
#@title 4. Execute DeepMainMast (Single Chain)

%cd "/content/DeepMainMast/dmmsinglechain/"

!chmod +x *
!chmod -R 777 /content/DeepMainMast/

print("Running Single chain DeepMainMast ...")
!./run_dmm_colab.sh $job_name $num_seconds $contour_level $use_AF_Model &>> /content/output
print("Done running Single chain DeepMainMast\n")

/content/DeepMainMast/dmmsinglechain
Running Single chain DeepMainMast ...
Done running Single chain DeepMainMast



In [None]:
#@title 5. Visualize pre rosetta

from google.colab import output
output.enable_custom_widget_manager()

#@markdown <b>Issue with Rosetta</b> <br>
#@markdown We noticed that often RosettaCM distorts a structure model, which tends to happen when the map resolution is low. And for this reason, we also provide models before the application of RosettaCM in /content/DeepMainMast/dmmsinglechain/results/\<your job name\>/pre_rosetta_ranked folder.

rank = "1" #@param {type:"string"}

pdb_file = f"/content/DeepMainMast/dmmsinglechain/results/{job_name}/pre_rosetta_ranked/rank{rank}.pdb"

from Bio.PDB import PDBParser
from Bio.PDB.Polypeptide import PPBuilder
import nglview as nv

parser = PDBParser()
structure = parser.get_structure('my_protein', pdb_file)

# create an NGLViewer widget and add the structure to it
view = nv.show_biopython(structure)

# display the widget
view



NGLWidget()

In [None]:
#@title 6. Execute Rosetta

print("Running Rosetta ...")
!./run_rosetta_colab.sh $job_name $num_models &>> /content/output
print("Done running Rosetta\n")

In [None]:
#@title 7. Scoring and Ranking

print("Running scoring and ranking ...")
!./run_rank_colab.sh $job_name $num_models &>> /content/output
print("Done running scoring and ranking\n")


Running scoring and ranking ...
Done running scoring and ranking



## Note

After scoring and ranking step execution is complete you can download the final predicted structure (.pdb files) of your input map under /content/DeepMainMast/dmmsinglechain/results/\<your_job_name\>/ranked/ folder. Depending on the number of models parameter you have set in the configuration DeepMainMast will generate several files and rank them accordinf to DAQ and DOT scores (rank0 being the most accurate, rank1 being second best and so on).

In [None]:
#@title 8. Visualization

import py3Dmol
import glob
import matplotlib.pyplot as plt

#@markdown You can visualize the predicted .pdb files from the /content/DeepMainMast/dmmsinglechain/results/\<your_job_name\>/ranked/ folder.

rank = "1" #@param {type:"string"}
# pdb_filename = "S_singletgt_0001.pdb" #@param ["S_singletgt_0001.pdb", "S_singletgt_0002.pdb", "S_singletgt_0003.pdb", "S_singletgt_0004.pdb", "S_singletgt_0005.pdb"]
show_sidechains = True #@param {type:"boolean"}
show_mainchains = True #@param {type:"boolean"}

pdb_file = f"/content/DeepMainMast/dmmsinglechain/results/{job_name}/ranked/rank{rank}.pdb"
# pdb_file = f"/content/rank1.pdb"
def show_pdb(show_sidechains=False, show_mainchains=False):
  view = py3Dmol.view(js='https://3dmol.org/build/3Dmol.js',)
  view.addModel(open(pdb_file,'r').read(),'pdb')
  view.setStyle({'cartoon': {'color':'spectrum'}})
  if show_sidechains:
    BB = ['C','O','N']
    view.addStyle({'and':[{'resn':["GLY","PRO"],'invert':True},{'atom':BB,'invert':True}]},
                        {'stick':{'colorscheme':f"WhiteCarbon",'radius':0.3}})
    view.addStyle({'and':[{'resn':"GLY"},{'atom':'CA'}]},
                        {'sphere':{'colorscheme':f"WhiteCarbon",'radius':0.3}})
    view.addStyle({'and':[{'resn':"PRO"},{'atom':['C','O'],'invert':True}]},
                        {'stick':{'colorscheme':f"WhiteCarbon",'radius':0.3}})  
  if show_mainchains:
    BB = ['C','O','N','CA']
    view.addStyle({'atom':BB},{'stick':{'colorscheme':f"WhiteCarbon",'radius':0.3}})
  view.zoomTo()
  return view

show_pdb(show_sidechains, show_mainchains).show()