<a href="https://colab.research.google.com/github/phenix-project/Colabs/blob/main/alphafold2/AlphaFoldWithDensityMap.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### <center> <b> <font color='black'>  AlphaFold with a density map </font></b> </center>

<font color='green'>This notebook integrates Phenix model rebuilding with AlphaFold to improve AlphaFold modeling.  You upload a sequence and a density map (ccp4/mrc format) and it carries out cycles of AlphaFold modeling, rebuilding with the density map, and AlphaFold modeling with the rebuilt model as a template. In each cycle you get a new AlphaFold model and a rebuilt model.

To understand how this all works see the Phenix tutorial video ["AlphaFold changes everything"](https://youtu.be/9IExeA_A8Xs) and the [BioRxiv preprint](https://www.biorxiv.org/content/10.1101/2022.01.07.475350v2) on using AlphaFold with a density map.

You can run a demo of any one of 25 structures by selecting one in the second cell.  You then only need to select the demo and enter the Phenix download password.

This notebook is derived from [ColabFold](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) and the DeepMind [AlphaFold2 Colab](https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb).
</font>

-----------------
<b> <font color='black'> <center>Instructions for a simple run:</center>
</font></b> 

1. Run the first cell to install condacolab and reboot the virtual machine. You need to do this <b><i>before</i></b> using <b><i>Run all</i></b> in step 3.

2.  Select the "Basic Inputs" cell, type in a sequence, resolution, jobname, and Phenix download password in the form in the first cell. You can also edit the Options in the next cell if you want.

3. Start your run by going up to the <b><i>Runtime</i></b> pulldown menu and selecting <b><i>Run all</i></b>

4. Scroll down the page and follow what is going on.  If necessary, upload your map file when the Upload button appears below the "Setting up input files" form. If you use Google drive
for your input and output files you will be asked for permission.

5. See the helpful hints at the bottom of the page for more details and advanced notes.


</font>
-----------------
<b> <font color='black'> <center>Please cite the ColabFold and AlphaFold2 papers if you use this notebook:</center>
</font></b> 

- <font color='green'>[Mirdita, M.,  Ovchinnikov, S., Steinegger, M.(2021). ColabFold - Making protein folding accessible to all *bioRxiv*, 2021.08.15.456425](https://www.biorxiv.org/content/10.1101/2021.08.15.456425v2)</font> 

- <font color='green'> [Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021)](https://www.nature.com/articles/s41586-021-03819-2)
</font>
-----------------


In [None]:
#@title 1. Hit the triangle <b>Run</b> button to the left to install condacolab and reboot the virtual machine.  

#@markdown  You can edit the forms below while it is rebooting.

#@markdown In 30 sec you get 3 messages about a crash (because of the reboot).
#@markdown Close the last one and you are ready to go with <b><i>Runtime</i></b> / <b><i>Run all</i></b>

# https://github.com/conda-incubator/condacolab

# Get the helper python files
import os
os.chdir("/content/")
file_name = 'phenix_colab_utils.py'
if os.path.isfile(file_name):
  os.remove(file_name)
os.environ['file_name'] = file_name
result = os.system("wget -qnc https://raw.githubusercontent.com/phenix-project/Colabs/main/alphafold2/$file_name")

print("About to install condacolab...ignore crash messages")
import phenix_colab_utils as cu
cu.get_helper_files()  # get all the other helper files
cu.clear_python_caches()
cu.install_condacolab()
!touch STEP_1
print("Ready with condacolab installed...close the crash message")

In [None]:
#@title 2. Basic inputs (Required)
#@markdown Select this cell, then enter sequence of chain to predict (at least 20 residues), resolution, name of this job, and Phenix download password 

import os
if not os.path.isfile("STEP_1"):
  raise AssertionError("Please run step 1 first")

os.chdir("/content")
from phenix_colab_utils import exit, get_map_name, get_demo_info, set_up_demo

if not os.path.isfile("STEP_1"):
  exit("Please run step 1 first...")

sequence = 'TNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAP GQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPL QSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPK' #@param {type:"string"}
resolution = 3.44 #@param {type:"number"}
jobname = '7lx5' #@param {type:"string"}
phenix_download_password='' #@param {type:"string"}
query_sequence = sequence
password = phenix_download_password

#@markdown You can run a demo if you want. Select the structure and just put in the download password. 

demo_to_run = "None (Demos with boxed maps, approx timings are for Colab; Pro is 2x faster, Pro+ 3x)" #@param ['None (Demos with boxed maps, approx timings are for Colab; Pro is 2x faster, Pro+ 3x)','7mjs (EMDB 23883, 3.03 A, 132 residues, 2 hours)',  '7kzz (EMDB 23093, 3.42 A, 281 residues, 3 hours)',  '7m9c (EMDB 23723, 4.2 A, 257 residues, 3 hours)',  '7lx5 (EMDB 23566, 3.44 A, 196 residues, 4 hours)',  '7brm (EMDB 30160, 3.6 A, 257 residues, 5 hours)',  '7m7b (EMDB 23709, 2.95 A, 209 residues, 6 hours)',  '7c2k (EMDB 30275, 2.93 A, 927 residues, 10 hours, requires Colab Pro)',  '7mlz (EMDB 23914, 3.71 A, 196 residues)',  '7ev9 (EMDB 31325, 2.6 A, 382 residues)',  '7lc6 (EMDB 23269, 3.7 A, 557 residues)',  '7lci (EMDB 23274, 2.9 A, 393 residues)',  '7bxt (EMDB 30237, 4.2 A, 103 residues)',  '7eda (EMDB 31062, 2.78 A, 334 residues)',  '7ku7 (EMDB 23035, 3.4 A, 269 residues)',  '7l1k (EMDB 23110, 3.16 A, 149 residues)',  '7l6u (EMDB 23208, 3.3 A, 311 residues)',  '7ls5 (EMDB 23502, 2.74 A, 243 residues)',  '7lsx (EMDB 23508, 3.61 A, 245 residues)',  '7lv9 (EMDB 23530, 4.5 A, 97 residues)',  '7lvr (EMDB 23541, 2.9 A, 441 residues)',  '7mby (EMDB 23750, 2.44 A, 339 residues)',  '7me0 (EMDB 23786, 2.48 A, 347 residues)',  '7msw (EMDB 23970, 3.76 A, 635 residues)',  '7n8i (EMDB 24237, 3 A, 106 residues)',  '7rb9 (EMDB 24400, 3.76 A, 372 residues)'] {type:"string"}
if demo_to_run != "None":
  jobname, query_sequence,resolution = set_up_demo(demo_to_run)
  is_demo = True
else:
  is_demo = False

# Check for required inputs
if not password:
  exit("Please supply a Phenix download password")
if not query_sequence:
  exit("Please supply a query sequence")
if not resolution:
  exit("Please supply a resolution")
if not jobname:
  exit("Please supply a job name")

# Save all parameters in a dictionary
params = {}
for p in ['resolution','jobname', 'password', 'query_sequence']:
  params[p] = locals().get(p,None)
! touch STEP_2
! rm -f STEP_3


In [None]:
#@title 3. Options (Run without changes for a simple job)

import os
from phenix_colab_utils import exit
if not os.path.isfile("STEP_2"):
  exit("Please run step 2 first...")

#@markdown Check if you want your ouputs saved to the directory <b>ColabOutputs</b> on Google drive
save_outputs_in_google_drive = True #@param {type:"boolean" }

#@markdown If your maps and models are uploaded, fill in name of directory containing just these files here
#@markdown (usually put them in <b>ColabInputs</b>). Skip parts of the file name like /content/ or MyDrive/). Leave blank to upload directly</i></b>
input_directory = "ColabInputs" #@param {type:"string"}
if is_demo and input_directory != "ColabInputs":
  exit("For a demo the input_directory must be ColabInputs")

#@markdown Choose what templates to include (those from the PDB are based on sequence 
#@markdown similarity):
include_templates_from_pdb = False #@param {type:"boolean" }
maximum_templates_from_pdb =  20#@param {type:"integer"}
upload_manual_templates = False #@param {type:"boolean" }

#@markdown Specify whether any uploaded templates have the correct sequence 
#@markdown (if not checked, only used as suggestions for rebuilding and not as AlphaFold templates)</font></i></b>
uploaded_templates_have_exact_sequence = True #@param {type:"boolean" }
uploaded_templates_are_map_to_model = (not uploaded_templates_have_exact_sequence)


#@markdown Cycles to run:
maximum_cycles =  10#@param {type:"integer"}

#@markdown Version of Phenix to use:
phenix_version ='dev-4517' #@param {type:"string"}
version = phenix_version  # rename variable

#@markdown Specify if you want to run a series of jobs by uploading a file with one jobname, resolution and sequence per line</i></b>
upload_file_with_jobname_resolution_sequence_lines = False #@param {type:"boolean"}
if is_demo and upload_file_with_jobname_resolution_sequence_lines:
  exit("For a demo upload_file_with_jobname_resolution_sequence_lines must be False")

#@markdown Specify how to use multiple sequence alignment information</i></b>
msa_use = 'Use MSA throughout' #@param ["Use MSA throughout", "Use MSA in first cycle","Skip all MSA"]
# Set actual parameters

#@markdown Specify maximum randomizations to carry out):
random_seed_iterations =  50#@param {type:"integer"}
random_seed = 581867 #@param {type:"integer"}

#@markdown Turn on debugging</i></b>
debug = False #@param {type:"boolean"}

#@markdown Carry on from where you left off (requires using Google drive)
carry_on = False #@param {type: "boolean"}

# We are going to get these from uploaded file...
if upload_file_with_jobname_resolution_sequence_lines:
  params['jobname'] = None
  params['resolution'] = None
  params['sequence'] = None

if msa_use == "Use MSA throughout":
  skip_all_msa = False 
  skip_all_msa_after_first_cycle = False
elif msa_use == "Use MSA in first cycle":
  skip_all_msa = False 
  skip_all_msa_after_first_cycle = True
else:
  skip_all_msa = True 
  skip_all_msa_after_first_cycle = True

upload_maps = True  # This version expects a map
use_msa = (not skip_all_msa)

minimum_random_seed_iterations = int(max(1,random_seed_iterations//20))
data_dir = "/content"
content_dir = "/content"
# Save parameters
for p in ['content_dir','data_dir','save_outputs_in_google_drive',
    'input_directory','working_directory',
    'include_templates_from_pdb','maximum_templates_from_pdb',
    'upload_manual_templates','uploaded_templates_are_map_to_model',
    'maximum_cycles','version',
    'upload_file_with_jobname_resolution_sequence_lines',
    'use_msa','skip_all_msa_after_first_cycle',
    'upload_maps','debug','carry_on','random_seed',
    'random_seed_iterations','minimum_random_seed_iterations']:
  params[p] = locals().get(p,None)
!touch STEP_3

In [None]:
#@title 4. Setting up input files...
#@markdown You will be asked for permission to use your Google drive if needed.

#@markdown The upload button will appear below this cell if needed

import os
if not os.path.isfile("STEP_3"):
  from phenix_colab_utils import exit
  exit("Please run steps 2-3 again before rerunning step 4...")

# Set up the inputs using the helper python files
from phenix_alphafold_utils import set_up_input_files
params = set_up_input_files(params, convert_to_params = False)
! touch STEP_4
! rm -f STEP_2 STEP_3


In [None]:
#@title 5. Installing Phenix, Alphafold and utilities...
#@markdown This step takes 8 minutes

import os
from phenix_colab_utils import exit

if not os.path.isfile("STEP_4"):
  exit("Please run steps 1-4 first...")

import phenix_colab_utils as cu

# Get tensorflow import before installation
if not locals().get('tf'):
  tf = cu.import_tensorflow()

# Install selected software
cu.install_software(
  bioconda = True,
  phenix = True,
    phenix_version = params.get('version'),
    phenix_password = params.get('password'),
  alphafold = True,
  pdb_to_cif = True
    )
!touch STEP_5

In [None]:
#@title 6. Creating AlphaFold models

import os

from phenix_colab_utils import exit

if not os.path.isfile("STEP_4"):
  exit("Please run steps 2-4 again before rerunning this step...")

if not os.path.isfile("STEP_5"):
  exit("Please run step 5 first...")

! rm -f STEP_2 STEP_3 STEP_4

# Convert params from dict to alphafold_with_density_map params
from phenix_alphafold_utils import get_alphafold_with_density_map_params
params = get_alphafold_with_density_map_params(params)

from run_alphafold_with_density_map import run_jobs

# Working directory
os.chdir(params.content_dir)
run_jobs(params)



In [None]:
#@title Utilities (skipped unless checked)

# Put whatever utilities you want here. They will be run if checked
clear_caches = False #@param {type:"boolean" }
if clear_caches:
  from phenix_colab_utils import clear_python_caches
  clear_python_caches(modules = ['run_alphafold_with_density_map3','run_job','rebuild_model','install_phenix','run_fix_paths','runsh','mk_mock_template','mk_template','hh_process_seq','run_job','get_template_hit_list','run_alphafold_with_density_map','get_template_hit_list','get_cif_file_list','alphafold_utils','get_msa','get_templates_from_drive','phenix_alphafold_utils','phenix_colab_utils','clear_python_caches'])
  from phenix_colab_utils import clear_python_caches
  clear_python_caches()


crash_deliberately_and_restart = False #@param {type:"boolean" }
if crash_deliberately_and_restart:
  print("Crashing by using all memory.  Results in restart, losing everything")
  [1]*10**10

upload_helper_files = False #@param {type:"boolean" }
def get_helper_files():
  import os
  for file_name in ['phenix_colab_utils.py',
      'alphafold_utils.py','run_alphafold_with_density_map.py','phenix_alphafold_utils.py']:
    if os.path.isfile(file_name):
      os.remove(file_name)
    os.environ['file_name'] = file_name
    result = os.system("wget -qnc https://raw.githubusercontent.com/phenix-project/Colabs/main/alphafold2/$file_name")
if upload_helper_files:
  get_helper_files()

remove_everything_and_restart = False #@param {type:"boolean" }
if remove_everything_and_restart:
  !kill -9 -1

auto_reload = False #@param {type:"boolean" }
if auto_reload:
  %load_ext autoreload
  %autoreload 2

**Helpful hints**

**Password**
* Your Phenix download password is the password you get from <a href = "https://phenix-online.org/download" target="_blank"> phenix_online.org/download </a> and that you (or someone from your institution) used to download Phenix. It is updated weekly so you may need to request a new one rather frequently.

**Saving your results**

* You might want to download your results as they appear.  Go to the Folder icon on the left, click on the 3 dots to the right of your file and select "Download".

* If you specify a Google drive input_directory (maybe "ColabInputs"), then your output files will be saved as they are created in a directory called ColabOutputs in your Google drive.

**Carrying on after a timeout or crash**

* If you save your results in your Google drive folder <b>ColabOutputs</b> by specifying a Google drive input_directory, you can continue on after a crash.  You set up the inputs just as you did on the initial run, but check the <b>carry_on</b> box. You then (usually) go through the whole process again (reboot the virtual machine, then <b>Run all</b>).  The notebook will look in your <b>ColabOutputs</b> directory for the files that it is going to create...if it finds them there it will use them instead of creating them again.  If you are lucky you may be able to restart without rebooting...you can try by just selecting <b>Run all</b> again and if it runs you are ok.

**Sequence format**

* Your sequence should contain only the 1-letter code of one protein chain. It can contain spaces if you want.

**File names and jobname must match**
* Your AlphaFold predictions will be named yyyy_ALPHAFOLD_x.pdb
and your rebuilt models yyyy_REBUILT_x.pdb, where yyyy is your jobname and x is the cycle number.

* All model file names must start with 4 characters, optionally followed by "_" and more characters, and must end in ".pdb" or ".cif",  Valid file names are abcd.pdb, abcd.cif, abcd_other.pdb.  Non-valid names are abc.pdb, abcde.cif.

* Your jobname must match the beginnings of your map file names and model file names.  If your jobname is joba then your map file name must look like: joba_xxx.mrc or joba_yyy.ccp4.  Your model file name must look like: joba_mymodel.pdb or joba.cif.  This correspondence is used to match map and model files with jobnames.

**Options for uploading your map file**

* (A) Upload when the Upload button appears at the bottom of the cell after you hit Runtime / Run all in step 3
* (B) Upload in advance to a unique folder in your Google Drive and specify this directory in the entry form.
* (C) as in B but upload to a unique new folder in /content/.  Note that C requires using the command-line tool at the bottom left of the page to create a new directory like MyFiles, uploading with the upload button near the top left of the page, and moving the uploaded file from /content/my_file.mrc to /content/MyFiles/my_file.mrc.

**Uploading a file with all your file information**

* To upload
a file with a jobname, resolution, and sequence on each line, 
check ***upload_file_with_jobname_resolution_sequence_lines*** and hit
the ***Run*** button to the left of the first cell.

* If you upload a file with multiple sequences, each line of the file should have exactly one job name, a space, resolution, and a sequence, like this:

7n8i_24237 2.3 VIWMTQSPSSLSASVGDRVTITCQASQDIRFYLNWYQQKPGKAPKLLISDASNMETGVPSRFSGS

7lvr_23541 3 MRECISIHVGQAGVQIGNACWELYCLEHGIQPDGQMPSDKTIGGGDDSFNTFFSETG

**Randomized tries on first cycle**

* You can specify how many AlphaFold models to try
and build at the start (50 may be a good number unless you have a big structure). Models are scored by plDDT
and the highest-scoring one is kept.  If all the models
have similar plDDT as they are being created the randomization step is discontinued and the best one found is used.

**Try turning off MSA's after first cycle**

* You can encourage AlphaFold to use your rebuilt templates by specifying skip_all_msa_after_first_cycle. This will just use your template information and intrinsic structural information in AlphaFold for all cycles except the first.

* Reproducibility: The tensorflow and AlphaFold2 code will give different results depending on the GPU that is used.
 You can see what GPU you have by opening a cell with the '+Code' button and typing:
 ! nvidia-smi  and then running that cell.
The GPU type will be listed (like Tesla V100-SXM2).
You get a much higher-quality GPU with Colab Pro or Pro+ than with the free version.

**Running cells in this Colab notebook**
* You can step through this notebook one part at a time
by hitting the ***Run*** buttons to the left one at a time. 

* The cell that is active is indicated by a ***Run*** button that has turned into a black circle with a moving black arc

* When execution is done, the ***Run*** button will go back 
to its original white triangle inside a black circle

* You can stop execution of the active cell by hitting its ***Run*** button. It will turn red to indicate it has stopped.

* You can rerun any cell any time that nothing is running.  That means you can go all the way through, then go back to the first cell and enter another sequence and redo the procedure.

* If something goes wrong, the Colab Notebook will print out
an error message.  Usually this will be something telling you
how to change your inputs.  You enter your new inputs and
hit the ***Run*** button again to carry on.

**Possible problems**

* The automatic download may not always work. Normally the
file download starts when the .zip files are created,
but the actual download happens when all the AlphaFold
models are completed.
You can click on the 
folder icon to the left of the window and download your
jobname.zip file manually.  Open and close the file
broswer to show recently-added files.

* Your Colab connection may time out if you go away and
leave it, or if you run for a long time (more than an hour).
If your connection times out you lose everything that
is not yet downloaded. So you might want to download as you go or specify a Google drive input directory.

* The zip file or files will not be automatically downloaded until the very end of the job. 

* Google Colab assigns different types of GPUs with varying amount of memory. Some might not have enough memory to predict the structure for a long sequence.  


**Result zip file contents**

1. Alphafold prediction for each cycle
2. Rebuilt model for each cycle
3. PAE matrix (.jsn) for each cycle
4. PAE and plDDT figures (.png) for each cycle

**Colab limitations**
* While Colab is free, it is designed for interactive work and not-unlimited memory and GPU usage. It will time-out after a few hours and it may check that you are not a robot at random times.  On a time-out you may lose your work. You can increase your allowed time with Colab+

* AlphaFold can crash if it requires too much memory. On a crash you may lose all your work that is not yet downloaded. You can have more memory accessible if you have Colab+. If you are familiar with Colab scripts you can try this [hack](https://towardsdatascience.com/double-your-google-colab-ram-in-10-seconds-using-these-10-characters-efa636e646ff ) with the <b>crash_deliberately_and_restart</b> check-off in the Utilities section to increase your memory allowance.


**Description of the plots**

*   **Number of sequences per position** - Look for at least 30 sequences per position, for best performance, ideally 100 sequences.
*   **Predicted lDDT per position** - model confidence (out of 100) at each position. The higher the better.
*   **Predicted Alignment Error** - For homooligomers, this could be a useful metric to assess how confident the model is about the interface. The lower the better.

**Updates**

- <b> <font color='green'>2022-01-25 Includes integrated rebuilding and AlphaFold2 modeling


**Acknowledgments**

- <b> <font color='green'>This notebook is based on the very nice notebook from ColabFold ([Mirdita et al., *bioRxiv*, 2021](https://www.biorxiv.org/content/10.1101/2021.08.15.456425v1), https://github.com/sokrypton/ColabFold)</font></b> 

- <b><font color='green'>ColabFold is based on AlphaFold2 [(Jumper et al. 2021)](https://www.nature.com/articles/s41586-021-03819-2)
</font></b>