diff --git a/tutorials/AlphaFold2/AlphaFold2.ipynb b/tutorials/AlphaFold2/AlphaFold2.ipynb new file mode 100644 index 0000000..8610db0 --- /dev/null +++ b/tutorials/AlphaFold2/AlphaFold2.ipynb @@ -0,0 +1,2174 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "## Notebooks Hub AlphaFold2 Tutorial Using Jupyter Widgets\n", + "This is an adapted notebook of the original [ColabFold AlphaFold2 notebook](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) to serve as an example for Notebooks Hub using interactive Jupyter widgets." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "Here are some helpful links to learn more about using AlphaFold and its impact in computational biology. \n", + "  \\- AlphaFold Protein Structure Database [FAQ](https://alphafold.ebi.ac.uk/faw) \n", + "  \\- neurosnap.ai's guides [Part 1](https://neurosnap.ai/blog/post/641a34a1148354cbab382afe) [Part 2](https://neurosnap.ai/blog/post/64222437a55063d26e9c069e) [Part 3](https://neurosnap.ai/blog/post/6422432aa55063d26e9c06a1) \n", + "  \\- Jumper et al (2021). Highly accurate protein structure prediction with AlphaFold. [doi: 10.1038/s41586-021-03819-2](https://doi.org/10.1038/s41586-021-03819-2) \n", + "  \\- Mirdita et al (2022). ColabFold: Making protein folding accessible to all. [doi: 10.1038/s41592-022-01488-1](https://doi.org/10.1038/s41592-022-01488-1) \n", + "  \\- Bertoline et al (2023). Before and after AlphaFold2: An overview of protein structure prediction. [doi: 10.3389/fbinf.2023.1120370](https://doi.org/10.3389/fbinf.2023.1120370) \n", + "  \\- Fang et al (2023). A method for multiple-sequence-alignment-free protein structure prediction using a protein language model. [doi: 10.1038/s42256-023-00721-6](https://doi.org/10.1038/s42256-023-00721-6) \n", + "\n", + "To learn more about Notebooks Hub or Jupyter Widgets, check out their documentation [here](https://polusai.github.io/notebooks-hub/) and [here](https://ipywidgets.readthedocs.io/en/8.1.2/index.html), respectively." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Setup\n", + "Import appropriate packages into your program to get started. These are necessary to run the AlphaFold predictions later on." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import os\n", + "import re\n", + "import hashlib\n", + "import random\n", + "import shutil\n", + "\n", + "from sys import version_info\n", + "python_version = f\"{version_info.major}.{version_info.minor}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The following packages will enable the build of interactive widgets to provide a better user experience. More information on building interactive widgets can be found in the [Jupyter Widgets documentation](https://ipywidgets.readthedocs.io/en/8.1.2/index.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from ipywidgets import interact, interactive, fixed, interact_manual, FileUpload, GridBox, Layout, VBox\n", + "import ipywidgets as widgets\n", + "from IPython.display import display, HTML\n", + "from ipyfilechooser import FileChooser" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Generate interactive widgets for query session inputs\n", + "An interactive widget will help the user select the necessary inputs that are required to run the analysis. Examples of additional styles and formats can be found in the [documentation's](https://ipywidgets.readthedocs.io/en/8.1.2/index.html) [widget list](https://ipywidgets.readthedocs.io/en/8.1.2/examples/Widget%20List.html). \n", + "1. The core parameters to define before running AlphaFold predictions include the **query sequence(s)**, **Amber relaxed model(s)**, and **template(s)**, as well as **jobname** to keep track of each query session. \n", + "2. Set up individual widget types for each prediction parameter. These are widget *children* that will be grouped into a *family* in the next step. For visual convenience, a border will be added to outline each family container." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `.HTML` widget type will display text that can be used as headers or descriptions. These strings can be formatted using html tags (e.g., ``). The `grid_area` attribute will help with widget placement inside the family container." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "html_input = widgets.HTML(description=\"Input Protein Sequences:\",\n", + " value=\"\",\n", + " style={'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_input'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Variables can be assigned values (or, as shown below, concatenated f-strings for large blocks of text) that can then be used to more easily attribute values inside each widget." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "qtips = (f\"Helpful Tips
\"\n", + " f\"Query: Use `:` to specify inter-protein chainbreaks for modeling complexes (supports homo- and hetro-oligomers). \"\n", + " f\"For example `PI...SK:PI...SK` for a homodimer.
\"\n", + " f\"Template Mode: Select the desired template to run predictions against.
\"\n", + " f\"Amber Relaxes: Specify how many of the top ranked structures to relax using Amber.
\"\n", + " f\"Amber` is a suite of programs that apply AMBER forcefields to simulations of biomolecules and molecular dynamics. \"\n", + " f\"Amber relaxed models relax acid side chain positions and are usually required for users who need accurate side-chain positions.\"\n", + " )\n", + "\n", + "html_qtips = widgets.HTML(description=\"\",\n", + " value=qtips,\n", + " style={'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_qtips'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `.Text` widget type will provide a box that allows the user to input text strings while `.Textarea` will provide an adjustable box. In this case, the adjustable box was selected for **query sequence**. This will be useful if the query is long or complex, because the user can view the input query in its entirety. If specified, the attribute `placeholder` will be visible in the text box when empty. An initial value can be assigned to the `value` attribute to pre-populate the box." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "text_queryseq = widgets.Textarea(value='PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK',\n", + " placeholder='Input Amino Acid Sequence for Protein of Interest',\n", + " description='Query Sequence:',\n", + " disabled=False,\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='700px', grid_area='text_queryseq'))\n", + "\n", + "text_jobname = widgets.Text(value='test',\n", + " placeholder='Type Jobname',\n", + " description='Jobname:',\n", + " disabled=False,\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='auto', grid_area='text_jobname'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The next two widgets need to offer pre-determined options for user selection. These options can be defined for each necessary parameters using the `options` attribute within each appropriate individual. The default selection within these options can be specified using the `value` attribute. \n", + "\n", + "The `RadioButtons` widget type will list the possible options with buttons for a single selection." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "buttons_num_relax = widgets.RadioButtons(options=[('0',0),('1',1),('5',5)],\n", + " value=0,\n", + " description='Number of Amber Relaxes:',\n", + " disabled=False,\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='auto', grid_area='buttons_num_relax'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `.ToggleButtons` widget will display buttons that allow the user to make one choice of the given options. This widget type is very helpful, because the attribute `tooltips` enables descriptions for each option upon hovering over with the mouse pointer." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "buttons_template_mode = widgets.ToggleButtons(options=['none','pdb100','custom'],\n", + " description='Template Mode:',\n", + " disabled=False,\n", + " button_style='',\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='auto', grid_area='buttons_template_mode'),\n", + " tooltips=['no template information is used',\n", + " 'detect templates in pdb100',\n", + " 'upload and search own templates (PDB or mmCIF format, see notes) to bias AlphaFold\\'s predictions'])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Now, set up the widget family. Jupyter's `Box` widgets utilize the CSS [flexbox spec](https://css-tricks.com/snippets/css/a-guide-to-flexbox/) for gathering individual widgets within a container. The family below will use the `GridBox` container that can be customized according to the CSS [Grid layouts](https://css-tricks.com/snippets/css/complete-guide-grid). \n", + "1. Using the `GridBox` container, define its children from the code above.\n", + "2. Lay out children as desired within container. Each child's `Layout.grid_area` attribute will need to have a matching label inside the container's `Layout.grid_template_areas` attribute.\n", + "3. `display()` will display the interactive widget family." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8577dc777ce547329f08335363ab1bfa", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "GridBox(children=(HTML(value='', description='Input Protein Sequences:', layout=Layout(grid_area='html_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "controls_query = GridBox(children=[html_input, html_qtips, text_queryseq, text_jobname, buttons_num_relax, buttons_template_mode],\n", + " layout=Layout(\n", + " border='solid 1.5px',\n", + " width='1255px',\n", + " grid_template_rows='auto auto auto auto auto',\n", + " grid_template_columns='300px 450px 500px',\n", + " grid_template_areas='''\n", + " \"html_input html_input html_qtips\"\n", + " \"text_queryseq text_queryseq html_qtips\"\n", + " \"text_jobname . html_qtips\"\n", + " \"buttons_template_mode buttons_template_mode html_qtips\"\n", + " \"buttons_num_relax buttons_num_relax html_qtips\"\n", + " ''')\n", + " )\n", + "\n", + "display(controls_query)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Access widget outputs to generate a new directory for saving prediction results and queries" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The following code defines functions to help augment jobnames to minimize the risk of files being overwritten if the same sequence was queried multiple times using different parameter values. The functions below will append an underscore and integer at the end of the jobname with each sequential run (e.g., `_0`, `_1`, ...)." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "def add_hash(x,y):\n", + " return x+\"_\"+hashlib.sha1(y.encode()).hexdigest()[:5]\n", + "\n", + "def update_jobname(jobname):\n", + " basejobname = \"\".join(jobname.split())\n", + " basejobname = re.sub(r'\\W+', '', basejobname)\n", + " jobname_new = add_hash(basejobname, query_sequence)\n", + " \n", + " return jobname_new" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "User-defined and selected values from the widget can be accessed through each children's `.value` to save as accessible variables. This is shown in the top portion of the following code block. \n", + "  • With the rest of the code block, the system will then check in the working path for a directory sharing the same jobname. If one does not exist, it will create one. If one does, it will create an iteration (e.g., `_0`, `_1`, ...). \n", + "  • ***Note:*** *these code chunks are required to be in the same cell, otherwise iterative numbering does not work as intended (i.e., recursively appends `_0` instead of increasing in value).*" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "jobname: test_a5e17_3\n", + "sequence: PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK\n", + "length: 59\n", + "relax: 0\n", + "template: none\n" + ] + } + ], + "source": [ + "# save outputs as accessible variables\n", + "jobname = text_jobname.value\n", + "query_sequence = text_queryseq.value\n", + "num_relax = buttons_num_relax.value\n", + "template_mode = buttons_template_mode.value\n", + "\n", + "use_amber = num_relax > 0\n", + "length = len(query_sequence.replace(\":\",\"\"))\n", + "\n", + "# remove whitespaces and update jobname\n", + "query_sequence = \"\".join(query_sequence.split())\n", + "jobname = update_jobname(jobname)\n", + "\n", + "# check if directory with jobname exists\n", + "def check(folder):\n", + " if os.path.exists(folder):\n", + " return False\n", + " else:\n", + " return True\n", + " \n", + "if not check(jobname):\n", + " n = 0\n", + " while not check(f\"{jobname}_{n}\"): n += 1\n", + " jobname = f\"{jobname}_{n}\"\n", + "\n", + "# make directory to save results\n", + "os.makedirs(jobname, exist_ok=True)\n", + "\n", + "# save a copy of the query sequence in the newly generated folder\n", + "queries_path = os.path.join(jobname, f\"{jobname}.csv\")\n", + "with open(queries_path, \"w\") as text_file:\n", + " text_file.write(f\"id,sequence\\n{jobname},{query_sequence}\")\n", + "\n", + "# for verification purposes, return the session's information.\n", + "print(f\"jobname: {jobname}\" \"\\n\"\n", + " f\"sequence: {query_sequence}\" \"\\n\"\n", + " f\"length: {length}\" \"\\n\"\n", + " f\"relax: {num_relax}\" \"\\n\"\n", + " f\"template: {template_mode}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Generate file upload widgets to select custom templates for predictions" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "**Local File Upload:** The `FileUpload` widget allows the user to select local computer files for upload to the current working directory on the server. AlphaFold allows for multiple custom templates, so `multiple=TRUE` was set. *One* specific file extension can be specified inside the attribute `accept=''`. \n", + "  • ***Note:*** *This will replace pre-existing files in the current directory with the same name. Please rename if necessary.* \n", + "  • ***Note:*** *The counter shown will increase despite re-selecting a file. The cell containing `display(upload)` must be rerun to reset the counter.* " + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "upload = FileUpload(accept='', multiple=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "**Server File Upload:** The `FileChooser` widget allows the user to select a single file that is already present on the server. Default files can be shown by defining `fc.filter_pattern` with one or more specific extensions. For AlphaFold, templates should be PDB or PDBx/mmCIF format. \n", + "  • **Note:** ipyfilechooser is a separate package that works in conjunction with ipywidgets. \n", + "  • If a file from the server was selected, `os.rename` will move the file into the template folder inside the current query session's directory (i.e., */\\/template/\\*) \n", + "  • If a file from the server was selected, `os.rename` will move the file into the template folder inside the current query session's directory (i.e., */\\/template/\\*) " + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "fc = FileChooser(\n", + " os.getcwd(),\n", + " filename='',\n", + " title='Select custom template(s)
Note: must follow four letter PDB naming with lower case letters',\n", + " show_hidden=False,\n", + " select_default=False,\n", + " show_only_dirs=False\n", + " )\n", + "\n", + "fc.filter_pattern = ['*.pdb', '*.pdbx', '*.txt'] " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The following statements set variables for downstream analysis and will display the upload widgets only if the `template_mode` was set to *custom*. A template folder will also be generated inside the current jobname's directory to store template files. \n", + "  • ***Note:*** *In order to cancel file selection from the server, the previous cell must be rerun.*" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# set variables and display file uploaders if template mode is set to custom\n", + "if template_mode == \"pdb100\":\n", + " use_templates = True\n", + " custom_template_path = None\n", + "elif template_mode == \"custom\":\n", + " custom_template_path = os.path.join(jobname,f\"template\")\n", + " os.makedirs(custom_template_path, exist_ok=True)\n", + " use_templates = True\n", + " display(fc)\n", + " display(upload)\n", + "else:\n", + " custom_template_path = None\n", + " use_templates = False" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "File selected from the server: None\n", + "File(s) selected from the local host for upload: None\n" + ] + } + ], + "source": [ + "# move server file to session's template folder\n", + "if template_mode == \"custom\":\n", + " if fc.selected is not None:\n", + " for fn in fc.selected:\n", + " os.rename(fn,os.path.join(custom_template_path,fn))\n", + "\n", + "# return filenames of all files selected for custom template use.\n", + "if not upload.value:\n", + " fps = \"None\"\n", + "print(f\"File selected from the server: {fc.selected}\")\n", + "print(f\"File(s) selected from the local host for upload: {fps}\")\n", + "\n", + "# upload local files to server and place inside session's template folder\n", + "if upload.value:\n", + " fps = []\n", + " for fp in upload.value:\n", + " fps.append(f\"{fp}\")\n", + " with open(fp, 'wb') as output_file:\n", + " content = upload.value[fp]['content']\n", + " output_file.write(content)\n", + " os.rename(fp,os.path.join(custom_template_path,fp))\n", + " print(f\">> {fp} successfully uploaded\")\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Generate interactive widget for multiple sequence alignment options" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "AlphaFold's AI was trained with multiple sequence alignment (MSA), paired residues, and experimentally validated protein structures from the [RSCB Protein Data Bank (PDB)](https://www.rcsb.org/). AlphaFold2 uses MMseq2 [(Many-against-Many searching)](https://mmseqs.com/latest/userguide.pdf) software to search and cluster huge sequence sets from databases that comprise of UniRef [(UniProt Reference Clusters)](https://www.uniprot.org/help/uniref) and its own novel [environmental database](https://colabfold.mmseqs.com/), referred to as _env_ inside widget options. MSA pairing can also be controlled to improve prediction accuracy for protein complexes. A new family of widgets will be created below for these options." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Use the `.HTML` widget type to create a descriptive header." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "html_msaopts = widgets.HTML(description=\"Multiple Sequence Alignment Options (custom MSA upload, single sequence, pairing mode)\",\n", + " value=\"\",\n", + " style= {'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_msaopts'),)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `.Select` widget type will display a box with all possible options for selection by row." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "select_msa_mode = widgets.Select(options=['mmseqs2_uniref_env', 'mmseqs2_uniref', 'single_sequence', 'custom'],\n", + " value='mmseqs2_uniref_env',\n", + " description='MSA mode:',\n", + " rows=5,\n", + " disabled=False,\n", + " layout=Layout(width='auto', grid_area='select_msa_mode'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `.ToggleButtons` type will display options with the helpful description upon hover." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "buttons_pair_mode = widgets.ToggleButtons(options=['unpaired_paired', 'paired', 'unpaired'],\n", + " description='Pair Mode:',\n", + " disabled=False,\n", + " button_style='',\n", + " layout=Layout(width='auto', grid_area='buttons_pair_mode'),\n", + " tooltips=['pair sequences from same species + unpaired MSA',\n", + " 'seperate MSA for each chain',\n", + " 'only use paired sequences'])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Now, set up the widget family.\n", + "1. Using the `GridBox` container, define its children from the code above.\n", + "2. `display()` will display the interactive widget family." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7baf411722a34bba9adffdd71e629436", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "GridBox(children=(HTML(value='', description='Multiple Sequence Alignment Options (custom MSA upload, singl…" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "controls_msa = GridBox(children=[html_msaopts, select_msa_mode, buttons_pair_mode],\n", + " layout=Layout(\n", + " border='solid 1.5px',\n", + " width='605px',\n", + " grid_template_rows='auto auto',\n", + " grid_template_columns='300px 300px',\n", + " grid_template_areas='''\n", + " \"html_msaopts html_msaopts\"\n", + " \"select_msa_mode buttons_pair_mode\"\n", + " ''')\n", + " )\n", + "\n", + "display(controls_msa)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Next, save the widget selections as accessible variables." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "msa_mode = select_msa_mode.value\n", + "pair_mode = buttons_pair_mode.value" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "#### Custom MSA file (.a3m formatted)\n", + "##### DISCLAIMER: HAVE NOT TESTED FUNCTIONALITY OF USING CUSTOM A3M FILE AFTER SUCCESSFUL UPLOAD\n", + "Custom MSA allows users to provide their own alignment files for multiple sequence alignment. Any kind of alignment tool can be used to generate the MSA, including the [HHblits Toolkit server](https://toolkit.tuebingen.mpg.de/tools/hhblits)." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "cellView": "form", + "id": "C2_sh2uAonJH", + "tags": [] + }, + "outputs": [], + "source": [ + "# create additional file uploaders to use for custom MSA\n", + "upload_msa = FileUpload(accept='.a3m', multiple=False)\n", + "\n", + "# decide which a3m to use\n", + "if \"mmseqs2\" in msa_mode:\n", + " a3m_file = os.path.join(jobname,f\"{jobname}.a3m\")\n", + "\n", + "elif msa_mode == \"custom\":\n", + " a3m_file = os.path.join(jobname,f\"{jobname}.custom.a3m\")\n", + " if not os.path.isfile(a3m_file):\n", + " print(\"The first FASTA entry of the A3M file must be the query sequence without gaps.\")\n", + " display(upload_msa)\n", + "else:\n", + " a3m_file = os.path.join(jobname,f\"{jobname}.single_sequence.a3m\")\n", + " with open(a3m_file, \"w\") as text_file:\n", + " text_file.write(\">1\\n%s\" % query_sequence)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The following code cell will save the selected local file to the server and create a renamed copy for the program to access." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "# upload local file to session's folder on server\n", + "if upload_msa.value:\n", + " up_msa = upload_msa.value\n", + " fpmsa = []\n", + " for fn, fd in up_msa.items():\n", + " fpmsa.append(f\"{fn}\")\n", + " with open(fn, 'wb') as output_file:\n", + " content = fd['content']\n", + " output_file.write(content)\n", + " os.rename(fn,os.path.join(jobname,fn))\n", + " print(f\"{fn} successfully uploaded. Don't forget to cite your custom MSA generation method!\")\n", + "\n", + "if upload_msa.value:\n", + " orig_msa = f\"{jobname}/{fpmsa[0]}\"\n", + " custom_msa = shutil.copy2(orig_msa,f\"{jobname}/strip_{fpmsa[0]}\") # copy file as backup or for preservation purposes\n", + "\n", + " header = 0\n", + " import fileinput\n", + " for line in fileinput.FileInput(custom_msa,inplace=True):\n", + " if line.startswith(\">\"):\n", + " header = header + 1\n", + " if not line.rstrip():\n", + " continue\n", + " if line.startswith(\">\") == False and header == 1:\n", + " query_sequence = line.rstrip()\n", + " print(line, end='')\n", + " \n", + " os.rename(custom_msa, a3m_file)\n", + " queries_path=a3m_file\n", + " print(f\"Moving {custom_msa} to {a3m_file} for use by AlphaFold.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Advanced Settings" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Create the widget header and a tips box using the `.HTML` widget type." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "html_advset = widgets.HTML(description=\"Advanced Settings:\",\n", + " value=\"\",\n", + " style={'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_advset'))\n", + "\n", + "advtips = (f\"Helpful Tips
\"\n", + " f\"Model Type: Choose the structural dimentions for prediction (i.e., oligomeric or multimeric). For monomer predictions, choose alphafold2-ptm. \"\n", + " f\"Auto permits the model to decide and will use alphafold2_multimer_v3 for complex prediction.
\"\n", + " f\"Number of Recycles: Enables multiple reiterations through the sequence by building off its own predictions. \"\n", + " f\"The default is 3, but 6+ will enable a more accurate prediction despite longer runtimes.
\"\n", + " f\"Recycle Early Stop Tolerance: Auto tolerance will be 0.0, unless using alphafold2_multimer_v3.
\"\n", + " f\"Alphafold2_multimer_v3: For complex predictions using this model, `auto` will result in recycles = 20 and tolerance = 0.05.\"\n", + " )\n", + "\n", + "html_advtips = widgets.HTML(description=\"\",\n", + " value=advtips,\n", + " style={'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_advtips'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `.Dropdown` widget type will display a single selection list in dropdown format. Create dropdown widgets to select AlphaFold **model types**, **number of recycles**, and **recycle early stop tolerance** values." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "drop_model_type = widgets.Dropdown(options=['auto','alphafold2_ptm','alphafold2_multimer_v1','alphafold2_multimer_v2','alphafold2_multimer_v3'],\n", + " value='auto',\n", + " description='Model Type:',\n", + " disabled=False,\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='auto', grid_area='drop_model_type'))\n", + "\n", + "drop_num_recycles = widgets.Dropdown(options=[('auto','auto'),('0',0),('1',1),('3',3),('6',6),('12',12),('24',24),('48',48)],\n", + " value='auto',\n", + " description='Number of Recycles:',\n", + " disabled=False,\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='auto', grid_area='drop_num_recycles'))\n", + "\n", + "drop_tol = widgets.Dropdown(options=[('auto','auto'),('0.0',0.0),('0.5',0.5),('1.0',1.0)],\n", + " value='auto',\n", + " description='Recycle Early Stop Tolerance:',\n", + " disabled=False,\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='auto', grid_area='drop_tol'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Use `.ToggleButtons` to create toggle buttons to select **pairing strategy** and have descriptions display upon hovering over each button." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "buttons_pairing_strategy = widgets.ToggleButtons(options=['greedy','complete'],\n", + " description='Pairing Strategy:',\n", + " disabled=False,\n", + " button_style='',\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='auto', grid_area='buttons_pairing_strategy'),\n", + " tooltips=['pair any taxonomically matching subsets',\n", + " ' all sequences have to match in one line'])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Now, set up the widget family, once again using `GridBox` and `display()`." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "collapsed": true, + "jupyter": { + "outputs_hidden": true + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cca6239d6f5c4c3cbeb3a9b35dc0fc3a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "GridBox(children=(HTML(value='', description='Advanced Settings:', layout=Layout(grid_area='html_advset…" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "controls_advset = GridBox(children=[html_advset, html_advtips, drop_model_type, drop_num_recycles, drop_tol, buttons_pairing_strategy],\n", + " layout=Layout(\n", + " border='solid 1.5px',\n", + " width='1305px',\n", + " grid_template_rows='auto auto auto auto auto',\n", + " grid_template_columns='350px 200px 750px',\n", + " grid_template_areas='''\n", + " \"html_advset html_advset html_advtips\"\n", + " \"drop_model_type . html_advtips\"\n", + " \"drop_num_recycles . html_advtips\"\n", + " \"drop_tol . html_advtips\"\n", + " \"buttons_pairing_strategy buttons_pairing_strategy html_advtips\"\n", + " ''')\n", + " )\n", + "\n", + "display(controls_advset)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Lastly, save the widget selections as accessible variables." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "model_type = drop_model_type.value\n", + "pairing_strategy = buttons_pairing_strategy.value\n", + "\n", + "if drop_model_type.value != 'alphafold2_multimer_v3':\n", + " if drop_num_recycles.value == 'auto':\n", + " num_recycles = 3\n", + " else:\n", + " num_recycles = drop_num_recycles.value\n", + " \n", + " if drop_tol.value == 'auto':\n", + " recycle_early_stop_tolerance = 0.0\n", + " else:\n", + " recycle_early_stop_tolerance = drop_tol.value\n", + "\n", + "elif drop_model_type.value == 'alphafold2_multimer_v3':\n", + " if drop_num_recycles.value == 'auto':\n", + " num_recycles = 20\n", + " else:\n", + " num_recycles = drop_num_recycles.value\n", + "\n", + " if drop_tol.value == 'auto':\n", + " recycle_early_stop_tolerance = 0.5\n", + " else:\n", + " recycle_early_stop_tolerance = drop_tol.value" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Generate interactive widget to define sample settings" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "`.HTML` widgets can be used again to add headers as well as additional text to provide setting tips." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "user_expressions": [] + }, + "outputs": [], + "source": [ + "html_sampset = widgets.HTML(description=\"Sample Settings:\",\n", + " value=\"\",\n", + " style= {'description_width': 'initial'})\n", + "\n", + "msatips = (f\"Helpful Tips
\"\n", + " f\"- Decrease Max MSA to increase uncertainty.
\"\n", + " f\"- Enable dropouts and increase # seeds to sample predictions from uncertainty of the model.\"\n", + " )\n", + "\n", + "html_msatips = widgets.HTML(description=\"\",\n", + " value=msatips,\n", + " style={'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_msatips'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "A `.SelectionSlider` widget will be used alongside the standard `.Dropdown` type used previously. The selection slider offers a range of custom values without conforming to a uniform increment." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [], + "source": [ + "drop_max_msa = widgets.Dropdown(options=['auto','512:1024','256:512','64:128','32:64', '16:32'],\n", + " value='auto',\n", + " description='Max MSA:',\n", + " disabled=False)\n", + "\n", + "slider_num_seeds = widgets.SelectionSlider(options=[('1',1),('2',2),('4',4),('8',8),('16',16)],\n", + " value=1,\n", + " description='# seeds:',\n", + " disabled=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Here, a `.Checkbox` widget type that can be selected or unselected is introduced." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "cb_dropout = widgets.Checkbox(value=False, description='Use Dropout', disabled=False, indent=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "Now, set up the widget family, once again using `GridBox` and `display()`." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "65d0fc31364e4abba96f20175e67a29e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "GridBox(children=(HTML(value='', description='Sample Settings:', style=DescriptionStyle(description_wid…" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "controls_sampset = GridBox(children=[html_sampset, html_msatips, drop_max_msa, slider_num_seeds, cb_dropout],\n", + " layout=Layout(\n", + " border='solid 1.5px',\n", + " width='805px',\n", + " grid_template_rows='auto auto auto auto auto',\n", + " grid_template_columns='400px 400px',\n", + " grid_template_areas='''\n", + " \"h_sampset html_msatips\"\n", + " \"drop_max_msa html_msatips\"\n", + " \"slider_num_seeds html_msatips\"\n", + " \"cb_dropout .\"\n", + " ''')\n", + " )\n", + "\n", + "display(controls_sampset)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Save the widget selections as accessible variables. These will also be used to assign other values as depicted in the bottom half of the code cell." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "max_msa = drop_max_msa.value\n", + "num_seeds = slider_num_seeds.value\n", + "use_dropout = cb_dropout.value\n", + "\n", + "num_recycles = None if num_recycles == \"auto\" else int(num_recycles)\n", + "recycle_early_stop_tolerance = None if recycle_early_stop_tolerance == \"auto\" else float(recycle_early_stop_tolerance)\n", + "if max_msa == \"auto\": max_msa = None" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Generate interactive widget to toggle save settings" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `.IntText` widget can be used to allow the user to specify any integer inside its given text box." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [], + "source": [ + "text_save_dpi = widgets.IntText(value=200,\n", + " description='dpi:',\n", + " disabled=False,\n", + " layout=Layout(width='auto', grid_area='text_save_dpi'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "Set up individual `.HTML` widgets to display text and `.Checkboxes` for toggles." + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [], + "source": [ + "html_saveset = widgets.HTML(description=\"Save Settings:\",\n", + " value=\"\",\n", + " style= {'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_saveset'))\n", + "\n", + "html_savedpi = widgets.HTML(description=\"Set dpi for image resolution:\",\n", + " value=\"\",\n", + " style= {'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_savedpi'))\n", + "\n", + "cb_savefull = widgets.Checkbox(value=False,description='Save All', disabled=False, layout=Layout(width='auto', grid_area='cb_savefull'))\n", + "\n", + "cb_saverecyc = widgets.Checkbox(value=False, description='Save Recycles', disabled=False, layout=Layout(width='auto', grid_area='cb_saverecyc'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Set up the widget family, once again using `GridBox` and `display()`. This time columns are also utilized. For proper layout assignment into the family container, `grid_area` was defined for each widget child." + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d23d5928e4b34ba9a7093ab87e156f7d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "GridBox(children=(HTML(value='', description='Save Settings:', layout=Layout(grid_area='html_saveset', …" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "controls_saveset = GridBox(children=[html_saveset, html_savedpi, text_save_dpi, cb_savefull, cb_saverecyc],\n", + " layout=Layout(\n", + " border='solid 1.5px',\n", + " width='455px',\n", + " grid_template_rows='auto auto auto auto',\n", + " grid_template_columns='150px 300px',\n", + " grid_template_areas='''\n", + " \"html_saveset .\"\n", + " \"html_savedpi text_save_dpi\"\n", + " \". cb_savefull\"\n", + " \". cb_saverecyc\"\n", + " ''')\n", + " )\n", + "\n", + "display(controls_saveset)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Save the widget selections as accessible variables. " + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [], + "source": [ + "save_all = cb_savefull.value\n", + "save_recycles = cb_saverecyc.value\n", + "dpi = text_save_dpi.value" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Prepare prediction run" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Use a simple `.Checkbox` widget to allow the user to toggle whether or not images should be displayed during the prediction run." + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6a2fdce0120147198404fc82a7ac1be7", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Checkbox(value=True, description='Display Images', layout=Layout(border='solid 1.5px'))" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "cb_displayimg = widgets.Checkbox(value=True, description='Display Images', disabled=False, indent=True, layout=Layout(border='solid 1.5px'))\n", + "display(cb_displayimg)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Assign the selection as an accessible variable." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [], + "source": [ + "display_images = cb_displayimg" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The following package imports are necessary to finally run the structural predictions." + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "import sys\n", + "import warnings\n", + "warnings.simplefilter(action='ignore', category=FutureWarning)\n", + "from Bio import BiopythonDeprecationWarning\n", + "#warnings.simplefilter(action='ignore', category=BiopythonDeprecationWarning)\n", + "from pathlib import Path\n", + "from colabfold.download import download_alphafold_params, default_data_dir\n", + "from colabfold.utils import setup_logging\n", + "from colabfold.batch import get_queries, run, set_model_type\n", + "from colabfold.plot import plot_msa_v2\n", + "\n", + "from colabfold.colabfold import plot_protein\n", + "#from colabfold.cf import plot_protein\n", + "from pathlib import Path\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import os\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Add system paths to the new dependencies that were installed previously." + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [], + "source": [ + "# pdbfixer 1.8.1\n", + "if use_amber and f\"/opt/conda/pkgs/pdbfixer-1.8.1-pyh6c4a22f_0/site-packages/\" not in sys.path:\n", + " sys.path.insert(0, f\"/opt/conda/pkgs/pdbfixer-1.8.1-pyh6c4a22f_0/site-packages/\")\n", + "\n", + "# openmm 7.7.0\n", + "if use_amber and f\"/opt/conda/pkgs/openmm-7.7.0-py39h15fbce5_1/lib/python3.9/site-packages\" not in sys.path:\n", + " sys.path.insert(0, f\"/opt/conda/pkgs/openmm-7.7.0-py39h15fbce5_1/lib/python3.9/site-packages\")\n", + "\n", + "# kalign2 2.0.4\n", + "if use_templates and f\"/opt/conda/pkgs/kalign2-2.04-h031d066_5/bin\" not in sys.path:\n", + " sys.path.insert(0, f\"/opt/conda/pkgs/kalign2-2.04-h031d066_5/bin\")\n", + "\n", + "# hhsuite 3.3.0\n", + "if use_templates and f\"/opt/conda/pkgs/hhsuite-3.3.0-py39pl5321he10ea66_9/bin\" not in sys.path:\n", + " sys.path.insert(0, f\"/opt/conda/pkgs/hhsuite-3.3.0-py39pl5321he10ea66_9/bin\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Additionally, the following cell defines a few necessary functions for ColabFold/AlphaFold." + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [], + "source": [ + "def input_features_callback(input_features):\n", + " if display_images:\n", + " plot_msa_v2(input_features)\n", + " plt.show()\n", + " plt.close()\n", + "\n", + "def prediction_callback(protein_obj, length,\n", + " prediction_result, input_features, mode):\n", + " model_name, relaxed = mode\n", + " if not relaxed:\n", + " if display_images:\n", + " fig = plot_protein(protein_obj, Ls=length, dpi=150)\n", + " plt.show()\n", + " plt.close()\n", + "\n", + "result_dir = jobname\n", + "log_filename = os.path.join(jobname,\"log.txt\")\n", + "if not os.path.isfile(log_filename) or 'logging_setup' not in globals():\n", + " setup_logging(Path(log_filename))\n", + " logging_setup = True\n", + "\n", + "queries, is_complex = get_queries(queries_path)\n", + "model_type = set_model_type(is_complex, model_type)\n", + "\n", + "if \"multimer\" in model_type and max_msa is not None:\n", + " use_cluster_profile = False\n", + "else:\n", + " use_cluster_profile = True" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Run AlphaFold2 predictions" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "**Note:** IF USING AMBER RELAXATION: User may receive the following error during pLDDT reranking and may be unable to continue forward. Adding `run_relax=false` somewhere inside may help with this issue [see here](https://github.com/google-deepmind/alphafold/issues/112). \n", + "> ValueError: Minimization failed after 100 attempts." + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": { + "cellView": "form", + "collapsed": true, + "id": "mbaIO9pWjaN0", + "jupyter": { + "outputs_hidden": true + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-12-05 20:28:49,371 Unable to initialize backend 'cuda': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'\n", + "2023-12-05 20:28:49,374 Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'\n", + "2023-12-05 20:28:49,379 Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available.\n", + "2023-12-05 20:28:49,383 No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)\n", + "2023-12-05 20:28:49,383 WARNING: no GPU detected, will be using CPU\n", + "2023-12-05 20:28:59,023 Found 4 citations for tools or databases\n", + "2023-12-05 20:28:59,024 Query 1/1: test_a5e17_3 (length 59)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "COMPLETE: 100%|██████████| 150/150 [elapsed: 00:02 remaining: 00:00]\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-12-05 20:29:02,320 Setting max_seq=512, max_extra_seq=5120\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2023-12-05 20:29:15.291663: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 43410312 exceeds 10% of free system memory.\n", + "2023-12-05 20:29:15.297120: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 42604372 exceeds 10% of free system memory.\n", + "2023-12-05 20:29:15.920536: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 42604372 exceeds 10% of free system memory.\n", + "2023-12-05 20:29:16.028863: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 42604372 exceeds 10% of free system memory.\n", + "2023-12-05 20:29:16.135384: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 42604372 exceeds 10% of free system memory.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-12-05 20:32:33,551 alphafold2_ptm_model_1_seed_000 recycle=0 pLDDT=96.6 pTM=0.754\n", + "2023-12-05 20:35:27,454 alphafold2_ptm_model_1_seed_000 recycle=1 pLDDT=96.5 pTM=0.758 tol=0.233\n", + "2023-12-05 20:38:21,705 alphafold2_ptm_model_1_seed_000 recycle=2 pLDDT=96.4 pTM=0.757 tol=0.0374\n", + "2023-12-05 20:41:15,995 alphafold2_ptm_model_1_seed_000 recycle=3 pLDDT=96.1 pTM=0.756 tol=0.0339\n", + "2023-12-05 20:41:15,997 alphafold2_ptm_model_1_seed_000 took 719.7s (3 recycles)\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-12-05 20:44:09,965 alphafold2_ptm_model_2_seed_000 recycle=0 pLDDT=96.9 pTM=0.76\n", + "2023-12-05 20:47:03,429 alphafold2_ptm_model_2_seed_000 recycle=1 pLDDT=96.9 pTM=0.765 tol=0.284\n", + "2023-12-05 20:49:57,554 alphafold2_ptm_model_2_seed_000 recycle=2 pLDDT=96.9 pTM=0.766 tol=0.124\n", + "2023-12-05 20:52:50,860 alphafold2_ptm_model_2_seed_000 recycle=3 pLDDT=96.8 pTM=0.767 tol=0.0565\n", + "2023-12-05 20:52:50,862 alphafold2_ptm_model_2_seed_000 took 694.4s (3 recycles)\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-12-05 20:55:44,737 alphafold2_ptm_model_3_seed_000 recycle=0 pLDDT=97.2 pTM=0.775\n", + "2023-12-05 20:58:38,388 alphafold2_ptm_model_3_seed_000 recycle=1 pLDDT=97.4 pTM=0.783 tol=0.273\n", + "2023-12-05 21:01:32,093 alphafold2_ptm_model_3_seed_000 recycle=2 pLDDT=97.4 pTM=0.782 tol=0.116\n", + "2023-12-05 21:04:25,697 alphafold2_ptm_model_3_seed_000 recycle=3 pLDDT=97.4 pTM=0.784 tol=0.0477\n", + "2023-12-05 21:04:25,699 alphafold2_ptm_model_3_seed_000 took 694.7s (3 recycles)\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-12-05 21:07:19,449 alphafold2_ptm_model_4_seed_000 recycle=0 pLDDT=97.4 pTM=0.774\n", + "2023-12-05 21:10:12,890 alphafold2_ptm_model_4_seed_000 recycle=1 pLDDT=97.4 pTM=0.781 tol=0.302\n", + "2023-12-05 21:13:06,725 alphafold2_ptm_model_4_seed_000 recycle=2 pLDDT=97.2 pTM=0.777 tol=0.0837\n", + "2023-12-05 21:16:00,285 alphafold2_ptm_model_4_seed_000 recycle=3 pLDDT=96.9 pTM=0.777 tol=0.033\n", + "2023-12-05 21:16:00,287 alphafold2_ptm_model_4_seed_000 took 694.5s (3 recycles)\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-12-05 21:18:54,305 alphafold2_ptm_model_5_seed_000 recycle=0 pLDDT=97.4 pTM=0.784\n", + "2023-12-05 21:21:47,797 alphafold2_ptm_model_5_seed_000 recycle=1 pLDDT=96.9 pTM=0.784 tol=0.248\n", + "2023-12-05 21:24:41,418 alphafold2_ptm_model_5_seed_000 recycle=2 pLDDT=96.3 pTM=0.776 tol=0.188\n", + "2023-12-05 21:27:35,677 alphafold2_ptm_model_5_seed_000 recycle=3 pLDDT=96.3 pTM=0.778 tol=0.0931\n", + "2023-12-05 21:27:35,679 alphafold2_ptm_model_5_seed_000 took 695.2s (3 recycles)\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-12-05 21:27:35,815 reranking models by 'plddt' metric\n", + "2023-12-05 21:27:35,816 rank_001_alphafold2_ptm_model_3_seed_000 pLDDT=97.4 pTM=0.784\n", + "2023-12-05 21:27:35,828 rank_002_alphafold2_ptm_model_4_seed_000 pLDDT=96.9 pTM=0.777\n", + "2023-12-05 21:27:35,837 rank_003_alphafold2_ptm_model_2_seed_000 pLDDT=96.8 pTM=0.767\n", + "2023-12-05 21:27:35,846 rank_004_alphafold2_ptm_model_5_seed_000 pLDDT=96.3 pTM=0.778\n", + "2023-12-05 21:27:35,852 rank_005_alphafold2_ptm_model_1_seed_000 pLDDT=96.1 pTM=0.756\n", + "2023-12-05 21:27:38,005 Done\n" + ] + } + ], + "source": [ + "download_alphafold_params(model_type, Path(\".\"))\n", + "results = run(\n", + " queries=queries,\n", + " result_dir=result_dir,\n", + " use_templates=use_templates,\n", + " custom_template_path=custom_template_path,\n", + " num_relax=num_relax,\n", + " msa_mode=msa_mode,\n", + " model_type=model_type,\n", + " num_models=5,\n", + " num_recycles=num_recycles,\n", + " recycle_early_stop_tolerance=recycle_early_stop_tolerance,\n", + " num_seeds=num_seeds,\n", + " use_dropout=use_dropout,\n", + " model_order=[1,2,3,4,5],\n", + " is_complex=is_complex,\n", + " data_dir=Path(\".\"),\n", + " keep_existing_results=False,\n", + " rank_by=\"auto\",\n", + " pair_mode=pair_mode,\n", + " pairing_strategy=pairing_strategy,\n", + " stop_at_score=float(100),\n", + " prediction_callback=prediction_callback,\n", + " dpi=dpi,\n", + " zip_results=False,\n", + " save_all=save_all,\n", + " max_msa=max_msa,\n", + " use_cluster_profile=use_cluster_profile,\n", + " input_features_callback=input_features_callback,\n", + " save_recycles=save_recycles,\n", + " user_agent=\"colabfold/google-colab-main\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Once predictions are successfully generated, the results can be saved into the current directory as a zip file." + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'/home/jovyan/work/AF2 preinstall test/test_a5e17_3.result.zip'" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results = f\"{jobname}.result\"\n", + "\n", + "if not check(f\"{results}.zip\"):\n", + " n = 0\n", + " while not check(f\"{results}_{n}.zip\"): n += 1\n", + " results = f\"{results}_{n}\"\n", + " \n", + "shutil.make_archive(results, 'zip', jobname)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "**Result zip file contents** \n", + "1. PDB formatted structures sorted by avg. pLDDT and complexes are sorted by pTMscore. (unrelaxed and relaxed if `use_amber` is enabled).\n", + "2. Plots of the model quality.\n", + "3. Plots of the MSA coverage.\n", + "4. Parameter log file.\n", + "5. A3M formatted input MSA.\n", + "6. A `predicted_aligned_error_v1.json` using [AlphaFold-DB's format](https://alphafold.ebi.ac.uk/faq#faq-7) and a `scores.json` for each model which contains an array (list of lists) for PAE, a list with the average pLDDT and the pTMscore.\n", + "7. BibTeX file with citations for all used tools and databases." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Generate interactive widget to display 3D structure" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "Import additional packages to enable visualization in 3D. \n", + "**Note:** *`colabfold.colabfold.py` was renamed to `colabfold.cf.py` directly inside the package contents, and `colabfold.colabfold` was renamed to `colabfold.cf` in the next cell and inside `colabfold.batch.py`.*" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [], + "source": [ + "import py3Dmol\n", + "import glob\n", + "import matplotlib.pyplot as plt\n", + "from colabfold.colabfold import plot_plddt_legend\n", + "from colabfold.colabfold import pymol_color_list, alphabet_list" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Generate an interactive widget to select which ranked prediction to display, which specific color scheme, and whether to show sidechains and/or mainchains. The same widget types used earlier (`.HTML`, `.Dropdown`, `.Select`, `.Checkbox`) will be used for this family. \n", + "\n", + "***Note:*** *this will not update the 3D image in real time, so each time a different selection is selected, the cell containing the following functions will need to be rerun.* \n", + "    `show_pdb(rank_num, show_sidechains, show_mainchains, color).show()` \n", + "    `if color == \"pLDDT\": plot_plddt_legend().show()`" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "html_disp3d = widgets.HTML(description=\"Display 3D Structure:\",\n", + " value=\"\",\n", + " style= {'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_disp3d'))\n", + "\n", + "drop_rank_num = widgets.Dropdown(options=[('1',1),('2',2),('3',3),('4',4),('5',5)],\n", + " value=1,\n", + " description='rank_num:',\n", + " disabled=False,\n", + " layout=Layout(width='auto', grid_area='drop_rank_num'))\n", + "\n", + "select_color = widgets.Select(options=['chain','pLDDT','rainbow'],\n", + " value='pLDDT',\n", + " description='Color:',\n", + " rows=3,\n", + " disabled=False,\n", + " layout=Layout(width='auto', grid_area='select_color'))\n", + "\n", + "cb_sidechains = widgets.Checkbox(value=False,\n", + " description='show_sidechains',\n", + " disabled=False,\n", + " indent=True,\n", + " layout=Layout(width='auto', grid_area='cb_sidechains'))\n", + "\n", + "cb_mainchains = widgets.Checkbox(value=False,\n", + " description='show_mainchains',\n", + " disabled=False,\n", + " indent=True,\n", + " layout=Layout(width='auto', grid_area='cb_mainchains'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Display widget family." + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "d944ce870f6841fbb48637bf676c96a1", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "GridBox(children=(HTML(value='', description='Display 3D Structure:', layout=Layout(grid_area='html_dis…" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "controls_disp3d = GridBox(children=[html_disp3d, drop_rank_num, select_color, cb_sidechains, cb_mainchains],\n", + " layout=Layout(\n", + " border='solid 1.5px',\n", + " grid_template_rows='auto auto',\n", + " grid_template_columns='20% 20% 20%',\n", + " grid_template_areas='''\n", + " \"html_disp3d html_disp3d .\"\n", + " \"drop_rank_num select_color cb_sidechains\"\n", + " \". select_color cb_mainchains\"\n", + " ''')\n", + " )\n", + "\n", + "display(controls_disp3d)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The following code cell helps apply the proper display settings for the predicted protein structure selected in the widget." + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": { + "cellView": "form", + "id": "KK7X9T44pWb7" + }, + "outputs": [], + "source": [ + "def show_pdb(rank_num=1, show_sidechains=False, show_mainchains=False, color=\"pLDDT\"):\n", + " model_name = f\"rank_{rank_num}\"\n", + " view = py3Dmol.view(js='https://3dmol.org/build/3Dmol.js',)\n", + " #view.addModel(open(pdb_file[0],'r').read(),'pdb')\n", + " view.addModel(open(pdb_file[rank_num -1],'r').read(),'pdb')\n", + "\n", + " if color == \"pLDDT\":\n", + " view.setStyle({'cartoon': {'colorscheme': {'prop':'b','gradient': 'roygb','min':50,'max':90}}})\n", + " elif color == \"rainbow\":\n", + " view.setStyle({'cartoon': {'color':'spectrum'}})\n", + " elif color == \"chain\":\n", + " chains = len(queries[0][1]) + 1 if is_complex else 1\n", + " for n,chain,color in zip(range(chains),alphabet_list,pymol_color_list):\n", + " view.setStyle({'chain':chain},{'cartoon': {'color':color}})\n", + "\n", + " if show_sidechains:\n", + " BB = ['C','O','N']\n", + " view.addStyle({'and':[{'resn':[\"GLY\",\"PRO\"],'invert':True},{'atom':BB,'invert':True}]},\n", + " {'stick':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}})\n", + " view.addStyle({'and':[{'resn':\"GLY\"},{'atom':'CA'}]},\n", + " {'sphere':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}})\n", + " view.addStyle({'and':[{'resn':\"PRO\"},{'atom':['C','O'],'invert':True}]},\n", + " {'stick':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}})\n", + " if show_mainchains:\n", + " BB = ['C','O','N','CA']\n", + " view.addStyle({'atom':BB},{'stick':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}})\n", + "\n", + " view.zoomTo()\n", + " return view" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Assign widget selections as accessible variables and show 3D protein structure." + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [ + { + "data": { + "application/3dmoljs_load.v0": "
\n

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol

\n
\n", + "text/html": [ + "
\n", + "

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n", + " jupyter labextension install jupyterlab_3dmol

\n", + "
\n", + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "rank_num = drop_rank_num.value\n", + "color = select_color.value\n", + "show_sidechains = cb_sidechains.value\n", + "show_mainchains = cb_mainchains.value\n", + "\n", + "jobname_prefix = \".custom\" if msa_mode == \"custom\" else \"\"\n", + "pdb_file = sorted(glob.glob(f\"./{jobname}\"+\"/*.pdb\"))\n", + "\n", + "# show result\n", + "show_pdb(rank_num, show_sidechains, show_mainchains, color).show()\n", + "if color == \"pLDDT\":\n", + " plot_plddt_legend().show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "##### Plots" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Generate master plot of mini plots generated from the prediction." + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": { + "cellView": "form", + "id": "11l8k--10q0C" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "
\n", + "

Plots for test_a5e17_3

\n", + " \n", + " \n", + " \n", + "
\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import base64\n", + "from html import escape\n", + "\n", + "# see: https://stackoverflow.com/a/53688522\n", + "def image_to_data_url(filename):\n", + " ext = filename.split('.')[-1]\n", + " prefix = f'data:image/{ext};base64,'\n", + " with open(filename, 'rb') as f:\n", + " img = f.read()\n", + " return prefix + base64.b64encode(img).decode('utf-8')\n", + "\n", + "pae = image_to_data_url(os.path.join(jobname,f\"{jobname}{jobname_prefix}_pae.png\"))\n", + "cov = image_to_data_url(os.path.join(jobname,f\"{jobname}{jobname_prefix}_coverage.png\"))\n", + "plddt = image_to_data_url(os.path.join(jobname,f\"{jobname}{jobname_prefix}_plddt.png\"))\n", + "display(HTML(f\"\"\"\n", + "\n", + "
\n", + "

Plots for {escape(jobname)}

\n", + " \n", + " \n", + " \n", + "
\n", + "\"\"\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G4yBrceuFbf3", + "tags": [], + "user_expressions": [] + }, + "source": [ + "
\n", + "\n", + "**ColabFold v1.5.2-patch: AlphaFold2 using MMseqs2** \n", + "Easy-to-use protein structure and complex prediction using [AlphaFold2](https://www.nature.com/articles/s41586-021-03819-2) and [Alphafold2-multimer](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1). Sequence alignments/templates are generated through [MMseqs2](mmseqs.com) and [HHsearch](https://github.com/soedinglab/hh-suite). For more details, see [bottom](#Instructions) of the notebook, checkout the [ColabFold GitHub](https://github.com/sokrypton/ColabFold) and read the authors' manuscript: [Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: Making protein folding accessible to all.\n", + "*Nature Methods*, 2022](https://www.nature.com/articles/s41592-022-01488-1).\n", + "Old versions: [v1.4](https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.4.0/AlphaFold2.ipynb), [v1.5.1](https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.5.1/AlphaFold2.ipynb)\n", + "\n", + "**LICENSE** \n", + "The source code of ColabFold is licensed under [MIT](https://raw.githubusercontent.com/sokrypton/ColabFold/main/LICENSE). Additionally, this notebook uses the AlphaFold2 source code and its parameters licensed under [Apache 2.0](https://raw.githubusercontent.com/deepmind/alphafold/main/LICENSE) and [CC BY 4.0](https://creativecommons.org/licenses/by-sa/4.0/) respectively. Read more about the AlphaFold license [here](https://github.com/deepmind/alphafold). \n", + "\n", + "**PDB100** \n", + "As of 23/06/08, ColabFold has transitioned from using the PDB70 to a 100% clustered PDB, the PDB100. The construction methodology of PDB100 differs from that of PDB70. \n", + "The PDB70 was constructed by running each PDB70 representative sequence through [HHblits](https://github.com/soedinglab/hh-suite) against the [Uniclust30](https://uniclust.mmseqs.com/). \n", + "On the other hand, the PDB100 is built by searching each PDB100 representative structure with [Foldseek](https://github.com/steineggerlab/foldseek) against the [AlphaFold Database](https://alphafold.ebi.ac.uk). \n", + "*To maintain compatibility with older Notebook versions and local installations, the generated files and API responses will continue to be named \"PDB70\", even though we're now using the PDB100.* \n", + "\n", + "**USING CUSTOM TEMPLATES** \n", + "\\- Custom templates must follow the four letter PDB naming with lower case letters. \n", + "\\- Templates in mmCIF format must contain `_entity_poly_seq`. An error is thrown if this field is not present. The field `_pdbx_audit_revision_history.revision_date` is automatically generated if it is not present. \n", + "\\- Templates in PDB format are automatically converted to the mmCIF format. `_entity_poly_seq` and `_pdbx_audit_revision_history.revision_date` are automatically generated. \n", + "\\- If you encounter problems, please report them to this [issue](https://github.com/sokrypton/ColabFold/issues/177).\n", + "\n", + "**COMPARISON TO THE FULL ALPHAFOLD2 AND ALPHAFOLD2 COLAB** \n", + "This notebook replaces the homology detection and MSA pairing of AlphaFold2 with MMseqs2. For a comparison against the [AlphaFold2 Colab](https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb) and the full [AlphaFold2](https://github.com/deepmind/alphafold) system read our [paper](https://www.nature.com/articles/s41592-022-01488-1).\n", + "\n", + "**BUGS** \n", + "If you encounter any bugs in the original notebook, please report the issue to https://github.com/sokrypton/ColabFold/issues\n", + "\n", + "**LIMITATIONS** \n", + "*The ColabFold's authors recommend to additionally use the full [AlphaFold2 pipeline](https://github.com/deepmind/alphafold).* \n", + "\\- **Computing resources:** The original [ColabFold AlphaFold2 notebook](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) MMseqs2 API can handle ~20-50k requests per day. \n", + "\\- **MSAs:** MMseqs2 is very precise and sensitive but might find less hits compared to HHblits/HMMer searched against BFD or MGnify.\n", + "\n", + "**DESCRIPTION OF PLOTS** \n", + "\\- **Number of sequences per position** - We want to see at least 30 sequences per position, for best performance, ideally 100 sequences. \n", + "\\- **Predicted lDDT per position** - model confidence (out of 100) at each position. The higher the better. \n", + "\\- **Predicted Alignment Error** - For homooligomers, this could be a useful metric to assess how confident the model is about the interface. The lower the better. \n", + "\n", + "**COLABFOLD ACKNOWLEDGEMENTS** \n", + "\\- We thank the AlphaFold team for developing an excellent model and open sourcing the software. \n", + "\\- [KOBIC](https://kobic.re.kr) and [Söding Lab](https://www.mpinat.mpg.de/soeding) for providing the computational resources for the MMseqs2 MSA server. \n", + "\\- Richard Evans for helping to benchmark the ColabFold's Alphafold-multimer support. \n", + "\\- [David Koes](https://github.com/dkoes) for his awesome [py3Dmol](https://3dmol.csb.pitt.edu/) plugin, without whom these notebooks would be quite boring! \n", + "\\- Do-Yoon Kim for creating the ColabFold logo. \n", + "\\- A colab by Sergey Ovchinnikov ([@sokrypton](https://twitter.com/sokrypton)), Milot Mirdita ([@milot_mirdita](https://twitter.com/milot_mirdita)) and Martin Steinegger ([@thesteinegger](https://twitter.com/thesteinegger))." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "af2-env", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/tutorials/AlphaFold2/ignore/AlphaFold2_CPU.ipynb b/tutorials/AlphaFold2/ignore/AlphaFold2_CPU.ipynb new file mode 100644 index 0000000..529d4b0 --- /dev/null +++ b/tutorials/AlphaFold2/ignore/AlphaFold2_CPU.ipynb @@ -0,0 +1,2247 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "## Notebooks Hub AlphaFold2 Tutorial Using Jupyter Widgets\n", + "This is an adapted notebook of the original [ColabFold AlphaFold2 notebook](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) to serve as an example for Notebooks Hub using interactive Jupyter widgets. \n", + "**Note: This particular notebook runs on CPU and a GPU version is in process.**" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "Here are some helpful links to learn more about using AlphaFold and its impact in computational biology. \n", + "  \\- AlphaFold Protein Structure Database [FAQ](https://alphafold.ebi.ac.uk/faw) \n", + "  \\- neurosnap.ai's guides [Part 1](https://neurosnap.ai/blog/post/641a34a1148354cbab382afe) [Part 2](https://neurosnap.ai/blog/post/64222437a55063d26e9c069e) [Part 3](https://neurosnap.ai/blog/post/6422432aa55063d26e9c06a1) \n", + "  \\- Jumper et al (2021). Highly accurate protein structure prediction with AlphaFold. [doi: 10.1038/s41586-021-03819-2](https://doi.org/10.1038/s41586-021-03819-2) \n", + "  \\- Mirdita et al (2022). ColabFold: Making protein folding accessible to all. [doi: 10.1038/s41592-022-01488-1](https://doi.org/10.1038/s41592-022-01488-1) \n", + "  \\- Bertoline et al (2023). Before and after AlphaFold2: An overview of protein structure prediction. [doi: 10.3389/fbinf.2023.1120370](https://doi.org/10.3389/fbinf.2023.1120370) \n", + "  \\- Fang et al (2023). A method for multiple-sequence-alignment-free protein structure prediction using a protein language model. [doi: 10.1038/s42256-023-00721-6](https://doi.org/10.1038/s42256-023-00721-6) \n", + "\n", + "To learn more about Notebooks Hub or Jupyter Widgets, check out their documentation [here](https://polusai.github.io/notebooks-hub/) and [here](https://ipywidgets.readthedocs.io/en/8.1.2/index.html), respectively." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Setup\n", + "Import appropriate packages into your program to get started. These are necessary to run the AlphaFold predictions later on." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import os\n", + "import re\n", + "import hashlib\n", + "import random\n", + "import shutil\n", + "\n", + "from sys import version_info\n", + "python_version = f\"{version_info.major}.{version_info.minor}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The following packages will enable the build of interactive widgets to provide a better user experience. More information on building interactive widgets can be found in the [Jupyter Widgets documentation](https://ipywidgets.readthedocs.io/en/8.1.2/index.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from ipywidgets import interact, interactive, fixed, interact_manual, FileUpload, GridBox, Layout, VBox\n", + "import ipywidgets as widgets\n", + "from IPython.display import display, HTML\n", + "from ipyfilechooser import FileChooser" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Generate interactive widgets for query session inputs\n", + "An interactive widget will help the user select the necessary inputs that are required to run the analysis. Examples of additional styles and formats can be found in the [documentation's](https://ipywidgets.readthedocs.io/en/8.1.2/index.html) [widget list](https://ipywidgets.readthedocs.io/en/8.1.2/examples/Widget%20List.html). \n", + "1. The core parameters to define before running AlphaFold predictions include the **query sequence(s)**, **Amber relaxed model(s)**, and **template(s)**, as well as **jobname** to keep track of each query session. \n", + "2. Set up individual widget types for each prediction parameter. These are widget *children* that will be grouped into a *family* in the next step. For visual convenience, a border will be added to outline each family container." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `.HTML` widget type will display text that can be used as headers or descriptions. These strings can be formatted using html tags (e.g., ``). The `grid_area` attribute will help with widget placement inside the family container." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "html_input = widgets.HTML(description=\"Input Protein Sequences:\",\n", + " value=\"\",\n", + " style={'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_input'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Variables can be assigned values (or, as shown below, concatenated f-strings for large blocks of text) that can then be used to more easily attribute values inside each widget." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "qtips = (f\"Helpful Tips
\"\n", + " f\"Query: Use `:` to specify inter-protein chainbreaks for modeling complexes (supports homo- and hetro-oligomers). \"\n", + " f\"For example `PI...SK:PI...SK` for a homodimer.
\"\n", + " f\"Template Mode: Select the desired template to run predictions against.
\"\n", + " f\"Amber Relaxes: Specify how many of the top ranked structures to relax using Amber.
\"\n", + " f\"Amber` is a suite of programs that apply AMBER forcefields to simulations of biomolecules and molecular dynamics. \"\n", + " f\"Amber relaxed models relax acid side chain positions and are usually required for users who need accurate side-chain positions.\"\n", + " )\n", + "\n", + "html_qtips = widgets.HTML(description=\"\",\n", + " value=qtips,\n", + " style={'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_qtips'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `.Text` widget type will provide a box that allows the user to input text strings while `.Textarea` will provide an adjustable box. In this case, the adjustable box was selected for **query sequence**. This will be useful if the query is long or complex, because the user can view the input query in its entirety. If specified, the attribute `placeholder` will be visible in the text box when empty. An initial value can be assigned to the `value` attribute to pre-populate the box." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "text_queryseq = widgets.Textarea(value='PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK',\n", + " placeholder='Input Amino Acid Sequence for Protein of Interest',\n", + " description='Query Sequence:',\n", + " disabled=False,\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='700px', grid_area='text_queryseq'))\n", + "\n", + "text_jobname = widgets.Text(value='test',\n", + " placeholder='Type Jobname',\n", + " description='Jobname:',\n", + " disabled=False,\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='auto', grid_area='text_jobname'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The next two widgets need to offer pre-determined options for user selection. These options can be defined for each necessary parameters using the `options` attribute within each appropriate individual. The default selection within these options can be specified using the `value` attribute. \n", + "\n", + "The `RadioButtons` widget type will list the possible options with buttons for a single selection." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "buttons_num_relax = widgets.RadioButtons(options=[('0',0),('1',1),('5',5)],\n", + " value=0,\n", + " description='Number of Amber Relaxes:',\n", + " disabled=False,\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='auto', grid_area='buttons_num_relax'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `.ToggleButtons` widget will display buttons that allow the user to make one choice of the given options. This widget type is very helpful, because the attribute `tooltips` enables descriptions for each option upon hovering over with the mouse pointer." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "buttons_template_mode = widgets.ToggleButtons(options=['none','pdb100','custom'],\n", + " description='Template Mode:',\n", + " disabled=False,\n", + " button_style='',\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='auto', grid_area='buttons_template_mode'),\n", + " tooltips=['no template information is used',\n", + " 'detect templates in pdb100',\n", + " 'upload and search own templates (PDB or mmCIF format, see notes) to bias AlphaFold\\'s predictions'])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Now, set up the widget family. Jupyter's `Box` widgets utilize the CSS [flexbox spec](https://css-tricks.com/snippets/css/a-guide-to-flexbox/) for gathering individual widgets within a container. The family below will use the `GridBox` container that can be customized according to the CSS [Grid layouts](https://css-tricks.com/snippets/css/complete-guide-grid). \n", + "1. Using the `GridBox` container, define its children from the code above.\n", + "2. Lay out children as desired within container. Each child's `Layout.grid_area` attribute will need to have a matching label inside the container's `Layout.grid_template_areas` attribute.\n", + "3. `display()` will display the interactive widget family." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9cd2ade9f4bd4a91aa565d6426431699", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "GridBox(children=(HTML(value='', description='Input Protein Sequences:', layout=Layout(grid_area='html_…" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "controls_query = GridBox(children=[html_input, html_qtips, text_queryseq, text_jobname, buttons_num_relax, buttons_template_mode],\n", + " layout=Layout(\n", + " border='solid 1.5px',\n", + " width='1255px',\n", + " grid_template_rows='auto auto auto auto auto',\n", + " grid_template_columns='300px 450px 500px',\n", + " grid_template_areas='''\n", + " \"html_input html_input html_qtips\"\n", + " \"text_queryseq text_queryseq html_qtips\"\n", + " \"text_jobname . html_qtips\"\n", + " \"buttons_template_mode buttons_template_mode html_qtips\"\n", + " \"buttons_num_relax buttons_num_relax html_qtips\"\n", + " ''')\n", + " )\n", + "\n", + "display(controls_query)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Access widget outputs to generate a new directory for saving prediction results and queries" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The following code defines functions to help augment jobnames to minimize the risk of files being overwritten if the same sequence was queried multiple times using different parameter values. The functions below will append an underscore and integer at the end of the jobname with each sequential run (e.g., `_0`, `_1`, ...)." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "def add_hash(x,y):\n", + " return x+\"_\"+hashlib.sha1(y.encode()).hexdigest()[:5]\n", + "\n", + "def update_jobname(jobname):\n", + " basejobname = \"\".join(jobname.split())\n", + " basejobname = re.sub(r'\\W+', '', basejobname)\n", + " jobname_new = add_hash(basejobname, query_sequence)\n", + " \n", + " return jobname_new" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "User-defined and selected values from the widget can be accessed through each children's `.value` to save as accessible variables. This is shown in the top portion of the following code block. \n", + "  • With the rest of the code block, the system will then check in the working path for a directory sharing the same jobname. If one does not exist, it will create one. If one does, it will create an iteration (e.g., `_0`, `_1`, ...). \n", + "  • ***Note:*** *these code chunks are required to be in the same cell, otherwise iterative numbering does not work as intended (i.e., recursively appends `_0` instead of increasing in value).*" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "jobname: test_a5e17_0\n", + "sequence: PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK\n", + "length: 59\n", + "relax: 0\n", + "template: none\n" + ] + } + ], + "source": [ + "# save outputs as accessible variables\n", + "jobname = text_jobname.value\n", + "query_sequence = text_queryseq.value\n", + "num_relax = buttons_num_relax.value\n", + "template_mode = buttons_template_mode.value\n", + "\n", + "use_amber = num_relax > 0\n", + "length = len(query_sequence.replace(\":\",\"\"))\n", + "\n", + "# remove whitespaces and update jobname\n", + "query_sequence = \"\".join(query_sequence.split())\n", + "jobname = update_jobname(jobname)\n", + "\n", + "# check if directory with jobname exists\n", + "def check(folder):\n", + " if os.path.exists(folder):\n", + " return False\n", + " else:\n", + " return True\n", + " \n", + "if not check(jobname):\n", + " n = 0\n", + " while not check(f\"{jobname}_{n}\"): n += 1\n", + " jobname = f\"{jobname}_{n}\"\n", + "\n", + "# make directory to save results\n", + "os.makedirs(jobname, exist_ok=True)\n", + "\n", + "# save a copy of the query sequence in the newly generated folder\n", + "queries_path = os.path.join(jobname, f\"{jobname}.csv\")\n", + "with open(queries_path, \"w\") as text_file:\n", + " text_file.write(f\"id,sequence\\n{jobname},{query_sequence}\")\n", + "\n", + "# for verification purposes, return the session's information.\n", + "print(f\"jobname: {jobname}\" \"\\n\"\n", + " f\"sequence: {query_sequence}\" \"\\n\"\n", + " f\"length: {length}\" \"\\n\"\n", + " f\"relax: {num_relax}\" \"\\n\"\n", + " f\"template: {template_mode}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Generate file upload widgets to select custom templates for predictions" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "**Local File Upload:** The `FileUpload` widget allows the user to select local computer files for upload to the current working directory on the server. AlphaFold allows for multiple custom templates, so `multiple=TRUE` was set. *One* specific file extension can be specified inside the attribute `accept=''`. \n", + "  • ***Note:*** *This will replace pre-existing files in the current directory with the same name. Please rename if necessary.* \n", + "  • ***Note:*** *The counter shown will increase despite re-selecting a file. The cell containing `display(upload)` must be rerun to reset the counter.* " + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "upload = FileUpload(accept='', multiple=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "**Server File Upload:** The `FileChooser` widget allows the user to select a single file that is already present on the server. Default files can be shown by defining `fc.filter_pattern` with one or more specific extensions. For AlphaFold, templates should be PDB or PDBx/mmCIF format. \n", + "  • **Note:** ipyfilechooser is a separate package that works in conjunction with ipywidgets. \n", + "  • If a file from the server was selected, `os.rename` will move the file into the template folder inside the current query session's directory (i.e., */\\/template/\\*) \n", + "  • If a file from the server was selected, `os.rename` will move the file into the template folder inside the current query session's directory (i.e., */\\/template/\\*) " + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "fc = FileChooser(\n", + " os.getcwd(),\n", + " filename='',\n", + " title='Select custom template(s)
Note: must follow four letter PDB naming with lower case letters',\n", + " show_hidden=False,\n", + " select_default=False,\n", + " show_only_dirs=False\n", + " )\n", + "\n", + "fc.filter_pattern = ['*.pdb', '*.pdbx', '*.txt'] " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The following statements set variables for downstream analysis and will display the upload widgets only if the `template_mode` was set to *custom*. A template folder will also be generated inside the current jobname's directory to store template files. \n", + "  • ***Note:*** *In order to cancel file selection from the server, the previous cell must be rerun.*" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# set variables and display file uploaders if template mode is set to custom\n", + "if template_mode == \"pdb100\":\n", + " use_templates = True\n", + " custom_template_path = None\n", + "elif template_mode == \"custom\":\n", + " custom_template_path = os.path.join(jobname,f\"template\")\n", + " os.makedirs(custom_template_path, exist_ok=True)\n", + " use_templates = True\n", + " display(fc)\n", + " display(upload)\n", + "else:\n", + " custom_template_path = None\n", + " use_templates = False" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "File selected from the server: None\n", + "File(s) selected from the local host for upload: None\n" + ] + } + ], + "source": [ + "# move server file to session's template folder\n", + "if template_mode == \"custom\":\n", + " if fc.selected is not None:\n", + " for fn in fc.selected:\n", + " os.rename(fn,os.path.join(custom_template_path,fn))\n", + "\n", + "# return filenames of all files selected for custom template use.\n", + "if not upload.value:\n", + " fps = \"None\"\n", + "print(f\"File selected from the server: {fc.selected}\")\n", + "print(f\"File(s) selected from the local host for upload: {fps}\")\n", + "\n", + "# upload local files to server and place inside session's template folder\n", + "if upload.value:\n", + " fps = []\n", + " for fp in upload.value:\n", + " fps.append(f\"{fp}\")\n", + " with open(fp, 'wb') as output_file:\n", + " content = upload.value[fp]['content']\n", + " output_file.write(content)\n", + " os.rename(fp,os.path.join(custom_template_path,fp))\n", + " print(f\">> {fp} successfully uploaded\")\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Install dependencies" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Duplicate variables (with capital letters) for the program to find so that it can install any necessary dependencies." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "USE_AMBER = use_amber\n", + "USE_TEMPLATES = use_templates\n", + "PYTHON_VERSION = python_version" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The following code was minimally adapted from the original code and is based on various combinations of parameters selected for analysis. It will scan the current working directory for files indicating the appropriate modules are installed, create a symbolic link between the package location and current working directory, add patches, and create/run appropriate files needed to run AlphaFold predictions." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "cellView": "form", + "id": "AzIKiDiCaHAn", + "tags": [] + }, + "outputs": [], + "source": [ + "if not os.path.isfile(\"COLABFOLD_READY\"):\n", + " print(\"installing colabfold...\")\n", + " os.system(\"ln -s /opt/modules/my/conda-envs/alphafold-test/lib/python3.9/site-packages/colabfold colabfold\")\n", + " os.system(\"ln -s /opt/modules/my/conda-envs/alphafold-test/lib/python3.9/site-packages/alphafold alphafold\")\n", + " # patch for jax > 0.3.25\n", + " os.system(\"sed -i 's/weights = jax.nn.softmax(logits)/logits=jnp.clip(logits,-1e8,1e8);weights=jax.nn.softmax(logits)/g' alphafold/model/modules.py\")\n", + " os.system(\"touch COLABFOLD_READY\")\n", + " print(\"colabfold ready!\")\n", + "\n", + "if USE_AMBER or USE_TEMPLATES:\n", + " if not os.path.isfile(\"CONDA_READY\"):\n", + " print(\"installing conda...\")\n", + " os.system(\"wget -qnc https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh\")\n", + " os.system(\"bash Mambaforge-Linux-x86_64.sh -bfp /usr/local\")\n", + " os.system(\"mamba config --set auto_update_conda false\")\n", + " os.system(\"touch CONDA_READY\")\n", + " print(\"conda ready!\")\n", + "\n", + "if USE_TEMPLATES and not os.path.isfile(\"HH_READY\") and USE_AMBER and not os.path.isfile(\"AMBER_READY\"):\n", + " print(\"installing hhsuite and amber...\")\n", + " os.system(f\"mamba install -y -c conda-forge -c bioconda kalign2=2.04 hhsuite=3.3.0 openmm=7.7.0 python='{PYTHON_VERSION}' pdbfixer\")\n", + " os.system(\"touch HH_READY\")\n", + " os.system(\"touch AMBER_READY\")\n", + " print(\"hhsuite & amber ready!\")\n", + " \n", + "else:\n", + " if USE_TEMPLATES and not os.path.isfile(\"HH_READY\"):\n", + " print(\"installing hhsuite...\")\n", + " os.system(f\"mamba install -y -c conda-forge -c bioconda kalign2=2.04 hhsuite=3.3.0 python='{PYTHON_VERSION}'\")\n", + " os.system(\"touch HH_READY\")\n", + " print(\"hhsuite ready!\")\n", + " if USE_AMBER and not os.path.isfile(\"AMBER_READY\"):\n", + " print(\"installing amber...\")\n", + " os.system(f\"mamba install -y -c conda-forge openmm=7.7.0 python='{PYTHON_VERSION}' pdbfixer\")\n", + " os.system(\"touch AMBER_READY\")\n", + " print(\"amber ready!\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Generate interactive widget for multiple sequence alignment options" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "AlphaFold's AI was trained with multiple sequence alignment (MSA), paired residues, and experimentally validated protein structures from the [RSCB Protein Data Bank (PDB)](https://www.rcsb.org/). AlphaFold2 uses MMseq2 [(Many-against-Many searching)](https://mmseqs.com/latest/userguide.pdf) software to search and cluster huge sequence sets from databases that comprise of UniRef [(UniProt Reference Clusters)](https://www.uniprot.org/help/uniref) and its own novel [environmental database](https://colabfold.mmseqs.com/), referred to as _env_ inside widget options. MSA pairing can also be controlled to improve prediction accuracy for protein complexes. A new family of widgets will be created below for these options." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Use the `.HTML` widget type to create a descriptive header." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "html_msaopts = widgets.HTML(description=\"Multiple Sequence Alignment Options (custom MSA upload, single sequence, pairing mode)\",\n", + " value=\"\",\n", + " style= {'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_msaopts'),)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `.Select` widget type will display a box with all possible options for selection by row." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "select_msa_mode = widgets.Select(options=['mmseqs2_uniref_env', 'mmseqs2_uniref', 'single_sequence', 'custom'],\n", + " value='mmseqs2_uniref_env',\n", + " description='MSA mode:',\n", + " rows=5,\n", + " disabled=False,\n", + " layout=Layout(width='auto', grid_area='select_msa_mode'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `.ToggleButtons` type will display options with the helpful description upon hover." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "buttons_pair_mode = widgets.ToggleButtons(options=['unpaired_paired', 'paired', 'unpaired'],\n", + " description='Pair Mode:',\n", + " disabled=False,\n", + " button_style='',\n", + " layout=Layout(width='auto', grid_area='buttons_pair_mode'),\n", + " tooltips=['pair sequences from same species + unpaired MSA',\n", + " 'seperate MSA for each chain',\n", + " 'only use paired sequences'])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Now, set up the widget family.\n", + "1. Using the `GridBox` container, define its children from the code above.\n", + "2. `display()` will display the interactive widget family." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7030ed90c3f44ae482d2e4faf9114729", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "GridBox(children=(HTML(value='', description='Multiple Sequence Alignment Options (custom MSA upload, singl…" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "controls_msa = GridBox(children=[html_msaopts, select_msa_mode, buttons_pair_mode],\n", + " layout=Layout(\n", + " border='solid 1.5px',\n", + " width='605px',\n", + " grid_template_rows='auto auto',\n", + " grid_template_columns='300px 300px',\n", + " grid_template_areas='''\n", + " \"html_msaopts html_msaopts\"\n", + " \"select_msa_mode buttons_pair_mode\"\n", + " ''')\n", + " )\n", + "\n", + "display(controls_msa)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Next, save the widget selections as accessible variables." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "msa_mode = select_msa_mode.value\n", + "pair_mode = buttons_pair_mode.value" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "#### Custom MSA file (.a3m formatted)\n", + "##### DISCLAIMER: HAVE NOT TESTED FUNCTIONALITY OF USING CUSTOM A3M FILE AFTER SUCCESSFUL UPLOAD\n", + "Custom MSA allows users to provide their own alignment files for multiple sequence alignment. Any kind of alignment tool can be used to generate the MSA, including the [HHblits Toolkit server](https://toolkit.tuebingen.mpg.de/tools/hhblits)." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "cellView": "form", + "id": "C2_sh2uAonJH", + "tags": [] + }, + "outputs": [], + "source": [ + "# create additional file uploaders to use for custom MSA\n", + "upload_msa = FileUpload(accept='.a3m', multiple=False)\n", + "\n", + "# decide which a3m to use\n", + "if \"mmseqs2\" in msa_mode:\n", + " a3m_file = os.path.join(jobname,f\"{jobname}.a3m\")\n", + "\n", + "elif msa_mode == \"custom\":\n", + " a3m_file = os.path.join(jobname,f\"{jobname}.custom.a3m\")\n", + " if not os.path.isfile(a3m_file):\n", + " print(\"The first FASTA entry of the A3M file must be the query sequence without gaps.\")\n", + " display(upload_msa)\n", + "else:\n", + " a3m_file = os.path.join(jobname,f\"{jobname}.single_sequence.a3m\")\n", + " with open(a3m_file, \"w\") as text_file:\n", + " text_file.write(\">1\\n%s\" % query_sequence)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The following code cell will save the selected local file to the server and create a renamed copy for the program to access." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "# upload local file to session's folder on server\n", + "if upload_msa.value:\n", + " up_msa = upload_msa.value\n", + " fpmsa = []\n", + " for fn, fd in up_msa.items():\n", + " fpmsa.append(f\"{fn}\")\n", + " with open(fn, 'wb') as output_file:\n", + " content = fd['content']\n", + " output_file.write(content)\n", + " os.rename(fn,os.path.join(jobname,fn))\n", + " print(f\"{fn} successfully uploaded. Don't forget to cite your custom MSA generation method!\")\n", + "\n", + "if upload_msa.value:\n", + " orig_msa = f\"{jobname}/{fpmsa[0]}\"\n", + " custom_msa = shutil.copy2(orig_msa,f\"{jobname}/strip_{fpmsa[0]}\") # copy file as backup or for preservation purposes\n", + "\n", + " header = 0\n", + " import fileinput\n", + " for line in fileinput.FileInput(custom_msa,inplace=True):\n", + " if line.startswith(\">\"):\n", + " header = header + 1\n", + " if not line.rstrip():\n", + " continue\n", + " if line.startswith(\">\") == False and header == 1:\n", + " query_sequence = line.rstrip()\n", + " print(line, end='')\n", + " \n", + " os.rename(custom_msa, a3m_file)\n", + " queries_path=a3m_file\n", + " print(f\"Moving {custom_msa} to {a3m_file} for use by AlphaFold.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Advanced Settings" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Create the widget header and a tips box using the `.HTML` widget type." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "html_advset = widgets.HTML(description=\"Advanced Settings:\",\n", + " value=\"\",\n", + " style={'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_advset'))\n", + "\n", + "advtips = (f\"Helpful Tips
\"\n", + " f\"Model Type: Choose the structural dimentions for prediction (i.e., oligomeric or multimeric). For monomer predictions, choose alphafold2-ptm. \"\n", + " f\"Auto permits the model to decide and will use alphafold2_multimer_v3 for complex prediction.
\"\n", + " f\"Number of Recycles: Enables multiple reiterations through the sequence by building off its own predictions. \"\n", + " f\"The default is 3, but 6+ will enable a more accurate prediction despite longer runtimes.
\"\n", + " f\"Recycle Early Stop Tolerance: Auto tolerance will be 0.0, unless using alphafold2_multimer_v3.
\"\n", + " f\"Alphafold2_multimer_v3: For complex predictions using this model, `auto` will result in recycles = 20 and tolerance = 0.05.\"\n", + " )\n", + "\n", + "html_advtips = widgets.HTML(description=\"\",\n", + " value=advtips,\n", + " style={'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_advtips'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `.Dropdown` widget type will display a single selection list in dropdown format. Create dropdown widgets to select AlphaFold **model types**, **number of recycles**, and **recycle early stop tolerance** values." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "drop_model_type = widgets.Dropdown(options=['auto','alphafold2_ptm','alphafold2_multimer_v1','alphafold2_multimer_v2','alphafold2_multimer_v3'],\n", + " value='auto',\n", + " description='Model Type:',\n", + " disabled=False,\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='auto', grid_area='drop_model_type'))\n", + "\n", + "drop_num_recycles = widgets.Dropdown(options=[('auto','auto'),('0',0),('1',1),('3',3),('6',6),('12',12),('24',24),('48',48)],\n", + " value='auto',\n", + " description='Number of Recycles:',\n", + " disabled=False,\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='auto', grid_area='drop_num_recycles'))\n", + "\n", + "drop_tol = widgets.Dropdown(options=[('auto','auto'),('0.0',0.0),('0.5',0.5),('1.0',1.0)],\n", + " value='auto',\n", + " description='Recycle Early Stop Tolerance:',\n", + " disabled=False,\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='auto', grid_area='drop_tol'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Use `.ToggleButtons` to create toggle buttons to select **pairing strategy** and have descriptions display upon hovering over each button." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "buttons_pairing_strategy = widgets.ToggleButtons(options=['greedy','complete'],\n", + " description='Pairing Strategy:',\n", + " disabled=False,\n", + " button_style='',\n", + " style={'description_width': '170px'},\n", + " layout=Layout(width='auto', grid_area='buttons_pairing_strategy'),\n", + " tooltips=['pair any taxonomically matching subsets',\n", + " ' all sequences have to match in one line'])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Now, set up the widget family, once again using `GridBox` and `display()`." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "collapsed": true, + "jupyter": { + "outputs_hidden": true + }, + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9253f05178e24e71aac5d7ce4e9a6024", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "GridBox(children=(HTML(value='', description='Advanced Settings:', layout=Layout(grid_area='html_advset…" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "controls_advset = GridBox(children=[html_advset, html_advtips, drop_model_type, drop_num_recycles, drop_tol, buttons_pairing_strategy],\n", + " layout=Layout(\n", + " border='solid 1.5px',\n", + " width='1305px',\n", + " grid_template_rows='auto auto auto auto auto',\n", + " grid_template_columns='350px 200px 750px',\n", + " grid_template_areas='''\n", + " \"html_advset html_advset html_advtips\"\n", + " \"drop_model_type . html_advtips\"\n", + " \"drop_num_recycles . html_advtips\"\n", + " \"drop_tol . html_advtips\"\n", + " \"buttons_pairing_strategy buttons_pairing_strategy html_advtips\"\n", + " ''')\n", + " )\n", + "\n", + "display(controls_advset)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Lastly, save the widget selections as accessible variables." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "model_type = drop_model_type.value\n", + "pairing_strategy = buttons_pairing_strategy.value\n", + "\n", + "if drop_model_type.value != 'alphafold2_multimer_v3':\n", + " if drop_num_recycles.value == 'auto':\n", + " num_recycles = 3\n", + " else:\n", + " num_recycles = drop_num_recycles.value\n", + " \n", + " if drop_tol.value == 'auto':\n", + " recycle_early_stop_tolerance = 0.0\n", + " else:\n", + " recycle_early_stop_tolerance = drop_tol.value\n", + "\n", + "elif drop_model_type.value == 'alphafold2_multimer_v3':\n", + " if drop_num_recycles.value == 'auto':\n", + " num_recycles = 20\n", + " else:\n", + " num_recycles = drop_num_recycles.value\n", + "\n", + " if drop_tol.value == 'auto':\n", + " recycle_early_stop_tolerance = 0.5\n", + " else:\n", + " recycle_early_stop_tolerance = drop_tol.value" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Generate interactive widget to define sample settings" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "`.HTML` widgets can be used again to add headers as well as additional text to provide setting tips." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "user_expressions": [] + }, + "outputs": [], + "source": [ + "html_sampset = widgets.HTML(description=\"Sample Settings:\",\n", + " value=\"\",\n", + " style= {'description_width': 'initial'})\n", + "\n", + "msatips = (f\"Helpful Tips
\"\n", + " f\"- Decrease Max MSA to increase uncertainty.
\"\n", + " f\"- Enable dropouts and increase # seeds to sample predictions from uncertainty of the model.\"\n", + " )\n", + "\n", + "html_msatips = widgets.HTML(description=\"\",\n", + " value=msatips,\n", + " style={'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_msatips'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "A `.SelectionSlider` widget will be used alongside the standard `.Dropdown` type used previously. The selection slider offers a range of custom values without conforming to a uniform increment." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "drop_max_msa = widgets.Dropdown(options=['auto','512:1024','256:512','64:128','32:64', '16:32'],\n", + " value='auto',\n", + " description='Max MSA:',\n", + " disabled=False)\n", + "\n", + "slider_num_seeds = widgets.SelectionSlider(options=[('1',1),('2',2),('4',4),('8',8),('16',16)],\n", + " value=1,\n", + " description='# seeds:',\n", + " disabled=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Here, a `.Checkbox` widget type that can be selected or unselected is introduced." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "cb_dropout = widgets.Checkbox(value=False, description='Use Dropout', disabled=False, indent=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "Now, set up the widget family, once again using `GridBox` and `display()`." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "ff6a53f08157477fac47dfc1cee7761d", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "GridBox(children=(HTML(value='', description='Sample Settings:', style=DescriptionStyle(description_wid…" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "controls_sampset = GridBox(children=[html_sampset, html_msatips, drop_max_msa, slider_num_seeds, cb_dropout],\n", + " layout=Layout(\n", + " border='solid 1.5px',\n", + " width='805px',\n", + " grid_template_rows='auto auto auto auto auto',\n", + " grid_template_columns='400px 400px',\n", + " grid_template_areas='''\n", + " \"h_sampset html_msatips\"\n", + " \"drop_max_msa html_msatips\"\n", + " \"slider_num_seeds html_msatips\"\n", + " \"cb_dropout .\"\n", + " ''')\n", + " )\n", + "\n", + "display(controls_sampset)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Save the widget selections as accessible variables. These will also be used to assign other values as depicted in the bottom half of the code cell." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [], + "source": [ + "max_msa = drop_max_msa.value\n", + "num_seeds = slider_num_seeds.value\n", + "use_dropout = cb_dropout.value\n", + "\n", + "num_recycles = None if num_recycles == \"auto\" else int(num_recycles)\n", + "recycle_early_stop_tolerance = None if recycle_early_stop_tolerance == \"auto\" else float(recycle_early_stop_tolerance)\n", + "if max_msa == \"auto\": max_msa = None" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Generate interactive widget to toggle save settings" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `.IntText` widget can be used to allow the user to specify any integer inside its given text box." + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [], + "source": [ + "text_save_dpi = widgets.IntText(value=200,\n", + " description='dpi:',\n", + " disabled=False,\n", + " layout=Layout(width='auto', grid_area='text_save_dpi'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "Set up individual `.HTML` widgets to display text and `.Checkboxes` for toggles." + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [], + "source": [ + "html_saveset = widgets.HTML(description=\"Save Settings:\",\n", + " value=\"\",\n", + " style= {'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_saveset'))\n", + "\n", + "html_savedpi = widgets.HTML(description=\"Set dpi for image resolution:\",\n", + " value=\"\",\n", + " style= {'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_savedpi'))\n", + "\n", + "cb_savefull = widgets.Checkbox(value=False,description='Save All', disabled=False, layout=Layout(width='auto', grid_area='cb_savefull'))\n", + "\n", + "cb_saverecyc = widgets.Checkbox(value=False, description='Save Recycles', disabled=False, layout=Layout(width='auto', grid_area='cb_saverecyc'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Set up the widget family, once again using `GridBox` and `display()`. This time columns are also utilized. For proper layout assignment into the family container, `grid_area` was defined for each widget child." + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2fb08726f7d047f48aca368ea456cc4b", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "GridBox(children=(HTML(value='', description='Save Settings:', layout=Layout(grid_area='html_saveset', …" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "controls_saveset = GridBox(children=[html_saveset, html_savedpi, text_save_dpi, cb_savefull, cb_saverecyc],\n", + " layout=Layout(\n", + " border='solid 1.5px',\n", + " width='455px',\n", + " grid_template_rows='auto auto auto auto',\n", + " grid_template_columns='150px 300px',\n", + " grid_template_areas='''\n", + " \"html_saveset .\"\n", + " \"html_savedpi text_save_dpi\"\n", + " \". cb_savefull\"\n", + " \". cb_saverecyc\"\n", + " ''')\n", + " )\n", + "\n", + "display(controls_saveset)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Save the widget selections as accessible variables. " + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [], + "source": [ + "save_all = cb_savefull.value\n", + "save_recycles = cb_saverecyc.value\n", + "dpi = text_save_dpi.value" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Prepare prediction run" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Use a simple `.Checkbox` widget to allow the user to toggle whether or not images should be displayed during the prediction run." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "755dbc2dad2e452a859217e852a90d3c", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Checkbox(value=True, description='Display Images', layout=Layout(border='solid 1.5px'))" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "cb_displayimg = widgets.Checkbox(value=True, description='Display Images', disabled=False, indent=True, layout=Layout(border='solid 1.5px'))\n", + "display(cb_displayimg)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Assign the selection as an accessible variable." + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "display_images = cb_displayimg" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The following package imports are necessary to finally run the structural predictions. \n", + "**Note:** *`colabfold.colabfold.py` was renamed to `colabfold.cf.py` directly inside the package contents, and `colabfold.colabfold` was renamed to `colabfold.cf` in the next cell and inside `colabfold.batch.py`.*" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [], + "source": [ + "import sys\n", + "import warnings\n", + "warnings.simplefilter(action='ignore', category=FutureWarning)\n", + "from Bio import BiopythonDeprecationWarning\n", + "#warnings.simplefilter(action='ignore', category=BiopythonDeprecationWarning)\n", + "from pathlib import Path\n", + "from colabfold.download import download_alphafold_params, default_data_dir\n", + "from colabfold.utils import setup_logging\n", + "from colabfold.batch import get_queries, run, set_model_type\n", + "from colabfold.plot import plot_msa_v2\n", + "\n", + "from colabfold.cf import plot_protein\n", + "from pathlib import Path\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import os\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Add system paths to the new dependencies that were installed previously." + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [], + "source": [ + "# pdbfixer 1.8.1\n", + "if use_amber and f\"/opt/conda/pkgs/pdbfixer-1.8.1-pyh6c4a22f_0/site-packages/\" not in sys.path:\n", + " sys.path.insert(0, f\"/opt/conda/pkgs/pdbfixer-1.8.1-pyh6c4a22f_0/site-packages/\")\n", + "\n", + "# openmm 7.7.0\n", + "if use_amber and f\"/opt/conda/pkgs/openmm-7.7.0-py39h15fbce5_1/lib/python3.9/site-packages\" not in sys.path:\n", + " sys.path.insert(0, f\"/opt/conda/pkgs/openmm-7.7.0-py39h15fbce5_1/lib/python3.9/site-packages\")\n", + "\n", + "# kalign2 2.0.4\n", + "if use_templates and f\"/opt/conda/pkgs/kalign2-2.04-h031d066_5/bin\" not in sys.path:\n", + " sys.path.insert(0, f\"/opt/conda/pkgs/kalign2-2.04-h031d066_5/bin\")\n", + "\n", + "# hhsuite 3.3.0\n", + "if use_templates and f\"/opt/conda/pkgs/hhsuite-3.3.0-py39pl5321he10ea66_9/bin\" not in sys.path:\n", + " sys.path.insert(0, f\"/opt/conda/pkgs/hhsuite-3.3.0-py39pl5321he10ea66_9/bin\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Additionally, the following cell defines a few necessary functions for ColabFold/AlphaFold." + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [], + "source": [ + "def input_features_callback(input_features):\n", + " if display_images:\n", + " plot_msa_v2(input_features)\n", + " plt.show()\n", + " plt.close()\n", + "\n", + "def prediction_callback(protein_obj, length,\n", + " prediction_result, input_features, mode):\n", + " model_name, relaxed = mode\n", + " if not relaxed:\n", + " if display_images:\n", + " fig = plot_protein(protein_obj, Ls=length, dpi=150)\n", + " plt.show()\n", + " plt.close()\n", + "\n", + "result_dir = jobname\n", + "log_filename = os.path.join(jobname,\"log.txt\")\n", + "if not os.path.isfile(log_filename) or 'logging_setup' not in globals():\n", + " setup_logging(Path(log_filename))\n", + " logging_setup = True\n", + "\n", + "queries, is_complex = get_queries(queries_path)\n", + "model_type = set_model_type(is_complex, model_type)\n", + "\n", + "if \"multimer\" in model_type and max_msa is not None:\n", + " use_cluster_profile = False\n", + "else:\n", + " use_cluster_profile = True" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "### Run AlphaFold2 predictions" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "**Note:** IF USING AMBER RELAXATION: User may receive the following error during pLDDT reranking and may be unable to continue forward. Adding `run_relax=false` somewhere inside may help with this issue [see here](https://github.com/google-deepmind/alphafold/issues/112). \n", + "> ValueError: Minimization failed after 100 attempts." + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": { + "cellView": "form", + "collapsed": true, + "id": "mbaIO9pWjaN0", + "jupyter": { + "outputs_hidden": true + }, + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-10-12 15:30:50,210 Unable to initialize backend 'cuda': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'\n", + "2023-10-12 15:30:50,228 Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'\n", + "2023-10-12 15:30:50,229 Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available.\n", + "2023-10-12 15:30:50,229 No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)\n", + "2023-10-12 15:30:50,229 WARNING: no GPU detected, will be using CPU\n", + "2023-10-12 15:30:58,115 Found 4 citations for tools or databases\n", + "2023-10-12 15:30:58,116 Query 1/1: test_a5e17_0 (length 59)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "COMPLETE: 100%|██████████| 150/150 [elapsed: 00:02 remaining: 00:00]\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-10-12 15:31:01,304 Setting max_seq=512, max_extra_seq=5120\n", + "2023-10-12 15:33:18,608 alphafold2_ptm_model_1_seed_000 recycle=0 pLDDT=96.6 pTM=0.755\n", + "2023-10-12 15:35:13,837 alphafold2_ptm_model_1_seed_000 recycle=1 pLDDT=96.6 pTM=0.758 tol=0.297\n", + "2023-10-12 15:37:08,659 alphafold2_ptm_model_1_seed_000 recycle=2 pLDDT=96.4 pTM=0.757 tol=0.0433\n", + "2023-10-12 15:39:03,605 alphafold2_ptm_model_1_seed_000 recycle=3 pLDDT=96.2 pTM=0.757 tol=0.0395\n", + "2023-10-12 15:39:03,607 alphafold2_ptm_model_1_seed_000 took 476.0s (3 recycles)\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-10-12 15:40:58,848 alphafold2_ptm_model_2_seed_000 recycle=0 pLDDT=96.9 pTM=0.76\n", + "2023-10-12 15:42:53,596 alphafold2_ptm_model_2_seed_000 recycle=1 pLDDT=97 pTM=0.765 tol=0.366\n", + "2023-10-12 15:44:47,713 alphafold2_ptm_model_2_seed_000 recycle=2 pLDDT=96.9 pTM=0.766 tol=0.124\n", + "2023-10-12 15:46:42,150 alphafold2_ptm_model_2_seed_000 recycle=3 pLDDT=96.8 pTM=0.767 tol=0.0802\n", + "2023-10-12 15:46:42,152 alphafold2_ptm_model_2_seed_000 took 458.4s (3 recycles)\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-10-12 15:48:38,692 alphafold2_ptm_model_3_seed_000 recycle=0 pLDDT=97.2 pTM=0.775\n", + "2023-10-12 15:50:36,467 alphafold2_ptm_model_3_seed_000 recycle=1 pLDDT=97.4 pTM=0.782 tol=0.293\n", + "2023-10-12 15:52:34,415 alphafold2_ptm_model_3_seed_000 recycle=2 pLDDT=97.4 pTM=0.782 tol=0.116\n", + "2023-10-12 15:54:31,265 alphafold2_ptm_model_3_seed_000 recycle=3 pLDDT=97.4 pTM=0.784 tol=0.055\n", + "2023-10-12 15:54:31,266 alphafold2_ptm_model_3_seed_000 took 469.0s (3 recycles)\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-10-12 15:56:25,472 alphafold2_ptm_model_4_seed_000 recycle=0 pLDDT=97.4 pTM=0.775\n", + "2023-10-12 15:58:19,697 alphafold2_ptm_model_4_seed_000 recycle=1 pLDDT=97.4 pTM=0.782 tol=0.29\n", + "2023-10-12 16:00:13,941 alphafold2_ptm_model_4_seed_000 recycle=2 pLDDT=97.2 pTM=0.778 tol=0.069\n", + "2023-10-12 16:02:08,141 alphafold2_ptm_model_4_seed_000 recycle=3 pLDDT=97.1 pTM=0.779 tol=0.0458\n", + "2023-10-12 16:02:08,143 alphafold2_ptm_model_4_seed_000 took 456.8s (3 recycles)\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-10-12 16:04:01,285 alphafold2_ptm_model_5_seed_000 recycle=0 pLDDT=97.4 pTM=0.784\n", + "2023-10-12 16:05:55,664 alphafold2_ptm_model_5_seed_000 recycle=1 pLDDT=97 pTM=0.784 tol=0.234\n", + "2023-10-12 16:07:50,563 alphafold2_ptm_model_5_seed_000 recycle=2 pLDDT=96.4 pTM=0.777 tol=0.166\n", + "2023-10-12 16:09:45,743 alphafold2_ptm_model_5_seed_000 recycle=3 pLDDT=96.2 pTM=0.777 tol=0.128\n", + "2023-10-12 16:09:45,744 alphafold2_ptm_model_5_seed_000 took 457.5s (3 recycles)\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2023-10-12 16:09:45,856 reranking models by 'plddt' metric\n", + "2023-10-12 16:09:45,857 rank_001_alphafold2_ptm_model_3_seed_000 pLDDT=97.4 pTM=0.784\n", + "2023-10-12 16:09:45,864 rank_002_alphafold2_ptm_model_4_seed_000 pLDDT=97.1 pTM=0.779\n", + "2023-10-12 16:09:45,871 rank_003_alphafold2_ptm_model_2_seed_000 pLDDT=96.8 pTM=0.767\n", + "2023-10-12 16:09:45,879 rank_004_alphafold2_ptm_model_5_seed_000 pLDDT=96.2 pTM=0.777\n", + "2023-10-12 16:09:45,886 rank_005_alphafold2_ptm_model_1_seed_000 pLDDT=96.2 pTM=0.757\n", + "2023-10-12 16:09:47,878 Done\n" + ] + } + ], + "source": [ + "download_alphafold_params(model_type, Path(\".\"))\n", + "results = run(\n", + " queries=queries,\n", + " result_dir=result_dir,\n", + " use_templates=use_templates,\n", + " custom_template_path=custom_template_path,\n", + " num_relax=num_relax,\n", + " msa_mode=msa_mode,\n", + " model_type=model_type,\n", + " num_models=5,\n", + " num_recycles=num_recycles,\n", + " recycle_early_stop_tolerance=recycle_early_stop_tolerance,\n", + " num_seeds=num_seeds,\n", + " use_dropout=use_dropout,\n", + " model_order=[1,2,3,4,5],\n", + " is_complex=is_complex,\n", + " data_dir=Path(\".\"),\n", + " keep_existing_results=False,\n", + " rank_by=\"auto\",\n", + " pair_mode=pair_mode,\n", + " pairing_strategy=pairing_strategy,\n", + " stop_at_score=float(100),\n", + " prediction_callback=prediction_callback,\n", + " dpi=dpi,\n", + " zip_results=False,\n", + " save_all=save_all,\n", + " max_msa=max_msa,\n", + " use_cluster_profile=use_cluster_profile,\n", + " input_features_callback=input_features_callback,\n", + " save_recycles=save_recycles,\n", + " user_agent=\"colabfold/google-colab-main\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Once predictions are successfully generated, the results can be saved into the current directory as a zip file." + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'/home/jovyan/work/AlphaFold2/test_a5e17_0.result.zip'" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results = f\"{jobname}.result\"\n", + "\n", + "if not check(f\"{results}.zip\"):\n", + " n = 0\n", + " while not check(f\"{results}_{n}.zip\"): n += 1\n", + " results = f\"{results}_{n}\"\n", + " \n", + "shutil.make_archive(results, 'zip', jobname)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "**Result zip file contents** \n", + "1. PDB formatted structures sorted by avg. pLDDT and complexes are sorted by pTMscore. (unrelaxed and relaxed if `use_amber` is enabled).\n", + "2. Plots of the model quality.\n", + "3. Plots of the MSA coverage.\n", + "4. Parameter log file.\n", + "5. A3M formatted input MSA.\n", + "6. A `predicted_aligned_error_v1.json` using [AlphaFold-DB's format](https://alphafold.ebi.ac.uk/faq#faq-7) and a `scores.json` for each model which contains an array (list of lists) for PAE, a list with the average pLDDT and the pTMscore.\n", + "7. BibTeX file with citations for all used tools and databases." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "### Generate interactive widget to display 3D structure" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "Import additional packages to enable visualization in 3D. \n", + "**Note:** *`colabfold.colabfold.py` was renamed to `colabfold.cf.py` directly inside the package contents, and `colabfold.colabfold` was renamed to `colabfold.cf` in the next cell and inside `colabfold.batch.py`.*" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [], + "source": [ + "import py3Dmol\n", + "import glob\n", + "import matplotlib.pyplot as plt\n", + "from colabfold.cf import plot_plddt_legend\n", + "from colabfold.cf import pymol_color_list, alphabet_list" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Generate an interactive widget to select which ranked prediction to display, which specific color scheme, and whether to show sidechains and/or mainchains. The same widget types used earlier (`.HTML`, `.Dropdown`, `.Select`, `.Checkbox`) will be used for this family. \n", + "\n", + "***Note:*** *this will not update the 3D image in real time, so each time a different selection is selected, the cell containing the following functions will need to be rerun.* \n", + "    `show_pdb(rank_num, show_sidechains, show_mainchains, color).show()` \n", + "    `if color == \"pLDDT\": plot_plddt_legend().show()`" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "html_disp3d = widgets.HTML(description=\"Display 3D Structure:\",\n", + " value=\"\",\n", + " style= {'description_width': 'initial'},\n", + " layout=Layout(width='auto', grid_area='html_disp3d'))\n", + "\n", + "drop_rank_num = widgets.Dropdown(options=[('1',1),('2',2),('3',3),('4',4),('5',5)],\n", + " value=1,\n", + " description='rank_num:',\n", + " disabled=False,\n", + " layout=Layout(width='auto', grid_area='drop_rank_num'))\n", + "\n", + "select_color = widgets.Select(options=['chain','pLDDT','rainbow'],\n", + " value='pLDDT',\n", + " description='Color:',\n", + " rows=3,\n", + " disabled=False,\n", + " layout=Layout(width='auto', grid_area='select_color'))\n", + "\n", + "cb_sidechains = widgets.Checkbox(value=False,\n", + " description='show_sidechains',\n", + " disabled=False,\n", + " indent=True,\n", + " layout=Layout(width='auto', grid_area='cb_sidechains'))\n", + "\n", + "cb_mainchains = widgets.Checkbox(value=False,\n", + " description='show_mainchains',\n", + " disabled=False,\n", + " indent=True,\n", + " layout=Layout(width='auto', grid_area='cb_mainchains'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Display widget family." + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5b5008b9917b4c95a243e972bfc43c48", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "GridBox(children=(HTML(value='', description='Display 3D Structure:', layout=Layout(grid_area='html_dis…" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "controls_disp3d = GridBox(children=[html_disp3d, drop_rank_num, select_color, cb_sidechains, cb_mainchains],\n", + " layout=Layout(\n", + " border='solid 1.5px',\n", + " grid_template_rows='auto auto',\n", + " grid_template_columns='20% 20% 20%',\n", + " grid_template_areas='''\n", + " \"html_disp3d html_disp3d .\"\n", + " \"drop_rank_num select_color cb_sidechains\"\n", + " \". select_color cb_mainchains\"\n", + " ''')\n", + " )\n", + "\n", + "display(controls_disp3d)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The following code cell helps apply the proper display settings for the predicted protein structure selected in the widget." + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": { + "cellView": "form", + "id": "KK7X9T44pWb7" + }, + "outputs": [], + "source": [ + "def show_pdb(rank_num=1, show_sidechains=False, show_mainchains=False, color=\"pLDDT\"):\n", + " model_name = f\"rank_{rank_num}\"\n", + " view = py3Dmol.view(js='https://3dmol.org/build/3Dmol.js',)\n", + " #view.addModel(open(pdb_file[0],'r').read(),'pdb')\n", + " view.addModel(open(pdb_file[rank_num -1],'r').read(),'pdb')\n", + "\n", + " if color == \"pLDDT\":\n", + " view.setStyle({'cartoon': {'colorscheme': {'prop':'b','gradient': 'roygb','min':50,'max':90}}})\n", + " elif color == \"rainbow\":\n", + " view.setStyle({'cartoon': {'color':'spectrum'}})\n", + " elif color == \"chain\":\n", + " chains = len(queries[0][1]) + 1 if is_complex else 1\n", + " for n,chain,color in zip(range(chains),alphabet_list,pymol_color_list):\n", + " view.setStyle({'chain':chain},{'cartoon': {'color':color}})\n", + "\n", + " if show_sidechains:\n", + " BB = ['C','O','N']\n", + " view.addStyle({'and':[{'resn':[\"GLY\",\"PRO\"],'invert':True},{'atom':BB,'invert':True}]},\n", + " {'stick':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}})\n", + " view.addStyle({'and':[{'resn':\"GLY\"},{'atom':'CA'}]},\n", + " {'sphere':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}})\n", + " view.addStyle({'and':[{'resn':\"PRO\"},{'atom':['C','O'],'invert':True}]},\n", + " {'stick':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}})\n", + " if show_mainchains:\n", + " BB = ['C','O','N','CA']\n", + " view.addStyle({'atom':BB},{'stick':{'colorscheme':f\"WhiteCarbon\",'radius':0.3}})\n", + "\n", + " view.zoomTo()\n", + " return view" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Assign widget selections as accessible variables and show 3D protein structure." + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [ + { + "data": { + "application/3dmoljs_load.v0": "
\n

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol

\n
\n", + "text/html": [ + "
\n", + "

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n", + " jupyter labextension install jupyterlab_3dmol

\n", + "
\n", + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "rank_num = drop_rank_num.value\n", + "color = select_color.value\n", + "show_sidechains = cb_sidechains.value\n", + "show_mainchains = cb_mainchains.value\n", + "\n", + "jobname_prefix = \".custom\" if msa_mode == \"custom\" else \"\"\n", + "pdb_file = sorted(glob.glob(f\"./{jobname}\"+\"/*.pdb\"))\n", + "\n", + "# show result\n", + "show_pdb(rank_num, show_sidechains, show_mainchains, color).show()\n", + "if color == \"pLDDT\":\n", + " plot_plddt_legend().show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "##### Plots" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Generate master plot of mini plots generated from the prediction." + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": { + "cellView": "form", + "id": "11l8k--10q0C" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "
\n", + "

Plots for test_a5e17_0

\n", + " \n", + " \n", + " \n", + "
\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import base64\n", + "from html import escape\n", + "\n", + "# see: https://stackoverflow.com/a/53688522\n", + "def image_to_data_url(filename):\n", + " ext = filename.split('.')[-1]\n", + " prefix = f'data:image/{ext};base64,'\n", + " with open(filename, 'rb') as f:\n", + " img = f.read()\n", + " return prefix + base64.b64encode(img).decode('utf-8')\n", + "\n", + "pae = image_to_data_url(os.path.join(jobname,f\"{jobname}{jobname_prefix}_pae.png\"))\n", + "cov = image_to_data_url(os.path.join(jobname,f\"{jobname}{jobname_prefix}_coverage.png\"))\n", + "plddt = image_to_data_url(os.path.join(jobname,f\"{jobname}{jobname_prefix}_plddt.png\"))\n", + "display(HTML(f\"\"\"\n", + "\n", + "
\n", + "

Plots for {escape(jobname)}

\n", + " \n", + " \n", + " \n", + "
\n", + "\"\"\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G4yBrceuFbf3", + "tags": [], + "user_expressions": [] + }, + "source": [ + "
\n", + "\n", + "**ColabFold v1.5.2-patch: AlphaFold2 using MMseqs2** \n", + "Easy-to-use protein structure and complex prediction using [AlphaFold2](https://www.nature.com/articles/s41586-021-03819-2) and [Alphafold2-multimer](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1). Sequence alignments/templates are generated through [MMseqs2](mmseqs.com) and [HHsearch](https://github.com/soedinglab/hh-suite). For more details, see [bottom](#Instructions) of the notebook, checkout the [ColabFold GitHub](https://github.com/sokrypton/ColabFold) and read the authors' manuscript: [Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: Making protein folding accessible to all.\n", + "*Nature Methods*, 2022](https://www.nature.com/articles/s41592-022-01488-1).\n", + "Old versions: [v1.4](https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.4.0/AlphaFold2.ipynb), [v1.5.1](https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.5.1/AlphaFold2.ipynb)\n", + "\n", + "**LICENSE** \n", + "The source code of ColabFold is licensed under [MIT](https://raw.githubusercontent.com/sokrypton/ColabFold/main/LICENSE). Additionally, this notebook uses the AlphaFold2 source code and its parameters licensed under [Apache 2.0](https://raw.githubusercontent.com/deepmind/alphafold/main/LICENSE) and [CC BY 4.0](https://creativecommons.org/licenses/by-sa/4.0/) respectively. Read more about the AlphaFold license [here](https://github.com/deepmind/alphafold). \n", + "\n", + "**PDB100** \n", + "As of 23/06/08, ColabFold has transitioned from using the PDB70 to a 100% clustered PDB, the PDB100. The construction methodology of PDB100 differs from that of PDB70. \n", + "The PDB70 was constructed by running each PDB70 representative sequence through [HHblits](https://github.com/soedinglab/hh-suite) against the [Uniclust30](https://uniclust.mmseqs.com/). \n", + "On the other hand, the PDB100 is built by searching each PDB100 representative structure with [Foldseek](https://github.com/steineggerlab/foldseek) against the [AlphaFold Database](https://alphafold.ebi.ac.uk). \n", + "*To maintain compatibility with older Notebook versions and local installations, the generated files and API responses will continue to be named \"PDB70\", even though we're now using the PDB100.* \n", + "\n", + "**USING CUSTOM TEMPLATES** \n", + "\\- Custom templates must follow the four letter PDB naming with lower case letters. \n", + "\\- Templates in mmCIF format must contain `_entity_poly_seq`. An error is thrown if this field is not present. The field `_pdbx_audit_revision_history.revision_date` is automatically generated if it is not present. \n", + "\\- Templates in PDB format are automatically converted to the mmCIF format. `_entity_poly_seq` and `_pdbx_audit_revision_history.revision_date` are automatically generated. \n", + "\\- If you encounter problems, please report them to this [issue](https://github.com/sokrypton/ColabFold/issues/177).\n", + "\n", + "**COMPARISON TO THE FULL ALPHAFOLD2 AND ALPHAFOLD2 COLAB** \n", + "This notebook replaces the homology detection and MSA pairing of AlphaFold2 with MMseqs2. For a comparison against the [AlphaFold2 Colab](https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb) and the full [AlphaFold2](https://github.com/deepmind/alphafold) system read our [paper](https://www.nature.com/articles/s41592-022-01488-1).\n", + "\n", + "**BUGS** \n", + "If you encounter any bugs in the original notebook, please report the issue to https://github.com/sokrypton/ColabFold/issues\n", + "\n", + "**LIMITATIONS** \n", + "*The ColabFold's authors recommend to additionally use the full [AlphaFold2 pipeline](https://github.com/deepmind/alphafold).* \n", + "\\- **Computing resources:** The original [ColabFold AlphaFold2 notebook](https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb) MMseqs2 API can handle ~20-50k requests per day. \n", + "\\- **MSAs:** MMseqs2 is very precise and sensitive but might find less hits compared to HHblits/HMMer searched against BFD or MGnify.\n", + "\n", + "**DESCRIPTION OF PLOTS** \n", + "\\- **Number of sequences per position** - We want to see at least 30 sequences per position, for best performance, ideally 100 sequences. \n", + "\\- **Predicted lDDT per position** - model confidence (out of 100) at each position. The higher the better. \n", + "\\- **Predicted Alignment Error** - For homooligomers, this could be a useful metric to assess how confident the model is about the interface. The lower the better. \n", + "\n", + "**COLABFOLD ACKNOWLEDGEMENTS** \n", + "\\- We thank the AlphaFold team for developing an excellent model and open sourcing the software. \n", + "\\- [KOBIC](https://kobic.re.kr) and [Söding Lab](https://www.mpinat.mpg.de/soeding) for providing the computational resources for the MMseqs2 MSA server. \n", + "\\- Richard Evans for helping to benchmark the ColabFold's Alphafold-multimer support. \n", + "\\- [David Koes](https://github.com/dkoes) for his awesome [py3Dmol](https://3dmol.csb.pitt.edu/) plugin, without whom these notebooks would be quite boring! \n", + "\\- Do-Yoon Kim for creating the ColabFold logo. \n", + "\\- A colab by Sergey Ovchinnikov ([@sokrypton](https://twitter.com/sokrypton)), Milot Mirdita ([@milot_mirdita](https://twitter.com/milot_mirdita)) and Martin Steinegger ([@thesteinegger](https://twitter.com/thesteinegger))." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "alphafold-test", + "language": "python", + "name": "alphafold-test" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.18" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/tutorials/AlphaFold2/ignore/environment.yml b/tutorials/AlphaFold2/ignore/environment.yml new file mode 100644 index 0000000..693f74a --- /dev/null +++ b/tutorials/AlphaFold2/ignore/environment.yml @@ -0,0 +1,220 @@ +name: /opt/modules/my/conda-envs/alphafold-test +channels: + - conda-forge + - bioconda + - defaults +dependencies: + - _libgcc_mutex=0.1=conda_forge + - _openmp_mutex=4.5=2_gnu + - asttokens=2.4.0=pyhd8ed1ab_0 + - backcall=0.2.0=pyh9f0ad1d_0 + - backports=1.0=pyhd8ed1ab_3 + - backports.functools_lru_cache=1.6.5=pyhd8ed1ab_0 + - bzip2=1.0.8=h7f98852_4 + - ca-certificates=2023.7.22=hbcca054_0 + - comm=0.1.4=pyhd8ed1ab_0 + - cudatoolkit=11.8.0=h4ba93d1_12 + - debugpy=1.8.0=py39h3d6467e_1 + - decorator=5.1.1=pyhd8ed1ab_0 + - exceptiongroup=1.1.3=pyhd8ed1ab_0 + - icu=73.2=h59595ed_0 + - importlib_metadata=6.8.0=hd8ed1ab_0 + - ipykernel=6.25.2=pyh2140261_0 + - ipython=8.16.1=pyh0d859eb_0 + - jedi=0.19.1=pyhd8ed1ab_0 + - jupyter_client=8.3.1=pyhd8ed1ab_0 + - jupyter_core=5.3.2=py39hf3d152e_0 + - ld_impl_linux-64=2.40=h41732ed_0 + - libexpat=2.5.0=hcb278e6_1 + - libffi=3.4.2=h7f98852_5 + - libgcc-ng=13.2.0=h807b86a_2 + - libgomp=13.2.0=h807b86a_2 + - libnsl=2.0.0=hd590300_1 + - libsodium=1.0.18=h36c2ea0_1 + - libsqlite=3.43.0=h2797004_0 + - libstdcxx-ng=13.2.0=h7e041cc_2 + - libuuid=2.38.1=h0b41bf4_0 + - libuv=1.46.0=hd590300_0 + - libzlib=1.2.13=hd590300_5 + - matplotlib-inline=0.1.6=pyhd8ed1ab_0 + - ncurses=6.4=hcb278e6_0 + - nodejs=20.8.0=hb753e55_0 + - openssl=3.1.3=hd590300_0 + - packaging=23.2=pyhd8ed1ab_0 + - parso=0.8.3=pyhd8ed1ab_0 + - pexpect=4.8.0=pyh1a96a4e_2 + - pickleshare=0.7.5=py_1003 + - pip=23.2.1=pyhd8ed1ab_0 + - platformdirs=3.11.0=pyhd8ed1ab_0 + - prompt-toolkit=3.0.39=pyha770c72_0 + - prompt_toolkit=3.0.39=hd8ed1ab_0 + - psutil=5.9.5=py39hd1e30aa_1 + - ptyprocess=0.7.0=pyhd3deb0d_0 + - pure_eval=0.2.2=pyhd8ed1ab_0 + - pygments=2.16.1=pyhd8ed1ab_0 + - python=3.9.18=h0755675_0_cpython + - python-dateutil=2.8.2=pyhd8ed1ab_0 + - python_abi=3.9=4_cp39 + - readline=8.2=h8228510_1 + - setuptools=68.2.2=pyhd8ed1ab_0 + - six=1.16.0=pyh6c4a22f_0 + - stack_data=0.6.2=pyhd8ed1ab_0 + - tk=8.6.13=h2797004_0 + - tornado=6.3.3=py39hd1e30aa_1 + - traitlets=5.11.2=pyhd8ed1ab_0 + - typing-extensions=4.8.0=hd8ed1ab_0 + - typing_extensions=4.8.0=pyha770c72_0 + - tzdata=2023c=h71feb2d_0 + - wcwidth=0.2.8=pyhd8ed1ab_0 + - wheel=0.41.2=pyhd8ed1ab_0 + - xz=5.2.6=h166bdaf_0 + - zeromq=4.3.4=h9c3ff4c_1 + - zipp=3.17.0=pyhd8ed1ab_0 + - zlib=1.2.13=hd590300_5 + - pip: + - absl-py==1.4.0 + - alphafold-colabfold==2.3.5 + - anyio==4.0.0 + - appdirs==1.4.4 + - argon2-cffi==23.1.0 + - argon2-cffi-bindings==21.2.0 + - arrow==1.3.0 + - astunparse==1.6.3 + - async-lru==2.0.4 + - attrs==23.1.0 + - babel==2.13.0 + - beautifulsoup4==4.12.2 + - biopython==1.81 + - bleach==6.0.0 + - bokeh==3.2.2 + - branca==0.6.0 + - cachetools==5.3.1 + - certifi==2023.7.22 + - cffi==1.16.0 + - charset-normalizer==3.3.0 + - chex==0.1.83 + - colabfold==1.5.2 + - contextlib2==21.6.0 + - contourpy==1.1.1 + - cycler==0.12.0 + - defusedxml==0.7.1 + - dm-haiku==0.0.9 + - dm-tree==0.1.8 + - docker==6.1.3 + - executing==2.0.0 + - fastjsonschema==2.18.1 + - flatbuffers==23.5.26 + - fonttools==4.43.0 + - fqdn==1.5.1 + - gast==0.5.4 + - google-auth==2.23.2 + - google-auth-oauthlib==1.0.0 + - google-pasta==0.2.0 + - grpcio==1.59.0 + - h5py==3.9.0 + - idna==3.4 + - immutabledict==3.0.0 + - importlib-metadata==4.13.0 + - importlib-resources==6.1.0 + - ipyfilechooser==0.6.0 + - ipyleaflet==0.17.4 + - ipython-genutils==0.2.0 + - ipywidgets==7.7.0 + - isoduration==20.11.0 + - jax==0.4.13 + - jaxlib==0.4.13 + - jinja2==3.1.2 + - jmp==0.0.4 + - json5==0.9.14 + - jsonpointer==2.4 + - jsonschema==4.19.1 + - jsonschema-specifications==2023.7.1 + - jupyter-bokeh==3.0.5 + - jupyter-events==0.7.0 + - jupyter-lsp==2.2.0 + - jupyter-server==2.7.3 + - jupyter-server-terminals==0.4.4 + - jupyterlab==4.0.6 + - jupyterlab-pygments==0.2.2 + - jupyterlab-server==2.25.0 + - jupyterlab-widgets==3.0.9 + - keras==2.14.0 + - kiwisolver==1.4.5 + - libclang==16.0.6 + - markdown==3.4.4 + - markupsafe==2.1.3 + - matplotlib==3.8.0 + - mistune==3.0.2 + - ml-collections==0.1.1 + - ml-dtypes==0.2.0 + - nbclient==0.8.0 + - nbconvert==7.9.2 + - nbformat==5.9.2 + - nest-asyncio==1.5.8 + - notebook==7.0.4 + - notebook-shim==0.2.3 + - numpy==1.26.0 + - nvidia-cublas-cu11==11.11.3.6 + - nvidia-cuda-cupti-cu11==11.8.87 + - nvidia-cuda-nvcc-cu11==11.8.89 + - nvidia-cuda-nvrtc-cu11==11.8.89 + - nvidia-cuda-runtime-cu11==11.8.89 + - nvidia-cudnn-cu11==8.9.4.25 + - nvidia-cufft-cu11==10.9.0.58 + - nvidia-cusolver-cu11==11.4.1.48 + - nvidia-cusparse-cu11==11.7.5.86 + - oauthlib==3.2.2 + - opt-einsum==3.3.0 + - overrides==7.4.0 + - pandas==1.5.3 + - pandocfilters==1.5.0 + - pillow==10.0.1 + - prometheus-client==0.17.1 + - protobuf==4.24.3 + - py3dmol==2.0.4 + - pyasn1==0.5.0 + - pyasn1-modules==0.3.0 + - pycparser==2.21 + - pyparsing==3.1.1 + - python-json-logger==2.0.7 + - pytz==2023.3.post1 + - pyyaml==6.0.1 + - pyzmq==24.0.1 + - referencing==0.30.2 + - requests==2.31.0 + - requests-oauthlib==1.3.1 + - rfc3339-validator==0.1.4 + - rfc3986-validator==0.1.1 + - rpds-py==0.10.4 + - rsa==4.9 + - scipy==1.11.3 + - sde==1.1.9 + - send2trash==1.8.2 + - sniffio==1.3.0 + - soupsieve==2.5 + - stack-data==0.6.3 + - tabulate==0.9.0 + - tensorboard==2.14.1 + - tensorboard-data-server==0.7.1 + - tensorflow-cpu==2.14.0 + - tensorflow-estimator==2.14.0 + - tensorflow-io-gcs-filesystem==0.34.0 + - termcolor==2.3.0 + - terminado==0.17.1 + - tinycss2==1.2.1 + - tomli==2.0.1 + - toolz==0.12.0 + - tqdm==4.66.1 + - traittypes==0.2.1 + - types-python-dateutil==2.8.19.14 + - uri-template==1.3.0 + - urllib3==2.0.6 + - webcolors==1.13 + - webencodings==0.5.1 + - websocket-client==1.6.3 + - werkzeug==3.0.0 + - widgetsnbextension==3.6.6 + - wrapt==1.14.1 + - xyzservices==2023.7.0 + - y-py==0.6.2 +prefix: /opt/modules/my/conda-envs/alphafold-test diff --git a/tutorials/AlphaFold2/ignore/jupyter_labextensions.txt b/tutorials/AlphaFold2/ignore/jupyter_labextensions.txt new file mode 100644 index 0000000..3dfa861 --- /dev/null +++ b/tutorials/AlphaFold2/ignore/jupyter_labextensions.txt @@ -0,0 +1,40 @@ +JupyterLab v4.0.6 +/opt/modules/my/conda-envs/alphafold-test/share/jupyter/labextensions + jupyterlab-datawidgets v7.1.2 enabled OK + jupyter-vue v1.8.0 enabled OK + jupyterlab_pygments v0.2.2 enabled X (python, jupyterlab_pygments) + jupyter-matplotlib v0.11.3 enabled OK + jupyter-vuetify v1.8.4 enabled X + bqplot v0.5.37 enabled X (python, bqplot) + ipyvolume v0.6.1 enabled OK + jupyter-threejs v2.4.0 enabled OK (python, pythreejs) + jupyter-leaflet v0.17.4 enabled OK + jupyterlab-plotly v5.12.0 enabled X + jupyter-webrtc v0.6.0 enabled OK + @beakerx/beakerx-tabledisplay v2.3.13 enabled X + @pyviz/jupyterlab_pyviz v2.3.2 enabled X (python, pyviz_comms) + @jupyter-notebook/lab-extension v7.0.4 enabled OK + @bokeh/jupyter_bokeh v3.0.5 enabled X (python, jupyter_bokeh) + @voila-dashboards/jupyterlab-preview v2.1.6 enabled X (python, voila) + @jupyter-widgets/jupyterlab-manager v3.1.3 enabled X (python, jupyterlab_widgets) + + + The following extensions are outdated: + jupyterlab_pygments + jupyter-vuetify + bqplot + jupyterlab-plotly + @beakerx/beakerx-tabledisplay + @pyviz/jupyterlab_pyviz + @bokeh/jupyter_bokeh + @voila-dashboards/jupyterlab-preview + @jupyter-widgets/jupyterlab-manager + + Consider checking if an update is available for these packages. + +Other labextensions (built into JupyterLab) + app dir: /opt/modules/my/conda-envs/alphafold-test/share/jupyter/lab + + +The following source extensions are overshadowed by older prebuilt extensions: + @jupyter-widgets/jupyterlab-manager \ No newline at end of file diff --git a/tutorials/AlphaFold2/requirements.txt b/tutorials/AlphaFold2/requirements.txt new file mode 100644 index 0000000..98f6f75 --- /dev/null +++ b/tutorials/AlphaFold2/requirements.txt @@ -0,0 +1,24 @@ +### list of packages installed into new environment + +### conda install +git + +### pip install the following +ipywidgets==7.7.1 +ipyfilechooser +ipykernel +alphafold-colabfold +colabfold @ git+https://github.com/sokrypton/ColabFold@36afbad707ea7401982e24c1cddd05e03c55c001 + +### ensure jax and jaxlib get installed using the following line +### these packages MUST be version 0.4.13 +pip install --upgrade jax==0.4.13 jaxlib==0.4.13+cuda12.cudnn89 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html + +### conda install -c conda-forge +openmm==7.7.0 +pdbfixer + +### conda install -c bioconda +kalign2 +hhsuite +