#ColabFold v1.5.5: AlphaFold2 w/ MMseqs2 BATCH

<img src="https://raw.githubusercontent.com/sokrypton/ColabFold/main/.github/ColabFold_Marv_Logo_Small.png" height="256" align="right" style="height:256px">

Modified easy to use AlphaFold2 protein structure [(Jumper et al. 2021)](https://www.nature.com/articles/s41586-021-03819-2) and complex [(Evans et al. 2021)](https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1) prediction using multiple sequence alignments generated through MMseqs2. For details, refer to our manuscript:

[Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: Making protein folding accessible to all.
*Nature Methods*, 2022](https://www.nature.com/articles/s41592-022-01488-1)

**Usage**

`input_dir` directory with only fasta files or MSAs stored in Google Drive. MSAs need to be A3M formatted and have an `.a3m` extention. For MSAs MMseqs2 will not be called.

`result_dir` results will be written to the result directory in Google Drive

Old versions: [v1.4](https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.4.0/batch/AlphaFold2_batch.ipynb), [v1.5.1](https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.5.1/batch/AlphaFold2_batch.ipynb), [v1.5.2](https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.5.2/batch/AlphaFold2_batch.ipynb), [v1.5.3-patch](https://colab.research.google.com/github/sokrypton/ColabFold/blob/56c72044c7d51a311ca99b953a71e552fdc042e1/batch/AlphaFold2_batch.ipynb)

<strong>For more details, see <a href="#Instructions">bottom</a> of the notebook and checkout the [ColabFold GitHub](https://github.com/sokrypton/ColabFold). </strong>
This notebook was adapted to automatically run and visualize a binding prediction specificity experiment.
-----------

### News
- <b><font color='green'>2023/07/31: The ColabFold MSA server is back to normal. It was using older DB (UniRef30 2202/PDB70 220313) from 27th ~8:30 AM CEST to 31st ~11:10 AM CEST.</font></b>
- <b><font color='green'>2023/06/12: New databases! UniRef30 updated to 2023_02 and PDB to 230517. We now use PDB100 instead of PDB70 (see notes in the [main](https://colabfold.com) notebook).</font></b>
- <b><font color='green'>2023/06/12: We introduced a new default pairing strategy: Previously, for multimer predictions with more than 2 chains, we only pair if all sequences taxonomically match ("complete" pairing). The new default "greedy" strategy pairs any taxonomically matching subsets.</font></b>

In [1]:
import os
import sys
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from datetime import datetime
import pickle
import shutil
from typing import Dict, List, Tuple
import warnings
import tqdm
warnings.filterwarnings('ignore')


# Define nanobody sequences with metadata
nanobody_data = {
    'nbGFP_6xzf': {
        'sequence': 'QVQLVESGGALVQPGGSLRLSCAASGFPVNRYSMRWYRQADTNNDGWIEGDELKEREWVAGMSSAGDRSSYEDSVKGRFTISRDDARNTVYLQMNSLKPEDTAVYYCNVNVGFEYWGQGTQVTVSS',
        'known_target': 'GFP',
        'pdb_id': '6xzf'
    },
    'nbmCherry_8ilx': {
        'sequence': 'QVQLVQSGGGLVQAGGSLRLSCAASGRTFSDIAVGWFRQTPGKEREFVAAISWSGLIINYGDSVEDRFTISRDNAKSAVYLQMNSLKPEDTAVYYCAARIGMNYYYAREIEYPYWGQGTQVTVSK',
        'known_target': 'mCherry',
        'pdb_id': '8ilx'
    },
    'nbSARS_7f5h': {
        'sequence': 'QVQLQESGGGLVQAGGSLRLSCAASGSDFSSSTMGWYRQAPGKQREFVAISSEGSTSYAGSVKGRFTISRDNAKNTVYLQMNSLEPEDTAVYYCNVVDRWYDYWGQGTQVTVSA',
        'known_target': 'SARS-Cov2-rbc',
        'pdb_id': '7f5h'
    },
    'nblys_1mel': {
        'sequence': 'DVQLQASGGGSVQAGGSLRLSCAASGYTIGPYCMGWFRQAPGKEREGVAAINMGGGITYYADSVKGRFTISQDNAKNTVYLLMNSLEPEDTAIYYCAADSTIYASYYECGHGLSTGGYGYDSWGQGTQVTVSS',
        'known_target': 'Lysozyme',
        'pdb_id': '1mel'
    },
    'nbALB_8y9t': {
        'sequence': 'EVQLQESGGGLVQPGGSLRLSCAASGFTFSRYWMFWVRQAPGKGLEWISDINSGGTYTRYADSVKGRFTISRDNAKNTLYLQMNSLRAEDTAVYYCATNSGDGKRYCSGGYCYRSRGQGTLVTVSS',
        'known_target': 'Albumin',
      'pdb_id': '8y9t'
    },
    'nbNAT_8zoy': {
        'sequence': 'EVQLVESGGGLVQAGGSLRLSCAASGFPVTNFEMYWYRQAPGKEREWVAAIYSTGITEYADSVKGRFTISRDNSKNTVYLQMNSLKPEDTAVYYCNVKDNGAWRQNYDYWGQGTQVTVSS',
        'known_target': 'NAT',
        'pdb_id': '8zoy'
    },
}

# Define antigen sequences with metadata
antigen_data = {
    'GFP': {
        'sequence': 'VSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYK',
        'uniprot': 'P42212'
    },
    'mCherry': {
        'sequence': 'GMVSKGEEDNMAIIKEFMRFKVHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKGGPLPFAWDILSPQFMYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGVVTVTQDSSLQDGEFIYKVKLRGTNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKKPVQLPGAYNVNIKLDITSHNEDYTIVEQYERAEGRHSTGGMDELYK',
        'uniprot': '-'
    },
    'SARS-Cov2-rbc': {
        'sequence': 'AGSPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTGTLEVLFQ',
        'uniprot': '-'
    },
    'Lysozyme': {
        'sequence': 'KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRL',
        'uniprot': '-'
    },
    'Albumin': {
        'sequence': 'RGVFRRDAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLPRLVRPEVDVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLPKLDELRDEGKASSAKQRLKCASLQKFGERAFKAWAVARLSQRFPKAEFAEVSKLVTDLTKVHTECCHGDLLECADDRADLAKYICENQDSISSKLKECCEKPLLEKSHCIAEVENDEMPADLPSLAADFVESKDVCKNYAEAKDVFLGMFLYEYARRHPDYSVVLLLRLAKTYETTLEKCCAAADPHECYAKVFDEFKPLVEEPQNLIKQNCELFEQLGEYKFQNALLVRYTKKVPQVSTPTLVEVSRNLGKVGSKCCKHPEAKRMPCAEDYLSVVLNQLCVLHEKTPVSDRVTKCCTESLVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLSEKERQIKKQTALVELVKHKPKATKEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL',
        'uniprot': '-'
    },
    'NAT': {
        'sequence': 'APRDGDAQPRETWGKKIDFLLSVVGFAVDLANVWRFPYLCYKNGGGAFLIPYTLFLIIAGMPLFYMELALGQYNREGAATVWKICPFFKGVGYAVILIALYVGFYYNVIIAWSLYYLFSSFTLNLPWTDCGHTWNSPNCTDPKLLNGSVLGNHTKYSKYKFTPAAEFYERGVLHLHESSGIHDIGLPQWQLLLCLMVVVIVLYFSLWKGVKTSGKVVWITATLPYFVLFVLLVHGVTLPGASNGINAYLHIDFYRLKEATVWIDAATQIFFSLGAGFGVLIAFASYNKFDNNCYRDALLTSSINCITSFVSGFAIFSILGYMAHEHKVNIEDVATEGAGLVFILYPEAISTLSGSTFWAVVFFVMLLALGLDSSMGGMEAVITGLADDFQVLKRHRKLFTFGVTFSTFLLALFCITKGGIYVLTLLDTFAAGTSILFAVLMEAIGVSWFYGVDRFSNDIQQMMGFRPGLYWRLCWKFVSPAFLLFVVVVSIINFKPLTYDDYIFPPWANWVGWGIALSSMVLVPIYVIYKFLSTQGSLWERLAYGITPENEHHLVAQRDIRQFQLQHWLAI',
        'uniprot': '-'
    }
}


In [2]:
#@title Input protein sequence, then hit `Runtime` -> `Run all`

input_dir = '/content/input_fasta' #@param {type:"string"}
result_dir = '/content/result' #@param {type:"string"}

# number of models to use
#@markdown ---
#@markdown ### Advanced settings
msa_mode = "MMseqs2 (UniRef+Environmental)" #@param ["MMseqs2 (UniRef+Environmental)", "MMseqs2 (UniRef only)","single_sequence","custom"]
num_models = 5 #@param [1,2,3,4,5] {type:"raw"}
num_recycles = 48 #@param [1,3,6,12,24,48] {type:"raw"}
stop_at_score = 100 #@param {type:"string"}
#@markdown - early stop computing models once score > threshold (avg. plddt for "structures" and ptmscore for "complexes")
use_custom_msa = False
num_relax = 5 #@param [0, 1, 5] {type:"raw"}
use_amber = num_relax > 0
relax_max_iterations = 200 #@param [0,200,2000] {type:"raw"}
use_templates = False #@param {type:"boolean"}
do_not_overwrite_results = True #@param {type:"boolean"}
zip_results = False #@param {type:"boolean"}


In [3]:
#@title Input protein sequence(s), then hit `Runtime` -> `Run all`
from google.colab import files
import os
import re
import hashlib
import random

from sys import version_info
python_version = f"{version_info.major}.{version_info.minor}"

def add_hash(x,y):
  return x+"_"+hashlib.sha1(y.encode()).hexdigest()[:5]

query_sequence = 'PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK'
jobname = 'initialtest'
# number of models to use
num_relax = 0
template_mode = "none"
use_amber = num_relax > 0

# remove whitespaces
query_sequence = "".join(query_sequence.split())

basejobname = "".join(jobname.split())
basejobname = re.sub(r'\W+', '', basejobname)
jobname = add_hash(basejobname, query_sequence)

# check if directory with jobname exists
def check(folder):
  if os.path.exists(folder):
    return False
  else:
    return True
if not check(jobname):
  n = 0
  while not check(f"{jobname}_{n}"): n += 1
  jobname = f"{jobname}_{n}"

# make directory to save results
os.makedirs(jobname, exist_ok=True)

# save queries
queries_path = os.path.join(jobname, f"{jobname}.csv")
with open(queries_path, "w") as text_file:
  text_file.write(f"id,sequence\n{jobname},{query_sequence}")

if template_mode == "pdb100":
  use_templates = True
  custom_template_path = None
elif template_mode == "custom":
  custom_template_path = os.path.join(jobname,f"template")
  os.makedirs(custom_template_path, exist_ok=True)
  uploaded = files.upload()
  use_templates = True
  for fn in uploaded.keys():
    os.rename(fn,os.path.join(custom_template_path,fn))
else:
  custom_template_path = None
  use_templates = False

print("jobname",jobname)
print("sequence",query_sequence)
print("length",len(query_sequence.replace(":","")))

jobname initialtest_a5e17
sequence PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK
length 59


In [4]:
#@title Install dependencies
%%time
import os
USE_AMBER = use_amber
USE_TEMPLATES = use_templates
PYTHON_VERSION = python_version

if not os.path.isfile("COLABFOLD_READY"):
  print("installing colabfold...")
  os.system("pip install -q --no-warn-conflicts 'colabfold[alphafold-minus-jax] @ git+https://github.com/sokrypton/ColabFold'")
  if os.environ.get('TPU_NAME', False) != False:
    os.system("pip uninstall -y jax jaxlib")
    os.system("pip install --no-warn-conflicts --upgrade dm-haiku==0.0.10 'jax[cuda12_pip]'==0.3.25 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html")
  os.system("ln -s /usr/local/lib/python3.*/dist-packages/colabfold colabfold")
  os.system("ln -s /usr/local/lib/python3.*/dist-packages/alphafold alphafold")
  os.system("touch COLABFOLD_READY")

if USE_AMBER or USE_TEMPLATES:
  if not os.path.isfile("CONDA_READY"):
    print("installing conda...")
    os.system("wget -qnc https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh")
    os.system("bash Miniforge3-Linux-x86_64.sh -bfp /usr/local")
    os.system("mamba config --set auto_update_conda false")
    os.system("touch CONDA_READY")

if USE_TEMPLATES and not os.path.isfile("HH_READY") and USE_AMBER and not os.path.isfile("AMBER_READY"):
  print("installing hhsuite and amber...")
  os.system(f"mamba install -y -c conda-forge -c bioconda kalign2=2.04 hhsuite=3.3.0 openmm=8.2.0 python='{PYTHON_VERSION}' pdbfixer")
  os.system("touch HH_READY")
  os.system("touch AMBER_READY")
else:
  if USE_TEMPLATES and not os.path.isfile("HH_READY"):
    print("installing hhsuite...")
    os.system(f"mamba install -y -c conda-forge -c bioconda kalign2=2.04 hhsuite=3.3.0 python='{PYTHON_VERSION}'")
    os.system("touch HH_READY")
  if USE_AMBER and not os.path.isfile("AMBER_READY"):
    print("installing amber...")
    os.system(f"mamba install -y -c conda-forge openmm=8.2.0 python='{PYTHON_VERSION}' pdbfixer")
    os.system("touch AMBER_READY")

installing colabfold...
CPU times: user 497 ms, sys: 59.1 ms, total: 556 ms
Wall time: 34 s


In [5]:
!pip install -q condacolab
import condacolab
condacolab.install()
!conda install -c conda-forge pdbfixer -y
!pip show pdbfixer

⏬ Downloading https://github.com/jaimergp/miniforge/releases/download/24.11.2-1_colab/Miniforge3-colab-24.11.2-1_colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:06
🔁 Restarting kernel...
Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | done
Solving environment: - \ | done

## Package Plan ##

  environment location: /usr/local

  added / updated specs:
    - pdbfixer


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2025.4.26  |       hbd8a1cb_0         149 KB  conda-forge
    certifi-2025.4.26          |     pyhd8ed1ab_0         154 KB  conda-forge
    conda-24.11.3              |  py311h38be061_0         1.1 MB  conda-forge
    cudatoolkit-11.8.0         |      h4ba93d1_13

In [6]:
from colabfold.download import download_alphafold_params
download_alphafold_params(model_type=f"alphafold2_multimer_v3")

Downloading alphafold2_multimer_v3 weights to /root/.cache/colabfold: 100%|██████████| 3.82G/3.82G [02:58<00:00, 23.0MB/s]


In [None]:
#@title Configuration Parameters
#@markdown ### Screening Parameters
num_variations = 1 #@param {type:"integer"}
#@markdown Note: Using single run since model order doesn't provide real variation

#@markdown ### ColabFold Parameters
input_dir = '/content/input_fasta' #@param {type:"string"}
result_dir = '/content/results' #@param {type:"string"}

msa_mode = "MMseqs2 (UniRef+Environmental)" #@param ["MMseqs2 (UniRef+Environmental)", "MMseqs2 (UniRef only)","single_sequence","custom"]
num_models = 5 #@param [1,2,3,4,5] {type:"raw"}
num_recycles = 48 #@param [1,3,6,12,24,48] {type:"raw"}
stop_at_score = 80 #@param {type:"number"}
use_templates = False #@param {type:"boolean"}
num_relax = 5 #@param [0, 1, 5] {type:"raw"}
use_amber = num_relax > 0

#@markdown ### Analysis Parameters
confidence_metric = "combined" #@param ["ipTM", "pTM", "combined"]
binding_threshold = 0.7 #@param {type:"number"}

#@title Helper Functions

import os
import re
import json
import numpy as np
import io
import contextlib
from pathlib import Path

def extract_metrics_from_stdout(log_output):
    """Extract ipTM, pTM, pLDDT from ColabFold's printed output"""
    match = re.search(r"rank_001_.*?pLDDT=([\d.]+)\s+pTM=([\d.]+)\s+ipTM=([\d.]+)", log_output)
    if match:
        plddt, ptm, iptm = map(float, match.groups())
        return {
            "mean_plddt": plddt,
            "max_ptm": ptm,
            "max_iptm": iptm
        }
    print("No score line found in stdout.")
    return {}

def get_binding_score(metrics, method='ipTM'):
    """Calculate binding score"""
    if method == 'ipTM':
        return metrics.get('max_iptm', 0)
    elif method == 'pTM':
        return metrics.get('max_ptm', 0)
    elif method == 'combined':
        return (0.6 * metrics.get('max_iptm', 0) +
                0.3 * metrics.get('max_ptm', 0) +
                0.1 * metrics.get('mean_plddt', 0) / 100)
    return 0

#@title Generate FASTA Files

os.makedirs(input_dir, exist_ok=True)
os.makedirs(result_dir, exist_ok=True)

# Create all combinations
jobs = []
for nb_name, nb_info in nanobody_data.items():
    for ag_name, ag_info in antigen_data.items():
        job_id = f"{nb_name}_{ag_name}"
        fasta_path = Path(input_dir) / f"{job_id}.fasta"

        # Write FASTA
        with open(fasta_path, 'w') as f:
            f.write(f">{job_id}\n")
            f.write(f"{nb_info['sequence']}:{ag_info['sequence']}\n")

        jobs.append({
            'id': job_id,
            'nanobody': nb_name,
            'antigen': ag_name,
            'fasta': str(fasta_path)
        })

print(f"Created {len(jobs)} FASTA files")

#@title Run ColabFold Batch

from colabfold.batch import get_queries, run
from colabfold.download import default_data_dir
from colabfold.utils import setup_logging

setup_logging(Path(result_dir) / "colabfold_global_log.txt")

# Run jobs
results = []

for job in jobs:
    print(f"\nProcessing {job['id']}...")
    job_dir = Path(result_dir) / job['id']
    job_dir.mkdir(exist_ok=True)

    try:
        queries, is_complex = get_queries(job['fasta'])

        buffer = io.StringIO()
        with contextlib.redirect_stdout(buffer):
            run(
                queries=queries,
                result_dir=str(job_dir),
                use_templates=use_templates,
                num_relax=num_relax,
                msa_mode=msa_mode,
                model_type="alphafold2_multimer_v3",
                num_models=num_models,
                num_recycles=num_recycles,
                is_complex=is_complex,
                data_dir=default_data_dir,
                rank_by="auto",
                stop_at_score=stop_at_score,
                zip_results=False
            )
        log_output = buffer.getvalue()

        # Parse from stdout
        metrics = extract_metrics_from_stdout(log_output)
        print("Parsed metrics:", metrics)

        score = get_binding_score(metrics, confidence_metric)
        print("Binding score:", score)

        results.append({
            'nanobody': job['nanobody'],
            'antigen': job['antigen'],
            'score': score,
            **metrics
        })

    except Exception as e:
        print(f"Error running job {job['id']}: {e}")
        results.append({
            'nanobody': job['nanobody'],
            'antigen': job['antigen'],
            'score': 0
        })


Created 36 FASTA files

Processing nbGFP_6xzf_GFP...


COMPLETE: 100%|██████████| 300/300 [elapsed: 00:02 remaining: 00:00]
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:01 remaining: 00:00]
SUBMIT:   0%|          | 0/300 [elapsed: 00:00 remaining: ?]

Parsed metrics: {'mean_plddt': 93.1, 'max_ptm': 0.892, 'max_iptm': 0.872}
Binding score: 0.8838999999999999

Processing nbGFP_6xzf_mCherry...


COMPLETE: 100%|██████████| 300/300 [elapsed: 00:02 remaining: 00:00]
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:01 remaining: 00:00]
SUBMIT:   0%|          | 0/300 [elapsed: 00:00 remaining: ?]

Parsed metrics: {'mean_plddt': 84.6, 'max_ptm': 0.636, 'max_iptm': 0.208}
Binding score: 0.4002

Processing nbGFP_6xzf_SARS-Cov2-rbc...


COMPLETE: 100%|██████████| 300/300 [elapsed: 00:02 remaining: 00:00]
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:01 remaining: 00:00]
SUBMIT:   0%|          | 0/300 [elapsed: 00:00 remaining: ?]

Parsed metrics: {'mean_plddt': 79.5, 'max_ptm': 0.602, 'max_iptm': 0.182}
Binding score: 0.36929999999999996

Processing nbGFP_6xzf_Lysozyme...


COMPLETE: 100%|██████████| 300/300 [elapsed: 00:02 remaining: 00:00]
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:01 remaining: 00:00]
SUBMIT:   0%|          | 0/300 [elapsed: 00:00 remaining: ?]

Parsed metrics: {'mean_plddt': 85.4, 'max_ptm': 0.54, 'max_iptm': 0.202}
Binding score: 0.36860000000000004

Processing nbGFP_6xzf_Albumin...


COMPLETE: 100%|██████████| 300/300 [elapsed: 00:02 remaining: 00:00]
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:02 remaining: 00:00]
SUBMIT:   0%|          | 0/300 [elapsed: 00:00 remaining: ?]

Parsed metrics: {'mean_plddt': 86.0, 'max_ptm': 0.761, 'max_iptm': 0.167}
Binding score: 0.4145

Processing nbGFP_6xzf_NAT...


COMPLETE: 100%|██████████| 300/300 [elapsed: 00:03 remaining: 00:00]
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:01 remaining: 00:00]
SUBMIT:   0%|          | 0/300 [elapsed: 00:00 remaining: ?]

Parsed metrics: {'mean_plddt': 83.0, 'max_ptm': 0.769, 'max_iptm': 0.218}
Binding score: 0.4445

Processing nbmCherry_8ilx_GFP...


COMPLETE: 100%|██████████| 300/300 [elapsed: 00:02 remaining: 00:00]
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:01 remaining: 00:00]
SUBMIT:   0%|          | 0/300 [elapsed: 00:00 remaining: ?]

Parsed metrics: {'mean_plddt': 93.4, 'max_ptm': 0.883, 'max_iptm': 0.812}
Binding score: 0.8455

Processing nbmCherry_8ilx_mCherry...


COMPLETE: 100%|██████████| 300/300 [elapsed: 00:02 remaining: 00:00]
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:01 remaining: 00:00]
SUBMIT:   0%|          | 0/300 [elapsed: 00:00 remaining: ?]

Parsed metrics: {'mean_plddt': 90.9, 'max_ptm': 0.81, 'max_iptm': 0.663}
Binding score: 0.7317

Processing nbmCherry_8ilx_SARS-Cov2-rbc...


COMPLETE: 100%|██████████| 300/300 [elapsed: 00:02 remaining: 00:00]
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:01 remaining: 00:00]
SUBMIT:   0%|          | 0/300 [elapsed: 00:00 remaining: ?]

Parsed metrics: {'mean_plddt': 83.6, 'max_ptm': 0.625, 'max_iptm': 0.21}
Binding score: 0.3971

Processing nbmCherry_8ilx_Lysozyme...


COMPLETE: 100%|██████████| 300/300 [elapsed: 00:02 remaining: 00:00]
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:01 remaining: 00:00]
SUBMIT:   0%|          | 0/300 [elapsed: 00:00 remaining: ?]

Parsed metrics: {'mean_plddt': 89.0, 'max_ptm': 0.584, 'max_iptm': 0.231}
Binding score: 0.4028

Processing nbmCherry_8ilx_Albumin...


COMPLETE: 100%|██████████| 300/300 [elapsed: 00:02 remaining: 00:00]
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:02 remaining: 00:00]
SUBMIT:   0%|          | 0/300 [elapsed: 00:00 remaining: ?]

Parsed metrics: {'mean_plddt': 87.6, 'max_ptm': 0.769, 'max_iptm': 0.161}
Binding score: 0.4149

Processing nbmCherry_8ilx_NAT...


COMPLETE: 100%|██████████| 300/300 [elapsed: 00:03 remaining: 00:00]
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:01 remaining: 00:00]
SUBMIT:   0%|          | 0/300 [elapsed: 00:00 remaining: ?]

Parsed metrics: {'mean_plddt': 83.3, 'max_ptm': 0.774, 'max_iptm': 0.247}
Binding score: 0.46369999999999995

Processing nbSARS_7f5h_GFP...


COMPLETE: 100%|██████████| 300/300 [elapsed: 00:02 remaining: 00:00]
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:01 remaining: 00:00]
SUBMIT:   0%|          | 0/300 [elapsed: 00:00 remaining: ?]

Parsed metrics: {'mean_plddt': 86.1, 'max_ptm': 0.653, 'max_iptm': 0.16}
Binding score: 0.378

Processing nbSARS_7f5h_mCherry...


COMPLETE: 100%|██████████| 300/300 [elapsed: 00:02 remaining: 00:00]
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:01 remaining: 00:00]


In [None]:
#@title Create Visualization
# Preliminary plot

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

print("results", results)
# Create matrix
df = pd.DataFrame(results)
nanobody_order = list(nanobody_data.keys())
antigen_order = list(antigen_data.keys())
matrix = df.pivot(index='nanobody', columns='antigen', values='score').reindex(index=nanobody_order, columns=antigen_order)


# Plot
plt.figure(figsize=(10, 8))
ax = sns.heatmap(
    matrix,
    annot=True,
    fmt='.3f',
    cmap=sns.color_palette("viridis", as_cmap=True),
    vmin=0,
    vmax=1,
    square=True,
    linewidths=0.5,
    cbar_kws={'label': f'Binding Score ({confidence_metric})'}
)
binding_threshold = 1.0
# Highlight high scores
for i, nb in enumerate(matrix.index):
    for j, ag in enumerate(matrix.columns):
        if matrix.loc[nb, ag] > binding_threshold:
            ax.add_patch(plt.Rectangle((j, i), 1, 1, fill=False, edgecolor='red', lw=3))

plt.title('Nanobody-Antigen Binding Matrix', fontsize=14, pad=15)
plt.tight_layout()
plt.savefig(Path(result_dir) / 'binding_matrix.png', dpi=300)
plt.show()

# Summary statistics
print("\nTop Binding Pairs:")
df_sorted = df.sort_values('score', ascending=False)
for _, row in df_sorted.head(5).iterrows():
    print(f"  {row['nanobody']} + {row['antigen']}: {row['score']:.3f}")

# Save results
df.to_csv(Path(result_dir) / 'results.csv', index=False)
matrix.to_csv(Path(result_dir) / 'matrix.csv')

print(f"\nResults saved to {result_dir}")

# Instructions <a name="Instructions"></a>
**Quick start**
1. Upload your single fasta files to a folder in your Google Drive
2. Define path to the fold containing the fasta files (`input_dir`) define an outdir (`output_dir`)
3. Press "Runtime" -> "Run all".

**Result zip file contents**

At the end of the job a all results `jobname.result.zip` will be uploaded to your (`output_dir`) Google Drive. Each zip contains one protein.

1. PDB formatted structures sorted by avg. pIDDT. (unrelaxed and relaxed if `use_amber` is enabled).
2. Plots of the model quality.
3. Plots of the MSA coverage.
4. Parameter log file.
5. A3M formatted input MSA.
6. BibTeX file with citations for all used tools and databases.


**Troubleshooting**
* Check that the runtime type is set to GPU at "Runtime" -> "Change runtime type".
* Try to restart the session "Runtime" -> "Factory reset runtime".
* Check your input sequence.

**Known issues**
* Google Colab assigns different types of GPUs with varying amount of memory. Some might not have enough memory to predict the structure for a long sequence.
* Google Colab assigns different types of GPUs with varying amount of memory. Some might not have enough memory to predict the structure for a long sequence.
* Your browser can block the pop-up for downloading the result file. You can choose the `save_to_google_drive` option to upload to Google Drive instead or manually download the result file: Click on the little folder icon to the left, navigate to file: `jobname.result.zip`, right-click and select \"Download\" (see [screenshot](https://pbs.twimg.com/media/E6wRW2lWUAEOuoe?format=jpg&name=small)).

**Limitations**
* Computing resources: Our MMseqs2 API can handle ~20-50k requests per day.
* MSAs: MMseqs2 is very precise and sensitive but might find less hits compared to HHblits/HMMer searched against BFD or Mgnify.
* We recommend to additionally use the full [AlphaFold2 pipeline](https://github.com/deepmind/alphafold).

**Description of the plots**
*   **Number of sequences per position** - We want to see at least 30 sequences per position, for best performance, ideally 100 sequences.
*   **Predicted lDDT per position** - model confidence (out of 100) at each position. The higher the better.
*   **Predicted Alignment Error** - For homooligomers, this could be a useful metric to assess how confident the model is about the interface. The lower the better.

**Bugs**
- If you encounter any bugs, please report the issue to https://github.com/sokrypton/ColabFold/issues

**License**

The source code of ColabFold is licensed under [MIT](https://raw.githubusercontent.com/sokrypton/ColabFold/main/LICENSE). Additionally, this notebook uses AlphaFold2 source code and its parameters licensed under [Apache 2.0](https://raw.githubusercontent.com/deepmind/alphafold/main/LICENSE) and  [CC BY 4.0](https://creativecommons.org/licenses/by-sa/4.0/) respectively. Read more about the AlphaFold license [here](https://github.com/deepmind/alphafold).

**Acknowledgments**
- We thank the AlphaFold team for developing an excellent model and open sourcing the software.

- Do-Yoon Kim for creating the ColabFold logo.

- A colab by Sergey Ovchinnikov ([@sokrypton](https://twitter.com/sokrypton)), Milot Mirdita ([@milot_mirdita](https://twitter.com/milot_mirdita)) and Martin Steinegger ([@thesteinegger](https://twitter.com/thesteinegger)).

- Slightly modified to accomodate a binding specificity evaluation experiment.