
# --- Boltz2: Protein Structure Prediction Pipeline ---

![Python](https://img.shields.io/badge/Python-3.10-blue?logo=python)
![CUDA](https://img.shields.io/badge/CUDA-Enabled-green?logo=nvidia)
![Boltz2](https://img.shields.io/badge/Model-Boltz2-purple)
![Platform](https://img.shields.io/badge/Platform-Colab%20|%20Linux-lightgrey?logo=googlecolab)
![License](https://img.shields.io/badge/License-MIT-orange)
![Status](https://img.shields.io/badge/Status-Active-success)
![Build](https://img.shields.io/badge/Build-Stable-brightgreen)
![Contributions](https://img.shields.io/badge/Contributions-Welcome-blue)
<br>

---

## Boltz2: Deep Learning Pipeline for Protein Structure Prediction

Boltz2 is an **open-source, deep learning-based software** for predicting **3D protein structures** from amino acid sequences.  
It leverages **advanced neural networks** and **diffusion models** to generate accurate protein models, supporting both **monomeric** and **complex assemblies**.

---

###  Pipeline Overview
1. **Input**: Provide a protein sequence (and optional ligands).  
2. **YAML Generation**: The sequence is formatted into a YAML config.  
3. **MSA Search**: Boltz2 fetches multiple sequence alignments (MSA) using online servers.  
4. **Structure Prediction**: The neural network predicts 3D coordinates using diffusion and recycling steps.  
5. **Output**: Results include 3D models (CIF/PDB), confidence scores (**pLDDT**), and error heatmaps (**PAE**).  
6. **Visualization**: The notebook displays the predicted structure and confidence plots.  

---

 **Note:** This notebook automates the full Boltz2 workflow, from setup to visualization, with **color-coded status** and **interactive outputs**.  

---

##  Credits & Authorship

- **Notebook Developer:** Atharva Tilewale  
- **Affiliation:** Gujarat Biotechnology University | Bioinformatics & Computational Biology  
- **GitHub Repository:** [Boltz-Notebook](https://github.com/AtharvaTilewale/Boltz-Notebook)  
- **Contact:** [LinkedIn](https://www.linkedin.com/in/atharvatilewale) | [GitHub](https://github.com/AtharvaTilewale)  

**Acknowledgements:**  
- **Boltz2 framework**: [Original Boltz repository](https://github.com/jwohlwend/boltz) by J. Wohlwend and collaborators.  
- **Dependencies:** PyTorch, Biopython, NumPy, Matplotlib, Py3Dmol, PyYAML.  
- Special thanks to the **open-source community** for providing tools that make structural bioinformatics more accessible.  

---

## References

- Passaro, S., Corso, G., Wohlwend, J., Reveiz, M., Thaler, S., Somnath, V. R., Getz, N., Portnoi, T., Roy, J., Stark, H., Kwabi-Addo, D., Beaini, D., Jaakkola, T., & Barzilay, R. (2025).  
  **Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction.** *bioRxiv.*  
    [![bioRxiv Boltz2](https://img.shields.io/badge/bioRxiv-Boltz2-red)](https://doi.org/10.1101/2025.06.14.659707)

- Wohlwend, J., Corso, G., Passaro, S., Getz, N., Reveiz, M., Leidal, K., Swiderski, W., Atkinson, L., Portnoi, T., Chinn, I., Silterra, J., Jaakkola, T., & Barzilay, R. (2024).  
  **Boltz-1: Democratizing Biomolecular Interaction Modeling.** *bioRxiv.*  
    [![bioRxiv Boltz1](https://img.shields.io/badge/bioRxiv-Boltz1-orange)](https://doi.org/10.1101/2024.11.19.624167)

- Mirdita, M., Schütze, K., Moriwaki, Y., Heo, L., Ovchinnikov, S., & Steinegger, M. (2022).  
  **ColabFold: Making protein folding accessible to all.** *Nature Methods.*  
    [![ColabFold](https://img.shields.io/badge/ColabFold-Reference-yellow)](https://doi.org/10.1038/s41592-022-01488-1)

---

## Cite
If you use this notebook, please **cite the following repository**:

[![GitHub Repo](https://img.shields.io/badge/GitHub-Boltz--Notebook-181717?logo=github)](https://github.com/AtharvaTilewale/Boltz-Notebook)

In [None]:
# @title Install Dependencies and Boltz2 with CUDA support
import sys
import subprocess
import threading
import time
import os
import shutil

os.chdir("/content/")

# ANSI color codes for colored output
class Color:
    CYAN = "\033[96m"
    GREEN = "\033[92m"
    YELLOW = "\033[93m"
    RED = "\033[91m"
    RESET = "\033[0m"

repo_dirs = ["Boltz-Notebook"]

steps = [
    {
        "loader": f"{Color.CYAN}Cloning Notebook Modules...{Color.RESET}",
        "done":   f"[{Color.GREEN}✔{Color.RESET}] Notebook modules cloned successfully.",
        "fail":   f"[{Color.RED}✘{Color.RESET}] Boltz-Notebook clone failed.",
        "cmd": ["git", "clone", "https://github.com/AtharvaTilewale/Boltz-Notebook.git"]
    },
]

def loader(msg, stop_event):
    symbols = ["-", "\\", "|", "/"]
    i = 0
    while not stop_event.is_set():
        sys.stdout.write(f"\r[{symbols[i % len(symbols)]}] {msg}   ")
        sys.stdout.flush()
        time.sleep(0.1)
        i += 1
    sys.stdout.write("\r" + " " * (len(msg) + 10) + "\r")

# Step 1: Remove repo if it exists
for repo in repo_dirs:
    if os.path.isdir(repo):
        print(f"{Color.YELLOW}[i] Repository already exists. Removing '{repo}'...{Color.RESET}")
        try:
            shutil.rmtree(repo)
            print(f"[{Color.GREEN}✔{Color.RESET}] Existing repository '{repo}' removed.")
        except Exception as e:
            print(f"[{Color.RED}✘{Color.RESET}] Failed to remove '{repo}': {e}")
            raise

all_success = True

# Main steps
for step in steps:
    stop_event = threading.Event()
    t = threading.Thread(target=loader, args=(step["loader"], stop_event))
    t.start()
    try:
        subprocess.run(step["cmd"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, check=True)
        stop_event.set()
        t.join()
        print(step["done"])
    except Exception as e:
        stop_event.set()
        t.join()
        print(f"{step['fail']} {e}")
        all_success = False
        break

%run /content/Boltz-Notebook/dist/setup.py

if all_success:
    print(f"{Color.GREEN}All steps completed successfully.{Color.RESET}")


In [None]:

# @title Download CCD Dataset and Test Boltz2
import sys
import threading
import time
import os

# ANSI color codes for colored output
class Color:
    CYAN = "\033[96m"
    GREEN = "\033[92m"
    YELLOW = "\033[93m"
    RED = "\033[91m"
    RESET = "\033[0m"

def loader(msg, stop_event):
    symbols = ["-", "\\", "|", "/"]
    i = 0
    while not stop_event.is_set():
        sys.stdout.write(f"\r[{symbols[i % len(symbols)]}] {msg}   ")
        sys.stdout.flush()
        time.sleep(0.1)
        i += 1
    sys.stdout.write("\r" + " " * (len(msg) + 10) + "\r")
    sys.stdout.flush()

# Step 1: Create data directory
os.makedirs("/content/boltz_data", exist_ok=True)
os.chdir("/content/boltz_data/")

# Step 2: Write YAML file
yaml_content = f"""\
version: 1
sequences:
    - protein:
        id: [A]
        sequence: MVTPE
    - ligand:
        id: [B]
        ccd: SAH
"""
with open("/content/boltz_data/test.yaml", "w") as f:
    f.write(yaml_content)

# Step 3: Run boltz predict (silent)
step_msg = f"{Color.YELLOW}Downloading CCD Dataset...{Color.RESET}"
stop_event = threading.Event()
t = threading.Thread(target=loader, args=(step_msg, stop_event))
t.start()
try:
    import subprocess
    subprocess.run(
        ["boltz", "predict", "test.yaml", "--use_msa_server"],
        cwd="/content/boltz_data",
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL,
        check=True
    )
    stop_event.set()
    t.join()
    print(f"[{Color.GREEN}✔{Color.RESET}] CCD Dataset Downloaded and validated.")
except Exception as e:
    stop_event.set()
    t.join()
    print(f"[{Color.RED}✘{Color.RESET}] CCD Dataset Download or validation failed: {e}")


In [None]:
# @title Generate Parameters
%run /content/boltz_data/dist/param_gen.py

In [None]:
# @title Run Boltz2 Engine
%run /content/boltz_data/dist/Boltz_Run.py

In [None]:
# @title Analyse Results
%run /content/boltz_data/dist/analysis.py

In [None]:
# @title Copy Results to Drive
import shutil, os
from google.colab import drive
from Bio.PDB import MMCIFParser, PDBIO

# ANSI color codes for colored output
class Color:
    CYAN = "\033[96m"
    GREEN = "\033[92m"
    YELLOW = "\033[93m"
    RED = "\033[91m"
    BLUE = "\033[94m"
    MAGENTA = "\033[95m"
    RESET = "\033[0m"

# Mount Google Drive
drive.mount('/content/drive')

# Paths
drive_output_dir = f"/content/drive/MyDrive/Boltz2_Results/{job_name}"
local_output_path = f"/content/boltz_data/{job_name}"

# # Convert CIF to PDB
# cif_file = f"{local_output_path}/boltz_results_{job_name}/predictions/{job_name}/{job_name}_model_0.cif"
# pdb_file = f"{local_output_path}/{job_name}.pdb"

# parser = MMCIFParser(QUIET=True)
# structure = parser.get_structure("prot", cif_file)
# io = PDBIO()
# io.set_structure(structure)
# io.save(pdb_file)

# Remove old folder in Drive if exists
if os.path.exists(drive_output_dir):
    print(f"Removing existing folder {drive_output_dir}")
    shutil.rmtree(drive_output_dir)
    print("Old Drive folder removed.")

# Copy local output folder to Drive
shutil.copytree(local_output_path, drive_output_dir)
print(f"{Color.GREEN}All results copied to Google Drive: {drive_output_dir}{Color.RESET}")
# # Copy PDB file separately (optional, just in case)
# drive_pdb_file = os.path.join(drive_output_dir, os.path.basename(pdb_file))
# shutil.copy(pdb_file, drive_pdb_file)

In [None]:
# @title Download Results (.zip)
from google.colab import files
from Bio.PDB import MMCIFParser, PDBIO
import shutil
import os

# Local output folder you want to download
local_output_path = f"/content/boltz_data/{job_name}"

# # Convert CIF to PDB
# cif_file = f"{local_output_path}/boltz_results_{job_name}/predictions/{job_name}/{job_name}_model_0.cif"
# pdb_file = f"{local_output_path}/{job_name}.pdb"

# # Parse CIF and save as PDB
# parser = MMCIFParser(QUIET=True)
# structure = parser.get_structure("prot", cif_file)
# io = PDBIO()
# io.set_structure(structure)
# io.save(pdb_file)

# Path for the zip file
zip_file = f"/content/{job_name}.zip"

# Remove previous zip if exists
if os.path.exists(zip_file):
    os.remove(zip_file)

# Create zip of the entire folder
shutil.make_archive(base_name=f"/content/{job_name}", format='zip', root_dir=local_output_path)

# Download the zip file
files.download(zip_file)

# Success message
print(f"{Color.GREEN}Download successful! All results from '{job_name}' are saved in '{zip_file}'{Color.RESET}")
