
# --- Boltz2: Protein Structure Prediction Pipeline ---

![Python](https://img.shields.io/badge/Python-3.10-blue?logo=python)
![CUDA](https://img.shields.io/badge/CUDA-Enabled-green?logo=nvidia)
![Boltz2](https://img.shields.io/badge/Model-Boltz2-purple)
![Platform](https://img.shields.io/badge/Platform-Colab%20|%20Linux-lightgrey?logo=googlecolab)
![License](https://img.shields.io/badge/License-MIT-orange)
![Status](https://img.shields.io/badge/Status-Active-success)
![Build](https://img.shields.io/badge/Build-Stable-brightgreen)
![Contributions](https://img.shields.io/badge/Contributions-Welcome-blue)
<br>

---

## Boltz2: Deep Learning Pipeline for Protein Structure Prediction

Boltz2 is an **open-source, deep learning-based software** for predicting **3D protein structures** from amino acid sequences.  
It leverages **advanced neural networks** and **diffusion models** to generate accurate protein models, supporting both **monomeric** and **complex assemblies**.

---

###  Pipeline Overview
1. **Input**: Provide a protein sequence (and optional ligands).  
2. **YAML Generation**: The sequence is formatted into a YAML config.  
3. **MSA Search**: Boltz2 fetches multiple sequence alignments (MSA) using online servers.  
4. **Structure Prediction**: The neural network predicts 3D coordinates using diffusion and recycling steps.  
5. **Output**: Results include 3D models (CIF/PDB), confidence scores (**pLDDT**), and error heatmaps (**PAE**).  
6. **Visualization**: The notebook displays the predicted structure and confidence plots.  

---

 **Note:** This notebook automates the full Boltz2 workflow, from setup to visualization, with **color-coded status** and **interactive outputs**.  

---

##  Credits & Authorship

- **Notebook Developer:** Atharva Tilewale  
- **Affiliation:** Gujarat Biotechnology University | Bioinformatics & Computational Biology  
- **GitHub Repository:** [Boltz-Notebook](https://github.com/AtharvaTilewale/Boltz-Notebook)  
- **Contact:** [LinkedIn](https://www.linkedin.com/in/atharvatilewale) | [GitHub](https://github.com/AtharvaTilewale)  

**Acknowledgements:**  
- **Boltz2 framework**: [Original Boltz repository](https://github.com/jwohlwend/boltz) by J. Wohlwend and collaborators.  
- **Dependencies:** PyTorch, Biopython, NumPy, Matplotlib, Py3Dmol, PyYAML.  
- Special thanks to the **open-source community** for providing tools that make structural bioinformatics more accessible.  

---

## References

- Passaro, S., Corso, G., Wohlwend, J., Reveiz, M., Thaler, S., Somnath, V. R., Getz, N., Portnoi, T., Roy, J., Stark, H., Kwabi-Addo, D., Beaini, D., Jaakkola, T., & Barzilay, R. (2025).  
  **Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction.** *bioRxiv.*  
    [![bioRxiv Boltz2](https://img.shields.io/badge/bioRxiv-Boltz2-red)](https://doi.org/10.1101/2025.06.14.659707)

- Wohlwend, J., Corso, G., Passaro, S., Getz, N., Reveiz, M., Leidal, K., Swiderski, W., Atkinson, L., Portnoi, T., Chinn, I., Silterra, J., Jaakkola, T., & Barzilay, R. (2024).  
  **Boltz-1: Democratizing Biomolecular Interaction Modeling.** *bioRxiv.*  
    [![bioRxiv Boltz1](https://img.shields.io/badge/bioRxiv-Boltz1-orange)](https://doi.org/10.1101/2024.11.19.624167)

- Mirdita, M., Schütze, K., Moriwaki, Y., Heo, L., Ovchinnikov, S., & Steinegger, M. (2022).  
  **ColabFold: Making protein folding accessible to all.** *Nature Methods.*  
    [![ColabFold](https://img.shields.io/badge/ColabFold-Reference-yellow)](https://doi.org/10.1038/s41592-022-01488-1)

---

## Cite
If you use this notebook, please **cite the following repository**:

[![GitHub Repo](https://img.shields.io/badge/GitHub-Boltz--Notebook-181717?logo=github)](https://github.com/AtharvaTilewale/Boltz-Notebook)

In [None]:
# @title Install Dependencies and Boltz2 with CUDA support
import sys
import subprocess
import threading
import time
import os
import shutil

# ANSI color codes for colored output
class Color:
    CYAN = "\033[96m"
    GREEN = "\033[92m"
    YELLOW = "\033[93m"
    RED = "\033[91m"
    RESET = "\033[0m"

repo_dir = "boltz"

steps = [
    {
        "loader": f"{Color.CYAN}Cloning repository...{Color.RESET}",
        "done": f"[{Color.GREEN}✔{Color.RESET}] Repository cloned successfully.",
        "fail": f"[{Color.RED}✘{Color.RESET}] Repository clone failed.",
        "cmd": ["git", "clone", "https://github.com/jwohlwend/boltz.git"]
    },
    {
        "loader": f"{Color.RESET}Installing dependencies...{Color.RESET}",
        "done": f"[{Color.GREEN}✔{Color.RESET}] Dependencies installed successfully.",
        "fail": f"[{Color.RED}✘{Color.RESET}] Dependency installation failed.",
        "cmd": [sys.executable, "-m", "pip", "install", "-e", "boltz[cuda]", "biopython", "numpy", "matplotlib", "pyyaml", "py3Dmol", "git+https://github.com/AtharvaTilewale/Boltz-Notebook.git", "--quiet"]
    },
    {
        "loader": f"{Color.CYAN}Validating installation...{Color.RESET}",
        "done": f"[{Color.GREEN}✔{Color.RESET}] Validation complete.",
        "fail": f"[{Color.RED}✘{Color.RESET}] Validation failed.",
        "cmd": [sys.executable, "-c", "import torch; print('Torch CUDA available:', torch.cuda.is_available()); print('CUDA device count:', torch.cuda.device_count())"]
    }
]

def loader(msg, stop_event):
    symbols = ["-", "\\", "|", "/"]
    i = 0
    while not stop_event.is_set():
        sys.stdout.write(f"\r[{symbols[i % len(symbols)]}] {msg}   ")
        sys.stdout.flush()
        time.sleep(0.1)
        i += 1
    sys.stdout.write("\r" + " " * (len(msg) + 10) + "\r")

# Step 1: Remove repo if it exists
if os.path.isdir(repo_dir):
    print(f"{Color.YELLOW}[i] Repository already exists. Removing '{repo_dir}'...{Color.RESET}")
    try:
        shutil.rmtree(repo_dir)
        print(f"[{Color.GREEN}✔{Color.RESET}] Existing repository removed.")
    except Exception as e:
        print(f"[{Color.RED}✘{Color.RESET}] Failed to remove existing repository: {e}")
        raise

all_success = True

# Main steps
for step in steps:
    stop_event = threading.Event()
    t = threading.Thread(target=loader, args=(step["loader"], stop_event))
    t.start()
    try:
        subprocess.run(step["cmd"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, check=True)
        stop_event.set()
        t.join()
        print(step["done"])
    except Exception as e:
        stop_event.set()
        t.join()
        print(f"{step['fail']} {e}")
        all_success = False
        break

if all_success:
    print(f"{Color.GREEN}All steps completed successfully.{Color.RESET}")
    from logger import log_event
    log_event("Done")


In [None]:

# @title Download CCD Dataset and Test Boltz2
import sys
import threading
import time
import os

# ANSI color codes for colored output
class Color:
    CYAN = "\033[96m"
    GREEN = "\033[92m"
    YELLOW = "\033[93m"
    RED = "\033[91m"
    RESET = "\033[0m"

def loader(msg, stop_event):
    symbols = ["-", "\\", "|", "/"]
    i = 0
    while not stop_event.is_set():
        sys.stdout.write(f"\r[{symbols[i % len(symbols)]}] {msg}   ")
        sys.stdout.flush()
        time.sleep(0.1)
        i += 1
    sys.stdout.write("\r" + " " * (len(msg) + 10) + "\r")
    sys.stdout.flush()

# Step 1: Create data directory
os.makedirs("/content/boltz_data", exist_ok=True)
os.chdir("/content/boltz_data/")

# Step 2: Write YAML file
yaml_content = f"""\
version: 1
sequences:
    - protein:
        id: [A]
        sequence: MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQF
    - ligand:
        id: [B]
        ccd: SAH   # fetch ligand from CCD
    - ligand:
        id: [C]
        smiles: 'N[C@@H](Cc1ccc(O)cc1)C(=O)O'
"""
with open("/content/boltz_data/test.yaml", "w") as f:
    f.write(yaml_content)

# Step 3: Run boltz predict (silent)
step_msg = f"{Color.YELLOW}Downloading CCD Dataset...{Color.RESET}"
stop_event = threading.Event()
t = threading.Thread(target=loader, args=(step_msg, stop_event))
t.start()
try:
    import subprocess
    subprocess.run(
        ["boltz", "predict", "test.yaml", "--use_msa_server"],
        cwd="/content/boltz_data",
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL,
        check=True
    )
    stop_event.set()
    t.join()
    print(f"[{Color.GREEN}✔{Color.RESET}] CCD Dataset Downloaded and validated.")
except Exception as e:
    stop_event.set()
    t.join()
    print(f"[{Color.RED}✘{Color.RESET}] CCD Dataset Download or validation failed: {e}")


In [None]:
# @title Generate Parameters (YAML file)
# Colab HTML UI -> Python YAML saver
from IPython.display import HTML, display
import yaml
from google.colab import output
import os
import re

os.chdir("/content/boltz_data/")

# --- START: Custom YAML Formatting (No changes needed here) ---
class IdList(list): pass

def represent_id_list(dumper, data):
    return dumper.represent_sequence('tag:yaml.org,2002:seq', data, flow_style=True)

def str_presenter(dumper, data):
    if len(data) > 70 or '\n' in data:
        return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
    return dumper.represent_scalar('tag:yaml.org,2002:str', data)

class MyDumper(yaml.SafeDumper):
    pass

class QuotedString(str): pass

def quoted_str_presenter(dumper, data):
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style="'")

MyDumper.add_representer(QuotedString, quoted_str_presenter)

MyDumper.add_representer(IdList, represent_id_list)
MyDumper.add_representer(str, str_presenter)
# --- END: Custom YAML Formatting ---

def _save_params(data):
    if not isinstance(data, dict) or 'sequences' not in data:
        return {'status': 'error', 'message': 'Invalid data structure'}
    sequences_fixed = []
    for entry in data['sequences']:
        if 'protein' in entry:
            # Convert IDs to uppercase and remove spaces
            ids = IdList([i.upper().replace(' ', '') for i in entry['protein'].get('id', [])])
            # Convert sequence to uppercase and remove spaces
            seq = re.sub(r'\s+', '', entry['protein'].get('sequence', '').upper())
            protein_dict = {'id': ids, 'sequence': seq}
            sequences_fixed.append({'protein': protein_dict})
        elif 'ligand' in entry:
            # Convert IDs to uppercase and remove spaces
            ids = IdList([i.upper().replace(' ', '') for i in entry['ligand'].get('id', [])])
            ligand_dict = {'id': ids}
            if 'ccd' in entry['ligand']:
                # Convert CCD to uppercase and remove spaces
                ligand_dict['ccd'] = entry['ligand']['ccd'].upper().replace(' ', '')
            if 'smiles' in entry['ligand']:
                # Preserve case (SMILES can be case-sensitive!), remove spaces
                smiles_val = entry['ligand']['smiles'].replace(' ', '')
                # Wrap in single quotes explicitly
                ligand_dict['smiles'] = QuotedString(smiles_val)
            sequences_fixed.append({'ligand': ligand_dict})
    filename = "params.yaml"
    try:
        with open(filename, 'w') as f:
            yaml.dump(
                {'version': 1, 'sequences': sequences_fixed},
                f, Dumper=MyDumper, sort_keys=False, default_flow_style=False, indent=2
            )
        return {'status': 'ok', 'filename': filename}
    except Exception as e:
        return {'status': 'error', 'message': str(e)}

output.register_callback('save_params', _save_params)

# HTML + JS with a Revamped UI
html = r"""
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.2/css/all.min.css">
<style>
    /* --- 1. THEME & GLOBAL STYLES --- */
    :root {
        --font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif;
        --primary-color: #3b82f6; /* Blue 500 */
        --primary-hover: #2563eb; /* Blue 600 */
        --danger-color: #ef4444;  /* Red 500 */
        --danger-hover: #dc2626;  /* Red 600 */
        --secondary-color: #6b7280; /* Gray 500 */
        --secondary-hover: #4b5563; /* Gray 600 */
        --bg-light: #f9fafb;      /* Gray 50 */
        --border-color: #d1d5db;  /* Gray 300 */
        --text-dark: #1f2937;     /* Gray 800 */
        --text-light: #4b5563;    /* Gray 600 */
        --radius: 8px;
        --shadow: 0 4px 6px -1px rgba(0,0,0,0.1), 0 2px 4px -2px rgba(0,0,0,0.1);
    }
    @keyframes fadeIn { from { opacity: 0; transform: translateY(-10px); } to { opacity: 1; transform: translateY(0); } }

    /* --- 2. LAYOUT & TYPOGRAPHY --- */
    .container { font-family: var(--font-family); color: var(--text-dark); background: #fff; padding: 24px; }
    .block {
        border: 1px solid var(--border-color);
        padding: 20px;
        margin: 16px 0;
        border-radius: var(--radius);
        background: #fff;
        box-shadow: var(--shadow);
        animation: fadeIn 0.4s ease-out;
        border-top: 4px solid var(--primary-color);
    }
    .block[data-type="ligand"] { border-top-color: #a855f7; } /* Purple 500 for Ligand */

    .block-header { display:flex; justify-content:space-between; align-items:center; margin-bottom:16px; }
    .title { font-weight: 600; font-size: 1.1em; color: var(--text-dark); display:flex; align-items:center; gap: 8px; }
    .row { display:flex; gap:12px; align-items:center; margin-bottom:12px; }
    .row label { width: 100px; color: var(--text-light); font-size: 0.9em; }

    /* --- 3. FORMS & BUTTONS --- */
    input[type="text"], textarea, select {
        flex: 1; padding: 10px; border: 1px solid var(--border-color); border-radius: 6px;
        font-size: 14px; color: var(--text-dark); background: var(--bg-light);
        transition: border-color 0.2s, box-shadow 0.2s;
    }
    input[type="text"]:focus, textarea:focus, select:focus {
        outline: none; border-color: var(--primary-color);
        box-shadow: 0 0 0 2px rgba(59, 130, 246, 0.4);
    }
    .btn {
        display: inline-flex; align-items: center; gap: 6px;
        border: none; padding: 8px 16px; border-radius: 6px; cursor: pointer;
        font-size: 14px; font-weight: 500;
        transition: background-color 0.2s, transform 0.1s;
    }
    .btn:active { transform: scale(0.98); }
    .btn.primary { background: var(--primary-color); color: #fff; }
    .btn.primary:hover { background: var(--primary-hover); }
    .btn.secondary { background: var(--secondary-color); color: #fff; }
    .btn.secondary:hover { background: var(--secondary-hover); }

    .remove-btn {
        background: transparent;
        color: var(--secondary-color);
        border: none;
        width: 32px;
        height: 32px;
        border-radius: 50%;
        cursor: pointer;
        display: inline-flex;
        align-items: center;
        justify-content: center;
        font-size: 1em;
        transition: background-color 0.2s, color 0.2s;
    }
    .remove-btn:hover {
        background-color: #fee2e2;
        color: var(--danger-color);
    }

    /* --- 4. CONTROLS & STATUS --- */
    .controls { margin-top: 24px; display:flex; gap:10px; flex-wrap:wrap; align-items:center; }
    .status-message {
        display: flex; align-items: center; gap: 8px;
        padding: 8px 12px; border-radius: 6px; font-size: 0.9em;
        animation: fadeIn 0.3s;
    }
    .status-message.success { background-color: #dcfce7; color: #166534; } /* Green */
    .status-message.error   { background-color: #fee2e2; color: #991b1b; } /* Red */
    .status-message.warning { background-color: #fef3c7; color: #92400e; } /* Amber */
</style>

<div class="container">
  <div id="sequences_container"></div>

  <div class="controls">
    <button class="btn primary" onclick="addProtein()"><i class="fa-solid fa-dna"></i> Add Protein</button>
    <button class="btn primary" style="background-color:#a855f7;" onclick="addLigand()"><i class="fa-solid fa-puzzle-piece"></i> Add Ligand</button>
    <button class="btn secondary" onclick="clearAll()"><i class="fa-solid fa-broom"></i> Clear Added</button>
    <button class="btn secondary" id="saveBtn" onclick="save()"><i id="saveIcon" class="fa-solid fa-save"></i> <span id="saveBtnText">Save YAML</span></button>
    <div id="status"></div>
  </div>

  <template id="first_protein_template">
    <div class="block seq-block first-protein" data-type="protein">
      <div class="block-header"><div class="title"><i class="fa-solid fa-dna"></i>Protein (Primary)</div></div>
      <div class="row"><label>IDs (comma):</label><input class="p-ids" type="text" placeholder="A,B" oninput="formatIDs(this)"/></div>
      <div class="row"><label>Sequence:</label><textarea class="p-seq" rows="5" style="text-transform: uppercase;"></textarea></div>
    </div>
  </template>

  <template id="protein_template">
    <div class="block seq-block" data-type="protein">
      <div class="block-header">
        <div class="title"><i class="fa-solid fa-dna"></i>Protein</div>
        <button class="remove-btn" onclick="removeBlock(this)" title="Remove block"><i class="fa-solid fa-trash-can"></i></button>
      </div>
      <div class="row"><label>IDs (comma):</label><input class="p-ids" type="text" placeholder="C,D" oninput="formatIDs(this)"/></div>
      <div class="row"><label>Sequence:</label><textarea class="p-seq" rows="5" style="text-transform: uppercase;"></textarea></div>
    </div>
  </template>

  <template id="ligand_template">
    <div class="block seq-block" data-type="ligand">
      <div class="block-header">
        <div class="title"><i class="fa-solid fa-puzzle-piece"></i>Ligand</div>
        <button class="remove-btn" onclick="removeBlock(this)" title="Remove block"><i class="fa-solid fa-trash-can"></i></button>
      </div>
      <div class="row"><label>IDs (comma):</label><input class="l-ids" type="text" placeholder="E,F" oninput="formatIDs(this)"/></div>
      <div class="row"><label>Type:</label>
        <select class="l-type" onchange="onLigandTypeChange(this)">
          <option value="ccd">CCD</option><option value="smiles">SMILES</option>
        </select>
      </div>
      <div class="row lig-value-row"><label>Value:</label><input class="l-value" style="text-transform: uppercase;" type="text" placeholder="e.g., SAH" /></div>
    </div>
  </template>
</div>

<script>
  const container = document.getElementById('sequences_container');

  function forceUpperCase(inputElement) {
    // Preserves cursor position while forcing uppercase
    const start = inputElement.selectionStart;
    const end = inputElement.selectionEnd;
    inputElement.value = inputElement.value.toUpperCase();
    inputElement.setSelectionRange(start, end);
  }

  function formatIDs(inputElement) {
    const originalValue = inputElement.value;
    const formattedValue = originalValue.replace(/[\s,]/g, '').split('').join(',');
    inputElement.value = formattedValue.toUpperCase();
  }

  function addBlock(templateId) {
      const tpl = document.getElementById(templateId);
      const node = tpl.content.cloneNode(true);
      container.appendChild(node);
  }
  function addProtein(first=false) { addBlock(first ? 'first_protein_template' : 'protein_template'); }
  function addLigand() { addBlock('ligand_template'); }

  function removeBlock(btn) {
      const block = btn.closest('.seq-block');
      if (block) block.remove();
  }

  function clearAll() {
      container.querySelectorAll('.seq-block:not(.first-protein)').forEach(el => el.remove());
      const first = container.querySelector('.first-protein');
      if (first) {
          first.querySelectorAll('input, textarea').forEach(el => el.value = '');
      }
      document.getElementById('status').innerHTML = '';
  }

  function onLigandTypeChange(select) {
      const valueInput = select.closest('.seq-block').querySelector('.l-value');
      valueInput.placeholder = select.value === 'ccd' ? 'e.g., SAH' : 'e.g., CCO... (SMILES)';
  }

  function setStatus(message, type) {
      const statusEl = document.getElementById('status');
      const icon = {
          success: 'fa-check-circle',
          error: 'fa-circle-xmark',
          warning: 'fa-triangle-exclamation'
      }[type] || 'fa-circle-info';

      statusEl.innerHTML = `<div class="status-message ${type}"><i class="fa-solid ${icon}"></i> ${message}</div>`;
  }

  async function save() {
      const saveBtn = document.getElementById('saveBtn');
      const saveIcon = document.getElementById('saveIcon');
      const saveBtnText = document.getElementById('saveBtnText');

      setStatus('Validating...', 'warning');
      const sequences = [];
      const blocks = document.querySelectorAll('.seq-block');
      const allIDs = new Set(); // To track all unique IDs
      let valid = true;

      for (const [idx, b] of Array.from(blocks).entries()) {
          const type = b.dataset.type;
          let currentIds = [];

          // --- Basic Validation and Data Gathering ---
          if (type === 'protein') {
              currentIds = (b.querySelector('.p-ids').value || '').split(',').map(s => s.trim()).filter(Boolean);
              const seq = b.querySelector('.p-seq').value.trim();
              if (currentIds.length === 0 || !seq) {
                  valid = false;
                  setStatus(`<strong>Error:</strong> Protein block ${idx + 1} requires both IDs and a Sequence.`, 'error');
              } else {
                  sequences.push({ protein: { id: currentIds, sequence: seq } });
              }
          } else if (type === 'ligand') {
              currentIds = (b.querySelector('.l-ids').value || '').split(',').map(s => s.trim()).filter(Boolean);
              const ltype = b.querySelector('.l-type').value;
              const lvalue = b.querySelector('.l-value').value.trim();
              if (currentIds.length === 0 || !lvalue) {
                  valid = false;
                  setStatus(`<strong>Error:</strong> Ligand block ${idx + 1} requires both IDs and a Value.`, 'error');
              } else {
                  const entry = { id: currentIds };
                  if (ltype === 'ccd') entry.ccd = lvalue; else entry.smiles = lvalue;
                  sequences.push({ ligand: entry });
              }
          }

          if (!valid) break;

          // --- Uniqueness Validation ---
          for (const id of currentIds) {
              if (allIDs.has(id)) {
                  valid = false;
                  setStatus(`<strong>Error:</strong> Duplicate ID '<strong>${id}</strong>' found in block ${idx + 1}. IDs must be unique.`, 'error');
                  break;
              }
              allIDs.add(id);
          }
          if (!valid) break;
      }

      if (!valid) return;

      // --- Proceed to Save ---
      saveBtn.disabled = true;
      saveBtnText.innerText = 'Saving...';
      saveIcon.className = 'fa-solid fa-spinner fa-spin';

      try {
          const result = await google.colab.kernel.invokeFunction('save_params', [{ sequences }], {});

          if (result && result.status === 'ok') {
              setStatus(`Saved as<strong>params.yaml</strong>`, 'success');
          } else {
              setStatus(`<strong>Error:</strong> ${result?.message || 'Unknown error occurred.'}`, 'error');
          }
      } catch (err) {
          setStatus(`<strong>Save failed:</strong> ${err.toString()}`, 'error');
      } finally {
          saveBtn.disabled = false;
          saveBtnText.innerText = 'Save YAML';
          saveIcon.className = 'fa-solid fa-save';
      }
  }

  // Initialize the UI
  addProtein(true);
</script>
"""

display(HTML(html))

In [None]:
# @title Boltz2 Engine
import sys
import threading
import time
import os
import re
import shutil
import numpy as np
import matplotlib.pyplot as plt
from Bio.PDB import MMCIFParser, PDBIO
import py3Dmol
import subprocess

# ANSI color codes for colored output
class Color:
    CYAN = "\033[96m"
    GREEN = "\033[92m"
    YELLOW = "\033[93m"
    RED = "\033[91m"
    BLUE = "\033[94m"
    MAGENTA = "\033[95m"
    RESET = "\033[0m"

def loader(msg, stop_event):
    symbols = ["⠋","⠙","⠹","⠸","⠼","⠴","⠦","⠧","⠇","⠏"]
    i = 0
    while not stop_event.is_set():
        sys.stdout.write(f"\r[{symbols[i % len(symbols)]}] {msg}   ")
        sys.stdout.flush()
        time.sleep(0.1)
        i += 1
    sys.stdout.write("\r" + " " * (len(msg) + 10) + "\r")
    sys.stdout.flush()

# Set up parameters
os.chdir("/content/boltz_data/")
job_name = "Insulin" # @param {type:"string"}
recycling_steps = 3   # @param {type:"integer"}
diffusion_samples = 1  # @param {type:"integer"}
use_potentials = True  # @param {type:"boolean"}
override = True  # @param {type:"boolean"}

output_path = f"/content/boltz_data/{job_name}"
if os.path.exists(output_path):
    shutil.rmtree(output_path)

# # Ensure the original params.yaml exists before moving
# if os.path.exists('/content/boltz_data/params.yaml'):
#     !mv /content/boltz_data/params.yaml '/content/boltz_data/{job_name}_pre.yaml'

source_file = '/content/boltz_data/params.yaml'
param_file = f'/content/boltz_data/{job_name}.yaml'

if os.path.exists(source_file):
  sed_command = f"sed '/sequence: |-/ {{ N; s/|-\\n\\s*/ / }}' {source_file} > {param_file}"
  subprocess.run(sed_command, shell=True, check=True)
else:
    if not os.path.exists(param_file):
        raise FileNotFoundError(f"Cannot proceed: The parameter file '{param_file}' does not exist and the source '{source_file}' was not found to create it.")
    else:
        print(f"Source '{source_file}' not found. Using existing formatted file: '{param_file}'")

# Prepare command
cmd = [
    "boltz", "predict", param_file,
    "--use_msa_server",
    "--out_dir", job_name,
    "--recycling_steps", str(recycling_steps),
    "--diffusion_samples", str(diffusion_samples),
]

if use_potentials: cmd.append("--use_potentials")
if override: cmd.append("--override")

# Run boltz predict with loader and silent output
step_msg = f"{Color.RESET}Running Boltz2 prediction...{Color.RESET}"
stop_event = threading.Event()
t = threading.Thread(target=loader, args=(step_msg, stop_event))
t.start()
try:
    subprocess.run(cmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, check=True)
    stop_event.set()
    t.join()
    print(f"[{Color.GREEN}✔{Color.RESET}] Boltz2 run finished successfully!")
except Exception as e:
    stop_event.set()
    t.join()
    print(f"[{Color.RED}✘{Color.RESET}] Boltz2 run failed: {e}")

# ### CORRECTED VISUALIZATION FUNCTION ###
def visualize_boltz_results(job_name, model_id=0, b_min=50, b_max=90):
    """
    Visualize Boltz2 results with:
      - 3D protein structure colored by pLDDT
      - pLDDT confidence plot
      - PAE heatmap
    """
    base_path = f"/content/boltz_data/{job_name}/boltz_results_{job_name}/predictions/{job_name}"
    cif_file   = f"{base_path}/{job_name}_model_{model_id}.cif"
    plddt_file = f"{base_path}/plddt_{job_name}_model_{model_id}.npz"
    pae_file   = f"{base_path}/pae_{job_name}_model_{model_id}.npz"
    # pdb_file   = f"/content/boltz_data/{job_name}/{job_name}_model_{model_id}.pdb"

    # # --- Convert CIF to PDB ---
    # print(f"[{Color.BLUE}i{Color.RESET}] Converting CIF to PDB for visualization...")
    # parser = MMCIFParser(QUIET=True)
    # structure = parser.get_structure("protein", cif_file)
    # io = PDBIO()
    # io.set_structure(structure)
    # io.save(pdb_file)

    # --- Load pLDDT scores ---
    plddt = np.load(plddt_file)["plddt"]

    # --- Load PDB into py3Dmol viewer ---
    with open(cif_file, "r") as f:
        cif_data = f.read()

    print(f"\n{Color.CYAN}{'='*50}{Color.RESET}")
    print(f"{Color.MAGENTA}3D Protein Structure Colored by pLDDT{Color.RESET}")
    print(f"{Color.RESET}Blue = low confidence, Red = high confidence.{Color.RESET}")
    print(f"{Color.CYAN}{'='*50}{Color.RESET}\n")

    viewer = py3Dmol.view(width=600, height=500)
    viewer.addModel(cif_data, "cif")

    # Color cartoon by B-factor (pLDDT stored in b)
    viewer.setStyle({'cartoon': {'colorscheme': {'prop':'b',
                                                 'gradient': 'roygb',
                                                 'min': b_min,
                                                 'max': b_max}}})

    # Sticks for ligands/ions
    viewer.addStyle({'hetflag': True}, {'stick': {'colorscheme': 'default'}})

    viewer.zoomTo()
    viewer.show()

    # --- pLDDT Plot ---
    print(f"\n{Color.CYAN}{'='*50}{Color.RESET}")
    print(f"{Color.MAGENTA}Predicted Local Distance Difference Test (pLDDT){Color.RESET}")
    print(f"{Color.RESET}Confidence score per residue: Higher = more reliable structure.{Color.RESET}")
    print(f"{Color.CYAN}{'='*50}{Color.RESET}\n")

    plddt = np.load(plddt_file)["plddt"]
    plt.figure(figsize=(10,4))
    plt.plot(plddt, label="pLDDT", color="blue")
    plt.xlabel("Residue index")
    plt.ylabel("pLDDT score")
    plt.title(f"Model {model_id} | Confidence per residue")
    plt.legend()
    plt.tight_layout(pad=3.0)
    plt.show()

    # --- PAE Heatmap ---
    print(f"\n{Color.CYAN}{'='*50}{Color.RESET}")
    print(f"{Color.MAGENTA}Predicted Aligned Error (PAE) Heatmap{Color.RESET}")
    print(f"{Color.RESET}Shows expected positional error between residue pairs.\nLower values = more reliable alignment.{Color.RESET}")
    print(f"{Color.CYAN}{'='*50}{Color.RESET}\n")

    pae = np.load(pae_file)["pae"]
    plt.figure(figsize=(6,5))
    plt.imshow(pae, cmap="viridis", origin="lower")
    plt.colorbar(label="Predicted Aligned Error (Å)")
    plt.title(f"Model {model_id} | PAE Heatmap")
    plt.xlabel("Residue index")
    plt.ylabel("Residue index")
    plt.tight_layout(pad=3.0)
    plt.show()

# ### CORRECTED TRIGGER CONDITION ###
# Check for the correct filename to start the visualization.
cif_to_check = f"/content/boltz_data/{job_name}/boltz_results_{job_name}/predictions/{job_name}/{job_name}_model_0.cif"
if os.path.exists(cif_to_check):
    visualize_boltz_results(job_name=job_name, model_id=0)
else:
    print(f"[{Color.RED}✘{Color.RESET}] Output file not found, skipping visualization. Searched for: {cif_to_check}")

In [None]:
# @title Copy Results to Drive
import shutil, os
from google.colab import drive
from Bio.PDB import MMCIFParser, PDBIO

# ANSI color codes for colored output
class Color:
    CYAN = "\033[96m"
    GREEN = "\033[92m"
    YELLOW = "\033[93m"
    RED = "\033[91m"
    BLUE = "\033[94m"
    MAGENTA = "\033[95m"
    RESET = "\033[0m"

# Mount Google Drive
drive.mount('/content/drive')

# Paths
drive_output_dir = f"/content/drive/MyDrive/Boltz2_Results/{job_name}"
local_output_path = f"/content/boltz_data/{job_name}"

# # Convert CIF to PDB
# cif_file = f"{local_output_path}/boltz_results_{job_name}/predictions/{job_name}/{job_name}_model_0.cif"
# pdb_file = f"{local_output_path}/{job_name}.pdb"

# parser = MMCIFParser(QUIET=True)
# structure = parser.get_structure("prot", cif_file)
# io = PDBIO()
# io.set_structure(structure)
# io.save(pdb_file)

# Remove old folder in Drive if exists
if os.path.exists(drive_output_dir):
    print(f"Removing existing folder {drive_output_dir}")
    shutil.rmtree(drive_output_dir)
    print("Old Drive folder removed.")

# Copy local output folder to Drive
shutil.copytree(local_output_path, drive_output_dir)
print(f"{Color.GREEN}All results copied to Google Drive: {drive_output_dir}{Color.RESET}")
# # Copy PDB file separately (optional, just in case)
# drive_pdb_file = os.path.join(drive_output_dir, os.path.basename(pdb_file))
# shutil.copy(pdb_file, drive_pdb_file)

In [None]:
# @title Download Results (.zip)
from google.colab import files
from Bio.PDB import MMCIFParser, PDBIO
import shutil
import os

# Local output folder you want to download
local_output_path = f"/content/boltz_data/{job_name}"

# # Convert CIF to PDB
# cif_file = f"{local_output_path}/boltz_results_{job_name}/predictions/{job_name}/{job_name}_model_0.cif"
# pdb_file = f"{local_output_path}/{job_name}.pdb"

# # Parse CIF and save as PDB
# parser = MMCIFParser(QUIET=True)
# structure = parser.get_structure("prot", cif_file)
# io = PDBIO()
# io.set_structure(structure)
# io.save(pdb_file)

# Path for the zip file
zip_file = f"/content/{job_name}.zip"

# Remove previous zip if exists
if os.path.exists(zip_file):
    os.remove(zip_file)

# Create zip of the entire folder
shutil.make_archive(base_name=f"/content/{job_name}", format='zip', root_dir=local_output_path)

# Download the zip file
files.download(zip_file)

# Success message
print(f"{Color.GREEN}Download successful! All results from '{job_name}' are saved in '{zip_file}'{Color.RESET}")
