<a href="https://colab.research.google.com/github/JKourelis/Colab_Boltz-2/blob/main/Boltz_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://raw.githubusercontent.com/jwohlwend/boltz/main/docs/boltz2_title.png" height="200" align="right" style="height:240px">

## Boltz-2: Democratizing Biomolecular Interaction Modeling

Easy to use protein structure and binding affinity prediction using [Boltz-2](https://doi.org/10.1101/2025.06.14.659707). Boltz-2 is a biomolecular foundation model that jointly models complex structures and binding affinities, approaching [AlphaFold3](https://www.nature.com/articles/s41586-024-07487-w) accuracy while running 1000x faster than physics-based methods.

**Key Features:**
- **Structure Prediction**: Protein, DNA, RNA, and ligand complexes with AlphaFold3-level accuracy
- **Binding Affinity**: First deep learning model to approach FEP accuracy for drug discovery
- **Open Source**: MIT license for academic and commercial use
- **Fast**: 1000x faster than traditional physics-based methods

**Usage Options:**
1. **Manual Input**: Enter sequences directly in the configuration boxes below
2. **FASTA Upload**: Upload FASTA files for batch processing

[Wohlwend J, Corso G, Passaro S, et al. Boltz-1: Democratizing Biomolecular Interaction Modeling. *bioRxiv*, 2024](https://doi.org/10.1101/2024.11.19.624167)

[Passaro S, Corso G, Wohlwend J, et al. Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction. *bioRxiv*, 2025](https://doi.org/10.1101/2025.06.14.659707)

In [None]:
#@title Cell 1: Install Boltz-2 and Dependencies
%%time
import subprocess
import sys
import os

def run_cmd(cmd, desc):
    print(f"[{desc}]")
    result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
    if result.returncode != 0:
        print(f"FAILED: {result.stderr[:200]}")
        return False
    print("OK")
    return True

def install_boltz():
    # Complete cleanup
    if not run_cmd(
        f"{sys.executable} -m pip uninstall torch torchvision torchaudio pytorch-lightning torchmetrics boltz -y",
        "Removing existing installations"
    ):
        pass  # Continue even if uninstall fails

    # Install PyTorch
    if not run_cmd(
        f"{sys.executable} -m pip install torch==2.4.1 torchvision==0.19.1 --index-url https://download.pytorch.org/whl/cu121",
        "Installing compatible PyTorch+torchvision"
    ):
        return False

    # Install lightning stack
    if not run_cmd(
        f"{sys.executable} -m pip install pytorch-lightning==2.4.0 torchmetrics==1.4.0",
        "Installing lightning stack"
    ):
        return False

    # Install boltz
    if not run_cmd(
        f"{sys.executable} -m pip install boltz",
        "Installing boltz"
    ):
        return False

    # Test installation
    print("[Testing boltz]")
    test_result = subprocess.run(["boltz", "--help"], capture_output=True, text=True, timeout=30)

    if test_result.returncode == 0:
        print("SUCCESS")
        with open("/content/BOLTZ_READY", "w") as f:
            f.write("Ready")
        return True
    else:
        print("FAILED:")
        print(test_result.stderr)
        return False

# Execute installation
if install_boltz():
    print("\nBoltz-2 installation complete")
else:
    print("\nInstallation failed")

[Removing existing installations]
OK
[Installing compatible PyTorch+torchvision]
OK
[Installing lightning stack]
OK
[Installing boltz]
OK
[Testing boltz]
SUCCESS

Boltz-2 installation complete
CPU times: user 778 ms, sys: 139 ms, total: 917 ms
Wall time: 5min 12s


In [None]:
#@title Cell 2: Choose Input Method
input_method = "FASTA Upload" #@param ["Manual Input", "FASTA Upload"]
#@markdown **Manual Input**: Enter sequences directly in the next cell
#@markdown **FASTA Upload**: Upload FASTA files for batch processing

# Initialize global variables
import hashlib
import os

# Global settings that will be used throughout
global_settings = {
    'input_method': input_method,
    'sequences': [],
    'batch_jobs': [],
    'drive': None,
    'gdrive_folder_name': "Boltz2_Predictions",
    'final_jobname': None
}

def add_hash(x, y):
    return x + "_" + hashlib.sha1(y.encode()).hexdigest()[:5]

print(f"✅ Selected input method: {input_method}")
if input_method == "Manual Input":
    print("📝 Next: Configure your sequences in the Manual Input cell")
else:
    print("📁 Next: Upload your FASTA files in the FASTA Upload cell")

✅ Selected input method: FASTA Upload
📁 Next: Upload your FASTA files in the FASTA Upload cell


In [None]:
#@title Cell 3: Manual Input Configuration (Skip if using FASTA Upload)
#@markdown Only run this cell if you selected "Manual Input" above

# Job configuration
jobname = 'Calvin' #@param {type:"string"}
#@markdown - Job name for output files

# Google Drive setup
setup_google_drive = True #@param {type:"boolean"}
#@markdown - Setup Google Drive for automatic result upload
gdrive_folder_name = "Boltz2_Predictions" #@param {type:"string"}
#@markdown - Google Drive folder name

# Sequence inputs
seq1_name = 'A' #@param {type:"string"}
seq1_type = "protein" #@param ["protein", "dna", "rna", "smiles", "ccd"]
seq1_content = 'MTMSPSTRAIPFASVSVFVHVICRLDAPSPTAANREREVREALLKLGVTLACAEHAADLI VSVSENETAVGRTEASERVAVTRACVQECIDRLKKLLANVKIVGPKAHCSGGKEVTCIGR SEGTAANRTNTGGEGVQELTHHNGITNRPSVQIEAVRMEPGSENCSVSHLGDIKVVVVET LTDVPYLAEFDPSECGTTACWFGGLEDVSDVPYMSPSEEPVRIATSVNQDVVGPCALSSE VLQPTRSECGTTTPLSPCSPLGEIRSSSNFEQVDSRLSPTPILAGLTDGASRTGGNCTSE VVERSGGGSLLPCTNVAAVTAAGDLVATPSLSVTAGTVGTENEDRVVRKRKRSSGATASR PRECKRALDKGSTKRKRSGSTNKARNAVDASEPAAEDPLQPLQHEIVYVDDDSLDAPPSP PPDKQSKQPRKRKVKPYVSSSSRRGRRYPHRIFVSCHDEEMERVLNDVVCHIGASAVDYL FGRVEKPTHFVTEPDSELTPAVLLARALGIPVLTTQWLQDTITQGKFPEDLENYEHPVFG KRSAPTQGPSTGTDGTTTLSQPRTALATNLSTNSATRCFESDTGCYYNPFLFGVTVFFVS SPHSPLTEFFAEIVRVLGGTVTRLPTCRKLSVIVNLSYGAISAPEEETFKKGEQTSARVG SKRFRISSALPQVDELVLESQALSFTYPNASLDISNGRKSGMSRVDLNTVLCKDLQNVLK TCQECRSDTVPVVSVEWLIHCIMQGRVVETSPYTVPTLTDDTTTILKTCVASGLLDSAEA LLQCVVGKS' #@param {type:"string"}
seq1_copies = 1 #@param {type:"integer"}

seq2_name = 'B' #@param {type:"string"}
seq2_type = "protein" #@param ["protein", "dna", "rna", "smiles", "ccd"]
seq2_content = 'MTTELNSFRVRKSEGAEGEEEDPALTELVFKKLRQTFLCPICHRPLQENPTALDVCGHVF CHSCIVNAIEKSSPSVKDPWEEDERQLTENSHGDQWSSPQKRNSRGRGNINDVRTSPVAK SGRGRSTKRLRLGQSCPICSVPAQISDLISVSLVSNLVSDIMKHPLLSAALVSPKGNDDN DLVKAGHIEEEEALAPSAEVSQVLSTLSGVSLTGNAGVQTVTLQSNNAPPVAEEATGEPE KKSEHGIHRQAIGSLPYSPSATPMGTSPLSVTSHVSVNTTNPHSTVTLSDHNFPVSVRSR SGSPEVVGGVHQTMPEPHGRSKTRGVGCSSPSEVCRPLTSNGDAENSDVLRNFEELSSSC SESGSSQDLQRHSEPCHRTNAVGSGTAGGGALTPVATAVSTSVQSSASTRHFSKKPQEDV GVIDNVGGDALTPVATAVSTSVQSSASTRHFSQKPQEDVGVIDNVGGDALTPVATAVSTS VQSNTLPKDAPTVREVGEGDGKDSSSDSSLSSGDSSLSSSSSSSFFGESFNFRVHAQRSE IPTESALIFETKGTANTHENQVHNGGVAVTQNLSSCDPDICSGRELLNHGKNIAGGSGAR ERSTSTISTSQLSEHKDDVFRLPHTFSVDASVENSTKTGEVLVSAAQMFGARVLDERMGS VDSRLRIYDPKPQKVIRHVFRLPSCQGSGAPAETQLCWKERHYAAASITCSYCLIMPTEG RLASISDDGSCISDCGGVPSARDSNLTSMTPTTACALVSGALVTDFRWIVESVAARCLLP ALQYSKRPSWSRHESVSTCGGSGASVDNQTPGRWAAEGHAFMLLPDSVLKLLLQQQTSGS ISMRGSSRELSATVTGGYDFCSWRRLILLSGGVLLRFPEECVRQLLLDAMSMSYNDVMIG RNGCDHDERMTAVQNAMHMANGAGRSVFNVECCPERCTASTFLIRNVIILRDSVSTGKDA PSSGSQLLFKRRLERILQSFSVLLSLVQSREVTPVRDVSPSQVFVGSKPFVTFGDVTERY SAPHVMLRSTKWLLRTFSGRLQDSSCESCVGSDQ' #@param {type:"string"}
seq2_copies = 1 #@param {type:"integer"}

seq3_name = 'C' #@param {type:"string"}
seq3_type = "protein" #@param ["protein", "dna", "rna", "smiles", "ccd"]
seq3_content = '' #@param {type:"string"}
seq3_copies = 1 #@param {type:"integer"}

# Check if this cell should run
if 'global_settings' not in globals():
    print("⚠️  Please run the 'Choose Input Method' cell first")
elif global_settings['input_method'] != "Manual Input":
    print("⏭️  Skipping manual input (FASTA Upload selected)")
else:
    # Setup Google Drive if requested
    drive = None
    if setup_google_drive:
        try:
            from pydrive2.drive import GoogleDrive
            from pydrive2.auth import GoogleAuth
            from google.colab import auth
            from oauth2client.client import GoogleCredentials
            from google.colab import files

            print("Setting up Google Drive...")
            auth.authenticate_user()
            gauth = GoogleAuth()
            gauth.credentials = GoogleCredentials.get_application_default()
            drive = GoogleDrive(gauth)
            print("✅ Google Drive connected successfully!")
        except Exception as e:
            print(f"❌ Google Drive setup failed: {e}")
            drive = None

    # Process sequences
    sequences = []
    all_sequences = [
        (seq1_name, seq1_type, seq1_content, seq1_copies),
        (seq2_name, seq2_type, seq2_content, seq2_copies),
        (seq3_name, seq3_type, seq3_content, seq3_copies)
    ]

    for name, seq_type, content, copies in all_sequences:
        if content.strip():  # Only process non-empty sequences
            chain_ids = []
            for i in range(copies):
                if copies == 1:
                    chain_ids.append(name)
                else:
                    chain_ids.append(f"{name}{i+1}")

            sequences.append({
                'name': name,
                'type': seq_type,
                'content': content.strip(),
                'copies': copies,
                'chain_ids': chain_ids
            })

    # Generate jobname hash
    if sequences:
        sequence_string = "".join([seq['content'] for seq in sequences])
        final_jobname = add_hash(jobname.replace(' ', '_'), sequence_string)

        # Update global settings
        global_settings.update({
            'sequences': sequences,
            'drive': drive,
            'gdrive_folder_name': gdrive_folder_name,
            'final_jobname': final_jobname
        })

        print("✅ Manual sequences configured:")
        print(f"📁 Job name: {final_jobname}")
        for seq in sequences:
            print(f"  {seq['name']}: {seq['type']}, {seq['copies']} copies, chains: {seq['chain_ids']}")
            print(f"    Content: {seq['content'][:50]}{'...' if len(seq['content']) > 50 else ''}")
    else:
        print("❌ No sequences provided")

Setting up Google Drive...
✅ Google Drive connected successfully!
✅ Manual sequences configured:
📁 Job name: Calvin_2720d
  A: protein, 1 copies, chains: ['A']
    Content: MTMSPSTRAIPFASVSVFVHVICRLDAPSPTAANREREVREALLKLGVTL...
  B: protein, 1 copies, chains: ['B']
    Content: MTTELNSFRVRKSEGAEGEEEDPALTELVFKKLRQTFLCPICHRPLQENP...


In [None]:
#@title Cell 4: FASTA Upload Configuration (Skip if using Manual Input)
#@markdown Only run this cell if you selected "FASTA Upload" above

# Google Drive setup for FASTA upload
setup_google_drive = True #@param {type:"boolean"}
#@markdown - Setup Google Drive for automatic result upload
gdrive_folder_name = "Boltz2_Predictions" #@param {type:"string"}
#@markdown - Google Drive folder name

upload_fasta1 = True #@param {type:"boolean"}
#@markdown - Upload FASTA file 1
upload_fasta2 = True #@param {type:"boolean"}
#@markdown - Upload FASTA file 2 (for combinations)
predict_combinations = True #@param {type:"boolean"}
#@markdown - Predict all combinations between FASTA1 and FASTA2 sequences

# Check if this cell should run
if 'global_settings' not in globals():
    print("⚠️  Please run the 'Choose Input Method' cell first")
elif global_settings['input_method'] != "FASTA Upload":
    print("⏭️  Skipping FASTA upload (Manual Input selected)")
else:
    import re
    from google.colab import files

    # Setup Google Drive if requested
    drive = None
    if setup_google_drive:
        try:
            from pydrive2.drive import GoogleDrive
            from pydrive2.auth import GoogleAuth
            from google.colab import auth
            from oauth2client.client import GoogleCredentials

            print("Setting up Google Drive...")
            auth.authenticate_user()
            gauth = GoogleAuth()
            gauth.credentials = GoogleCredentials.get_application_default()
            drive = GoogleDrive(gauth)
            print("✅ Google Drive connected successfully!")
        except Exception as e:
            print(f"❌ Google Drive setup failed: {e}")
            drive = None

    # FASTA processing functions
    def parse_fasta(file_content, file_name):
        """Parse FASTA file content and return list of sequences."""
        sequences = []
        current_seq = ""
        current_id = ""

        for line in file_content.split('\n'):
            line = line.strip()
            if line.startswith('>'):
                if current_seq and current_id:
                    clean_id = re.sub(r'[^\w\-_]', '_', current_id)
                    sequences.append({
                        'id': clean_id,
                        'original_id': current_id,
                        'sequence': current_seq,
                        'source_file': file_name
                    })
                current_id = line[1:]  # Remove '>'
                current_seq = ""
            else:
                current_seq += line

        # Add last sequence
        if current_seq and current_id:
            clean_id = re.sub(r'[^\w\-_]', '_', current_id)
            sequences.append({
                'id': clean_id,
                'original_id': current_id,
                'sequence': current_seq,
                'source_file': file_name
            })

        return sequences

    # Process FASTA uploads
    fasta1_sequences = []
    fasta2_sequences = []

    if upload_fasta1:
        print("Upload FASTA file 1:")
        uploaded_fasta1 = files.upload()
        for filename, content in uploaded_fasta1.items():
            file_content = content.decode('utf-8')
            fasta1_sequences = parse_fasta(file_content, filename)
            print(f"FASTA1 loaded: {len(fasta1_sequences)} sequences from {filename}")

    if upload_fasta2:
        print("Upload FASTA file 2:")
        uploaded_fasta2 = files.upload()
        for filename, content in uploaded_fasta2.items():
            file_content = content.decode('utf-8')
            fasta2_sequences = parse_fasta(file_content, filename)
            print(f"FASTA2 loaded: {len(fasta2_sequences)} sequences from {filename}")

    # Generate batch jobs
    batch_jobs = []
    if predict_combinations and fasta1_sequences and fasta2_sequences:
        print(f"Generating {len(fasta1_sequences)} x {len(fasta2_sequences)} = {len(fasta1_sequences) * len(fasta2_sequences)} combinations")

        for seq1 in fasta1_sequences:
            for seq2 in fasta2_sequences:
                job_name = f"{seq1['id']}_{seq2['id']}"
                batch_jobs.append({
                    'name': job_name,
                    'sequences': [
                        {'id': 'A', 'type': 'protein', 'content': seq1['sequence']},
                        {'id': 'B', 'type': 'protein', 'content': seq2['sequence']}
                    ]
                })

    elif fasta1_sequences and not predict_combinations:
        print(f"Processing {len(fasta1_sequences)} individual sequences from FASTA1")
        for seq in fasta1_sequences:
            job_name = seq['id']
            batch_jobs.append({
                'name': job_name,
                'sequences': [
                    {'id': 'A', 'type': 'protein', 'content': seq['sequence']}
                ]
            })

    # Update global settings
    global_settings.update({
        'batch_jobs': batch_jobs,
        'drive': drive,
        'gdrive_folder_name': gdrive_folder_name
    })

    print(f"✅ FASTA upload configured: {len(batch_jobs)} jobs to process")
    for i, job in enumerate(batch_jobs[:5]):  # Show first 5
        print(f"  Job {i+1}: {job['name']}")
    if len(batch_jobs) > 5:
        print(f"  ... and {len(batch_jobs) - 5} more jobs")

Setting up Google Drive...
✅ Google Drive connected successfully!
Upload FASTA file 1:


Saving I7.fasta to I7.fasta
FASTA1 loaded: 1 sequences from I7.fasta
Upload FASTA file 2:


Saving EFFECTORS_0001-0200.fasta to EFFECTORS_0001-0200.fasta
FASTA2 loaded: 120 sequences from EFFECTORS_0001-0200.fasta
Generating 1 x 120 = 120 combinations
✅ FASTA upload configured: 120 jobs to process
  Job 1: I7_fol_MALH01000055_000095
  Job 2: I7_fol_MALH01000230_000084
  Job 3: I7_fol_MALH01001161_000005
  Job 4: I7_fol_MALH01000039_000052
  Job 5: I7_fol_MALH01000518_000025
  ... and 115 more jobs


In [None]:
#@title Cell 5: MSA Configuration
msa_mode = "mmseqs2_uniref_env" #@param ["mmseqs2_uniref_env", "mmseqs2_uniref","single_sequence","custom"]
#@markdown - MSA generation method. mmseqs2 modes use the ColabFold server

msa_pairing_strategy = "greedy" #@param ["greedy", "complete"]
#@markdown - `greedy` = pair any taxonomically matching subsets, `complete` = all sequences must match

# Check if global_settings exists
if 'global_settings' not in globals():
    print("⚠️  Please run the 'Choose Input Method' cell first")
else:
    # Configure MSA settings based on mode
    if "mmseqs2" in msa_mode:
        use_msa_server = True
        msa_server_url = "https://api.colabfold.com"
    else:
        use_msa_server = False
        msa_server_url = None

    # Handle custom MSA upload if selected
    if msa_mode == "custom":
        print("Upload your custom MSA file (A3M format):")
        from google.colab import files
        custom_msa_dict = files.upload()
        if custom_msa_dict:
            custom_msa_file = list(custom_msa_dict.keys())[0]
            print(f"Custom MSA uploaded: {custom_msa_file}")
        else:
            print("No custom MSA uploaded, switching to single_sequence mode")
            msa_mode = "single_sequence"
            use_msa_server = False

    # Store MSA settings in global_settings
    global_settings.update({
        'msa_mode': msa_mode,
        'msa_pairing_strategy': msa_pairing_strategy,
        'use_msa_server': use_msa_server,
        'msa_server_url': msa_server_url
    })

    print(f"✅ MSA configuration set:")
    print(f"  Mode: {msa_mode}")
    print(f"  Pairing strategy: {msa_pairing_strategy}")
    print(f"  Use MSA server: {use_msa_server}")

✅ MSA configuration set:
  Mode: mmseqs2_uniref_env
  Pairing strategy: greedy
  Use MSA server: True


In [None]:
#@title Cell 6: Advanced Prediction Settings
# Structure Prediction Settings
recycling_steps = 6 #@param {type:"integer"}
#@markdown - **Iterative refinement passes**: Each cycle refines the structure using updated predictions. Higher values improve local geometry and confidence scores. **Time**: ~linear scaling (3 steps = 3x base time). **VRAM**: +20-30% per additional step for intermediate states.

sampling_steps = 200 #@param {type:"integer"}
#@markdown - **Diffusion denoising iterations**: Controls how many steps the diffusion model takes to generate structures from noise. More steps = smoother, higher quality structures. **Time**: Linear scaling (50 steps = 4x faster than 200). **VRAM**: +10-15% for intermediate diffusion states.

diffusion_samples = 5 #@param {type:"integer"}
#@markdown - **Independent structure predictions**: Number of different structures generated per input. More samples increase diversity and reliability of results. **Time**: Linear scaling (5 samples = 5x base time). **VRAM**: Depends on max_parallel_samples setting.

max_parallel_samples = 5 #@param {type:"integer"}
#@markdown - **GPU memory management**: How many diffusion samples are processed simultaneously. Critical for large complexes - each parallel sample requires full model memory allocation. **Time**: Minimal impact on total time. **VRAM**: ~Linear scaling (2 parallel = ~2x memory, 5 parallel = ~5x memory).

step_scale = 1.638 #@param {type:"number"}
#@markdown - **Sampling temperature**: Controls randomness in structure generation. Higher values increase diversity but may reduce quality. 1.638 is optimized default. **Time**: No impact. **VRAM**: No impact.

# Affinity Prediction Settings
predict_affinity = False #@param {type:"boolean"}
#@markdown - **Binding strength prediction**: Runs additional affinity model to predict binding strength (Kd/Ki values). Most reliable for protein-small molecule complexes. **Time**: +50-100% total time. **VRAM**: +40-60% for affinity model loading.

affinity_mw_correction = False #@param {type:"boolean"}
#@markdown - **Molecular weight adjustment**: Applies size-based corrections to affinity predictions. Only affects affinity calculation, not structure. **Time**: Minimal impact. **VRAM**: No impact.

sampling_steps_affinity = 200 #@param {type:"integer"}
#@markdown - **Affinity model diffusion steps**: Controls quality of affinity predictions. Similar to sampling_steps but for the affinity model. **Time**: Linear scaling within affinity prediction. **VRAM**: +5-10% for affinity diffusion states.

diffusion_samples_affinity = 5 #@param {type:"integer"}
#@markdown - **Affinity prediction ensemble size**: Number of independent affinity predictions to average for final binding strength. More samples = more reliable Kd estimates. **Time**: Linear scaling for affinity portion. **VRAM**: Minimal additional impact.

# Output and Optimization Settings
output_format = "mmcif" #@param ["mmcif", "pdb"]
#@markdown - **Structure file format**: mmCIF supports more metadata and modern features, PDB is more widely compatible. Both contain same structural information. **Time**: No impact. **VRAM**: No impact.

write_full_pae = False #@param {type:"boolean"}
#@markdown - **Save Predicted Aligned Error matrix**: Confidence scores between all residue pairs. Essential for assessing interface quality and domain reliability. **Time**: +5-10% for matrix computation and I/O. **VRAM**: +10-20% for large complexes during matrix storage.

write_full_pde = False #@param {type:"boolean"}
#@markdown - **Save Predicted Distance Error matrix**: Distance confidence predictions between residue pairs. Useful for validation and uncertainty quantification. **Time**: +5-10% for matrix computation and I/O. **VRAM**: +10-20% for large complexes during matrix storage.

use_potentials = True #@param {type:"boolean"}
#@markdown - **Inference-time physics optimization**: Applies physics-based energy minimization to improve local geometry and remove clashes. Significantly improves structure quality, especially for interfaces. **Time**: +30-50% total time. **VRAM**: +15-25% for physics calculation buffers.

# Check if global_settings exists
if 'global_settings' not in globals():
    print("⚠️  Please run the 'Choose Input Method' cell first")
else:
    # Store advanced settings
    advanced_settings = {
        'recycling_steps': recycling_steps,
        'sampling_steps': sampling_steps,
        'diffusion_samples': diffusion_samples,
        'max_parallel_samples': max_parallel_samples,
        'step_scale': step_scale,
        'predict_affinity': predict_affinity,
        'affinity_mw_correction': affinity_mw_correction,
        'sampling_steps_affinity': sampling_steps_affinity,
        'diffusion_samples_affinity': diffusion_samples_affinity,
        'output_format': output_format,
        'write_full_pae': write_full_pae,
        'write_full_pde': write_full_pde,
        'use_potentials': use_potentials,
        'max_msa_seqs': 8192,
        'subsample_msa': False,
        'num_subsampled_msa': 1024
    }

    global_settings.update(advanced_settings)

    print("✅ Advanced settings configured:")
    print(f"  Recycling steps: {recycling_steps}")
    print(f"  Sampling steps: {sampling_steps}")
    print(f"  Diffusion samples: {diffusion_samples}")
    print(f"  Predict affinity: {predict_affinity}")
    print(f"  Output format: {output_format}")
    print(f"  Use potentials: {use_potentials}")

✅ Advanced settings configured:
  Recycling steps: 6
  Sampling steps: 200
  Diffusion samples: 5
  Predict affinity: False
  Output format: mmcif
  Use potentials: True


In [None]:
#@title Cell 6.1: Residue Modifications Instructions (Optional)
#@markdown Specify residue modifications for amino acid, DNA, or RNA sequences. Each row should define one modification, with values separated by colons (:). The format is:
#@markdown
#@markdown `SEQ_ID : RESIDUE_INDEX : CCD_CODE`
#@markdown
#@markdown * **SEQ_ID** → The chain ID of the sequence as defined in **Input Sequences**.
#@markdown * **RESIDUE_INDEX** → The residue position to modify. Use **1** for the first residue.
#@markdown * **CCD_CODE** → The **Chemical Component Dictionary (CCD) code** of the modification.
#@markdown
#@markdown **Example Input:**
#@markdown ```
#@markdown A:102:MLY
#@markdown B:1:5MC
#@markdown C:26:PSU
#@markdown ```
#@markdown
#@markdown **Notes:**
#@markdown * Chain IDs (**SEQ_ID**) must match those in **Input Sequences**.
#@markdown * Residue indices start at **1**, not **0**.
#@markdown * Use valid **CCD codes** for modifications, use this resource for information on which CCD codes to use for your modification: https://pmc.ncbi.nlm.nih.gov/articles/PMC11394121/

residue_modifications = '' #@param {type:"string"}
#@markdown - Enter residue modifications (one per line, format: CHAIN_ID:RESIDUE_INDEX:CCD_CODE)

# Process residue modifications
modifications_list = []
if residue_modifications.strip():
    for line in residue_modifications.strip().split('\n'):
        if line.strip():
            parts = line.strip().split(':')
            if len(parts) == 3:
                chain_id, res_idx, ccd_code = parts
                modifications_list.append({
                    'chain_id': chain_id.strip(),
                    'position': int(res_idx.strip()),
                    'ccd': ccd_code.strip()
                })
            else:
                print(f"Invalid modification format: {line}")

print(f"Residue modifications configured: {len(modifications_list)} modifications")
for mod in modifications_list:
    print(f"  Chain {mod['chain_id']}, position {mod['position']}: {mod['ccd']}")

if 'global_settings' in globals() and modifications_list:
    global_settings['modifications_list'] = modifications_list

Residue modifications configured: 0 modifications


In [None]:
#@title Cell 6.2: Pocket Restraints Instructions (Optional)
#@markdown The **Binder Chain** corresponds to the binder chain, while "Contact Residues" specifies residues interacting with it.
#@markdown Specify inter-chain pocket restraints to guide Boltz-2 in folding complexes. These restraints define interactions between a binder sequence and residues in other sequences, influencing the folding process.
#@markdown Each row should define one pocket restraint, with values separated by colons (:). The format is:
#@markdown
#@markdown `CONTACT_CHAIN:CONTACT_RES`
#@markdown
#@markdown * **CONTACT_CHAIN** → The chain containing the interacting residue.
#@markdown * **CONTACT_RES** → The position of the residue on **CONTACT_CHAIN**.
#@markdown
#@markdown **Example Input:**
#@markdown ```
#@markdown A:66
#@markdown A:78
#@markdown B:13
#@markdown ```
#@markdown
#@markdown **Notes:**
#@markdown * Chain names match those in **Input Sequences**.
#@markdown * Residue numbering starts at 1.
#@markdown * The model currently only supports a single binder chain per pocket restraint, but multiple contact residues can be specified across different chains.
#@markdown * The chain name of the binder should only be specified if pocket restraints are being used.

binder_chain = '' #@param {type:"string"}
#@markdown - Specify the chain acting as the binder. See above instructions for more details.
contact_residues = '' #@param {type:"string"}
#@markdown - Specify residues interacting with the binder chain. See above instructions for more details.

# Process pocket restraints
pocket_contacts = []
if contact_residues.strip() and binder_chain.strip():
    for line in contact_residues.strip().split('\n'):
        if line.strip():
            parts = line.strip().split(':')
            if len(parts) == 2:
                contact_chain, contact_res = parts
                pocket_contacts.append({
                    'chain_id': contact_chain.strip(),
                    'residue': int(contact_res.strip())
                })
            else:
                print(f"Invalid contact format: {line}")

if binder_chain.strip():
    print(f"Pocket restraints configured:")
    print(f"  Binder chain: {binder_chain.strip()}")
    print(f"  Contact residues: {len(pocket_contacts)} contacts")
    for contact in pocket_contacts:
        print(f"    Chain {contact['chain_id']}, residue {contact['residue']}")
else:
    print("No pocket restraints configured")

if 'global_settings' in globals() and binder_chain.strip():
    global_settings['binder_chain'] = binder_chain.strip()
    global_settings['pocket_contacts'] = pocket_contacts

No pocket restraints configured


In [None]:
#@title Cell 6.3: Covalent Restraints Instructions (Optional)
#@markdown Specify covalent bonds between atoms to guide Boltz-2 in complex folding. These restraints define fixed interactions between atoms in different sequences, ensuring structural constraints are maintained.
#@markdown Each row should define one covalent restraint, with values separated by colons (:). The format is:
#@markdown
#@markdown `CHAIN_ID1:RES_ID1:ATOM_NAME1:CHAIN_ID2:RES_ID2:ATOM_NAME2`
#@markdown
#@markdown * **CHAIN_ID1** → The chain containing the first atom.
#@markdown * **RES_ID1** → Residue index on **CHAIN_ID1**.
#@markdown * **ATOM_NAME1** → Atom name in **RES_ID1**.
#@markdown * **CHAIN_ID2** → The chain containing the second atom.
#@markdown * **RES_ID2** → Residue index on **CHAIN_ID2**.
#@markdown * **ATOM_NAME2** → Atom name in **RES_ID2**.
#@markdown
#@markdown **Example Input:**
#@markdown ```
#@markdown A:6:CA:B:26:CB
#@markdown C:1:N1:A:45:OG
#@markdown ```
#@markdown
#@markdown **Notes:**
#@markdown * Chain names match those in **Input Sequences**.
#@markdown * Residue numbering starts at 1.
#@markdown * Atom names must match standardized PDB/CIF naming conventions.
#@markdown * Only canonical residues and CCD ligands are supported.
#@markdown * Covalent restraints ensure atoms remain bonded during folding but do not enforce bond angles or torsions.

covalent_restraints = '' #@param {type:"string"}
#@markdown - Specify covalent bonds between atoms. See above instructions for more details.

# Process covalent restraints
covalent_bonds = []
if covalent_restraints.strip():
    for line in covalent_restraints.strip().split('\n'):
        if line.strip():
            parts = line.strip().split(':')
            if len(parts) == 6:
                chain1, res1, atom1, chain2, res2, atom2 = parts
                covalent_bonds.append({
                    'atom1': [chain1.strip(), int(res1.strip()), atom1.strip()],
                    'atom2': [chain2.strip(), int(res2.strip()), atom2.strip()]
                })
            else:
                print(f"Invalid covalent restraint format: {line}")

print(f"Covalent restraints configured: {len(covalent_bonds)} bonds")
for bond in covalent_bonds:
    print(f"  {bond['atom1'][0]}:{bond['atom1'][1]}:{bond['atom1'][2]} - {bond['atom2'][0]}:{bond['atom2'][1]}:{bond['atom2'][2]}")

if 'global_settings' in globals() and covalent_bonds:
    global_settings['covalent_bonds'] = covalent_bonds

Covalent restraints configured: 0 bonds


In [None]:
#@title Run Boltz-2 Prediction (Fixed)
%%time
import subprocess
import os
import zipfile
import shutil
import time
from datetime import datetime, timedelta
from tqdm import tqdm

# Check if global_settings exists and is properly configured
if 'global_settings' not in globals():
    print("❌ Error: Please run the previous configuration cells first")
elif not global_settings.get('sequences') and not global_settings.get('batch_jobs'):
    print("❌ Error: No sequences or batch jobs configured")
    print("Please run either Manual Input or FASTA Upload configuration")
else:
    # GPU verification
    print("🔍 Checking GPU availability...")
    try:
        import torch
        if torch.cuda.is_available():
            print(f"✅ GPU: {torch.cuda.get_device_name(0)} ({torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB)")
        else:
            print("⚠️  WARNING: No GPU detected - predictions will be very slow")
    except ImportError:
        print("❌ PyTorch not available")

    # Helper functions
    def find_or_create_folder(drive, folder_name, parent_id='root'):
        """Find existing folder or create new one in Google Drive."""
        if not drive:
            return None

        try:
            file_list = drive.ListFile({'q': f"title='{folder_name}' and '{parent_id}' in parents and mimeType='application/vnd.google-apps.folder' and trashed=false"}).GetList()

            if file_list:
                print(f"✅ Found existing folder: {folder_name}")
                return file_list[0]['id']
            else:
                folder = drive.CreateFile({
                    'title': folder_name,
                    'mimeType': 'application/vnd.google-apps.folder',
                    'parents': [{'id': parent_id}]
                })
                folder.Upload()
                print(f"✅ Created new folder: {folder_name}")
                return folder['id']
        except Exception as e:
            print(f"❌ Error with folder '{folder_name}': {e}")
            return None

    def upload_to_gdrive(drive, file_path, folder_id, job_name):
        """Upload file to Google Drive folder."""
        if not drive or not os.path.exists(file_path):
            return None

        try:
            uploaded_file = drive.CreateFile({
                'title': os.path.basename(file_path),
                'parents': [{'id': folder_id}]
            })
            uploaded_file.SetContentFile(file_path)
            uploaded_file.Upload()

            file_url = f"https://drive.google.com/file/d/{uploaded_file['id']}/view"
            print(f"✅ Uploaded {job_name} to Google Drive: {file_url}")
            return file_url
        except Exception as e:
            print(f"❌ Upload failed for {job_name}: {e}")
            return None

    def create_results_zip(job_dir, output_filename):
        """Create a zip file with only the prediction outputs."""
        with zipfile.ZipFile(output_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
            # Find actual results directory
            results_dirs = [d for d in os.listdir(job_dir) if d.startswith('boltz_results_')]
            if results_dirs:
                predictions_dir = os.path.join(job_dir, results_dirs[0])
                if os.path.exists(predictions_dir):
                    # Add all files from results directory directly to zip root
                    for root, dirs, files in os.walk(predictions_dir):
                        for file in files:
                            file_path = os.path.join(root, file)
                            # Use relative path from predictions_dir as the archive path
                            arc_path = os.path.relpath(file_path, predictions_dir)
                            zipf.write(file_path, arc_path)

    def run_single_prediction(job_name, sequences_data, settings, is_batch=False, job_num=0, total_jobs=0):
        """Run a single Boltz-2 prediction."""
        job_start_time = time.time()

        print(f"\n{'='*60}")
        if is_batch:
            print(f"🚀 Job {job_num}/{total_jobs}: {job_name}")
        else:
            print(f"🚀 Running prediction: {job_name}")
        print(f"{'='*60}")

        # Setup job directory
        job_dir = job_name
        os.makedirs(job_dir, exist_ok=True)

        # Create input FASTA file
        input_file = os.path.join(job_dir, f"{job_name}.fasta")
        with open(input_file, "w") as f:
            for seq in sequences_data:
                f.write(f">{seq['id']}|{seq['type']}\n{seq['content']}\n")

        # Build Boltz command - ALWAYS include MSA server for reliability
        cmd_parts = [
            "boltz", "predict", input_file,
            "--out_dir", job_dir,
            "--recycling_steps", str(settings.get('recycling_steps', 3)),
            "--sampling_steps", str(settings.get('sampling_steps', 200)),
            "--diffusion_samples", str(settings.get('diffusion_samples', 5)),
            "--max_parallel_samples", str(settings.get('max_parallel_samples', 5)),
            "--step_scale", str(settings.get('step_scale', 1.638)),
            "--output_format", settings.get('output_format', 'mmcif'),
            "--max_msa_seqs", str(settings.get('max_msa_seqs', 8192)),
            "--override"
        ]

        # ALWAYS add MSA server - this was the key missing piece!
        if settings.get('use_msa_server', True):  # Default to True
            cmd_parts.extend([
                "--use_msa_server",
                "--msa_server_url", settings.get('msa_server_url', 'https://api.colabfold.com'),
                "--msa_pairing_strategy", settings.get('msa_pairing_strategy', 'greedy')
            ])

        # Add optional flags
        if settings.get('use_potentials', True):
            cmd_parts.append("--use_potentials")
        if settings.get('write_full_pae', True):
            cmd_parts.append("--write_full_pae")
        if settings.get('write_full_pde', True):
            cmd_parts.append("--write_full_pde")
        if settings.get('predict_affinity', False):
            cmd_parts.extend([
                "--predict_affinity",
                "--sampling_steps_affinity", str(settings.get('sampling_steps_affinity', 200)),
                "--diffusion_samples_affinity", str(settings.get('diffusion_samples_affinity', 5))
            ])
            if settings.get('affinity_mw_correction', False):
                cmd_parts.append("--affinity_mw_correction")

        cmd = " ".join(cmd_parts)
        print(f"📋 Command: {cmd}")

        # Run prediction
        try:
            result = subprocess.run(
                cmd,
                shell=True,
                capture_output=True,
                text=True,
                timeout=7200
            )

            print(f"Return code: {result.returncode}")

            if result.stdout:
                print("STDOUT:")
                print(result.stdout[-1000:])

            if result.stderr:
                print("STDERR:")
                print(result.stderr[-500:])

            if result.returncode == 0:
                # Check for failed examples in stdout
                failed_examples = 0
                if result.stdout and "Number of failed examples:" in result.stdout:
                    import re
                    match = re.search(r"Number of failed examples:\s*(\d+)", result.stdout)
                    if match:
                        failed_examples = int(match.group(1))

                if failed_examples > 0:
                    print(f"❌ {failed_examples} prediction(s) failed - likely out of memory")
                    return False

                # More comprehensive file detection
                results_dirs = [d for d in os.listdir(job_dir) if d.startswith('boltz_results_')]
                if results_dirs:
                    predictions_dir = os.path.join(job_dir, results_dirs[0])
                    print(f"📁 Found results directory: {predictions_dir}")

                    # Search ALL subdirectories for structure files
                    all_structure_files = []
                    structure_extensions = ['.cif', '.pdb', '.mmcif', '.ent', '.cif.gz', '.pdb.gz']

                    for root, dirs, files in os.walk(predictions_dir):
                        for file in files:
                            if any(file.endswith(ext) for ext in structure_extensions):
                                full_path = os.path.join(root, file)
                                rel_path = os.path.relpath(full_path, predictions_dir)
                                all_structure_files.append(rel_path)
                                print(f"🧬 Found structure file: {rel_path}")

                    # Also list ALL files for debugging
                    print(f"📋 All files in results directory:")
                    for root, dirs, files in os.walk(predictions_dir):
                        level = root.replace(predictions_dir, '').count(os.sep)
                        indent = '  ' * level
                        print(f"{indent}{os.path.basename(root)}/")
                        subindent = '  ' * (level + 1)
                        for file in files[:10]:  # Show first 10 files per directory
                            print(f"{subindent}{file}")
                        if len(files) > 10:
                            print(f"{subindent}... and {len(files) - 10} more files")

                    if not all_structure_files:
                        print(f"❌ No structure files found with extensions: {structure_extensions}")
                        return False
                    else:
                        print(f"✅ Found {len(all_structure_files)} structure file(s)")

                    # Create results zip
                    zip_filename = f"{job_name}_results.zip"
                    create_results_zip(job_dir, zip_filename)
                    print(f"📦 Created results archive: {zip_filename}")

                    # Upload to Google Drive if configured
                    if global_settings.get('drive') and gdrive_folder_id:
                        upload_url = upload_to_gdrive(global_settings['drive'], zip_filename, gdrive_folder_id, job_name)
                        if upload_url:
                            uploaded_files.append({'job': job_name, 'url': upload_url})

                    # Cleanup
                    try:
                        shutil.rmtree(job_dir)
                        if os.path.exists(zip_filename):
                            os.remove(zip_filename)
                    except Exception as e:
                        print(f"⚠️  Cleanup warning: {e}")

                    job_duration = time.time() - job_start_time
                    print(f"✅ Job completed in {job_duration:.1f}s")
                    return True
                else:
                    print(f"❌ No results directory found for {job_name}")
                    # Debug: show what directories DO exist
                    existing_dirs = [d for d in os.listdir(job_dir) if os.path.isdir(os.path.join(job_dir, d))]
                    print(f"📁 Existing directories in {job_dir}: {existing_dirs}")
                    return False
            else:
                print(f"❌ Prediction failed for {job_name}")
                return False

        except subprocess.TimeoutExpired:
            print(f"⏰ Prediction timed out for {job_name}")
            return False
        except Exception as e:
            print(f"💥 Error running prediction for {job_name}: {e}")
            return False

    # Setup Google Drive folder
    gdrive_folder_id = None
    uploaded_files = []

    if global_settings.get('drive'):
        gdrive_folder_id = find_or_create_folder(global_settings['drive'], global_settings['gdrive_folder_name'])

    # Run predictions
    start_time = datetime.now()

    if global_settings.get('batch_jobs'):
        # Batch processing
        batch_jobs = global_settings['batch_jobs']
        print(f"\n🚀 Starting batch processing of {len(batch_jobs)} jobs...")
        successful_jobs = 0

        for i, job in enumerate(batch_jobs, 1):
            success = run_single_prediction(
                job['name'],
                job['sequences'],
                global_settings,
                is_batch=True,
                job_num=i,
                total_jobs=len(batch_jobs)
            )

            if success:
                successful_jobs += 1

        print(f"\n🎉 Batch processing completed!")
        print(f"✅ Successful: {successful_jobs}/{len(batch_jobs)} jobs")

    elif global_settings.get('sequences'):
        # Single job processing
        print(f"\n🚀 Starting single prediction...")

        # Convert sequences to expected format
        single_job_sequences = []
        for seq in global_settings['sequences']:
            for chain_id in seq['chain_ids']:
                single_job_sequences.append({
                    'id': chain_id,
                    'type': seq['type'],
                    'content': seq['content']
                })

        if single_job_sequences:
            success = run_single_prediction(
                global_settings['final_jobname'],
                single_job_sequences,
                global_settings,
                is_batch=False
            )
            if success:
                print(f"✅ Single prediction completed successfully!")
            else:
                print(f"❌ Single prediction failed")

    # Summary
    end_time = datetime.now()
    duration = end_time - start_time

    print(f"\n{'='*60}")
    print(f"🏁 PREDICTION SUMMARY")
    print(f"{'='*60}")
    print(f"⏱️  Total duration: {duration}")
    print(f"📁 Google Drive folder: {global_settings['gdrive_folder_name']}")

    if uploaded_files:
        print(f"☁️  Files uploaded to Google Drive: {len(uploaded_files)}")
        for file_info in uploaded_files:
            print(f"   • {file_info['job']}: {file_info['url']}")
    else:
        print("❌ No files were uploaded to Google Drive")

    print(f"{'='*60}")

🔍 Checking GPU availability...
✅ GPU: NVIDIA A100-SXM4-40GB (42.5 GB)
✅ Found existing folder: Boltz2_Predictions

🚀 Starting batch processing of 142 jobs...

🚀 Job 1/142: I7_fol_MALH01000068_000006
📋 Command: boltz predict I7_fol_MALH01000068_000006/I7_fol_MALH01000068_000006.fasta --out_dir I7_fol_MALH01000068_000006 --recycling_steps 6 --sampling_steps 200 --diffusion_samples 5 --max_parallel_samples 5 --step_scale 1.638 --output_format mmcif --max_msa_seqs 8192 --override --use_msa_server --msa_server_url https://api.colabfold.com --msa_pairing_strategy greedy --use_potentials
Return code: 0
STDOUT:
Downloading and extracting the CCD data to /root/.boltz/mols. This may take a bit of time. You may change the cache directory with the --cache flag.
Downloading the Boltz-2 weights to /root/.boltz/boltz2_conf.ckpt. You may change the cache directory with the --cache flag.
Downloading the Boltz-2 affinity weights to /root/.boltz/boltz2_aff.ckpt. You may change the cache directory with th

In [None]:
#@title Debug GPU and Boltz Setup
import torch
import os
import subprocess
import glob

print("=== GPU DIAGNOSIS ===")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"GPU count: {torch.cuda.device_count()}")

if torch.cuda.is_available():
    print(f"GPU name: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

    # Test GPU actually works
    try:
        test_tensor = torch.randn(1000, 1000).cuda()
        result = torch.mm(test_tensor, test_tensor)
        print("✅ GPU tensor operations working")
        torch.cuda.empty_cache()
    except Exception as e:
        print(f"❌ GPU test failed: {e}")
else:
    print("❌ CUDA not available!")

print("\n=== ENVIRONMENT CHECK ===")
print(f"CUDA_VISIBLE_DEVICES: {os.environ.get('CUDA_VISIBLE_DEVICES', 'Not set')}")
print(f"Working directory: {os.getcwd()}")

print("\n=== BOLTZ VERSION INFO ===")
try:
    result = subprocess.run(["boltz", "--version"], capture_output=True, text=True, timeout=10)
    print(f"Boltz version stdout: {result.stdout}")
    print(f"Boltz version stderr: {result.stderr}")
except Exception as e:
    print(f"Error getting Boltz version: {e}")

print("\n=== BOLTZ HELP CHECK ===")
try:
    result = subprocess.run(["boltz", "--help"], capture_output=True, text=True, timeout=10)
    print("Boltz help (first 500 chars):")
    print(result.stdout[:500])
except Exception as e:
    print(f"Error getting Boltz help: {e}")

print("\n=== EXISTING DIRECTORIES CHECK ===")
current_dirs = [d for d in os.listdir('.') if os.path.isdir(d)]
boltz_dirs = [d for d in current_dirs if 'boltz_results_' in d or any(f.startswith('boltz_') for f in os.listdir(d) if os.path.isdir(os.path.join(d, f)))]
print(f"Current directories: {current_dirs[:10]}...")
print(f"Boltz-related directories: {boltz_dirs}")

if boltz_dirs:
    for bdir in boltz_dirs[:3]:  # Check first 3
        print(f"\n--- Examining {bdir} ---")
        for root, dirs, files in os.walk(bdir):
            level = root.replace(bdir, '').count(os.sep)
            indent = ' ' * 2 * level
            print(f"{indent}{os.path.basename(root)}/")
            subindent = ' ' * 2 * (level + 1)
            for file in files[:5]:  # Show first 5 files
                print(f"{subindent}{file}")
            if len(files) > 5:
                print(f"{subindent}... and {len(files) - 5} more files")
            if level > 3:  # Limit depth
                break

print("\n=== MEMORY CHECK ===")
try:
    result = subprocess.run(["nvidia-smi"], capture_output=True, text=True, timeout=10)
    print("nvidia-smi output:")
    print(result.stdout)
except Exception as e:
    print(f"Error running nvidia-smi: {e}")

print("\n=== PYTHON PACKAGES CHECK ===")
import pkg_resources
packages_of_interest = ['torch', 'boltz', 'pytorch-lightning', 'torchmetrics']
for package in packages_of_interest:
    try:
        version = pkg_resources.get_distribution(package).version
        print(f"{package}: {version}")
    except:
        print(f"{package}: Not found")

print("\n=== CACHE DIRECTORY CHECK ===")
cache_dirs = [
    os.path.expanduser("~/.boltz"),
    "/root/.boltz",
    "/tmp/.boltz",
    "./.boltz"
]
for cache_dir in cache_dirs:
    if os.path.exists(cache_dir):
        print(f"Found cache dir: {cache_dir}")
        try:
            cache_files = os.listdir(cache_dir)
            print(f"  Files: {cache_files}")
        except:
            print(f"  Cannot list files in {cache_dir}")
    else:
        print(f"No cache dir: {cache_dir}")

print("\n=== MANUAL BOLTZ TEST ===")
print("Running a minimal Boltz test...")
test_fasta_content = """>test|protein
MKLLVLSLSLVLVVVSSQE
"""

test_file = "test_minimal.fasta"
with open(test_file, "w") as f:
    f.write(test_fasta_content)

print(f"Created test file: {test_file}")

# Run minimal Boltz command
test_cmd = f"boltz predict {test_file} --out_dir test_minimal_out --recycling_steps 1 --sampling_steps 10 --diffusion_samples 1"
print(f"Test command: {test_cmd}")

try:
    result = subprocess.run(test_cmd, shell=True, capture_output=True, text=True, timeout=120)
    print(f"Test return code: {result.returncode}")
    print(f"Test stdout (last 500 chars): {result.stdout[-500:]}")
    print(f"Test stderr (last 300 chars): {result.stderr[-300:]}")

    # Check what was created
    if os.path.exists("test_minimal_out"):
        print("\nTest output directory contents:")
        for root, dirs, files in os.walk("test_minimal_out"):
            level = root.replace("test_minimal_out", "").count(os.sep)
            indent = ' ' * 2 * level
            print(f"{indent}{os.path.basename(root)}/")
            subindent = ' ' * 2 * (level + 1)
            for file in files:
                print(f"{subindent}{file}")
    else:
        print("No test output directory created")

except subprocess.TimeoutExpired:
    print("Test command timed out")
except Exception as e:
    print(f"Test command failed: {e}")

# Cleanup
try:
    os.remove(test_file)
    import shutil
    if os.path.exists("test_minimal_out"):
        shutil.rmtree("test_minimal_out")
except:
    pass

print("\n=== DIAGNOSIS COMPLETE ===")

=== GPU DIAGNOSIS ===
PyTorch version: 2.7.1+cu126
CUDA available: True
CUDA version: 12.6
GPU count: 1
GPU name: NVIDIA A100-SXM4-40GB
GPU memory: 42.5 GB
✅ GPU tensor operations working

=== ENVIRONMENT CHECK ===
CUDA_VISIBLE_DEVICES: Not set
Working directory: /content

=== BOLTZ VERSION INFO ===
Boltz version stdout: 
Boltz version stderr: Usage: boltz [OPTIONS] COMMAND [ARGS]...
Try 'boltz --help' for help.

Error: No such option: --version


=== BOLTZ HELP CHECK ===
Boltz help (first 500 chars):
Usage: boltz [OPTIONS] COMMAND [ARGS]...

  Boltz.

Options:
  --help  Show this message and exit.

Commands:
  predict  Run predictions with Boltz.


=== EXISTING DIRECTORIES CHECK ===
Current directories: ['.config', 'sample_data']...
Boltz-related directories: []

=== MEMORY CHECK ===
nvidia-smi output:
Thu Jun 26 10:33:18 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15