<a href="https://colab.research.google.com/github/Fahad8389/RFantibody-Colab/blob/main/RFantibody_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Google Colab Setup

**GPU Required:** Before running, enable GPU runtime:
1. Go to **Runtime ‚Üí Change runtime type**
2. Select **T4 GPU** (or better)
3. Click **Save**

In [None]:
#@title 1. Clone RFantibody Repository
#@markdown This will clone the RFantibody repository from GitHub.

#@title 1. Clone RFantibody Repository
import os

# Clone the repository
!git clone https://github.com/RosettaCommons/RFantibody.git
os.chdir("/content/RFantibody")

print("‚úì Repository cloned")
print("Current directory:", os.getcwd())

In [None]:
#@title 2. Download Model Weights
#@markdown Downloads the required model weights (~1.5 GB). This may take a few minutes.

#@title 2. Download Model Weights
# Download all weights
!bash include/download_weights.sh

# Copy to expected locations
!sudo mkdir -p /home/weights
!sudo cp /content/RFantibody/weights/* /home/weights/
!sudo cp /home/weights/RF2_ab.pt /home/weights/RFab_overall_best.pt

print("‚úì Weights downloaded")
!ls /home/weights/

In [None]:
#@title 3. Setup Environment (Official UV Method)
#@markdown Installs dependencies using UV package manager. This may take several minutes.

import os

os.chdir("/content/RFantibody")

# Install uv package manager
!curl -LsSf https://astral.sh/uv/install.sh | sh

# Add uv to PATH for this session
os.environ['PATH'] = f"/root/.local/bin:{os.environ['PATH']}"

# Sync environment (downloads Python 3.10 + all dependencies automatically)
!uv sync

# Clone and build USalign if missing
if not os.path.exists("/content/RFantibody/include/USalign"):
    !git clone https://github.com/pylelab/USalign.git /content/RFantibody/include/USalign

os.chdir("/content/RFantibody/include/USalign")
!make
os.chdir("/content/RFantibody")

# Verify installation
!uv run python -c "import torch; print(f'PyTorch: {torch.__version__}')"
!uv run python -c "import rfantibody; print('‚úì RFantibody imported successfully')"

print("\n‚úì Setup complete!")

# RFantibody: Structure-Based *De Novo* Antibody Design

| Step | Model | Purpose |
|------|-------|--------|
| 1 | RFdiffusion (Ab) | Generate antibody-target docks |
| 2 | ProteinMPNN | Assign sequences to CDR loops |
| 3 | RF2 (Ab) | Predict structure and validate |

### Pipeline Flow
```
Target + Framework ‚Üí RFdiffusion ‚Üí ProteinMPNN ‚Üí RF2
```

---
## Input Configuration

Upload your PDB files and set parameters below.

In [None]:
#@title 4. Create Input/Output Directories
#@markdown Creates folders to store your input files and results.

import os

os.makedirs("/content/inputs", exist_ok=True)
os.makedirs("/content/outputs", exist_ok=True)

In [None]:
#@title 5. Set Input Files
  #@markdown ### Upload Instructions:
  #@markdown 1. Click the folder icon on the left sidebar
  #@markdown 2. Navigate to `/content/inputs/`
  #@markdown 3. Right-click ‚Üí Upload ‚Üí select your file
  #@markdown ---

MODE = "nanobody" #@param ["nanobody", "antibody"]
TARGET_PDB = "/content/inputs/target_MAGEB1.pdb" #@param {type:"string"}
TARGET_CHAIN = "B" #@param {type:"string"}
FRAMEWORK_PDB = "/content/inputs/your_framework.pdb" #@param {type:"string"}

#@markdown > **Note:** For nanobody mode, the default framework is used automatically.

import os

if not os.path.exists(TARGET_PDB):
      raise FileNotFoundError(f"‚ùå File not found: {TARGET_PDB}\n   Upload your file first, then re-run this cell.")

with open(TARGET_PDB, 'r') as f:
      content = f.read()
content = content.replace(f' {TARGET_CHAIN} ', ' T ')
TARGET_PDB_RENAMED = "/content/inputs/target_T.pdb"
with open(TARGET_PDB_RENAMED, 'w') as f:
      f.write(content)

if MODE == "nanobody":
      FRAMEWORK_PDB = "/content/RFantibody/scripts/examples/example_inputs/h-NbBCII10.pdb"

print(f"‚úì Mode: {MODE}")
print(f"‚úì Chain {TARGET_CHAIN} renamed to T")
print(f"‚úì Target: {TARGET_PDB_RENAMED}")
print(f"‚úì Framework: {FRAMEWORK_PDB}")

In [None]:
#@title 6. Design Parameters
  #@markdown ---
  #@markdown ### Hotspot Residues (Epitope)
  #@markdown These are the residues on your target where you want the antibody to bind.
  #@markdown
  #@markdown **How to find them:**
  #@markdown 1. Open your target PDB in ChimeraX or PyMOL
  #@markdown 2. Identify the residues at your desired binding site
  #@markdown 3. Note the residue numbers (e.g., 195, 197, 256)
  #@markdown
  #@markdown **Format:** `[TResNum,TResNum,...]` ‚Äî Use `T` prefix (we renamed your chain to T)
  #@markdown
  #@markdown **Example:** `[T195,T197,T256]`
HOTSPOT_RES = "[T195,T197,T254,T255,T256,T277,T279]" #@param {type:"string"}

  #@markdown ---
  #@markdown ### CDR Loops to Design
  #@markdown These are the antibody loops that will be designed to bind your target.
  #@markdown
  #@markdown | Mode | Loops | Description |
  #@markdown |------|-------|-------------|
  #@markdown | Nanobody | `[H1:7,H2:6,H3:5-13]` | 3 heavy chain loops only |
  #@markdown | Antibody | `[L1:8-13,L2:7,L3:9-11,H1:7,H2:6,H3:5-13]` | 6 loops |
  #@markdown
  #@markdown **Format:** `[LoopName:length]` or `[LoopName:min-max]` for variable length
DESIGN_LOOPS = "[H1:7,H2:6,H3:5-13]" #@param ["[H1:7,H2:6,H3:5-13]", "[L1:8-13,L2:7,L3:9-11,H1:7,H2:6,H3:5-13]"] {allow-input: true}

  #@markdown ---
  #@markdown ### Number of Designs
  #@markdown How many different antibody designs to generate.
  #@markdown
  #@markdown **Tip:** Start with 5-10 for testing, increase to 50+ for production runs.
NUM_DESIGNS = 5 #@param {type:"slider", min:1, max:50, step:1}

print(f"‚úì Hotspot residues: {HOTSPOT_RES}")
print(f"‚úì Design loops: {DESIGN_LOOPS}")
print(f"‚úì Number of designs: {NUM_DESIGNS}")

---
## Section 1: RFdiffusion

In [None]:
#@title 7. Run RFdiffusion
  #@markdown ---
  #@markdown ### What this does:
  #@markdown Generates antibody/nanobody backbones that dock to your target protein.
  #@markdown
  #@markdown **This step may take 5-15 minutes** depending on the number of designs.
  #@markdown ---

import os

os.chdir("/content/RFantibody")
os.makedirs("/content/outputs", exist_ok=True)

print("Running RFdiffusion...\n")

# Remove brackets from parameters for CLI
loops = DESIGN_LOOPS.strip('[]')
hotspots = HOTSPOT_RES.strip('[]')

!uv run rfdiffusion -t {TARGET_PDB_RENAMED} -f {FRAMEWORK_PDB} -o /content/outputs/ab_des -n {NUM_DESIGNS} -l "{loops}" -h "{hotspots}"

print(f"\n‚úì RFdiffusion Complete!")
print(f"Generated files:")
for f in sorted(os.listdir("/content/outputs")):
      if f.endswith('.pdb'):
          print(f"  üìÑ {f}")

---
## Section 2: ProteinMPNN

In [None]:
 #@title 8. Run ProteinMPNN
  #@markdown ---
  #@markdown ### What this does:
  #@markdown Designs amino acid sequences for the generated backbones from RFdiffusion.
  #@markdown
  #@markdown This converts the 3D backbone structures into actual protein sequences
  #@markdown that can be synthesized in the lab.
  #@markdown
  #@markdown ---

import os

os.chdir("/content/RFantibody")
os.makedirs("/content/outputs/mpnn", exist_ok=True)

print("Running ProteinMPNN...\n")

!uv run proteinmpnn -i /content/outputs -o /content/outputs/mpnn -n 1

print(f"\n‚úì ProteinMPNN Complete!")
print(f"Generated files:")
for f in sorted(os.listdir("/content/outputs/mpnn")):
      if f.endswith('.pdb'):
          print(f"  üìÑ {f}")

# Show sequences
!uv run pip install -q biopython
from Bio.PDB import PDBParser

parser = PDBParser(QUIET=True)
print("\n=== Designed Sequences ===\n")
for f in sorted(os.listdir("/content/outputs/mpnn")):
      if f.endswith('.pdb'):
          print(f"--- {f} ---")
          structure = parser.get_structure("ab", f"/content/outputs/mpnn/{f}")
          for chain in structure[0]:
              seq = ''.join([res.resname for res in chain])
              print(f"  Chain {chain.id}: {len(list(chain))} residues")
          print()

---
## Section 3: RF2

In [None]:
#@title 9. RF2 Validation
  #@markdown ---
  #@markdown ### What this does:
  #@markdown Validates the designed structures using RoseTTAFold2.
  #@markdown
  #@markdown **Filtering criteria:**
  #@markdown - **pAE < 10** ‚Äî Predicted Aligned Error (confidence score)
  #@markdown - **RMSD < 2√Ö** ‚Äî Structural similarity to designed backbone
  #@markdown
  #@markdown Only designs that pass both filters will appear in the output.
  #@markdown If no files appear, try generating more designs in step 7.
  #@markdown
  #@markdown ---

import os

os.chdir("/content/RFantibody")
os.makedirs("/content/outputs/rf2", exist_ok=True)

print("Running RF2 Validation...\n")

!uv run rf2 -i /content/outputs/mpnn -o /content/outputs/rf2 -r 3

print(f"\n‚úì RF2 Validation Complete!")
print(f"Generated files:")
files = [f for f in os.listdir("/content/outputs/rf2") if f.endswith('.pdb')]
if files:
      for f in sorted(files):
          print(f"  ‚úÖ {f}")
      print(f"\n{len(files)} design(s) passed validation!")
else:
      print("  ‚ö†Ô∏è No files passed filtering (pAE < 10, RMSD < 2)")
      print("  üí° Try increasing NUM_DESIGNS and re-running from step 7")



---


**Sequence Extraction with CDR Annotation**




In [None]:
#@title 10. Extract & Annotate Sequences
  #@markdown ---
  #@markdown ### What this does:
  #@markdown Extracts amino acid sequences from validated designs and annotates CDR regions.
  #@markdown
  #@markdown **Output includes:**
  #@markdown - Full VH/VHH sequences (and VL for antibodies)
  #@markdown - CDR1, CDR2, CDR3 regions (Chothia numbering)
  #@markdown - Framework regions (FR1-FR4)
  #@markdown - Excel file with all annotations
  #@markdown
  #@markdown ---
  #@markdown ### Configuration
  #@markdown > **Note:** This should match the MODE you selected in Step 5.

EXTRACTION_MODE = "nanobody" #@param ["nanobody", "antibody"]

  #@markdown ---

!pip install -q abnumber biopython pandas openpyxl

import os
import glob
import pandas as pd
from Bio.PDB import PDBParser
from Bio.SeqUtils import seq1
from abnumber import Chain

RF2_OUTPUT_DIR = "/content/outputs/rf2"

def extract_sequence_from_pdb(pdb_file, chain_id):
      parser = PDBParser(QUIET=True)
      structure = parser.get_structure("protein", pdb_file)
      for model in structure:
          for chain in model:
              if chain.id == chain_id:
                  residues = [res for res in chain if res.id[0] == ' ']
                  sequence = ''.join([seq1(res.resname) for res in residues])
                  return sequence
      return None

def annotate_sequence(sequence, chain_type='H'):
      try:
          chain = Chain(sequence, scheme='chothia', chain_type=chain_type)
          cdr1_range = range(26, 33)
          cdr2_range = range(52, 57)
          cdr3_range = range(95, 103)
          fr1, cdr1, fr2, cdr2, fr3, cdr3, fr4 = [], [], [], [], [], [], []
          for pos, aa in chain:
              pos_num = pos.number
              if pos_num < 26:
                  fr1.append(aa)
              elif pos_num in cdr1_range:
                  cdr1.append(aa)
              elif pos_num < 52:
                  fr2.append(aa)
              elif pos_num in cdr2_range:
                  cdr2.append(aa)
              elif pos_num < 95:
                  fr3.append(aa)
              elif pos_num in cdr3_range:
                  cdr3.append(aa)
              else:
                  fr4.append(aa)
          regions = {
              'FR1': ''.join(fr1), 'CDR1': ''.join(cdr1),
              'FR2': ''.join(fr2), 'CDR2': ''.join(cdr2),
              'FR3': ''.join(fr3), 'CDR3': ''.join(cdr3),
              'FR4': ''.join(fr4)
          }
          return regions, True
      except Exception as e:
          print(f"    Warning: AbNumber failed - {e}")
          return None, False

def validate_length(sequence, chain_type='H'):
      length = len(sequence)
      if chain_type == 'H':
          return f"{length} residues ‚úì" if 100 <= length <= 140 else f"{length} residues ‚ö† (expected 100-140)"
      else:
          return f"{length} residues ‚úì" if 100 <= length <= 130 else f"{length} residues ‚ö† (expected 100-130)"

pdb_files = sorted(glob.glob(os.path.join(RF2_OUTPUT_DIR, "*.pdb")))

if not pdb_files:
      print(f"‚ùå No PDB files found in {RF2_OUTPUT_DIR}")
      print("   Make sure you've run RF2 validation first!")
else:
      print(f"Found {len(pdb_files)} RF2 output files\n")

all_sequences = []

for pdb_file in pdb_files:
      design_name = os.path.basename(pdb_file).replace('.pdb', '')
      print("‚ïê" * 60)
      print(f"DESIGN: {design_name}")
      print("‚ïê" * 60)
      design_data = {'Design': design_name}
      vh_seq = extract_sequence_from_pdb(pdb_file, 'H')
      if vh_seq:
          vh_valid = validate_length(vh_seq, 'H')
          print(f"\nHEAVY CHAIN (VH/VHH): {vh_valid}")
          print(f"\n>>{design_name}_VH")
          print(vh_seq)
          vh_regions, success = annotate_sequence(vh_seq, 'H')
          if success:
              print(f"\nREGION BREAKDOWN (Chothia):")
              print(f"‚îú‚îÄ FR1:  {vh_regions['FR1']}")
              print(f"‚îú‚îÄ CDR1: {vh_regions['CDR1']}")
              print(f"‚îú‚îÄ FR2:  {vh_regions['FR2']}")
              print(f"‚îú‚îÄ CDR2: {vh_regions['CDR2']}")
              print(f"‚îú‚îÄ FR3:  {vh_regions['FR3']}")
              print(f"‚îú‚îÄ CDR3: {vh_regions['CDR3']}")
              print(f"‚îî‚îÄ FR4:  {vh_regions['FR4']}")
              print(f"\nCDRs ONLY (for analysis tools):")
              print(f"  CDR-H1: {vh_regions['CDR1']}")
              print(f"  CDR-H2: {vh_regions['CDR2']}")
              print(f"  CDR-H3: {vh_regions['CDR3']}")
              design_data['VH_Full'] = vh_seq
              design_data['VH_Length'] = len(vh_seq)
              design_data['VH_FR1'] = vh_regions['FR1']
              design_data['VH_CDR1'] = vh_regions['CDR1']
              design_data['VH_FR2'] = vh_regions['FR2']
              design_data['VH_CDR2'] = vh_regions['CDR2']
              design_data['VH_FR3'] = vh_regions['FR3']
              design_data['VH_CDR3'] = vh_regions['CDR3']
              design_data['VH_FR4'] = vh_regions['FR4']
          else:
              design_data['VH_Full'] = vh_seq
              design_data['VH_Length'] = len(vh_seq)
      if EXTRACTION_MODE == "antibody":
          vl_seq = extract_sequence_from_pdb(pdb_file, 'L')
          if vl_seq:
              vl_valid = validate_length(vl_seq, 'L')
              print(f"\nLIGHT CHAIN (VL): {vl_valid}")
              print(f"\n>>{design_name}_VL")
              print(vl_seq)
              vl_regions, success = annotate_sequence(vl_seq, 'L')
              if success:
                  print(f"\nREGION BREAKDOWN (Chothia):")
                  print(f"‚îú‚îÄ FR1:  {vl_regions['FR1']}")
                  print(f"‚îú‚îÄ CDR1: {vl_regions['CDR1']}")
                  print(f"‚îú‚îÄ FR2:  {vl_regions['FR2']}")
                  print(f"‚îú‚îÄ CDR2: {vl_regions['CDR2']}")
                  print(f"‚îú‚îÄ FR3:  {vl_regions['FR3']}")
                  print(f"‚îú‚îÄ CDR3: {vl_regions['CDR3']}")
                  print(f"‚îî‚îÄ FR4:  {vl_regions['FR4']}")
                  print(f"\nCDRs ONLY:")
                  print(f"  CDR-L1: {vl_regions['CDR1']}")
                  print(f"  CDR-L2: {vl_regions['CDR2']}")
                  print(f"  CDR-L3: {vl_regions['CDR3']}")
                  design_data['VL_Full'] = vl_seq
                  design_data['VL_Length'] = len(vl_seq)
                  design_data['VL_FR1'] = vl_regions['FR1']
                  design_data['VL_CDR1'] = vl_regions['CDR1']
                  design_data['VL_FR2'] = vl_regions['FR2']
                  design_data['VL_CDR2'] = vl_regions['CDR2']
                  design_data['VL_FR3'] = vl_regions['FR3']
                  design_data['VL_CDR3'] = vl_regions['CDR3']
                  design_data['VL_FR4'] = vl_regions['FR4']
              else:
                  design_data['VL_Full'] = vl_seq
                  design_data['VL_Length'] = len(vl_seq)
      all_sequences.append(design_data)
      print()

if all_sequences:
      sequences_df = pd.DataFrame(all_sequences)
      output_file = "/content/outputs/sequences_annotated.xlsx"
      sequences_df.to_excel(output_file, index=False)
      print("‚ïê" * 60)
      print("SUMMARY")
      print("‚ïê" * 60)
      print(f"‚úì Extracted {len(all_sequences)} designs")
      print(f"‚úì Saved to: {output_file}")
      print(f"\nColumns in Excel:")
      for col in sequences_df.columns:
          print(f"  ‚Ä¢ {col}")
      print(f"\nüìã Variable 'sequences_df' ready for analysis cells")

---
## Section 4: Results

In [None]:
#@title 11. Download Results
  #@markdown ---
  #@markdown ### Choose what to download:
  #@markdown | Option | Contents |
  #@markdown |--------|----------|
  #@markdown | RF2 | Final validated antibodies |
  #@markdown | RF2+MPNN | Validated + sequence designs |
  #@markdown | All | RFdiffusion + ProteinMPNN + RF2 |
  #@markdown ---

DOWNLOAD_OPTION = "RF2" #@param ["RF2", "RF2+MPNN", "All"]

import shutil
import os
from google.colab import files

os.makedirs("/content/download_temp", exist_ok=True)
count = 0

rf2_count = 0
for f in os.listdir("/content/outputs/rf2"):
      if f.endswith('.pdb'):
          shutil.copy(f"/content/outputs/rf2/{f}",
  "/content/download_temp/")
          rf2_count += 1
print(f"‚úì Added {rf2_count} RF2 files")
count += rf2_count

if DOWNLOAD_OPTION in ["RF2+MPNN", "All"]:
      mpnn_count = 0
      for f in os.listdir("/content/outputs/mpnn"):
          if f.endswith('.pdb'):
              shutil.copy(f"/content/outputs/mpnn/{f}",
  "/content/download_temp/")
              mpnn_count += 1
      print(f"‚úì Added {mpnn_count} ProteinMPNN files")
      count += mpnn_count

if DOWNLOAD_OPTION == "All":
      rfd_count = 0
      for f in os.listdir("/content/outputs"):
          if f.endswith('.pdb'):
              shutil.copy(f"/content/outputs/{f}",
  "/content/download_temp/")
              rfd_count += 1
      print(f"‚úì Added {rfd_count} RFdiffusion files")
      count += rfd_count

if os.path.exists("/content/outputs/sequences_annotated.xlsx"):
      shutil.copy("/content/outputs/sequences_annotated.xlsx",
  "/content/download_temp/")
      print("‚úì Added sequences_annotated.xlsx")

shutil.make_archive('/content/rfantibody_results', 'zip',
  '/content/download_temp')
files.download('/content/rfantibody_results.zip')

print(f"\nüì¶ Total: {count} PDB files downloaded")
shutil.rmtree("/content/download_temp")