# üß¨ PepDesign on Google Colab

**Run real AI-powered peptide design with ProteinMPNN & RFdiffusion**

---

## ‚öôÔ∏è Setup Instructions

1. **Enable GPU**: `Runtime ‚Üí Change runtime type ‚Üí GPU (T4)`
2. **Run all cells**: `Runtime ‚Üí Run all` or run each cell with `Shift+Enter`
3. **Wait ~5 minutes**: Installation + design takes a few minutes

---

## üì¶ Cell 1: Install PepDesign

In [None]:
print("Installing PepDesign...")

# Clone repository
!git clone https://github.com/duyjimmypham/pepdesign.git
%cd pepdesign

# Install dependencies
!pip install -q pydantic biopython pandas pandarallel pdbfixer openmm

print("\n‚úÖ PepDesign installed successfully")

## üî¨ Cell 2: Install ProteinMPNN

In [None]:
print("Installing ProteinMPNN...")

# Clone ProteinMPNN
!git clone https://github.com/dauparas/ProteinMPNN.git /content/ProteinMPNN
%cd /content/ProteinMPNN

# Download model weights (~50MB)
!mkdir -p ca_model_weights
!wget -q https://github.com/dauparas/ProteinMPNN/raw/main/ca_model_weights/v_48_020.pt -O ca_model_weights/v_48_020.pt

print("\n‚úÖ ProteinMPNN ready")
!ls -lh ca_model_weights/v_48_020.pt

## üß¨ Cell 3: Install RFdiffusion (Optional - for Real Backbone Generation)

**Note**: This takes ~10-15 minutes. If skipped, the pipeline uses 'stub' mode for backbones (toy macrocycles).

In [None]:
# Install RFdiffusion (Uncomment to enable)
# print("Installing RFdiffusion (this takes ~15 mins)...")
# !git clone https://github.com/RosettaCommons/RFdiffusion.git /content/RFdiffusion
# !pip install -q dgl==1.1.2+cu118 -f https://data.dgl.ai/wheels/cu118/repo.html
# !pip install -q torch==2.0.1+cu118 -f https://download.pytorch.org/whl/torch_stable.html
# !pip install -q e3nn
# %cd /content/RFdiffusion
# !pip install -q -e .
# !mkdir -p models && cd models && wget -q http://files.ipd.uw.edu/pub/RFdiffusion/6f5902ac237024bdd0c176cb93063dc4/Base_ckpt.pt
# !wget -q http://files.ipd.uw.edu/pub/RFdiffusion/e29311f6f1bf1af907f9ef9f44b8328b/Complex_base_ckpt.pt
# print("‚úÖ RFdiffusion installed")

## üéØ Cell 4: Create Test Target (or Upload Your Own)

In [None]:
# Option A: Use test target
test_pdb = """ATOM      1  N   ALA A   1      10.000  10.000  10.000  1.00  0.00           N
ATOM      2  CA  ALA A   1      11.000  10.000  10.000  1.00  0.00           C
ATOM      3  C   ALA A   1      11.500  11.000  10.000  1.00  0.00           C
ATOM      4  O   ALA A   1      11.000  12.000  10.000  1.00  0.00           O
ATOM      5  N   GLY A   2      12.500  11.000  10.000  1.00  0.00           N
ATOM      6  CA  GLY A   2      13.000  12.000  10.000  1.00  0.00           C
ATOM      7  C   GLY A   2      13.500  13.000  10.000  1.00  0.00           C
ATOM      8  O   GLY A   2      13.000  14.000  10.000  1.00  0.00           O
END
"""

with open('/content/test_target.pdb', 'w') as f:
    f.write(test_pdb)

TARGET_PDB = '/content/test_target.pdb'
print(f"‚úÖ Using test target: {TARGET_PDB}")

# Option B: Upload your own PDB (uncomment to use)
# from google.colab import files
# uploaded = files.upload()
# TARGET_PDB = list(uploaded.keys())[0]
# print(f"‚úÖ Uploaded: {TARGET_PDB}")

## üöÄ Cell 5: Run PepDesign Pipeline

**This cell uses REAL ProteinMPNN for sequence design!**

In [None]:
import sys
sys.path.append('/content/pepdesign')

from pepdesign.pipeline import PepDesignPipeline
from pepdesign.config import (
    PipelineConfig, GlobalConfig, TargetConfig,
    BackboneConfig, DesignConfig, ScoringConfig
)

# Configure pipeline
config = PipelineConfig(
    global_settings=GlobalConfig(
        output_dir="/content/output",
        seed=42
    ),
    target=TargetConfig(
        pdb_path=TARGET_PDB,
        mode="de_novo",
        target_chain="A",
        binding_site_residues=[1, 2]  # Adjust for your target
    ),
    backbone=BackboneConfig(
        generator_type="stub",  # Change to "rfdiffusion" if installed above
        num_backbones=3,
        peptide_length=10
    ),
    design=DesignConfig(
        designer_type="protein_mpnn",  # üî• REAL ProteinMPNN!
        num_sequences_per_backbone=5
    ),
    scoring=ScoringConfig(
        charge_min=-2.0,
        charge_max=2.0
    )
)

# Run pipeline
print("="*60)
print("Running PepDesign with REAL ProteinMPNN...")
print("="*60)

pipeline = PepDesignPipeline(config)
pipeline.run()

print("\n" + "="*60)
print("‚úÖ PIPELINE COMPLETE!")
print("="*60)

## üìä Cell 6: View Results

In [None]:
import pandas as pd

# Load ranked sequences
df = pd.read_csv('/content/output/ranking/ranked.csv')

print("\nüèÜ Top 10 Designed Peptide Sequences:\n")
print(df[['design_id', 'peptide_seq', 'charge', 'hydrophobicity', 'composite_score']].head(10))

print(f"\nüìÅ Total sequences designed: {len(df)}")
print(f"‚úÖ Best sequence: {df.iloc[0]['peptide_seq']}")
print(f"   Score: {df.iloc[0]['composite_score']:.3f}")

## üì• Cell 7: Download Results

In [None]:
# Zip all results
!zip -r -q /content/pepdesign_results.zip /content/output/

# Download
from google.colab import files
files.download('/content/pepdesign_results.zip')

print("‚úÖ Results downloaded!")
print("\nContents:")
print("  - ranking/ranked.csv - Top sequences")
print("  - designs/sequences.csv - All sequences")
print("  - backbones/*.pdb - Generated backbones")
print("  - report.html - Interactive visualization")

---

## üéâ Success!

You've just run **real AI-powered peptide design** using ProteinMPNN!

### What just happened?

1. ‚úÖ Generated peptide backbones
2. ‚úÖ Used **ProteinMPNN neural network** to design sequences
3. ‚úÖ Scored sequences by physicochemical properties
4. ‚úÖ Ranked results

### Next Steps:

- üì§ Upload your own target PDB in Cell 3
- üéØ Adjust binding site residues
- üìà Increase `num_backbones` and `num_sequences_per_backbone`
- üî¨ Add AlphaFold3 for structure prediction (requires model params)

### Resources:

- üìñ Full docs: https://github.com/duyjimmypham/pepdesign
- üí¨ Questions? Open an issue on GitHub

---