# FK-RFdiffusion: Unconditional Protein Design Tutorial

This notebook demonstrates how to design standalone proteins with specific properties using Feynman-Kac guided RFdiffusion.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ErikHartman/FK-RFdiffusion/blob/main/examples/unconditional_design.ipynb)

**What you'll learn:**
- Set up the environment in Colab
- Design proteins with specific secondary structure
- Design proteins with sequence properties (charge, hydrophobicity)
- Visualize and analyze results

## 1. Environment Setup

First, let's install all required dependencies. This will take ~5-10 minutes.

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Install PyTorch with CUDA support
!pip install -q torch==2.4.0+cu121 torchvision==0.19.0+cu121 torchaudio==2.4.0+cu121 \
  --extra-index-url https://download.pytorch.org/whl/cu121

In [None]:
# Install DGL with CUDA support
!pip install -q --no-cache-dir "dgl==2.1.0+cu121" \
  -f https://data.dgl.ai/wheels/cu121/repo.html

In [None]:
# Install other dependencies
!pip install -q hydra-core==1.3.2 omegaconf==2.3.0 tqdm biopython pandas torchdata>=0.7,<0.8
!pip install -q pydssp

## 2. Clone FK-RFdiffusion and Dependencies

In [None]:
# Clone the repository with submodules
!git clone --recursive https://github.com/ErikHartman/FK-RFdiffusion.git
%cd FK-RFdiffusion

In [None]:
# Install RFdiffusion
%cd externals/RFdiffusion
!pip install -q -e . --no-deps

# Install SE(3) Transformer
%cd env/SE3Transformer
!pip install -q -r requirements.txt
!pip install -q .
%cd ../../../

In [None]:
# Install ProteinMPNN (for sequence design in reward functions)
%cd externals/ProteinMPNN
!pip install -q -e .
%cd ../..

## 3. Download RFdiffusion Model Weights

In [None]:
# Download the base checkpoint for unconditional design
!mkdir -p externals/RFdiffusion/models
!wget -q http://files.ipd.uw.edu/pub/RFdiffusion/e29311f6f1bf1af907f9ef9f44b8328b/Complex_base_ckpt.pt \
  -O externals/RFdiffusion/models/Complex_base_ckpt.pt

print("âœ“ Model checkpoint downloaded")

## 4. Design Proteins with Secondary Structure Guidance

Let's design proteins enriched in specific secondary structures.

In [None]:
import sys
sys.path.insert(0, '.')

from fk_rfdiffusion.run_inference_guided import run_feynman_kac_design

### Design alpha-helical proteins

In [None]:
run_feynman_kac_design(
    contigs=["50"],                       # 50-residue protein
    reward_function="alpha_helix_ss",     # Maximize alpha helix content
    num_designs=5,                        # Generate 5 designs
    n_particles=15,                       # Use 15 parallel particles
    resampling_frequency=5,               # Resample every 5 steps
    guidance_start_timestep=20,           # Start guiding early for structure
    output_prefix="./outputs/alpha_helix",
    checkpoint="base"
)

### Design beta-sheet proteins

In [None]:
run_feynman_kac_design(
    contigs=["60"],                       # 60-residue protein
    reward_function="beta_sheet_ss",      # Maximize beta sheet content
    num_designs=5,
    n_particles=15,
    resampling_frequency=5,
    guidance_start_timestep=20,
    output_prefix="./outputs/beta_sheet"
)

### Design loop-rich proteins

In [None]:
run_feynman_kac_design(
    contigs=["50"],
    reward_function="loop_ss",            # Maximize loop/coil content
    num_designs=5,
    n_particles=15,
    resampling_frequency=5,
    guidance_start_timestep=20,
    output_prefix="./outputs/loops"
)

## 5. Design Proteins with Sequence Properties

Now let's design proteins with specific sequence characteristics.

### Design hydrophobic proteins

In [None]:
run_feynman_kac_design(
    contigs=["50"],
    reward_function="sequence_hydrophobic",  # Maximize hydrophobic residues
    num_designs=5,
    n_particles=15,
    n_sequences=3,                           # Generate 3 sequences per structure
    aggregation_mode="mean",                 # Average their rewards
    resampling_frequency=5,
    guidance_start_timestep=30,
    output_prefix="./outputs/hydrophobic"
)

### Design positively charged proteins

In [None]:
run_feynman_kac_design(
    contigs=["50"],
    reward_function="sequence_charged_positive",  # Maximize positive charge (K, R)
    num_designs=5,
    n_particles=15,
    n_sequences=3,
    resampling_frequency=5,
    guidance_start_timestep=30,
    output_prefix="./outputs/positive_charge"
)

### Design negatively charged proteins

In [None]:
run_feynman_kac_design(
    contigs=["50"],
    reward_function="sequence_charged_negative",  # Maximize negative charge (D, E)
    num_designs=5,
    n_particles=15,
    n_sequences=3,
    resampling_frequency=5,
    guidance_start_timestep=30,
    output_prefix="./outputs/negative_charge"
)

## 6. Variable-Length Design

Design proteins with variable length to explore different sizes.

In [None]:
run_feynman_kac_design(
    contigs=["40-80"],                    # Variable length: 40-80 residues
    reward_function="alpha_helix_ss",
    num_designs=1,
    n_runs=10,                            # 10 runs with different lengths
    n_particles=15,
    resampling_frequency=5,
    guidance_start_timestep=20,
    output_prefix="./outputs/var_length_helix"
)

## 7. Visualize Results

Let's visualize the generated proteins using py3Dmol.

In [None]:
!pip install -q py3Dmol

In [None]:
import py3Dmol
import glob

# Visualize alpha helix designs
pdb_files = sorted(glob.glob("outputs/alpha_helix*.pdb"))[:3]

for i, pdb_file in enumerate(pdb_files):
    print(f"\n=== Alpha Helix Design {i+1} ===")
    
    with open(pdb_file, 'r') as f:
        pdb_data = f.read()
    
    view = py3Dmol.view(width=400, height=300)
    view.addModel(pdb_data, 'pdb')
    view.setStyle({'cartoon': {'color': 'spectrum'}})
    view.zoomTo()
    view.show()

In [None]:
# Visualize beta sheet designs
pdb_files = sorted(glob.glob("outputs/beta_sheet*.pdb"))[:3]

for i, pdb_file in enumerate(pdb_files):
    print(f"\n=== Beta Sheet Design {i+1} ===")
    
    with open(pdb_file, 'r') as f:
        pdb_data = f.read()
    
    view = py3Dmol.view(width=400, height=300)
    view.addModel(pdb_data, 'pdb')
    view.setStyle({'cartoon': {'color': 'cyan'}})
    view.zoomTo()
    view.show()

## 8. Analyze Secondary Structure Content

Let's quantify the secondary structure in our designs using DSSP.

In [None]:
from Bio.PDB import PDBParser
from pydssp import assign
import pandas as pd

def analyze_secondary_structure(pdb_file):
    """Analyze secondary structure content of a PDB file."""
    parser = PDBParser(QUIET=True)
    structure = parser.get_structure('protein', pdb_file)
    
    # Run DSSP
    dssp = assign(structure[0], pdb_file)
    
    # Count secondary structure types
    ss_counts = {'H': 0, 'E': 0, 'L': 0}  # Helix, Sheet, Loop
    
    for res in dssp:
        ss = res[2]
        if ss in ['H', 'G', 'I']:  # Alpha, 3-10, Pi helix
            ss_counts['H'] += 1
        elif ss in ['E', 'B']:  # Beta sheet, Beta bridge
            ss_counts['E'] += 1
        else:
            ss_counts['L'] += 1
    
    total = sum(ss_counts.values())
    return {
        'helix_%': 100 * ss_counts['H'] / total,
        'sheet_%': 100 * ss_counts['E'] / total,
        'loop_%': 100 * ss_counts['L'] / total
    }

# Analyze alpha helix designs
results = []
for pdb_file in glob.glob("outputs/alpha_helix*.pdb"):
    ss_content = analyze_secondary_structure(pdb_file)
    results.append({'file': pdb_file, **ss_content})

df = pd.DataFrame(results)
print("\nAlpha Helix Designs:")
print(df.to_string(index=False))
print(f"\nAverage helix content: {df['helix_%'].mean():.1f}%")

## 9. Export Designs

Download all your designed proteins.

In [None]:
# Zip all outputs
!zip -r unconditional_designs.zip outputs/

# Download in Colab
from google.colab import files
files.download('unconditional_designs.zip')

## Next Steps

- **Combine properties**: Create custom reward functions that optimize for multiple properties
- **Optimize parameters**: Experiment with different guidance settings
- **Validate designs**: Run MD simulations or experimental validation
- **Explore more**: Try different potential modes (`sum`, `max`, `immediate`) to see how they affect results