# BEAM Stage 2: CG Pipeline Demo

## From CG Trajectories to AA Enhanced Sampling Preparation

This notebook demonstrates the complete Stage 2 workflow:
1. Load and preprocess CG trajectory
2. Train TICA to learn slow collective variables
3. Visualize results
4. Prepare outputs for AA enhanced sampling

**Note:** This demo uses simplified numpy arrays. For real usage with DCD files, see the API documentation.

In [None]:
import sys
sys.path.insert(0, '..')

import numpy as np
import matplotlib.pyplot as plt

from beam import (
    train_cg_tica,
    suggest_tica_params,
    plot_tica_projection,
    plot_free_energy_landscape,
    plot_timescales
)

## Step 1: Load CG Trajectory

In real usage, you would use:
```python
from beam import load_and_preprocess_cg
cg_features = load_and_preprocess_cg(
    'cg_traj.dcd',
    'topology.pdb', 
    'reference.pdb'
)
```

For this demo, we use pre-generated data:

In [None]:
# Load demo CG trajectory (already preprocessed)
cg_features = np.load('../data/demo_cg_traj.npy')

print(f"CG trajectory loaded:")
print(f"  Shape: {cg_features.shape}")
print(f"  Frames: {cg_features.shape[0]}")
print(f"  Features: {cg_features.shape[1]}")
print(f"  (Features = n_backbone_atoms * 3 for xyz coordinates)")

## Step 2: Parameter Suggestion (Optional)

BEAM can suggest optimal TICA parameters.

**Current status:** Returns reasonable defaults  
**Future (Fellowship):** Automatic selection via VAMP-2 cross-validation

In [None]:
# Get parameter suggestions
params = suggest_tica_params(cg_features, method='tica')

print("\nSuggested TICA parameters:")
print(f"  Lag time: {params['lagtime']} frames")
print(f"  Dimensions: {params['dim']}")
print(f"\nNote: {params['note']}")

print("\nFuture features to be implemented:")
for feature in params['future_features']:
    print(f"  • {feature}")

## Step 3: Train TICA Model

Train TICA to learn slow collective variables from CG data.

In [None]:
# Train TICA
lagtime = params['lagtime']
dim = params['dim']

tica_model, cg_cv = train_cg_tica(
    cg_features,
    lagtime=lagtime,
    dim=dim,
    save_path='../data/cg_tica_model.pkl'
)

print(f"\nCG CV shape: {cg_cv.shape}")
print(f"TICA model saved to: ../data/cg_tica_model.pkl")

## Step 4: Visualize Results

### 4.1 TICA Projection

In [None]:
plot_tica_projection(
    cg_cv,
    title="CG Trajectory in TICA Space",
    xlabel="CG tIC 1",
    ylabel="CG tIC 2"
)

### 4.2 Free Energy Landscape

In [None]:
plot_free_energy_landscape(
    cg_cv,
    bins=40,
    title="CG Free Energy Landscape",
    vmax=10
)

### 4.3 Implied Timescales

In [None]:
plot_timescales(
    tica_model,
    lagtime=50,
    dt=1.0,
    n_components=5,
    title="CG TICA Implied Timescales"
)

## Step 5: Prepare for AA Enhanced Sampling

The trained TICA model is now ready to guide AA enhanced sampling.

### For REAP:

Use the `prepare_reap_interface()` function:

```python
from beam import prepare_reap_interface

prepare_reap_interface(
    aa_dcd_path='aa_trajectory.dcd',
    topology_pdb='topology.pdb',
    reference_pdb='reference.pdb',
    cg_tica_pkl_path='cg_tica_model.pkl',
    output_npy_path='aa_cv_for_reap.npy'
)
```

This will:
1. Load AA trajectory
2. Align and preprocess (same as CG)
3. Transform with CG TICA model
4. Save as .npy file ready for REAP

### For other methods (REUS, Metadynamics, etc.):

Extract CV weights from the model:

```python
# Get first CV (slowest mode)
cv1_weights = tica_model.eigenvectors[:, 0]

# Generate colvars configuration for NAMD/GROMACS
# (to be implemented in future versions)
```

## Summary

We have completed Stage 2 of the BEAM pipeline:

✅ Loaded and preprocessed CG trajectory  
✅ Trained TICA to learn slow collective variables  
✅ Visualized CV space and free energy landscape  
✅ Saved TICA model for AA enhanced sampling  

### Next Steps:

1. **Run AA enhanced sampling** using the learned CVs (e.g., REAP, REUS)
2. **Proceed to Stage 3**: Analyze AA trajectories (see `demo_aa_analysis.ipynb`)
3. **Compare CG vs AA**: Visualize and quantify differences

### Files Generated:

- `cg_tica_model.pkl` - Trained TICA model
- `cg_cv_trajectory.npy` - CG trajectory in CV space (optional)

These files are used in Stage 3 analysis!