# RMSX Demonstration

This notebook provides an end-to-end example of how to use **RMSX** for analyzing
molecular dynamics (MD) simulations. We'll show how to:

1. Set up the environment and install dependencies
2. Prepare input files (structure and trajectory)
3. Run RMSX on a single chain
4. Run RMSX on multiple chains simultaneously
5. Generate FlipBook snapshots to visualize your protein movements
6. Display and interpret the results

> **Note:** Adjust file paths to match your local environment.

## 1. Environment Setup

**Please ensure you have downloaded R and ChimeraX before proceeding**

M1 macs can run intel versions of R but that won't suffice here. If the following does not compile make sure you are using the correct version of R for your machine. We'll use it for plotting the heatmaps and flanking RMSD and RMSF plots

Once you have those installed you can just run each of the following cells in order to generate RMSX and the associated plots.


Quick Start:

In [None]:
# import sys, pathlib
# import rmsx, rmsx.rmsx as core
#
# print("python:", sys.executable)
# print("rmsx module:", rmsx.__file__)
# print("has run_rmsx?", hasattr(core, "run_rmsx"))

In [None]:
# One-click setup
import sys, subprocess, pathlib, os

REPO_URL = "https://github.com/AntunesLab/rmsx.git"
REPO_DIR = pathlib.Path.cwd() / "rmsx"       # clone next to the notebook
PKG_DIR  = REPO_DIR / "rmsx"                 # the Python package folder (has __init__.py)

# 1) Clone if missing
if not REPO_DIR.exists():
    print("Cloning RMSX…")
    subprocess.check_call(["git", "clone", REPO_URL, str(REPO_DIR)])

# 2) Install into this notebook’s environment (editable so changes show up)
subprocess.check_call([sys.executable, "-m", "pip", "install", "-e", str(REPO_DIR)])

# 3) Optional: check Rscript (Windows users: set the full path below if needed)
RSCRIPT = os.environ.get("RSCRIPT", "Rscript")
try:
    out = subprocess.run([RSCRIPT, "-e", "cat(R.version.string)"], capture_output=True, text=True, check=False)
    print(out.stdout or out.stderr or "Rscript OK")
except FileNotFoundError:
    print("⚠️ Rscript not found. Plots will be skipped until R is installed or RSCRIPT is set.")


*Windows users: if R isn’t on PATH, set RSCRIPT before running:*

In [None]:
import os
#os.environ["RSCRIPT"] = r"C:\Program Files\R\R-4.4.1\bin\Rscript.exe"

Loading Test Files - Please *Don't* add your files here, see the next section.

In [None]:
# !python -c "import rmsx, sys; print('rmsx file:', rmsx.__file__); print('sys.path[0]:', sys.path[0])"


In [None]:
from pathlib import Path
import os
import rmsx as core
from rmsx import run_rmsx, run_rmsx_flipbook, all_chain_rmsx, run_shift_flipbook

pkg_dir = Path(core.__file__).resolve().parent
# Look for test_files in common spots
candidates = [
    pkg_dir.parent / "test_files",
    pkg_dir.parent.parent / "test_files",
    Path.cwd() / "test_files",
    Path.cwd() / "rmsx" / "test_files",
]
test_dir = next((p for p in candidates if p.exists()), None)
if not test_dir:
    raise FileNotFoundError("Couldn't find test_files. Make sure you cloned the repo.")

pdb_file   = (test_dir / "1UBQ.pdb").as_posix()
dcd_file   = (test_dir / "mon_sys.dcd").as_posix()
output_dir = (test_dir / "example_uqb").as_posix()


pdb_file_multi   = (test_dir / "Protease/protease_backbone.pdb").as_posix()
traj_file_multi   = (test_dir / "Protease/short_protease_backbone.dcd").as_posix()
output_dir_multi = (test_dir / "multi_protease").as_posix()




## 3. Single-Chain RMSX Analysis

The `run_rmsx` function computes RMSX for a specified chain (e.g., chain A or a single-chain
protein) and automatically generates:

* A **heatmap** of RMSX values vs. residue and time slice
* An **RMSD** plot for overall structural changes
* An **RMSF** plot over the entire simulation

We'll also set `palette="mako"` or any color scheme supported by R's `viridis` package.

If the analysis ran successfully, RMSX will produce plots in the specified output directory.
Typically, you'll see a **CSV** with the RMSX values per residue/time-slice, plus **PNG** plots of:

1. RMSX heatmap
2. RMSD time series
3. RMSF bar chart

In [None]:
# Feel free to explore your own files here, or use the test files provided in the repo.

# pdb_file, = "example/path/to/your.pdb",
# dcd_file = "example/path/to/your.dcd" #or any other trajectory format
# output_dir = "example/path/to/your/output_dir"  # where to save results

# make sure you update chain_sele below as well or leave it as None to see your options"

run_rmsx(
    topology_file=pdb_file,        # PDB or topology file
    trajectory_file=dcd_file,      # Trajectory file
    output_dir=output_dir,         # Location for RMSX outputs
    num_slices=9,                  # Divide trajectory into 9 slices
    slice_size=None,               # (Alternately specify slice_size in frames)
    rscript_executable='Rscript',  # Path to Rscript
    verbose=False,                  # Print detailed logs
    interpolate=False,             # Disable heatmap interpolation
    triple=True,                   # Generate RMSX, RMSD, and RMSF plots
    overwrite=True,                # Overwrite existing folder
    palette="mako",                # Color palette
    chain_sele="7",                # Target chain ID
    start_frame=0,                 # First frame to analyze
    end_frame=None                 # Last frame (None = all frames)
)
print("Done. Outputs in:", output_dir)


## 4. Multi-Chain RMSX Analysis

For a protein with multiple chains (e.g., an HIV protease dimer), you can use
`all_chain_rmsx` or `run_rmsx_flipbook` to compute RMSX for each chain separately.
The function can also map the RMSX values back onto a *combined* PDB file for a
multi-chain FlipBook.

If you only want an RMSX plot for one of your chains use `run_rmsx()`

`all_chain_rmsx()` will work regardless of the number of chains


In [None]:

all_chain_rmsx(
    topology_file=pdb_file,
    trajectory_file=dcd_file,
    output_dir=output_dir,
    num_slices=12,
    slice_size=None,
    rscript_executable='Rscript',
    verbose=False,
    interpolate=False,
    triple=True,
    overwrite=True,
    palette="turbo",
    start_frame=0,
    end_frame=None,
    sync_color_scale=True   # Synchronize color bar across chains
)

After completion, each chain will have its own RMSX CSV and plots in subfolders of
`output_dir_multi_chain`. An additional “combined” folder usually appears, containing
snapshots that unify all chains into a single PDB per time slice.

## 5. Generating a FlipBook

**FlipBook** maps the RMSX values (stored in the B-factor column) back onto 3D structures and
arranges snapshots side-by-side. This lets you visually inspect how each region expands,
contracts, or shifts over time—much like flipping through an animation.

You can generate a FlipBook either by:

* Directly calling `run_rmsx_flipbook` (which performs RMSX **and** flipbook generation in one step), or
* Running `run_rmsx` first, then using `run_flipbook` to create the images from existing RMSX data.

In [None]:
pdb_file_multi


In [None]:
# you can define you own files here:
# pdb_file_multi = path/to/your/multi_chain.pdb
# traj_file_multi = path/to/your/multi_chain_trajectory.dcd
# output_dir_multi =output_dir/to/your/output_dir

run_rmsx_flipbook(
    topology_file=pdb_file_multi,
    trajectory_file=traj_file_multi,
    output_dir=output_dir_multi,
    num_slices=9,
    slice_size=None,
    rscript_executable='Rscript',
    verbose=False,
    interpolate=False,
    triple=True,
    overwrite=True,
    palette="turbo",
    spacingFactor="0.8",   # Adjust space between models in the flipbook
    start_frame=0,
    end_frame=None
)

Here you can try out a trajectory map implementation here - please credit the original method creators:

Matej Kožić, Branimir Bertoša, Trajectory maps: molecular dynamics visualization and analysis, NAR Genomics and Bioinformatics, Volume 6, Issue 1, March 2024, lqad114, https://doi.org/10.1093/nargab/lqad114

In [None]:
!open '/Users/finn/Documents/GitHub/rmsx_paper_files/rmsx/rmsx/test_files/multi_protease'

In [None]:
# you can define you own files here:
# pdb_file_multi = path/to/your/multi_chain.pdb
# traj_file_multi = path/to/your/multi_chain_trajectory.dcd
# output_dir_multi =output_dir/to/your/output_dir

run_shift_flipbook(
    topology_file=pdb_file_multi,
    trajectory_file=traj_file_multi,
    output_dir=output_dir_multi,
    num_slices=9,
    slice_size=None,
    rscript_executable='Rscript',
    verbose=False,
    interpolate=False,
    triple=True,
    overwrite=True,
    palette="viridis",
    spacingFactor="0.9",   # Adjust space between models in the flipbook
    start_frame=0,
    end_frame=None
)

When the command completes, the script automatically launches ChimeraX in the background (if
installed) to generate high-resolution PNG snapshots. You can in ChimeraX you can reposition the structure or adjust rendering and rerun using the save png command (check the log in ChimerX).

### 5.1 Displaying the FlipBook Image

Below is an example of how to display the resulting PNG in this notebook, though the exact
file path depends on your setup.

In [None]:
import glob
from IPython.display import Image, display
imgs = sorted(glob.glob(os.path.join(output_dir_multi, "combined", "*.png")))
if imgs: display(Image(filename=imgs[-1]))
else:    print("No flipbook images found in", output_dir_multi)

## 6. Visualizing Results and Interpretation

RMSX outputs three main plots by default (if `triple=True`):

- **RMSX Heatmap**: Residue index vs. time slice, colored by RMSF magnitude.
- **RMSD Plot**: Overall deviation over time.
- **RMSF Plot**: Average fluctuations per residue over the entire trajectory.

Additionally, FlipBook produces side-by-side snapshots of the protein colored by RMSX values
at each time window.

**Key questions to ask** while interpreting RMSX:
1. Which regions (residues) show the highest flexibility overall?
2. When do these flexibility spikes occur (early, mid, or late in the simulation)?
3. Does RMSX reveal multiple distinct intervals of high fluctuation?
4. How do these fluctuations align with known functional domains or events (e.g., ligand binding,
   domain motions)?

## 7. (Optional) Advanced Discussion

RMSX can be used alongside other analysis methods like:

- **Dynamic Cross-Correlation Maps (DCCM)** to see if flexible regions move in a correlated way.
- **Principal Component Analysis (PCA)** to reduce dimensionality and identify principal motions.

For scripts demonstrating DCCM or PCA, see the `dccm_example.py` or `pca_script.py` in this repository.

**Potential Future Directions** include integrating RMSX results with principal component trajectories,
or analyzing multiple replicate simulations to see if high-flexibility intervals are reproducible.

---
### **Conclusions**

In this notebook, we showed how to:
1. Install and configure **RMSX**  
2. Run RMSX on single- and multi-chain proteins  
3. Generate FlipBook snapshots to visualize fluctuations over time  
4. Interpret the RMSX plots to discover **when** and **where** the largest changes happen  

We hope this helps you get started with RMSX for your own MD simulations. For additional details,
please check out our [GitHub documentation](https://github.com/AntunesLab/rmsx) or raise an issue
if you have questions.

### **Acknowledgements**
We would like thank the following researchers for their helpful comments and suggestions in making this tutorial:
- Jaila Lewis - University of Houston
- Mason Kretiv - Texas A&M University
- Daniel Giraldo -