# Constrained Search tutorial

This is a complete tutorial going through the process of locating the 40S small ribosomal subunit using the constrained orientation search from identified 60S large ribosomal subunits.
We step through the process generating the reference template maps, running the template matching program, optimizing the template, and finally running the constrained orientation search.
Input data for this tutorial as well as the intermediary result files can be found at [this Zenodo dataset 10.5281/zenodo.15368246](https://zenodo.org/records/15368246).
We will be using the micrograph with filename `xenon_131_000_0.0_DWS.mrc`, but this process can be done for a whole cohort of images.

### Tutorial Requirements

In terms of Python libraries, the following are required

* Leopard-EM v1.0 or above
* matplotlib
* TODO

In [None]:
# Run this code cell to install required packages
# !pip install leopard-em matplotlib
# TODO: test this and verify which packages are needed

In [None]:
import matplotlib.pyplot as plt
import mmdf
import mrcfile
import numpy as np

from leopard_em.pydantic_models.managers import MatchTemplateManager

## 1. Download and pre-process required data

The following cells will go through, download and pre-process all the necessary data to process in this tutorial.
This will also create a directory structure to save the micrographs, models, maps, and configuration files.

We also include a few visualizations to see what data we are working with.

In [None]:
import os

import requests


def download_zenodo_file(url: str, out_dir: str) -> str:
    """Helper function to download a file hosted on Zenodo from a URL to given dir."""
    output_filename = url.split("/")[-1]

    response = requests.get(url, stream=True)
    response.raise_for_status()  # Check for request errors

    with open(f"{out_dir}/{output_filename}", "wb") as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)

    return output_filename


file_downloads = [
    ("mgraphs", "https://zenodo.org/records/15368246/files/xenon_131_000_0.0_DWS.mrc"),
    ("models", "https://zenodo.org/records/15368246/files/60S_aligned.pdb"),
    ("models", "https://zenodo.org/records/15368246/files/6q8y_aligned.pdb"),
    (
        "models",
        "https://zenodo.org/records/15368246/files/6q8y_SSU_no_head_aligned.pdb",
    ),
    (
        "configs",
        "https://zenodo.org/records/15368246/files/match_template_config_crop.yaml",
    ),
]

# Loop through files list and download each
for out_dir, file_url in file_downloads:
    # Create the directory if it doesn't exist
    os.makedirs(out_dir, exist_ok=True)

    # Skip if the file already exists
    fname = file_url.split("/")[-1]
    if os.path.exists(f"{out_dir}/{fname}"):
        print(f"File {fname} already exists in {out_dir}. Skipping download.")
        continue

    # Download the file
    filename = download_zenodo_file(file_url, out_dir)
    print(f"Downloaded {filename} to {out_dir}")

### Plot of the micrograph

Here, we use the `mrcfile` package to read a .mrc file into a numpy array.
Then, we visualize the micrograph with `matplotlib`.

In [None]:
# Read image into numpy array called 'data'
data = mrcfile.open("mgraphs/xenon_131_000_0.0_DWS.mrc", mode="r").data.copy()

# Plot the greyscale image
plt.figure(figsize=(10, 10))
plt.imshow(data, cmap="gray")
plt.axis("off")
plt.show()

### Pre-process PDB files

We've downloaded a PDB model of the 80S ribosome in the non-rotated state `6q8y_aligned.pdb`.
In addition to this, we have two additional PDB files which correspond to the 40S (`6q8y_SSU_no_head_aligned.pdb`) and 60S (`60S_aligned.pdb`) ribosomal subunits; these were generated externally using the ChimeraX program from the full non-rotated ribosome model.
For the 40S subunit, the head domain has also been removed to leave only the body.
Both the 40S and 60S models have been pre-aligned with respect to the 80S model suing the matchmaker function in ChimeraX so the relative positions and orientations of the models match with each other.

#### Center PDB models

The 80S PDB file is shifted such that the average atomic position is located at $(0, 0, 0)$.
This same shift is applied to the 40S and 60S models so they remain aligned throughout, and all transformed PDB files are written back to disk with an `_aligned_zero` suffix.

In [None]:
def center_pdb_files(pdb_ref: str, pdb_A: str, pdb_B: str) -> None:
    """Transform reference PDB file to average atomic position of (0, 0, 0).

    The same transformation is applied to the other PDB files, A and B, and all files
    are saved with a new '_aligned_zero' suffix.

    Parameters
    ----------
    pdb_ref : str
        Path to reference PDB file to center.
    pdb_A : str
        Additional PDB file to also transform based on reference centering.
    pdb_B : str
        Additional PDB file to also transform based on reference centering.
    """
    # Load PDB models into DataFrame objects
    df_ref = mmdf.read(pdb_ref)
    df_A = mmdf.read(pdb_A)
    df_B = mmdf.read(pdb_B)

    # Extract atom coordinates from reference PDB. Shape of (n_atoms, 3)
    coords = df_ref[["x", "y", "z"]].values
    center = np.mean(coords, axis=0)

    print(f"Center of reference PDB: {center}")

    # Now apply the centering transformation to PDB files
    shift_vector = -center
    df_ref[["x", "y", "z"]] += shift_vector
    df_A[["x", "y", "z"]] += shift_vector
    df_B[["x", "y", "z"]] += shift_vector

    # Save the transformed PDB files with a new name
    mmdf.write(pdb_ref.replace(".pdb", "_aligned_zero.pdb"), df_ref)
    mmdf.write(pdb_A.replace(".pdb", "_aligned_zero.pdb"), df_A)
    mmdf.write(pdb_B.replace(".pdb", "_aligned_zero.pdb"), df_B)


# Center the PDB files
center_pdb_files(
    pdb_ref="models/60S_aligned.pdb",
    pdb_A="models/6q8y_aligned.pdb",
    pdb_B="models/6q8y_SSU_no_head_aligned.pdb",
)

## 2. Initial match template with 60S model

Since we want to constrain the search space for a 40S small subunit (SSU) using the 60S large subunit (LSU), we first need to run full-orientation match template on the LSU model.
We will go through the steps of configuring and running the match template program in Python.
Further details about the match template program in Leopard-EM are located [here on the documentation](TODO-link).

### Generating 3D maps from models

The template matching program requires simulated 3D maps to generate projections form, and below we use the [ttsim3d](https://github.com/teamtomo/ttsim3d) Python package to generate these maps.
For a different dataset/structure, these simulation configurations need to be changed.

In [None]:
# Making a directory to save 3D map files
os.makedirs("maps", exist_ok=True)

In [None]:
from ttsim3d.models import Simulator, SimulatorConfig

# Instantiate the configuration object
sim_conf = SimulatorConfig(
    voltage=300.0,  # in keV
    apply_dose_weighting=True,
    dose_start=0.0,  # in e-/A^2
    dose_end=50.0,  # in e-/A^2
    dose_filter_modify_signal="rel_diff",
    upsampling=-1,  # auto
    mtf_reference="falcon4EC_300kv",
)

# Instantiate the simulator
sim = Simulator(
    pdb_filepath="models/60S_aligned_aligned_zero.pdb",
    pixel_spacing=0.95,  # Angstroms
    volume_shape=(512, 512, 512),
    center_atoms=False,
    remove_hydrogens=True,
    b_factor_scaling=0.5,  # Multiply model b-factors by 1/2
    additional_b_factor=0,
    simulator_config=sim_conf,
)

# Run the simulation and write the output to a file
# We will read this file into memory later
mrc_filepath = "maps/60S_map_px0.95_bscale0.5.mrc"
sim.export_to_mrc(mrc_filepath)

### Initial test template matching run

Below, we run the match template program on a small image patch which will give us a few peaks and optimize our template simulation before proceeding.
First, we crop our a central 1k by 1k patch from our 4k by 4k image and save it as a new mrc file.

In [None]:
data = mrcfile.open("mgraphs/xenon_131_000_0.0_DWS.mrc", mode="r").data.copy()

# Crop out a central (1024, 1024) region of the image
data_cropped = data[
    data.shape[0] // 2 - 512 : data.shape[0] // 2 + 512,
    data.shape[1] // 2 - 512 : data.shape[1] // 2 + 512,
]

# Save the cropped image to a new MRC file
# NOTE: This is not updating any of the header information
output_filename = "mgraphs/xenon_131_000_0.0_DWS_cropped_4.mrc"
with mrcfile.new(output_filename, overwrite=True) as mrc:
    mrc.set_data(data_cropped)

Below, we setup and run a full-orientation match template run based on the downloaded [configuration file](./config/match_template_config_crop.yaml).
The programs section of the Leopard-EM documentation contains detailed explanations for each of these fields, so we will continue by running match template.

**Note: This config assumes you have 4 GPUs on your system! You may need to change the `gpu_ids` field depending on your system!**

In [None]:
# Make directory to save program results
os.makedirs("results", exist_ok=True)

In [None]:
YAML_CONFIG_PATH = "configs/match_template_config_crop.yaml"
ORIENTATION_BATCH_SIZE = 8  # Adjust depending on GPU memory

mt_manager = MatchTemplateManager.from_yaml(YAML_CONFIG_PATH)
mt_manager.run_match_template(ORIENTATION_BATCH_SIZE)
df = mt_manager.results_to_dataframe(locate_peaks_kwargs={"false_positives": 1.0})
df.to_csv("results/results_match_template_crop.csv")