# __ML/MM Toolkit: Feature overview__

## __Installation__

Straightforward installation from the [GitHub repository](https://github.com/t-0hmura/mlmm_toolkit). The steps include first configuring PyTorch to the corresponding  CUDA version of our GPU and then installing the toolkit from source. To use UMA models, login to HuggingFace is required. The installation below is for CUDA 12.9 (latest release). 

```
conda create --name mlmm python=3.11
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129
conda install notebook ipywidgets
pip install git+https://github.com/t-0hmura/mlmm_toolkit.git
hf auth login
```



In [None]:
# verify PyTorch installation
import torch
print(f'Version: {torch.__version__}. Using GPU: {torch.cuda.is_available()}, NUM_GPU: {torch.cuda.device_count()}')

## __Usage__
The toolkit provides three ways of using it
- via an ASE wrapper
- via a Pysisyphus Interface (direct API or via .yaml files)
- via a base-level API

Additionally, specific tasks can be performed via CLI tools. 

### __Workflow__
**Precatalytic structures.** First, we must chemically define what enzymatic reaction we are trying to model, and the suitable reaction coordinate to use. This is crucial as we will need to analyze the trajectory to obtain precatalytic structures. To determine the reaction kinetics, we will determine first the free energy barrier via:

$
\begin{equation}
\Delta G^* = \Delta G_\textnormal{precat}^* + \Delta G_\textnormal{EA}^* \equiv - RT\ln{\frac{n}{N}} - RT \left\{\frac{1}{n}\sum_i^n \textnormal{exp}\left(\frac{-\Delta E_i^*}{RT}\right) \right\}\;.
\end{equation}
$

We will only focus on the exponential average term (the precat term is obtained directly from MD sampling). This involves obstaining ${\Delta E_i^*}$ for all $n$ precatalytic structures, which involves optimizing the precatalytic structure to a minimum, from which a reaction coordinate scan $r \equiv r_\textnormal{CH} - r_\textnormal{OH}$ can be performed. Then products and TS structures can be obtained from the potential energy scan. 

**Obs.** In practice, however, due to computational limitations, $n$ is much lower than the total number of precatalytic structures. (This is where ML/MM can come in handy). 


**Computational procedure.** The ML/MM workflow is very similar to that of QM/MM and we will try to replicate it for later comparison. We start from a set of precatalytic structures, extracted from a MD trajectory, for which we have the topology (`.prmtop`), coordinates (`.inpcrd`) and structures (`.pdb`) files. The steps are as follows:

1. **Define the ML region.** Done manually with residue numbers and atom numbers. 
    - CLI `def_ml_region`
    - Visualize the ML region in the protein/ligand complex. CLI `xyz_geom2pdb`. 
    - Preprocess the structure for visualization. CLI `add_elem_info`. 
    - Define the active region. Active atoms for the ML/MM simulation, surrounded by a shell of frozen atoms.  


2. **Initial optimization.** Relax the MD frame. The relaxed system should become more 'reactant-like' / oriented towards the cofactor. Ensure this is the case before further steps. LBFGS is the optimizer in all opt steps. 

3. **Scan.** ML/MM scan starting from the optimized pre-reactant. The scan uses harmonic constraints to drive the reaction forward. This is performed relaxing the system. 
    - The scan provides the approximatly 'true' reactants, TS and product. 

4. **Further optimizations.** Once with the scan, the reactant, TS and products are located with further optimizations. TS uses the dimer method. 

5. **Result analysis.**

### __ML region definition__

Use the CLI helper of a visualization tool to select residues close to the substrate.

**Obs.** Ensure that atom order, atom names, residue IDs and residue names are identical to those in the full `complex.pdb`.

- CLI Helper `def_ml_region --real complex.pdb --center ligand.pdb --out ml_region.pdb --radius_ml 2.6 --radius_het 3.6 --include_H2O false --exclude_backbone true`
    - `-r complex.pdb` input 
    - `-c ligand.pdb` "center" of the selection
    - `-o ml_region.pdb` output
    - Other options



