# Understanding MD

**Welcome**

Welcome to the MD section of the EncoderMap tutorial. All EncoderMap tutorials are provided as jupyter notebooks, that you can run locally, on binderhub, or even on google colab.


Run this notebook on Google Colab:

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AG-Peter/encodermap/blob/main/tutorials/notebooks_MD/01_Understanding_MD.ipynb)

Find the documentation of EncoderMap:

https://ag-peter.github.io/encodermap

**Goals:**

In this tutorial you will learn:
- [What CVs are.](#primer)
- [How EncoderMaps' new `SingleTraj` class loads MD data.](#singletraj)
- [How a `SingleTraj` can be associated with CVs.](#load_CVs)

### For Google colab only:

If you're on Google colab, please uncomment these lines and install EncoderMap.

In [None]:
# !wget https://gist.githubusercontent.com/kevinsawade/deda578a3c6f26640ae905a3557e4ed1/raw/b7403a37710cb881839186da96d4d117e50abf36/install_encodermap_google_colab.sh
# !sudo bash install_encodermap_google_colab.sh

## Primer

The recent iteration of EncoderMap added features that allow EncoderMap to assist you in answering all analysis questions that you wan to ask your MD data.

In contrary to the older versions of EncoderMap, in which you could train a machine learning model and use it for dimensionality reduction, the new EncoderMap adds:

- Data organization.
- Data validation.
- Feature engineeering.
- Model Serving.

functionalities, that help you in working with your MD data. Let's have a look at these features by analysing an MD dataset. At the beginning, we need to import EncoderMap.

In [None]:
import encodermap as em
%load_ext autoreload
%autoreload 2

## EncoderMap pipeline

### Download data

EncoderMap comes with some out-of-the-box datasets. These are hosted on a data repository curated by the University of Konstanz, called KonDATA. You can fetch them with the `load_project` function. In our case, we are taking a look at a multidomain protein consisting of two [Ubiquitin proteins](https://www.rcsb.org/structure/1UBQ), that are joined as a long chain. The dataset consists of 12 trajectories, each consisting of 5001 frames.

In [None]:
trajs = em.load_project("linear_dimers")
trajs.del_CVs()
print(trajs)

The `TrajEnsemble` is EncoderMap's new container for organizing MD data.

### Featurize data

### Train

### Evaluate

In [None]:
import mdtraj as md
traj = md.load(
    "/home/kevin/git/encoder_map_private/tests/data/linear_dimers/01.xtc",
    top="/home/kevin/git/encoder_map_private/tests/data/linear_dimers/01.pdb",
)

traj.time[1] - traj.time[0]

In [None]:
traj.time[-1]

In [None]:
import encodermap as em

In [None]:
import encodermap as em
import numpy as np
from pathlib import Path
traj1 = em.SingleTraj(
    Path(em.__file__).parent.parent / "tests/data/1am7_corrected.xtc",
    Path(em.__file__).parent.parent / "tests/data/1am7_protein.pdb",
)
traj1.load_CV(traj1.xyz[..., -1], 'z_coordinate')

for i, frame in enumerate(traj1):
    print(np.array_equal(frame.z_coordinate[0], frame.xyz[0, :, -1]))
    if i == 3:
        break