# Advanced Usage: Asp 7

Run this notebook on Google Colab:

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AG-Peter/encodermap/blob/main/tutorials/notebooks_starter/02_Advanced_Usage-Asp7_Example.ipynb)

Find the documentation of EncoderMap:

https://ag-peter.github.io/encodermap

**Goals:**

In this tutorial you will learn:
- [What's different when data lies in a periodic space.](#periodic_variables)
- [How to visualize and observe training progression using `Tensorboard`.](#tensorboard)
- [How to use EncoderMap's InteractivePlotting session.](#interactive_plotting)

**For Google colab only:**

If you're on Google colab, please uncomment these lines and install EncoderMap.

In [1]:
# !wget https://gist.githubusercontent.com/kevinsawade/deda578a3c6f26640ae905a3557e4ed1/raw/b7403a37710cb881839186da96d4d117e50abf36/install_encodermap_google_colab.sh
# !sudo bash install_encodermap_google_colab.sh

If you're on Google colab, you also want to download the data we will use in this notebook.

In [2]:
# !wget https://raw.githubusercontent.com/AG-Peter/encodermap/main/tutorials/notebooks_starter/asp7.csv
# !wget https://raw.githubusercontent.com/AG-Peter/encodermap/main/tutorials/notebooks_starter/asp7.pdb
# !wget https://raw.githubusercontent.com/AG-Peter/encodermap/main/tutorials/notebooks_starter/asp7.xtc

## Primer

### Imports and load data

In this tutorial we will use example data from a molecular dynamics simulation and learn more about advanced usage of EncoderMap. Encoder map can create low-dimensional maps of the vast conformational spaces of molecules. This allows easy identification of the most common molecular conformations and helps to understand the relations between these conformations. In this example, we will use data from a simulation of a simple peptide: hepta-aspartic-acid.

First we need to import some libraries:

In [3]:
import encodermap as em
import numpy as np
import plotly.express as px
import plotly.io as pio
from math import pi
try:
    from google.colab import data_table, output
    data_table.enable_dataframe_formatter()
    output.enable_custom_widget_manager()
    renderer = "colab"
except ModuleNotFoundError:
    renderer = "plotly_mimetype+notebook"
pio.renderers.default = renderer


import os; os.environ['ENCODERMAP_ENABLE_GPU'] = 'True'

before importing encodermap.




Next, we need to load the input data. Different kinds of variables can be used to describe molecular conformations: e.g. Cartesian coordinates, distances, angles, dihedrals... In principle EncoderMap can deal with any of these inputs, however, some are better suited than others. The molecular conformation does not change when the molecule is translated or rotated. The chosen input variables should reflect that and be translationally and rotationally invariant. 

In this example we use the backbone dihedral angles phi and psi as input as they are translationally and rotationally invariant and describe the backbone of a protein/peptide very well.

The "asp7.csv" file contains one column for each dihedral and one row for each frame of the trajectory. Additionally, the last column contains a cluster_id from a gromos clustering which we can later use for comparison. We can load this data using `np.loadtxt()`:

In [4]:
csv_path = "asp7.csv"
data = np.loadtxt(csv_path, skiprows=1, delimiter=",")
dihedrals = data[:, :-1]
cluster_ids = data[:, -1]

We can view the molecular dynamics simulation right here in this jupyter notebook using the `nglview` package. This cell loads the `asp7.xtc` trajectory and `asp7.pdb` topology file and displays them as a ball and stick representation.

If you don't have access to these files, you can replace the line

```python
traj = md.load('asp7.xtc', top='asp7.pdb')
```

with

```python
traj = md.load_pdb('https://files.rcsb.org/view/1YUF.pdb')
```

to load a small molecular conformation ensemble from the protein database.

**Hint:**

Sometimes the view can be not centered. Use the 'center' button in the gui to center the structure.

In [5]:
import plotly.io as pio
pio.templates.default = "plotly_white"
traj = em.load('asp7.pdb')
em.plot.plot_ball_and_stick(traj, highlight="dihedrals")


You requested a `em.loading.features.Feature` to calculate features in a periodic box, using the minimum image convention, but the trajectory you provided does not have unitcell information. If this feature will later be supplied with trajectories with unitcell information, an Exception will be raised, to make sure distances/angles are calculated correctly.



In [6]:
import nglview as nv
import mdtraj as md
traj = md.load('asp7.xtc', top='asp7.pdb')
traj.center_coordinates()
view = nv.show_mdtraj(traj, gui=True)
view.clear_representations()
view.add_representation('ball+stick')
view

NGLWidget(max_frame=10000)

Tab(children=(Box(children=(Box(children=(Box(children=(Label(value='step'), IntSlider(value=1, min=-100)), la…

<a id='periodic_variables'></a>

### Periodic variables

Periodic variables pose a problem, when we implement a distance metric between two values in a periodic space. When the input space is not-periodic, the euclidean distacen between two points ($p$ and $q$) is given as:

\begin{equation}
d(p, q) = \sqrt{\left(  p-q \right)^2}
\end{equation}

This equation does not apply when p and q are in a periodic space. Take angle values as an example. Let us assume $p$ and $q$ lie in a periodic space of $(-180^\circ, 180^\circ]$ ($-180^\circ$ is not included, $180^\circ$ is included) and have the values $p=-100^\circ$ and $q=150^\circ$. Plugging that into formula, we get:

\begin{align}
d(p, q) &= \sqrt{\left(  -100-150 \right)^2}\\
&= \sqrt{\left( -250 \right)^2}\\
&=250
\end{align}

However, the distance between these two points is not $250^\circ$, but $110^\circ$.

In [7]:
import plotly.graph_objects as go
one = go.Scatterpolar(
    r=np.full((100, ), 1),
    theta=np.linspace(-100, 150, 100),
    name="250 deg distance",
    hovertemplate="250 deg distance",
)
two = go.Scatterpolar(
    r=np.full((100, ), 1),
    theta=np.linspace(0, 110, 100) - 210,
    hovertemplate="110 deg distance",
)
fig = go.Figure(
    data=[one, two],
    layout={
        "polar": {
            "radialaxis": {
                "showticklabels": False,
                "showgrid": False,
                "range": [0.5, 1.5],
            },
            "angularaxis": {
                "tickmode": "array",
                "tickvals": [0, 45, 90, 135, 180, 225, 270, 315],
                "ticktext": [0, 45, 90, 135, 180, -135, -90, -45],
            },
        },
        "showlegend": False,
    },
)
fig.show()

The distance in periodic spaces can be corrected using this formula:

\begin{equation}
d_{360}(p, q) = min\left( d(p, q), 360 - d(p, q) \right)
\end{equation}

Furthermore, during training the the angle values $\theta$ are converted into value pairs $\left( sin(\theta), cos(\theta) \right)$ to represent this.

### Parameter selection

Similarly to the previous example, we need to set some parameters. In contrast to the Cube example we now have periodic input data. The dihedral angles are in radians with a 2pi periodicity. We also set some further parameters but don't bother for now. 

In [8]:
parameters = em.Parameters()
parameters.main_path = em.misc.run_path("runs/asp7")
parameters.n_steps = 100
parameters.dist_sig_parameters = (4.5, 12, 6, 1, 2, 6)
parameters.periodicity = 2*pi
parameters.l2_reg_constant = 10.0
parameters.summary_step = 1
parameters.tensorboard = True

em.plot.distance_histogram_interactive(
    dihedrals[::10], 
    parameters.periodicity, 
    initial_guess=parameters.dist_sig_parameters,
    bins=50,
)

VBox(children=(HBox(children=(FloatSlider(value=1.0, description='lowd sigma', max=5.0, min=0.1), FloatSlider(…

Next, we can run the dimensionality reduction:

In [9]:
e_map = em.EncoderMap(parameters, dihedrals)

Output files are saved to runs/asp7/run0 as defined in 'main_path' in the parameters.


Saved a text-summary of the model and an image in runs/asp7/run0, as specified in 'main_path' in the parameters.


The new tensorflow 2 version of EncoderMap allows you to also view the output of the latent space during the training. Switch that feature on with `e_map.add_images_to_tensorboard()`.

In [10]:
e_map.add_images_to_tensorboard()

Logging images with (10000, 12)-shaped data every 1 epochs to Tensorboard at runs/asp7/run0


In [11]:
history = e_map.train()

  0%|                                                             | 0/100 [00:00<?, ?it/s]

  0%|                                        | 0/100 [00:00<?, ?it/s, Loss after step ?=?]

  0%|                                   | 0/100 [00:02<?, ?it/s, Loss after step 1=5.6e+3]

  1%|▎                          | 1/100 [00:02<04:56,  3.00s/it, Loss after step 1=5.6e+3]

  1%|▎                         | 1/100 [00:03<04:56,  3.00s/it, Loss after step 2=5.52e+3]

  2%|▌                         | 2/100 [00:03<02:11,  1.34s/it, Loss after step 2=5.52e+3]

  2%|▌                         | 2/100 [00:03<02:11,  1.34s/it, Loss after step 3=5.45e+3]

  3%|▊                         | 3/100 [00:03<01:18,  1.24it/s, Loss after step 3=5.45e+3]

  3%|▊                         | 3/100 [00:03<01:18,  1.24it/s, Loss after step 4=5.38e+3]

  4%|█                         | 4/100 [00:03<00:53,  1.80it/s, Loss after step 4=5.38e+3]

  4%|█                          | 4/100 [00:03<00:53,  1.80it/s, Loss after step 5=5.3e+3]

  5%|█▎                         | 5/100 [00:03<00:39,  2.39it/s, Loss after step 5=5.3e+3]

  5%|█▎                        | 5/100 [00:03<00:39,  2.39it/s, Loss after step 6=5.24e+3]

  6%|█▌                        | 6/100 [00:03<00:31,  2.99it/s, Loss after step 6=5.24e+3]

  6%|█▌                        | 6/100 [00:04<00:31,  2.99it/s, Loss after step 7=5.18e+3]

  7%|█▊                        | 7/100 [00:04<00:25,  3.63it/s, Loss after step 7=5.18e+3]

  7%|█▊                        | 7/100 [00:04<00:25,  3.63it/s, Loss after step 8=5.13e+3]

  8%|██                        | 8/100 [00:04<00:21,  4.20it/s, Loss after step 8=5.13e+3]

  8%|██                        | 8/100 [00:04<00:21,  4.20it/s, Loss after step 9=5.06e+3]

  9%|██▎                       | 9/100 [00:04<00:19,  4.67it/s, Loss after step 9=5.06e+3]

  9%|██▌                         | 9/100 [00:04<00:19,  4.67it/s, Loss after step 10=5e+3]

 10%|██▋                        | 10/100 [00:04<00:17,  5.11it/s, Loss after step 10=5e+3]

 10%|██▍                     | 10/100 [00:04<00:17,  5.11it/s, Loss after step 11=4.95e+3]

 11%|██▋                     | 11/100 [00:04<00:16,  5.43it/s, Loss after step 11=4.95e+3]

 11%|██▋                     | 11/100 [00:04<00:16,  5.43it/s, Loss after step 12=4.89e+3]

 12%|██▉                     | 12/100 [00:04<00:20,  4.34it/s, Loss after step 12=4.89e+3]

 12%|██▉                     | 12/100 [00:05<00:20,  4.34it/s, Loss after step 13=4.83e+3]

 13%|███                     | 13/100 [00:05<00:18,  4.74it/s, Loss after step 13=4.83e+3]

 13%|███                     | 13/100 [00:05<00:18,  4.74it/s, Loss after step 14=4.78e+3]

 14%|███▎                    | 14/100 [00:05<00:17,  4.99it/s, Loss after step 14=4.78e+3]

 14%|███▎                    | 14/100 [00:05<00:17,  4.99it/s, Loss after step 15=4.72e+3]

 15%|███▌                    | 15/100 [00:05<00:15,  5.32it/s, Loss after step 15=4.72e+3]

 15%|███▌                    | 15/100 [00:05<00:15,  5.32it/s, Loss after step 16=4.66e+3]

 16%|███▊                    | 16/100 [00:05<00:15,  5.58it/s, Loss after step 16=4.66e+3]

 16%|███▊                    | 16/100 [00:05<00:15,  5.58it/s, Loss after step 17=4.61e+3]

 17%|████                    | 17/100 [00:05<00:14,  5.86it/s, Loss after step 17=4.61e+3]

 17%|████                    | 17/100 [00:05<00:14,  5.86it/s, Loss after step 18=4.56e+3]

 18%|████▎                   | 18/100 [00:05<00:13,  6.03it/s, Loss after step 18=4.56e+3]

 18%|████▌                    | 18/100 [00:06<00:13,  6.03it/s, Loss after step 19=4.5e+3]

 19%|████▊                    | 19/100 [00:06<00:13,  6.12it/s, Loss after step 19=4.5e+3]

 19%|████▌                   | 19/100 [00:06<00:13,  6.12it/s, Loss after step 20=4.45e+3]

 20%|████▊                   | 20/100 [00:06<00:12,  6.24it/s, Loss after step 20=4.45e+3]

 20%|█████                    | 20/100 [00:06<00:12,  6.24it/s, Loss after step 21=4.4e+3]

 21%|█████▎                   | 21/100 [00:06<00:12,  6.30it/s, Loss after step 21=4.4e+3]

 21%|█████                   | 21/100 [00:06<00:12,  6.30it/s, Loss after step 22=4.35e+3]

 22%|█████▎                  | 22/100 [00:06<00:12,  6.36it/s, Loss after step 22=4.35e+3]

 22%|█████▌                   | 22/100 [00:06<00:12,  6.36it/s, Loss after step 23=4.3e+3]

 23%|█████▊                   | 23/100 [00:06<00:12,  6.37it/s, Loss after step 23=4.3e+3]

 23%|█████▌                  | 23/100 [00:06<00:12,  6.37it/s, Loss after step 24=4.25e+3]

 24%|█████▊                  | 24/100 [00:06<00:11,  6.38it/s, Loss after step 24=4.25e+3]

 24%|█████▊                  | 24/100 [00:07<00:11,  6.38it/s, Loss after step 25=4.19e+3]

 25%|██████                  | 25/100 [00:07<00:11,  6.42it/s, Loss after step 25=4.19e+3]

 25%|██████                  | 25/100 [00:07<00:11,  6.42it/s, Loss after step 26=4.15e+3]

 26%|██████▏                 | 26/100 [00:07<00:11,  6.38it/s, Loss after step 26=4.15e+3]

 26%|██████▌                  | 26/100 [00:07<00:11,  6.38it/s, Loss after step 27=4.1e+3]

 27%|██████▊                  | 27/100 [00:07<00:11,  6.36it/s, Loss after step 27=4.1e+3]

 27%|██████▍                 | 27/100 [00:07<00:11,  6.36it/s, Loss after step 28=4.05e+3]

 28%|██████▋                 | 28/100 [00:07<00:11,  6.42it/s, Loss after step 28=4.05e+3]

 28%|███████▌                   | 28/100 [00:07<00:11,  6.42it/s, Loss after step 29=4e+3]

 29%|███████▊                   | 29/100 [00:07<00:10,  6.46it/s, Loss after step 29=4e+3]

 29%|██████▉                 | 29/100 [00:07<00:10,  6.46it/s, Loss after step 30=3.96e+3]

 30%|███████▏                | 30/100 [00:07<00:10,  6.48it/s, Loss after step 30=3.96e+3]

 30%|███████▏                | 30/100 [00:07<00:10,  6.48it/s, Loss after step 31=3.91e+3]

 31%|███████▍                | 31/100 [00:07<00:10,  6.48it/s, Loss after step 31=3.91e+3]

 31%|███████▍                | 31/100 [00:08<00:10,  6.48it/s, Loss after step 32=3.87e+3]

 32%|███████▋                | 32/100 [00:08<00:10,  6.48it/s, Loss after step 32=3.87e+3]

 32%|███████▋                | 32/100 [00:08<00:10,  6.48it/s, Loss after step 33=3.82e+3]

 33%|███████▉                | 33/100 [00:08<00:10,  6.56it/s, Loss after step 33=3.82e+3]

 33%|███████▉                | 33/100 [00:08<00:10,  6.56it/s, Loss after step 34=3.78e+3]

 34%|████████▏               | 34/100 [00:08<00:10,  6.60it/s, Loss after step 34=3.78e+3]

 34%|████████▏               | 34/100 [00:08<00:10,  6.60it/s, Loss after step 35=3.74e+3]

 35%|████████▍               | 35/100 [00:08<00:10,  6.45it/s, Loss after step 35=3.74e+3]

 35%|████████▍               | 35/100 [00:08<00:10,  6.45it/s, Loss after step 36=3.69e+3]

 36%|████████▋               | 36/100 [00:08<00:10,  6.14it/s, Loss after step 36=3.69e+3]

 36%|████████▋               | 36/100 [00:08<00:10,  6.14it/s, Loss after step 37=3.65e+3]

 37%|████████▉               | 37/100 [00:08<00:10,  6.18it/s, Loss after step 37=3.65e+3]

 37%|████████▉               | 37/100 [00:09<00:10,  6.18it/s, Loss after step 38=3.61e+3]

 38%|█████████               | 38/100 [00:09<00:09,  6.28it/s, Loss after step 38=3.61e+3]

 38%|█████████               | 38/100 [00:09<00:09,  6.28it/s, Loss after step 39=3.57e+3]

 39%|█████████▎              | 39/100 [00:09<00:09,  6.39it/s, Loss after step 39=3.57e+3]

 39%|█████████▎              | 39/100 [00:09<00:09,  6.39it/s, Loss after step 40=3.52e+3]

 40%|█████████▌              | 40/100 [00:09<00:09,  6.43it/s, Loss after step 40=3.52e+3]

 40%|█████████▌              | 40/100 [00:09<00:09,  6.43it/s, Loss after step 41=3.49e+3]

 41%|█████████▊              | 41/100 [00:09<00:09,  6.50it/s, Loss after step 41=3.49e+3]

 41%|█████████▊              | 41/100 [00:09<00:09,  6.50it/s, Loss after step 42=3.45e+3]

 42%|██████████              | 42/100 [00:09<00:08,  6.57it/s, Loss after step 42=3.45e+3]

 42%|██████████              | 42/100 [00:09<00:08,  6.57it/s, Loss after step 43=3.41e+3]

 43%|██████████▎             | 43/100 [00:09<00:08,  6.62it/s, Loss after step 43=3.41e+3]

 43%|██████████▎             | 43/100 [00:09<00:08,  6.62it/s, Loss after step 44=3.37e+3]

 44%|██████████▌             | 44/100 [00:09<00:08,  6.71it/s, Loss after step 44=3.37e+3]

 44%|██████████▌             | 44/100 [00:10<00:08,  6.71it/s, Loss after step 45=3.33e+3]

 45%|██████████▊             | 45/100 [00:10<00:08,  6.76it/s, Loss after step 45=3.33e+3]

 45%|███████████▎             | 45/100 [00:10<00:08,  6.76it/s, Loss after step 46=3.3e+3]

 46%|███████████▌             | 46/100 [00:10<00:08,  6.69it/s, Loss after step 46=3.3e+3]

 46%|███████████             | 46/100 [00:10<00:08,  6.69it/s, Loss after step 47=3.26e+3]

 47%|███████████▎            | 47/100 [00:10<00:07,  6.69it/s, Loss after step 47=3.26e+3]

 47%|███████████▎            | 47/100 [00:10<00:07,  6.69it/s, Loss after step 48=3.23e+3]

 48%|███████████▌            | 48/100 [00:10<00:07,  6.76it/s, Loss after step 48=3.23e+3]

 48%|███████████▌            | 48/100 [00:10<00:07,  6.76it/s, Loss after step 49=3.19e+3]

 49%|███████████▊            | 49/100 [00:10<00:07,  6.79it/s, Loss after step 49=3.19e+3]

 49%|███████████▊            | 49/100 [00:10<00:07,  6.79it/s, Loss after step 50=3.15e+3]

 50%|████████████            | 50/100 [00:10<00:07,  6.75it/s, Loss after step 50=3.15e+3]

 50%|████████████            | 50/100 [00:11<00:07,  6.75it/s, Loss after step 51=3.12e+3]

 51%|████████████▏           | 51/100 [00:11<00:07,  6.76it/s, Loss after step 51=3.12e+3]

 51%|████████████▏           | 51/100 [00:11<00:07,  6.76it/s, Loss after step 52=3.08e+3]

 52%|████████████▍           | 52/100 [00:11<00:07,  6.76it/s, Loss after step 52=3.08e+3]

 52%|████████████▍           | 52/100 [00:11<00:07,  6.76it/s, Loss after step 53=3.05e+3]

 53%|████████████▋           | 53/100 [00:11<00:06,  6.76it/s, Loss after step 53=3.05e+3]

 53%|████████████▋           | 53/100 [00:11<00:06,  6.76it/s, Loss after step 54=3.02e+3]

 54%|████████████▉           | 54/100 [00:11<00:06,  6.80it/s, Loss after step 54=3.02e+3]

 54%|████████████▉           | 54/100 [00:11<00:06,  6.80it/s, Loss after step 55=2.99e+3]

 55%|█████████████▏          | 55/100 [00:11<00:06,  6.84it/s, Loss after step 55=2.99e+3]

 55%|█████████████▏          | 55/100 [00:11<00:06,  6.84it/s, Loss after step 56=2.95e+3]

 56%|█████████████▍          | 56/100 [00:11<00:06,  6.72it/s, Loss after step 56=2.95e+3]

 56%|█████████████▍          | 56/100 [00:11<00:06,  6.72it/s, Loss after step 57=2.92e+3]

 57%|█████████████▋          | 57/100 [00:11<00:06,  6.75it/s, Loss after step 57=2.92e+3]

 57%|█████████████▋          | 57/100 [00:12<00:06,  6.75it/s, Loss after step 58=2.89e+3]

 58%|█████████████▉          | 58/100 [00:12<00:06,  6.74it/s, Loss after step 58=2.89e+3]

 58%|█████████████▉          | 58/100 [00:12<00:06,  6.74it/s, Loss after step 59=2.86e+3]

 59%|██████████████▏         | 59/100 [00:12<00:06,  6.74it/s, Loss after step 59=2.86e+3]

 59%|██████████████▏         | 59/100 [00:12<00:06,  6.74it/s, Loss after step 60=2.83e+3]

 60%|██████████████▍         | 60/100 [00:12<00:05,  6.81it/s, Loss after step 60=2.83e+3]

 60%|███████████████          | 60/100 [00:12<00:05,  6.81it/s, Loss after step 61=2.8e+3]

 61%|███████████████▎         | 61/100 [00:12<00:05,  6.84it/s, Loss after step 61=2.8e+3]

 61%|██████████████▋         | 61/100 [00:12<00:05,  6.84it/s, Loss after step 62=2.77e+3]

 62%|██████████████▉         | 62/100 [00:12<00:05,  6.83it/s, Loss after step 62=2.77e+3]

 62%|██████████████▉         | 62/100 [00:12<00:05,  6.83it/s, Loss after step 63=2.74e+3]

 63%|███████████████         | 63/100 [00:12<00:05,  6.86it/s, Loss after step 63=2.74e+3]

 63%|███████████████         | 63/100 [00:12<00:05,  6.86it/s, Loss after step 64=2.71e+3]

 64%|███████████████▎        | 64/100 [00:12<00:05,  6.89it/s, Loss after step 64=2.71e+3]

 64%|███████████████▎        | 64/100 [00:13<00:05,  6.89it/s, Loss after step 65=2.68e+3]

 65%|███████████████▌        | 65/100 [00:13<00:05,  6.83it/s, Loss after step 65=2.68e+3]

 65%|███████████████▌        | 65/100 [00:13<00:05,  6.83it/s, Loss after step 66=2.65e+3]

 66%|███████████████▊        | 66/100 [00:13<00:04,  6.84it/s, Loss after step 66=2.65e+3]

 66%|███████████████▊        | 66/100 [00:13<00:04,  6.84it/s, Loss after step 67=2.63e+3]

 67%|████████████████        | 67/100 [00:13<00:04,  6.68it/s, Loss after step 67=2.63e+3]

 67%|████████████████▊        | 67/100 [00:13<00:04,  6.68it/s, Loss after step 68=2.6e+3]

 68%|█████████████████        | 68/100 [00:13<00:04,  6.47it/s, Loss after step 68=2.6e+3]

 68%|████████████████▎       | 68/100 [00:13<00:04,  6.47it/s, Loss after step 69=2.57e+3]

 69%|████████████████▌       | 69/100 [00:13<00:06,  4.57it/s, Loss after step 69=2.57e+3]

 69%|████████████████▌       | 69/100 [00:14<00:06,  4.57it/s, Loss after step 70=2.55e+3]

 70%|████████████████▊       | 70/100 [00:14<00:06,  4.87it/s, Loss after step 70=2.55e+3]

 70%|████████████████▊       | 70/100 [00:14<00:06,  4.87it/s, Loss after step 71=2.52e+3]

 71%|█████████████████       | 71/100 [00:14<00:05,  5.18it/s, Loss after step 71=2.52e+3]

 71%|█████████████████▊       | 71/100 [00:14<00:05,  5.18it/s, Loss after step 72=2.5e+3]

 72%|██████████████████       | 72/100 [00:14<00:05,  5.41it/s, Loss after step 72=2.5e+3]

 72%|█████████████████▎      | 72/100 [00:14<00:05,  5.41it/s, Loss after step 73=2.47e+3]

 73%|█████████████████▌      | 73/100 [00:14<00:04,  5.54it/s, Loss after step 73=2.47e+3]

 73%|█████████████████▌      | 73/100 [00:14<00:04,  5.54it/s, Loss after step 74=2.45e+3]

 74%|█████████████████▊      | 74/100 [00:14<00:04,  5.75it/s, Loss after step 74=2.45e+3]

 74%|█████████████████▊      | 74/100 [00:14<00:04,  5.75it/s, Loss after step 75=2.42e+3]

 75%|██████████████████      | 75/100 [00:14<00:04,  5.89it/s, Loss after step 75=2.42e+3]

 75%|██████████████████▊      | 75/100 [00:15<00:04,  5.89it/s, Loss after step 76=2.4e+3]

 76%|███████████████████      | 76/100 [00:15<00:03,  6.08it/s, Loss after step 76=2.4e+3]

 76%|██████████████████▏     | 76/100 [00:15<00:03,  6.08it/s, Loss after step 77=2.38e+3]

 77%|██████████████████▍     | 77/100 [00:15<00:03,  6.21it/s, Loss after step 77=2.38e+3]

 77%|██████████████████▍     | 77/100 [00:15<00:03,  6.21it/s, Loss after step 78=2.36e+3]

 78%|██████████████████▋     | 78/100 [00:15<00:03,  6.19it/s, Loss after step 78=2.36e+3]

 78%|██████████████████▋     | 78/100 [00:15<00:03,  6.19it/s, Loss after step 79=2.33e+3]

 79%|██████████████████▉     | 79/100 [00:15<00:03,  5.94it/s, Loss after step 79=2.33e+3]

 79%|██████████████████▉     | 79/100 [00:15<00:03,  5.94it/s, Loss after step 80=2.31e+3]

 80%|███████████████████▏    | 80/100 [00:15<00:03,  5.93it/s, Loss after step 80=2.31e+3]

 80%|███████████████████▏    | 80/100 [00:15<00:03,  5.93it/s, Loss after step 81=2.29e+3]

 81%|███████████████████▍    | 81/100 [00:15<00:03,  5.84it/s, Loss after step 81=2.29e+3]

 81%|███████████████████▍    | 81/100 [00:16<00:03,  5.84it/s, Loss after step 82=2.26e+3]

 82%|███████████████████▋    | 82/100 [00:16<00:03,  5.85it/s, Loss after step 82=2.26e+3]

 82%|███████████████████▋    | 82/100 [00:16<00:03,  5.85it/s, Loss after step 83=2.25e+3]

 83%|███████████████████▉    | 83/100 [00:16<00:02,  5.97it/s, Loss after step 83=2.25e+3]

 83%|███████████████████▉    | 83/100 [00:16<00:02,  5.97it/s, Loss after step 84=2.22e+3]

 84%|████████████████████▏   | 84/100 [00:16<00:02,  5.94it/s, Loss after step 84=2.22e+3]

 84%|█████████████████████    | 84/100 [00:16<00:02,  5.94it/s, Loss after step 85=2.2e+3]

 85%|█████████████████████▎   | 85/100 [00:16<00:02,  5.79it/s, Loss after step 85=2.2e+3]

 85%|████████████████████▍   | 85/100 [00:16<00:02,  5.79it/s, Loss after step 86=2.18e+3]

 86%|████████████████████▋   | 86/100 [00:16<00:02,  5.67it/s, Loss after step 86=2.18e+3]

 86%|████████████████████▋   | 86/100 [00:16<00:02,  5.67it/s, Loss after step 87=2.16e+3]

 87%|████████████████████▉   | 87/100 [00:16<00:02,  5.71it/s, Loss after step 87=2.16e+3]

 87%|████████████████████▉   | 87/100 [00:17<00:02,  5.71it/s, Loss after step 88=2.14e+3]

 88%|█████████████████████   | 88/100 [00:17<00:02,  5.82it/s, Loss after step 88=2.14e+3]

 88%|█████████████████████   | 88/100 [00:17<00:02,  5.82it/s, Loss after step 89=2.12e+3]

 89%|█████████████████████▎  | 89/100 [00:17<00:01,  5.76it/s, Loss after step 89=2.12e+3]

 89%|██████████████████████▎  | 89/100 [00:17<00:01,  5.76it/s, Loss after step 90=2.1e+3]

 90%|██████████████████████▌  | 90/100 [00:17<00:01,  5.66it/s, Loss after step 90=2.1e+3]

 90%|█████████████████████▌  | 90/100 [00:17<00:01,  5.66it/s, Loss after step 91=2.08e+3]

 91%|█████████████████████▊  | 91/100 [00:17<00:02,  4.36it/s, Loss after step 91=2.08e+3]

 91%|█████████████████████▊  | 91/100 [00:18<00:02,  4.36it/s, Loss after step 92=2.07e+3]

 92%|██████████████████████  | 92/100 [00:18<00:01,  4.48it/s, Loss after step 92=2.07e+3]

 92%|██████████████████████  | 92/100 [00:18<00:01,  4.48it/s, Loss after step 93=2.05e+3]

 93%|██████████████████████▎ | 93/100 [00:18<00:01,  4.76it/s, Loss after step 93=2.05e+3]

 93%|██████████████████████▎ | 93/100 [00:18<00:01,  4.76it/s, Loss after step 94=2.03e+3]

 94%|██████████████████████▌ | 94/100 [00:18<00:01,  5.04it/s, Loss after step 94=2.03e+3]

 94%|██████████████████████▌ | 94/100 [00:18<00:01,  5.04it/s, Loss after step 95=2.01e+3]

 95%|██████████████████████▊ | 95/100 [00:18<00:00,  5.31it/s, Loss after step 95=2.01e+3]

 95%|██████████████████████▊ | 95/100 [00:18<00:00,  5.31it/s, Loss after step 96=1.99e+3]

 96%|███████████████████████ | 96/100 [00:18<00:00,  5.49it/s, Loss after step 96=1.99e+3]

 96%|███████████████████████ | 96/100 [00:18<00:00,  5.49it/s, Loss after step 97=1.97e+3]

 97%|███████████████████████▎| 97/100 [00:18<00:00,  5.61it/s, Loss after step 97=1.97e+3]

 97%|███████████████████████▎| 97/100 [00:19<00:00,  5.61it/s, Loss after step 98=1.96e+3]

 98%|███████████████████████▌| 98/100 [00:19<00:00,  5.56it/s, Loss after step 98=1.96e+3]

 98%|███████████████████████▌| 98/100 [00:19<00:00,  5.56it/s, Loss after step 99=1.94e+3]

 99%|███████████████████████▊| 99/100 [00:19<00:00,  5.54it/s, Loss after step 99=1.94e+3]

 99%|██████████████████████▊| 99/100 [00:19<00:00,  5.54it/s, Loss after step 100=1.92e+3]

100%|██████████████████████| 100/100 [00:19<00:00,  5.68it/s, Loss after step 100=1.92e+3]

100%|██████████████████████| 100/100 [00:19<00:00,  5.11it/s, Loss after step 100=1.92e+3]




Saving the model to runs/asp7/run0/saved_model_2024-12-29T13:06:42+01:00.keras. Use `em.EncoderMap.from_checkpoint('runs/asp7/run0')` to load the most recent model, or `em.EncoderMap.from_checkpoint('runs/asp7/run0/saved_model_2024-12-29T13:06:42+01:00.keras')` to load the model with specific weights..
This model has a subclassed encoder, which can be loaded independently. Use `tf.keras.load_model('runs/asp7/run0/saved_model_2024-12-29T13:06:42+01:00_encoder.keras')` to load only this model.
This model has a subclassed decoder, which can be loaded independently. Use `tf.keras.load_model('runs/asp7/run0/saved_model_2024-12-29T13:06:42+01:00_decoder.keras')` to load only this model.


project all dihedrals to the low-dimensional space...

In [12]:
low_d_projection = e_map.encode(dihedrals)

 and plot the result:

In [13]:
import pandas as pd

# define max clusters
max_clusters = 5

# remove unwanted clusters
colors = cluster_ids.copy()
colors[colors > max_clusters] = 0
colors = colors.astype(int).astype(str)

# plot
px.scatter(
    data_frame=pd.DataFrame(
        {
            "x": low_d_projection[:, 0],
            "y": low_d_projection[:, 1],
            "color": colors,
        }
    ),
    x="x",
    y="y",
    color="color",
    opacity=0.5,
    color_discrete_map={
        "0": "rgba(100, 100, 100, 0.2)",
    },
    labels={
        "x": "x in a.u.",
        "y": "y in a.u.",
        "color": "cluster",
    },
    width=500,
    height=500,
)

In the above map points from different clusters (different colors) should be well separated. However, if you didn't change the parameters, they are probably not. Some of our parameter settings appear to be unsuitable. Let's see how we can find out what goes wrong.

The `history` element returned by `e_map.train()` is an instance of `tf.keras.callbacks.History`, which contains the loss during the training steps:

In [14]:
loss = np.asarray(history.history["loss"])

px.line(
    x=np.arange(len(loss)),
    y=loss,
    labels={
        "x": "training step",
        "y": "loss",
    },
    width=500,
    height=500,
)

<a id='tensorboard'></a>

## Visualize Learning with TensorBoard

### Running tensorboard on Google colab

To use tensorboard in google colabs notebooks, you neet to first load the tensorboard extension

```python
%load_ext tensorboard
```

And then activate it with:

```python
%tensorboard --logdir .
```

The next code cell contains these commands. Uncomment them and then continue.

### Running tensorboard locally

TensorBoard is a visualization tool from the machine learning library TensorFlow which is used by the EncoderMap package. During the dimensionality reduction step, when the neural network autoencoder is trained, several readings are saved in a TensorBoard format. All output files are saved to the path defined in `parameters.main_path`. Navigate to this location in a shell and start TensorBoard. Change the paramter Tensorboard to `True` to make Encodermap log to Tensorboard.

In case you run this tutorial in the provided Docker container you can open a new console inside the container by typing the following command in a new system shell.
```shell
docker exec -it emap bash
```
Navigate to the location where all the runs are saved. e.g.:
```shell
cd notebooks_easy/runs/asp7/
```
Start TensorBoard in this directory with:
```shell
tensorboard --logdir .
```

You should now be able to open TensorBoard in your webbrowser on port 6006.  
`0.0.0.0:6006` or `127.0.0.1:6006`

In the SCALARS tab of TensorBoard you should see among other values the overall cost and different contributions to the cost. The two most important contributions are `auto_cost` and `distance_cost`. `auto_cost` indicates differences between the inputs and outputs of the autoencoder. `distance_cost` is the part of the cost function which compares pairwise distances in the input space and the low-dimensional (latent) space.

**Fixing Reloading issues**
Using Tensorboard we often encountered some issues while training multiple models and writing mutliple runs to Tensorboard's logdir. Reloading the data and event refreshing the web page did not display the data of the current run. We needed to kill tensorboard and restart it in order to see the new data. This issue was fixed by setting `reload_multifile` `True`.

```bash
tensorboard --logdir . --reload_multifile True
```


In your case, probably the overall cost as well as the auto_cost and the distance_cost are still decreasing after all training iterations. This tells us that we can simply improve the result by increasing the number of training steps. The following cell contains the same code as above. Set a larger number of straining steps to improve the result (e.g. 3000).

**When you're on Goole Colab, you can load the Tensorboard extension with:**

In [15]:
# %load_ext tensorboard
# %tensorboard --logdir .

In [16]:
# Define parameters
parameters = em.Parameters(
    main_path=em.misc.run_path("runs/asp7"),
    n_steps=100,
    dist_sig_parameters=(4.5, 12, 6, 1, 2, 6),
    periodicity=2*pi,
    l2_reg_constant=10,
    summary_step=1,
    tensorboard=True
)

# Instantiate the EncoderMap class
e_map = em.EncoderMap(parameters, dihedrals)


# this function returns rgba() values, that plotly.express.scatter understands
def colors_from_cluster_ids(cluster_ids, max_clusters=10):
    import plotly as plt
    colors = np.full(shape=(len(cluster_ids), ), fill_value="rgba(125, 125, 125, 0.1)")
    # colors = np.full(shape=(len(cluster_ids), 4), fill_value=(.5, .5, .5, .1))
    for i in range(2, max_clusters + 2):
        where = np.where(cluster_ids == i)
        color = plt.colors.DEFAULT_PLOTLY_COLORS[i - 2]
        color = color.replace(")", ", 0.3)").replace("rgb", "rgba")
        colors[where] = color
    return colors

# Logging images to Tensorboard can greatly reduce performance.
# So they need to be specifically turned on
# with the .add_images_to_tensorboard() method
e_map.add_images_to_tensorboard(
    data=dihedrals,
    image_step=2,
    plotly_scatter_kws={
        'size_max': 1,
        'color': colors_from_cluster_ids(cluster_ids, 5),
    },
    backend="plotly",
    save_to_disk=True,
)

history = e_map.train()

Output files are saved to runs/asp7/run1 as defined in 'main_path' in the parameters.


Saved a text-summary of the model and an image in runs/asp7/run1, as specified in 'main_path' in the parameters.
Logging images with (10001, 12)-shaped data every 2 epochs to Tensorboard at runs/asp7/run1


  0%|                                                             | 0/100 [00:00<?, ?it/s]

  0%|                                        | 0/100 [00:00<?, ?it/s, Loss after step ?=?]

  0%|                                  | 0/100 [00:03<?, ?it/s, Loss after step 1=5.61e+3]

  1%|▎                         | 1/100 [00:03<05:18,  3.22s/it, Loss after step 1=5.61e+3]

  1%|▎                         | 1/100 [00:04<05:18,  3.22s/it, Loss after step 2=5.52e+3]

  2%|▌                         | 2/100 [00:04<03:17,  2.02s/it, Loss after step 2=5.52e+3]

  2%|▌                         | 2/100 [00:04<03:17,  2.02s/it, Loss after step 3=5.44e+3]

  3%|▊                         | 3/100 [00:05<03:15,  2.02s/it, Loss after step 4=5.36e+3]

  4%|█                         | 4/100 [00:05<01:31,  1.04it/s, Loss after step 4=5.36e+3]

  4%|█                         | 4/100 [00:05<01:31,  1.04it/s, Loss after step 5=5.29e+3]

  5%|█▎                        | 5/100 [00:05<01:30,  1.04it/s, Loss after step 6=5.22e+3]

  6%|█▌                        | 6/100 [00:05<01:02,  1.50it/s, Loss after step 6=5.22e+3]

  6%|█▌                        | 6/100 [00:05<01:02,  1.50it/s, Loss after step 7=5.17e+3]

  7%|█▉                         | 7/100 [00:06<01:02,  1.50it/s, Loss after step 8=5.1e+3]

  8%|██▏                        | 8/100 [00:06<00:49,  1.85it/s, Loss after step 8=5.1e+3]

  8%|██                        | 8/100 [00:06<00:49,  1.85it/s, Loss after step 9=5.04e+3]

  9%|██▎                      | 9/100 [00:07<00:49,  1.85it/s, Loss after step 10=4.99e+3]

 10%|██▍                     | 10/100 [00:07<00:42,  2.12it/s, Loss after step 10=4.99e+3]

 10%|██▍                     | 10/100 [00:07<00:42,  2.12it/s, Loss after step 11=4.93e+3]

 11%|██▋                     | 11/100 [00:07<00:42,  2.12it/s, Loss after step 12=4.87e+3]

 12%|██▉                     | 12/100 [00:07<00:37,  2.32it/s, Loss after step 12=4.87e+3]

 12%|██▉                     | 12/100 [00:07<00:37,  2.32it/s, Loss after step 13=4.81e+3]

 13%|███                     | 13/100 [00:08<00:37,  2.32it/s, Loss after step 14=4.76e+3]

 14%|███▎                    | 14/100 [00:08<00:35,  2.45it/s, Loss after step 14=4.76e+3]

 14%|███▌                     | 14/100 [00:08<00:35,  2.45it/s, Loss after step 15=4.7e+3]

 15%|███▌                    | 15/100 [00:09<00:34,  2.45it/s, Loss after step 16=4.65e+3]

 16%|███▊                    | 16/100 [00:09<00:32,  2.58it/s, Loss after step 16=4.65e+3]

 16%|████                     | 16/100 [00:09<00:32,  2.58it/s, Loss after step 17=4.6e+3]

 17%|████                    | 17/100 [00:09<00:32,  2.58it/s, Loss after step 18=4.54e+3]

 18%|████▎                   | 18/100 [00:09<00:30,  2.69it/s, Loss after step 18=4.54e+3]

 18%|████▎                   | 18/100 [00:09<00:30,  2.69it/s, Loss after step 19=4.49e+3]

 19%|████▌                   | 19/100 [00:10<00:30,  2.69it/s, Loss after step 20=4.44e+3]

 20%|████▊                   | 20/100 [00:10<00:28,  2.77it/s, Loss after step 20=4.44e+3]

 20%|████▊                   | 20/100 [00:10<00:28,  2.77it/s, Loss after step 21=4.39e+3]

 21%|█████                   | 21/100 [00:11<00:28,  2.77it/s, Loss after step 22=4.33e+3]

 22%|█████▎                  | 22/100 [00:11<00:27,  2.81it/s, Loss after step 22=4.33e+3]

 22%|█████▎                  | 22/100 [00:11<00:27,  2.81it/s, Loss after step 23=4.28e+3]

 23%|█████▌                  | 23/100 [00:12<00:27,  2.81it/s, Loss after step 24=4.23e+3]

 24%|█████▊                  | 24/100 [00:12<00:27,  2.80it/s, Loss after step 24=4.23e+3]

 24%|█████▊                  | 24/100 [00:12<00:27,  2.80it/s, Loss after step 25=4.18e+3]

 25%|██████                  | 25/100 [00:12<00:26,  2.80it/s, Loss after step 26=4.14e+3]

 26%|██████▏                 | 26/100 [00:12<00:25,  2.86it/s, Loss after step 26=4.14e+3]

 26%|██████▏                 | 26/100 [00:12<00:25,  2.86it/s, Loss after step 27=4.09e+3]

 27%|██████▍                 | 27/100 [00:13<00:25,  2.86it/s, Loss after step 28=4.04e+3]

 28%|██████▋                 | 28/100 [00:13<00:25,  2.87it/s, Loss after step 28=4.04e+3]

 28%|███████▌                   | 28/100 [00:13<00:25,  2.87it/s, Loss after step 29=4e+3]

 29%|██████▉                 | 29/100 [00:14<00:24,  2.87it/s, Loss after step 30=3.94e+3]

 30%|███████▏                | 30/100 [00:14<00:24,  2.87it/s, Loss after step 30=3.94e+3]

 30%|███████▌                 | 30/100 [00:14<00:24,  2.87it/s, Loss after step 31=3.9e+3]

 31%|███████▍                | 31/100 [00:14<00:24,  2.87it/s, Loss after step 32=3.85e+3]

 32%|███████▋                | 32/100 [00:14<00:23,  2.89it/s, Loss after step 32=3.85e+3]

 32%|███████▋                | 32/100 [00:14<00:23,  2.89it/s, Loss after step 33=3.81e+3]

 33%|███████▉                | 33/100 [00:15<00:23,  2.89it/s, Loss after step 34=3.76e+3]

 34%|████████▏               | 34/100 [00:15<00:22,  2.89it/s, Loss after step 34=3.76e+3]

 34%|████████▏               | 34/100 [00:15<00:22,  2.89it/s, Loss after step 35=3.72e+3]

 35%|████████▍               | 35/100 [00:16<00:22,  2.89it/s, Loss after step 36=3.68e+3]

 36%|████████▋               | 36/100 [00:16<00:22,  2.85it/s, Loss after step 36=3.68e+3]

 36%|████████▋               | 36/100 [00:16<00:22,  2.85it/s, Loss after step 37=3.63e+3]

 37%|████████▉               | 37/100 [00:16<00:22,  2.85it/s, Loss after step 38=3.59e+3]

 38%|█████████               | 38/100 [00:16<00:21,  2.84it/s, Loss after step 38=3.59e+3]

 38%|█████████               | 38/100 [00:16<00:21,  2.84it/s, Loss after step 39=3.55e+3]

 39%|█████████▎              | 39/100 [00:17<00:21,  2.84it/s, Loss after step 40=3.51e+3]

 40%|█████████▌              | 40/100 [00:17<00:20,  2.88it/s, Loss after step 40=3.51e+3]

 40%|█████████▌              | 40/100 [00:17<00:20,  2.88it/s, Loss after step 41=3.47e+3]

 41%|█████████▊              | 41/100 [00:18<00:20,  2.88it/s, Loss after step 42=3.43e+3]

 42%|██████████              | 42/100 [00:18<00:19,  2.90it/s, Loss after step 42=3.43e+3]

 42%|██████████              | 42/100 [00:18<00:19,  2.90it/s, Loss after step 43=3.39e+3]

 43%|██████████▎             | 43/100 [00:18<00:19,  2.90it/s, Loss after step 44=3.35e+3]

 44%|██████████▌             | 44/100 [00:18<00:19,  2.90it/s, Loss after step 44=3.35e+3]

 44%|██████████▌             | 44/100 [00:18<00:19,  2.90it/s, Loss after step 45=3.32e+3]

 45%|██████████▊             | 45/100 [00:19<00:18,  2.90it/s, Loss after step 46=3.28e+3]

 46%|███████████             | 46/100 [00:19<00:18,  2.90it/s, Loss after step 46=3.28e+3]

 46%|███████████             | 46/100 [00:19<00:18,  2.90it/s, Loss after step 47=3.24e+3]

 47%|███████████▎            | 47/100 [00:20<00:18,  2.90it/s, Loss after step 48=3.21e+3]

 48%|███████████▌            | 48/100 [00:20<00:18,  2.88it/s, Loss after step 48=3.21e+3]

 48%|███████████▌            | 48/100 [00:20<00:18,  2.88it/s, Loss after step 49=3.17e+3]

 49%|███████████▊            | 49/100 [00:21<00:17,  2.88it/s, Loss after step 50=3.13e+3]

 50%|████████████            | 50/100 [00:21<00:17,  2.89it/s, Loss after step 50=3.13e+3]

 50%|████████████▌            | 50/100 [00:21<00:17,  2.89it/s, Loss after step 51=3.1e+3]

 51%|████████████▏           | 51/100 [00:21<00:16,  2.89it/s, Loss after step 52=3.07e+3]

 52%|████████████▍           | 52/100 [00:21<00:17,  2.80it/s, Loss after step 52=3.07e+3]

 52%|████████████▍           | 52/100 [00:21<00:17,  2.80it/s, Loss after step 53=3.03e+3]

 53%|██████████████▎            | 53/100 [00:22<00:16,  2.80it/s, Loss after step 54=3e+3]

 54%|██████████████▌            | 54/100 [00:22<00:15,  2.89it/s, Loss after step 54=3e+3]

 54%|████████████▉           | 54/100 [00:22<00:15,  2.89it/s, Loss after step 55=2.97e+3]

 55%|█████████████▏          | 55/100 [00:23<00:15,  2.89it/s, Loss after step 56=2.94e+3]

 56%|█████████████▍          | 56/100 [00:23<00:15,  2.91it/s, Loss after step 56=2.94e+3]

 56%|██████████████           | 56/100 [00:23<00:15,  2.91it/s, Loss after step 57=2.9e+3]

 57%|█████████████▋          | 57/100 [00:23<00:14,  2.91it/s, Loss after step 58=2.87e+3]

 58%|█████████████▉          | 58/100 [00:23<00:14,  2.88it/s, Loss after step 58=2.87e+3]

 58%|█████████████▉          | 58/100 [00:23<00:14,  2.88it/s, Loss after step 59=2.84e+3]

 59%|██████████████▏         | 59/100 [00:24<00:14,  2.88it/s, Loss after step 60=2.81e+3]

 60%|██████████████▍         | 60/100 [00:24<00:13,  2.90it/s, Loss after step 60=2.81e+3]

 60%|██████████████▍         | 60/100 [00:24<00:13,  2.90it/s, Loss after step 61=2.78e+3]

 61%|██████████████▋         | 61/100 [00:25<00:13,  2.90it/s, Loss after step 62=2.75e+3]

 62%|██████████████▉         | 62/100 [00:25<00:13,  2.92it/s, Loss after step 62=2.75e+3]

 62%|██████████████▉         | 62/100 [00:25<00:13,  2.92it/s, Loss after step 63=2.72e+3]

 63%|███████████████▊         | 63/100 [00:25<00:12,  2.92it/s, Loss after step 64=2.7e+3]

 64%|████████████████         | 64/100 [00:25<00:12,  2.86it/s, Loss after step 64=2.7e+3]

 64%|███████████████▎        | 64/100 [00:25<00:12,  2.86it/s, Loss after step 65=2.67e+3]

 65%|███████████████▌        | 65/100 [00:26<00:12,  2.86it/s, Loss after step 66=2.64e+3]

 66%|███████████████▊        | 66/100 [00:26<00:11,  2.90it/s, Loss after step 66=2.64e+3]

 66%|███████████████▊        | 66/100 [00:26<00:11,  2.90it/s, Loss after step 67=2.61e+3]

 67%|████████████████        | 67/100 [00:27<00:11,  2.90it/s, Loss after step 68=2.58e+3]

 68%|████████████████▎       | 68/100 [00:27<00:11,  2.78it/s, Loss after step 68=2.58e+3]

 68%|████████████████▎       | 68/100 [00:27<00:11,  2.78it/s, Loss after step 69=2.56e+3]

 69%|████████████████▌       | 69/100 [00:28<00:11,  2.78it/s, Loss after step 70=2.53e+3]

 70%|████████████████▊       | 70/100 [00:28<00:11,  2.68it/s, Loss after step 70=2.53e+3]

 70%|█████████████████▌       | 70/100 [00:28<00:11,  2.68it/s, Loss after step 71=2.5e+3]

 71%|█████████████████       | 71/100 [00:29<00:10,  2.68it/s, Loss after step 72=2.48e+3]

 72%|█████████████████▎      | 72/100 [00:29<00:11,  2.36it/s, Loss after step 72=2.48e+3]

 72%|█████████████████▎      | 72/100 [00:29<00:11,  2.36it/s, Loss after step 73=2.45e+3]

 73%|█████████████████▌      | 73/100 [00:29<00:11,  2.36it/s, Loss after step 74=2.43e+3]

 74%|█████████████████▊      | 74/100 [00:29<00:10,  2.50it/s, Loss after step 74=2.43e+3]

 74%|█████████████████▊      | 74/100 [00:29<00:10,  2.50it/s, Loss after step 75=2.41e+3]

 75%|██████████████████      | 75/100 [00:30<00:09,  2.50it/s, Loss after step 76=2.38e+3]

 76%|██████████████████▏     | 76/100 [00:30<00:09,  2.60it/s, Loss after step 76=2.38e+3]

 76%|██████████████████▏     | 76/100 [00:30<00:09,  2.60it/s, Loss after step 77=2.36e+3]

 77%|██████████████████▍     | 77/100 [00:31<00:08,  2.60it/s, Loss after step 78=2.33e+3]

 78%|██████████████████▋     | 78/100 [00:31<00:08,  2.70it/s, Loss after step 78=2.33e+3]

 78%|██████████████████▋     | 78/100 [00:31<00:08,  2.70it/s, Loss after step 79=2.31e+3]

 79%|██████████████████▉     | 79/100 [00:32<00:07,  2.70it/s, Loss after step 80=2.29e+3]

 80%|███████████████████▏    | 80/100 [00:32<00:07,  2.69it/s, Loss after step 80=2.29e+3]

 80%|███████████████████▏    | 80/100 [00:32<00:07,  2.69it/s, Loss after step 81=2.26e+3]

 81%|███████████████████▍    | 81/100 [00:32<00:07,  2.69it/s, Loss after step 82=2.25e+3]

 82%|███████████████████▋    | 82/100 [00:32<00:06,  2.76it/s, Loss after step 82=2.25e+3]

 82%|███████████████████▋    | 82/100 [00:32<00:06,  2.76it/s, Loss after step 83=2.23e+3]

 83%|████████████████████▊    | 83/100 [00:33<00:06,  2.76it/s, Loss after step 84=2.2e+3]

 84%|█████████████████████    | 84/100 [00:33<00:05,  2.70it/s, Loss after step 84=2.2e+3]

 84%|████████████████████▏   | 84/100 [00:33<00:05,  2.70it/s, Loss after step 85=2.18e+3]

 85%|████████████████████▍   | 85/100 [00:34<00:05,  2.70it/s, Loss after step 86=2.16e+3]

 86%|████████████████████▋   | 86/100 [00:34<00:05,  2.60it/s, Loss after step 86=2.16e+3]

 86%|████████████████████▋   | 86/100 [00:34<00:05,  2.60it/s, Loss after step 87=2.14e+3]

 87%|████████████████████▉   | 87/100 [00:35<00:04,  2.60it/s, Loss after step 88=2.13e+3]

 88%|█████████████████████   | 88/100 [00:35<00:04,  2.57it/s, Loss after step 88=2.13e+3]

 88%|██████████████████████   | 88/100 [00:35<00:04,  2.57it/s, Loss after step 89=2.1e+3]

 89%|█████████████████████▎  | 89/100 [00:36<00:04,  2.57it/s, Loss after step 90=2.09e+3]

 90%|█████████████████████▌  | 90/100 [00:36<00:04,  2.48it/s, Loss after step 90=2.09e+3]

 90%|█████████████████████▌  | 90/100 [00:36<00:04,  2.48it/s, Loss after step 91=2.06e+3]

 91%|█████████████████████▊  | 91/100 [00:36<00:03,  2.48it/s, Loss after step 92=2.05e+3]

 92%|██████████████████████  | 92/100 [00:36<00:03,  2.44it/s, Loss after step 92=2.05e+3]

 92%|██████████████████████  | 92/100 [00:36<00:03,  2.44it/s, Loss after step 93=2.03e+3]

 93%|██████████████████████▎ | 93/100 [00:37<00:02,  2.44it/s, Loss after step 94=2.01e+3]

 94%|██████████████████████▌ | 94/100 [00:37<00:02,  2.22it/s, Loss after step 94=2.01e+3]

 94%|██████████████████████▌ | 94/100 [00:37<00:02,  2.22it/s, Loss after step 95=1.99e+3]

 95%|██████████████████████▊ | 95/100 [00:38<00:02,  2.22it/s, Loss after step 96=1.97e+3]

 96%|███████████████████████ | 96/100 [00:38<00:01,  2.30it/s, Loss after step 96=1.97e+3]

 96%|███████████████████████ | 96/100 [00:38<00:01,  2.30it/s, Loss after step 97=1.96e+3]

 97%|███████████████████████▎| 97/100 [00:39<00:01,  2.30it/s, Loss after step 98=1.94e+3]

 98%|███████████████████████▌| 98/100 [00:39<00:00,  2.25it/s, Loss after step 98=1.94e+3]

 98%|███████████████████████▌| 98/100 [00:39<00:00,  2.25it/s, Loss after step 99=1.93e+3]

 99%|██████████████████████▊| 99/100 [00:40<00:00,  2.25it/s, Loss after step 100=1.91e+3]

100%|██████████████████████| 100/100 [00:40<00:00,  2.24it/s, Loss after step 100=1.91e+3]

100%|██████████████████████| 100/100 [00:40<00:00,  2.46it/s, Loss after step 100=1.91e+3]

Saving the model to runs/asp7/run1/saved_model_2024-12-29T13:07:23+01:00.keras. Use `em.EncoderMap.from_checkpoint('runs/asp7/run1')` to load the most recent model, or `em.EncoderMap.from_checkpoint('runs/asp7/run1/saved_model_2024-12-29T13:07:23+01:00.keras')` to load the model with specific weights..
This model has a subclassed encoder, which can be loaded independently. Use `tf.keras.load_model('runs/asp7/run1/saved_model_2024-12-29T13:07:23+01:00_encoder.keras')` to load only this model.
This model has a subclassed decoder, which can be loaded independently. Use `tf.keras.load_model('runs/asp7/run1/saved_model_2024-12-29T13:07:23+01:00_decoder.keras')` to load only this model.





The molecule conformations form different clusters (different colors) should be separated a bit better now. In TensorBoard you should see the cost curves for this new run. When the cost curve becomes more or less flat towards the end, longer training does not make sense.

The resulting low-dimensional projection is probably still not very detailed and clusters are probably not well separated. Currently we use a regularization constant `parameters.l2_reg_constant = 10.0`. The regularization constant influences the 
complexity of the network and the map. A high regularization constant will result in a smooth map with little details. A small regularization constant will result in a rougher more detailed map.

Go back to the previous cell and decrease the regularization constant (e.g. `parameters.l2_reg_constant = 0.001`). Play with different settings to improve the separation of the clusters in the map. Have a look at TensorBoard to see how the cost changes for different parameters.

In [17]:
lowd = e_map.encode(dihedrals)

fig = px.scatter(
    x=lowd[:, 0],
    y=lowd[:, 1],
    color=colors_from_cluster_ids(cluster_ids, 5),
    height=500,
    width=500,
    size_max=0.1,
    opacity=0.4,
    labels={
        "x": "x in a.u.",
        "y": "y in a.u.",
    },
)
fig.update_layout(showlegend=False)
fig.show()

**Here is what you can see in Tensorboard:**

<img src="Tensorboard_Cost.png" width="800">

<img src="Tensorboard_Histograms.png" width="800">

<img src="Tensorboard_Parameters.png" width="800">

<img src="Tensorboard_Images.png" width="800">

### Save and Load
Once you are satisfied with your EncoderMap, you might want to save the result. The good news is: Encoder map automatically saves checkpoints during the training process in `parameters.main_path`. The frequency of writing checkpoints can be defined with `patameters.checkpoint_step`. Also, your selected parameters are saved in a file called `parameters.json`. Navigate to the driectory of your last run and open this `parameters.json` file in some text editor. You should find all the parameters that we have set so far. You also find some parameters which were not set by us specifically and where EncoderMap used its default values.

Let's start by looking at the parameters from the last run and printing them in a nicely formatted table with the `.parameters` attribute.

In [18]:
loaded_parameters = em.Parameters.from_file('runs/asp7/run0/parameters.json')
print(loaded_parameters.parameters)

Seems like the parameter file was moved to another directory. Parameter file is updated ...
    Parameter                 | Value                    | Description                                         
    --------------------------+--------------------------+---------------------------------------------------  
    n_neurons                 | [128, 128, 2]            | List containing number of neurons for each layer    
                              |                          | up to the bottleneck layer. For example [128, 128,  
                              |                          | 2] stands for an autoencoder with the following     
                              |                          | architecture {i, 128, 128, 2, 128, 128, i} where i  
                              |                          | is the number of dimensions of the input data.      
                              |                          | These are Input/Output Layers that are not          
            

Before we can reload our trained network we need to save it manually, because the checkpoint step was set to 5000 and we did only write a checkpoint at 0 (random initial weights). We call `e_map.save()` to do so.

In [19]:
e_map.save()

Saving the model to runs/asp7/run1/saved_model_2024-12-29T13:07:23+01:00.keras. Use `em.EncoderMap.from_checkpoint('runs/asp7/run1')` to load the most recent model, or `em.EncoderMap.from_checkpoint('runs/asp7/run1/saved_model_2024-12-29T13:07:23+01:00.keras')` to load the model with specific weights..
This model has a subclassed encoder, which can be loaded independently. Use `tf.keras.load_model('runs/asp7/run1/saved_model_2024-12-29T13:07:23+01:00_encoder.keras')` to load only this model.
This model has a subclassed decoder, which can be loaded independently. Use `tf.keras.load_model('runs/asp7/run1/saved_model_2024-12-29T13:07:23+01:00_decoder.keras')` to load only this model.


PosixPath('runs/asp7/run1')

And now we reload it.

In [20]:
# get the most recent run directory
from pathlib import Path
import re

latest_run_dir = Path("runs/asp7").glob("run*")
latest_run_dir = sorted(latest_run_dir, key=lambda x: int(re.findall(r"\d+", str(x))[0]))[0]
loaded_e_map = em.EncoderMap.from_checkpoint(latest_run_dir)

Seems like the parameter file was moved to another directory. Parameter file is updated ...
Output files are saved to runs/asp7/run0 as defined in 'main_path' in the parameters.


Saved a text-summary of the model and an image in runs/asp7/run0, as specified in 'main_path' in the parameters.


Now we are finished with loading and we can for example use the loaded EncoderMap object to project data to the low_dimensional space and plot the result:

In [21]:
import pandas as pd

# define max clusters
max_clusters = 5

# remove unwanted clusters
colors = cluster_ids.copy()
colors[colors > max_clusters] = 0
colors = colors.astype(int).astype(str)

# plot
px.scatter(
    data_frame=pd.DataFrame(
        {
            "x": lowd[:, 0],
            "y": lowd[:, 1],
            "color": colors,
        }
    ),
    x="x",
    y="y",
    color="color",
    opacity=0.5,
    color_discrete_map={
        "0": "rgba(100, 100, 100, 0.2)",
    },
    labels={
        "x": "x in a.u.",
        "y": "y in a.u.",
        "color": "cluster",
    },
    width=500,
    height=500,
)

<a id='interactive_plotting'></a>

## Generate Molecular Conformations
Already in the cube example, you have seen that with EncoderMap it is not only possible to project points to the low-dimensional space. Also, a projection of low-dimensional points into the high-dimensional space is possible. 

Here, we will use a tool form the EncoderMap library to interactively select a path in the low-dimensional map called. We will project points along this path into the high-dimensional dihedral space, and use these dihedrals to reconstruct molecular conformations. This can be very useful to explore the landscape an to see what changes occur in the molecular conformation going from one cluster to another.

The next cell instantiates the `InteractivePlotting` class of EncoderMap. Inside the main plotting area, you can click on points and their corresponding molecular conformation is displayed in the right window. The `Trace` plot contains the high-dimensional data (in this case the dihedrals) that this point was projected from. Picking up the `Lasso` tool from the toolbar, you can draw a lasso selection around some points. Pressing `Cluster` afterwards will display 10 structures from all of the structures you selected. You can adjust this number with the `Size` slider.

More interesting is the `Path` tool which can be used, when the density is displayed. With this tool you can generate molecular conformations from a path in the latent space. You don't need to pick up a tool from the toolbar to draw a path. Just switch to density with the `Density` button. After you have drawn your path, click `Generate` to generate the molecular conformations from the low-dimensional points that you just drew.

In either case, hitting `Save` will sasve your cluster or path into the training directory of the EncoderMap class (where alsi Tensorboard stuff is put).

Give the `InteractivePlotting` a try. We would like to hear your feedback at GitHub.

In [22]:
sess = em.InteractivePlotting(
    e_map,
    trajs="asp7.xtc",
    lowd_data=lowd,
    highd_data=dihedrals,
    top='asp7.pdb',
    ball_and_stick=True,
)

GridspecLayout(children=(HTML(value='<h2>EncoderMap Dashboard for kevin in runs/asp7/run1</h2>', layout=Layout…

As backbone dihedrals contain no information about the side-chains, only the backbone of the molecule can be reconstructed. 
In case the generated conformations change very abruptly it might be sensible to increase the regularization constant to obtain a smoother representation. If the generated conformations along a path are not changing at all, the regularization is probably to strong and prevents the network form generating different conformations.

## Conclusion

In this tutorial we applied EncoderMap to a molecular system. You have learned how to monitor the EncoderMap training procedure with TensorBoard, how to restore previously saved EncoderMaps and how to generate Molecular conformations using the InteractivePlotting session.