# Asp7 Example - Advanced Usage

In this tutorial we will use example data from a molecular dynamics simulation and learn more about advanced usage of EncoderMap. Encoder map can create low-dimensional maps of the vast conformational spaces of molecules. This allows easy identification of the most common molecular conformations and helps to understand the relations between these conformations. In this example, we will use data from a simulation of a simple peptide: hepta-aspartic-acid.

First we need to import some libraries:

In [None]:
import encodermap as em
import matplotlib.pyplot as plt
import numpy as np
from math import pi
%config Completer.use_jedi=False

Next, we need to load the input data. Different kinds of variables can be used to describe molecular conformations: e.g. Cartesian coordinates, distances, angles, dihedrals... In principle EncoderMap can deal with any of these inputs, however, some are better suited than others. The molecular conformation does not change when the molecule is translated or rotated. The chosen input variables should reflect that and be translationally and rotationally invariant. 

In this example we use the backbone dihedral angles phi and psi as input as they are translationally and rotationally invariant and describe the backbone of a protein/peptide very well.

The "asp7.csv" file contains one column for each dihedral and one row for each frame of the trajectory. Additionally, the last column contains a cluster_id from a gromos clustering which we can later use for comparison. We can load this data using numpy.loadtxt:

In [None]:
csv_path = "asp7.csv"
data = np.loadtxt(csv_path, skiprows=1, delimiter=",")
dihedrals = data[:, :-1]
cluster_ids = data[:, -1]

Similarly to the previous example, we need to set some parameters. In contrast to the Cube example we now have periodic input data. The dihedral angles are in radians with a 2pi periodicity. We also set some further parameters but don't bother for now. 

In [None]:
parameters = em.Parameters()
parameters.main_path = em.misc.run_path("runs/asp7")
parameters.n_steps = 10
parameters.dist_sig_parameters = (4.5, 12, 6, 1, 2, 6)
parameters.periodicity = 2*pi
parameters.l2_reg_constant = 10.0
parameters.summary_step = max(1, parameters.n_steps/100)

%matplotlib notebook
em.plot.distance_histogram(dihedrals[::10], 
                           parameters.periodicity, 
                           parameters.dist_sig_parameters,
                           bins=50)

Next we can run the dimensionality reduction:

In [None]:
e_map = em.EncoderMap(parameters, dihedrals)
e_map.train()

project all dihedrals to the low-dimensional space...

In [None]:
low_d_projection = e_map.encode(dihedrals)

 and plot the result:

In [None]:
%matplotlib notebook
fig, axe = plt.subplots()
axe.plot(low_d_projection[:, 0], low_d_projection[:, 1], linestyle="", marker=".",
         markersize=5, color="0.7", alpha=0.1)
for i in range(9):
    mask = cluster_ids == i + 1
    axe.plot(low_d_projection[:, 0][mask], low_d_projection[:, 1][mask], label=str(i),
             linestyle="", marker=".", markersize=5, alpha=0.3)
legend = axe.legend()
for lh in legend.legendHandles:
    lh._legmarker.set_alpha(1)

In the above map points from different clusters (different colors) should be well separated. However, if you didn't change the parameters, they are probably not. Some of our parameter settings appear to be unsuitable. Let's see how we can find out what goes wrong.

### Visualize Learning with TensorBoard

TensorBoard is a visualization tool from the machine learning library TensorFlow which is used by the EncoderMap package. During the dimensionality reduction step, when the neural network autoencoder is trained, several readings are saved in a TensorBoard format. All output files are saved to the path defined in `parameters.main_path`. Navigate to this location in a shell and start TensorBoard. 

In case you run this tutorial in the provided Docker container you can open a new console inside the container by typing the following command in a new system shell.
```shell
docker exec -it emap bash
```
Navigate to the location where all the runs are saved. e.g.:
```shell
cd notebooks/runs/asp7/
```
Start TensorBoard in this directory with:
```shell
tensorboard --logdir .
```

You should now be able to open TensorBoard in your webbrowser on port 6006.  
`0.0.0.0:6006` or `127.0.0.1:6006`

In the SCALARS tab of TensorBoard you should see among other values the overall cost and different contributions to the cost. The two most important contributions are `auto_cost` and `distance_cost`. `auto_cost` indicates differences between the inputs and outputs of the autoencoder. `distance_cost` is the part of the cost function which compares pairwise distances in the input space and the low-dimensional (latent) space.

In your case, probably the overall cost as well as the auto_cost and the distance_cost are still decreasing after all training iterations. This tells us that we can simply improve the result by increasing the number of training steps. The following cell contains the same code as above. Set a larger number of straining steps to improve the result (e.g. 3000).

In [None]:
parameters = em.Parameters()
parameters.main_path = em.misc.run_path("runs/asp7")
parameters.n_steps = 10  # increase this value to have more training iterations
parameters.dist_sig_parameters = (4.5, 12, 6, 1, 2, 6)
parameters.periodicity = 2*pi
parameters.l2_reg_constant = 10.0
parameters.summary_step = max(1, parameters.n_steps/100)

e_map = em.EncoderMap(parameters, dihedrals)
e_map.train()

low_d_projection = e_map.encode(dihedrals)

%matplotlib notebook
fig, axe = plt.subplots()
axe.plot(low_d_projection[:, 0], low_d_projection[:, 1], linestyle="", marker=".",
         markersize=5, color="0.7", alpha=0.1)
for i in range(9):
    mask = cluster_ids == i + 1
    axe.plot(low_d_projection[:, 0][mask], low_d_projection[:, 1][mask], label=str(i),
             linestyle="", marker=".", markersize=5, alpha=0.3)
legend = axe.legend()
for lh in legend.legendHandles:
    lh._legmarker.set_alpha(1)

The molecule conformations form different clusters (different colors) should be separated a bit better now. In TensorBoard you should see the cost curves for this new run. When the cost curve becomes more or less flat towards the end, longer training does not make sense.

The resulting low-dimensional projection is probably still not very detailed and clusters are probably not well separated. Currently we use a regularization constant `parameters.l2_reg_constant = 10.0`. The regularization constant influences the complexity of the network and the map. A high regularization constant will result in a smooth map with little details. A small regularization constant will result in a rougher more detailed map.

Go back to the previous cell and decrease the regularization constant (e.g. `parameters.l2_reg_constant = 0.001`). Play with different settings to improve the separation of the clusters in the map. Have a look at TensorBoard to see how the cost changes for different parameters.

### Save and Load
Once you are satisfied with your EncoderMap, you might want to save the result. The good news is: Encoder map automatically saves checkpoints during the training process in `parameters.main_path`. The frequency of writing checkpoints can be defined with `patameters.checkpoint_step`. Also, your selected parameters are saved in a file called `parameters.json`. Navigate to the driectory of your last run and open this `parameters.json` file in some text editor. You should find all the parameters that we have set so far. You also find some parameters which were not set by us specifically and where EncoderMap used its default values.

Let's start by loading the parameters from some previous run:

In [None]:
run_id = 0  # specify which run you want to load
loaded_parameters = em.Parameters.load("runs/asp7/run{}/parameters.json".format(run_id))

Next we create an EncoderMap map object. However, in this case we don't want to create a new neural network with random weights as we did before but we want to load an already trained network. Therefore, we have to specify the checkpoint path that we want to load. Whenever we create an EncoderMap object without giving training data, we also need to specify the number of neurons in the input layer with `n_inputs`.(this necessity will hopefully be removed in a future version)

In [None]:
step = 3000  # specify which checkpoint you want to load
checkpoint_path = "runs/asp7/run{}/checkpoints/step{}.ckpt".format(run_id, step)

loaded_e_map = em.EncoderMap(loaded_parameters, checkpoint_path=checkpoint_path, n_inputs=12)

Now we are finished with loading and we can for example use the loaded EncoderMap object to project data to the low_dimensional space and plot the result:

In [None]:
low_d_projection = loaded_e_map.encode(dihedrals)

# Plotting:
%matplotlib notebook
fig, axe = plt.subplots()
axe.plot(low_d_projection[:, 0], low_d_projection[:, 1], linestyle="", marker=".",
         markersize=5, color="0.7", alpha=0.1)
for i in range(9):
    mask = cluster_ids == i + 1
    axe.plot(low_d_projection[:, 0][mask], low_d_projection[:, 1][mask], label=str(i),
             linestyle="", marker=".", markersize=5, alpha=0.3)
legend = axe.legend()
for lh in legend.legendHandles:
    lh._legmarker.set_alpha(1)

### Generate Molecular Conformations
Already in the cube example, you have seen that with EncoderMap it is not only possible to project points to the low-dimensional space. Also, a projection of low-dimensional points into the high-dimensional space is possible. 

Here, we will use a tool form the EncoderMap library to interactively select a path in the low-dimensional map. We will project points along this path into the high-dimensional dihedral space, and use these dihedrals to reconstruct molecular conformations. This can be very useful to explore the landscape an to see what changes in the molecular conformation going from one cluster to another

The following cell contains the same code which we have previously used to plot the low-dimension projection. Additionally, we call `PathGenerateDihedrals` to attach the path selection tool to the plot.

To start a path, click somewhere in the plot. Add more waypoints by clicking. You can delete the last waypoint with the delete key. Once you are satisfied with your path selection press enter to finish your selection.

The points on the path you selected are then fed into the decoder part of the autoencoder and the resulting dihedrals are used to construct molecular conformations. The generated dihedrals as well as the constructed conformations are stored in the main_path.

After you have selected a path in the map and pressed enter to finish the selection, navigate to the main_path to see if the files where written.

In [None]:
# Same plotting commands as we have used before:
%matplotlib notebook
fig, axe = plt.subplots()
axe.plot(low_d_projection[:, 0], low_d_projection[:, 1], linestyle="", marker=".",
         markersize=5, color="0.7", alpha=0.1)
for i in range(9):
    mask = cluster_ids == i + 1
    axe.plot(low_d_projection[:, 0][mask], low_d_projection[:, 1][mask], label=str(i),
             linestyle="", marker=".", markersize=5, alpha=0.3)
legend = axe.legend()
for lh in legend.legendHandles:
    lh._legmarker.set_alpha(1)

# Here we attach the PathGenerateDihedrals tool to the plot:
pdb_path = "asp7.pdb"
generator = em.plot.PathGenerateDihedrals(axe, loaded_e_map, pdb_path)

You can use your favorite molecular viewer or the code in the following cell to have a look at generated molecular conformations. All you need to do is to adjust the path to the pdb file in the following cell:

In [None]:
import nglview
import MDAnalysis as md

uni = md.Universe("runs/asp7/run2/generated_paths/2019-11-19_10-06-49/generated.pdb") 

view = nglview.show_mdanalysis(uni)

view.clear_representations()
view.add_licorice(selection="backbone")

view

As backbone dihedrals contain no information about the side-chains, only the backbone of the molecule can be reconstructed. 
In case the generated conformations change very abruptly it might be sensible to increase the regularization constant to obtain a smoother representation. If the generated conformations along a path are not changing at all, the regularization is probably to strong and prevents the network form generating different conformations.

### Conclusion

In this tutorial we applied EncoderMap to a molecular system. You have learned how to monitor the EncoderMap training procedure with TensorBoard, how to restore previously saved EncoderMaps and how to generate Molecular conformations using the path selection tool.