# Intermediate EncoderMap: Different Topologies

**Welcome**

Welcome to the intermeidate part of the EncoderMap tutorials. The notebooks in this section contain more in-depth explanations of the concepts in EncoderMap. They also expect you to have some greater programming skill, as the programming explanations fall shorter.

## What are different topologies

With the word *'topology'* we mean the connectivity of a protein. Think of it as a graph (not a weighted graph. For our purposes, the graph connections don't need weights). The nodes are the atoms and the connections are the bonds. Two topologies are identical if their graphs are identical. Two proteins are different if you exchange or remove a single amino acid. However, biologically, you sometimes think of protein families. You are more interested in the general behaviors of proteins, and don't car for this exact specific residue being a aspartic acid or glutamic acid. When we transfer this idea into the language used in EncoderMap, the Asp and Glu protein version offer different feature spaces. To still allow us to train EncoderMap with simulations from these two proteins, we ca use sparse Tensors in the training process.

Run this notebook on Google Colab:

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AG-Peter/encodermap/blob/main/tutorials/notebooks_intermediate/02_training_with_different_topologies.ipynb)

Find the documentation of EncoderMap:

https://ag-peter.github.io/encodermap

**Goals**

In this tutorial you will learn:


In [None]:
# !pip install "git+https://github.com/AG-Peter/encodermap.git@main"
# !pip install -r pip install -r https://raw.githubusercontent.com/AG-Peter/encodermap/main/tests/test_requirements.md

## Imports

In [None]:
import encodermap as em
import numpy as np
import networkx as nx
import plotly.graph_objects as go
import plotly.express as px

%load_ext autoreload
%autoreload 2
%config Completer.use_jedi = False

Fix tensorflow seed for reproducibility

In [None]:
import tensorflow as tf
tf.random.set_seed(1)

## Load the trajectories

We use EncoderMap's `TrajEnsemble` class to load the trajectories and do the feature alignment.

In [None]:
traj_files = ["glu7.xtc", "asp7.xtc"]
top_files = ["glu7.pdb", "asp7.pdb"]

trajs = em.load(traj_files, top_files)

There's an issue with the proteins in these trajectories. They are missing a connection between the C-terminal hydroxy oxygen and hydrogen. The `GLU7-HO` atom is extra.

In [None]:
em.plot.plot_ball_and_stick(trajs[0], highlight="bonds")

In [None]:
G = trajs[0].top.to_bondgraph()

# Generate positions for the nodes
pos = nx.spring_layout(G)

# Create a Plotly figure
fig = go.Figure()

# Add edges to the figure
for u, v, data in G.edges(data=True):
    x0, y0 = pos[u]
    x1, y1 = pos[v]
    fig.add_trace(go.Scatter(x=[x0, x1], y=[y0, y1], mode='lines', line=dict(width=5, color='gray')))

# Add nodes to the figure
for node in G.nodes():
    x, y = pos[node]
    fig.add_trace(go.Scatter(x=[x], y=[y], mode='markers', marker=dict(size=10), hovertemplate="%{customdata}", customdata=[str(node)]))

# Show the figure
fig.update_layout({"width": 800, "height": 800, "showlegend": False})
fig.show()

This can be fixed with EncoderMap, by defining custom amino acids:

In [None]:
custom_aas = {
    "ASP": (
        "A",
        {
            "optional_bonds": [
                ("N", "H1"),
                ("N", "H2"),
                ("N", "H"),
                ("N", "CA"),
                ("CA", "CB"),
                ("CB", "CG"),
                ("CG", "OD1"),
                ("CG", "OD2"),
                ("OD2", "HD2"),
                ("CA", "C"),
                ("C", "O"),
                ("C", "OT"),
                ("O", "HO"),
                ("C", "+N"),
            ],
        },
    ),
    "GLU": (
        "E",
        {
            "optional_bonds": [
                ("N", "H1"),
                ("N", "H2"),
                ("N", "H"),
                ("N", "CA"),
                ("CA", "CB"),
                ("CB", "CG"),
                ("CG", "CD"),
                ("CD", "OE1"),
                ("CD", "OE2"),
                ("OE2", "HE2"),
                ("CA", "C"),
                ("C", "O"),
                ("C", "OT"),
                ("O", "HO"),
                ("C", "+N"),
            ],
        },
    ),
}


trajs.load_custom_topology(custom_aas)

In [None]:
G = trajs[0].top.to_bondgraph()

# Generate positions for the nodes
pos = nx.spring_layout(G)

# Create a Plotly figure
fig = go.Figure()

# Add edges to the figure
for u, v, data in G.edges(data=True):
    x0, y0 = pos[u]
    x1, y1 = pos[v]
    fig.add_trace(go.Scatter(x=[x0, x1], y=[y0, y1], mode='lines', line=dict(width=5, color='gray')))

# Add nodes to the figure
for node in G.nodes():
    x, y = pos[node]
    fig.add_trace(go.Scatter(x=[x], y=[y], mode='markers', marker=dict(size=10), hovertemplate="%{customdata}", customdata=[str(node)]))

# Show the figure
fig.update_layout({"width": 800, "height": 800, "showlegend": False})
fig.show()

Load the CVs with the `ensemble=True` options.

In [None]:
trajs.load_CVs("all", ensemble=True)

In [None]:
trajs

## Create the AngleDihedralCartesianEncoderMap

The AngleDihedralCartesianEncoderMap tries to learn all of the geometric features of a protein. The angles (backbone angles, backbone dihedrals, sidechain dihedrals) are passed through a neuronal network autoencoder, while the distances between the backbone atoms are used to create cartesian coordinates from the learned angles. The generated cartesians and the input (true) cartesians are used to construct pairwise C$_\alpha$ distances, which are then also weighted using sketchmap's sigmoid function. The `cartesian_cost_scale_soft_start` gradually increases the contribution of this cost function to the overall model loss.

In [None]:
p = em.ADCParameters(use_backbone_angles=True,
                     distance_cost_scale=1,
                     auto_cost_scale=0.1,
                     cartesian_cost_scale_soft_start=(50, 80),
                     n_neurons = [500, 250, 125, 2],
                     activation_functions = ['', 'tanh', 'tanh', 'tanh', ''],
                     use_sidechains=True,
                     summary_step=1,
                     tensorboard=True,
                     periodicity=2*np.pi,
                     n_steps=100,
                     checkpoint_step=1000,
                     dist_sig_parameters = (4.5, 12, 6, 1, 2, 6),
                     main_path=em.misc.run_path('runs/asp7_glu7_asp8'),
                     model_api='functional',
                    )
emap = em.AngleDihedralCartesianEncoderMap(trajs, p)

train

In [None]:
history = emap.train()

## Plot the result

In the result (longer training would be beneficial here), the projection area of asp7 and glu7 are separated.

In [None]:
ids = (trajs.name_arr == "asp7").astype(int)
glu7_lowd = emap.encode()[ids == 0]
asp7_lowd = emap.encode()[ids == 1]

fig = go.Figure(
    data=[
        go.Scatter(x=asp7_lowd[:, 0], y=asp7_lowd[:, 1], name="Asp7", mode="markers"),
        go.Scatter(x=glu7_lowd[:, 0], y=glu7_lowd[:, 1], name="Glu7", mode="markers"),
    ],
)
fig.update_layout({"height": 800, "width": 800})
fig.show()

## Create a new trajectory

Using the `InteractivePlotting` class, we can easily generate new molecular conformations by using the decoder part of the neural network. If you're running an interactive notebook, you can use the notebook or qt5 backend and play around with the InteractivePlotting.

In [None]:
sess = em.InteractivePlotting(emap)

For static notebooks, we load the points along the path and generate new molecular conformations from them.

In [None]:
# sess.view