# VISUALIZING A TRAINED NEURAL NETWORK

## 1. Neural network analysis

In this notebook, we will visualize the latent space of a trained neural network, and use it to generate protein interpolations. Let's start by loading two useful classes.

In [None]:
import os
from analysis import MolearnAnalysis

These are the paths to files containing the neural network parameters, the training set, and the test set.

In [None]:
networkfile = f'data{os.sep}conv1d-physics-path_B.pth'
training_set_file = f'data{os.sep}MurD_closed_open_strided.pdb'
test_set_file = f'data{os.sep}MurD_closed_apo_strided.pdb'

The `MolearnAnalysis` class features a series of methods to characterize the latent space of a neural network, assessing its performance, and generating new protein conformations. Its constructor requires two parameters: neural network parameters and a multiPDB containing the coordinates of the examples the neural network was trained with. Optional parameters enable the user to define the neural network architecture (if other than default) and atoms selected from training (backbone and beta carbon by default).

In [None]:
MA = MolearnAnalysis(networkfile, training_set_file)

Optionally, we can load a test set. The test set read by the network is returned to the user in two forms: a normalised pyTorch Tensor (useful to be fed to the neural network) and a numpy array (for displaying). The test set is also stored in the `MolearnAnalysis` object. In case multiple test sets are available, the last one being loaded is stored.

In [None]:
test_set = MA.load_test(test_set_file)

The following methods yield information on RMSD and DOPE score of training and test set (this step can be slightly slow, you can skip it if you are in a rush).

In [None]:
err_train = MA.get_error() # RMSD of traing set
err_test = MA.get_error(test_set[0]) #RMSD of test set
dope_train, dope_train_decoded = MA.get_dope() #DOPE score of training set before and after decoding
dope_test, dope_test_decoded = MA.get_dope(test_set[0]) #DOPE score of test set before and after decoding

the following methods perform a grid search of the latent space to assess L2 norm in latent and 3D space, and the local DOPE score (see Jupyter notebook `molearn_analysis.ipynb`). These act as heuristics of neural network precision. The output of these commands is also stored internally in the `MolearnAnalysis` object. Here, we will generate 50x50 grids around the training set (plus/minus 10%). If this method is called again, precalculated versions are returned, unless a different number of samples is passed as parameter. In this case, the grid search is executed again, and new results stored.

<div class="alert alert-block alert-warning">
<b>Warning:</b> depending on the sampling granularity, this operation can take a long time! Calculating the DOPE score of a 50x50 grid can take ~20 minutes.
If you are in a rush, you can skip these steps, or comment out one of the two lines below. The rest of this notebook will still run, though you will not be able to visualize coloured landscapes in the GUI presented in the next section.
</div>

In [None]:
landscape_err_latent, landscape_err_3d, xaxis, yaxis = MA.scan_error(samples=50)
#landscape_dope, xaxis, yaxis = MA.scan_dope(samples=50)

An interesting plot, is one where the latent space is coloured as a function of how closely it resembles a target structure. In this example, we pick a random conformation from the test set and compare a grid of the latent space to it in terms of RMSD.

In [None]:
landscape_target_rmsd, xaxis, yaxis = MA.scan_error_from_target(test_set[0][0], samples=50)

***

### 2. Neural Network visualisation

Now that we have loaded some information in our `MolearnAnalysis` object, it's time to explore its contents! To this end, we will use a `MolearnGUI` object, creating an interactive interface!

In [None]:
from analysis import MolearnGUI

In [None]:
MG = MolearnGUI(MA)

The interface is divided in three areas: a control panel (left), a 2D latent space representation (center) and a 3D protein view (right, initially empty). Here are instructions on how to use it:

* the click boxes in the interface enable displaying training and test set projections in the latent space area. If the test set was not loaded in the the `MolearnAnalysis` object, the box will be grayed out.

* the **drop down menu and scroller** enable colouring the latent space surface in different ways. Colouring styles available will depend on whether grid sampling methods have been called in the MolearnAnalysis object, before the GUI was started.

* the **2D surface is clickable**. Clicking multiple times enables defining a path. 
The coordinates of clicked points appear in the **editable text box** to the bottom left as (x1, y1, x2, y2, ...). 

* When clicking the latent space, a **3D protein structure** will appear on the right. The representation will feature a number of conformations equal to the number of points clicked on the latent space, to which extra **sampling points** are added. By default, 10 extra points are added in each interval. This value can be edited in the menu on the left.

* The interpolation visible on the right can be saved into a multiPDB file via the button labelled "**Save PDB**" to the bottom of the menu on the left.


You got to the end of this notebook! If you are interested in knowing what is happening inside the MolearnAnalysis object, see the Juptyer notebook `molearn_analysis.ipynb`.

***