# Extracting and visualising a free energy simulation

This notebook provides a step-by-step guide to extract and visualise a free energy simulation trajectory from a ``simulation.nc`` file using [openfe-analysis](https://github.com/OpenFreeEnergy/openfe_analysis), [MDAnalysis](https://github.com/MDAnalysis/mdanalysis) and [mdtraj](https://github.com/mdtraj/mdtraj). By the end, you should understand how to:

1. Extract the trajectory of a ``replica`` or ``single lambda state`` from a ``simulation.nc`` file
2. For a given hybrid topology trajectory, extract the relevant atom positions for the end states using `MDAnalysis`
3. Write out the trajectorie(s) using `MDAnalysis`
4. Centre the ligand in the simulation box using `mdtraj`

This visualisation workflow can be used with any OpenFE protocols, though the end-state extraction method can be different, as only the relative hybrid topology protocol uses the temperature factor method described here.

## Downloading the example data

First, download some example trajectory data. This may take a few minutes due to the size of the simulation file. Please skip this section if you have already done this!

In [1]:
! wget https://zenodo.org/records/15375081/files/simulation.nc -nc
! wget https://zenodo.org/records/15375081/files/hybrid_system.pdb -nc

File ‘simulation.nc’ already there; not retrieving.

File ‘hybrid_system.pdb’ already there; not retrieving.



## Extracting the trajectory with `MDAnalysis`

The `openfe-analysis` package provides an `MDAnalysis` reader to help extract the trajectory data from the `simulation.nc` file. As the file contains multipule replicas simulated at different lambda states, we must choose which of these to load as a single trajectory. We have two options available to construct the trajectory:
- `state_id`: will construct a trajectory which follows a single Hamiltonian lambda state at the specified value.
- `recplica_id`: will construct a trajectory which follows a single replica at the specified value.

In this example which uses a trajectory from a relative binding free energy calculation we will load the trajectory at `lambda=0` or the end state corresponding to Ligand A and visulaise the trajectory with `nglview`.

In [26]:
import MDAnalysis as mda
import mdtraj as md
from openfe_analysis import FEReader
import nglview as nv
import numpy as np

u_0 = mda.Universe("hybrid_system.pdb", "simulation.nc", format=FEReader, state_id=0)

v = nv.show_mdanalysis(u_0)
# v



<center>
<div style='width: 800px'>
    
![system](images/hybrid_system.png)

</div>
</center>

<div class=\"alert alert-block alert-info\"> <b>Note:</b> The OpenFE relative binding free energy protocol does not save water positions by default, this can be changed via the <a href="https://docs.openfree.energy/en/latest/reference/api/openmm_protocol_settings.html#openfe.protocols.openmm_utils.omm_settings.MultiStateOutputSettings.output_indices">output_indices</a> protocol setting. </div>


To view the final state at `lambda=1` we can use negative indexing if we don't know the total number of lambda states.

In [27]:
u_1 = mda.Universe("hybrid_system.pdb", "simulation.nc", format=FEReader, state_id=-1)

v = nv.show_mdanalysis(u_1)
v.center()
# v



<center>
<div style='width: 800px'>
    
![system_state_1](images/hybrid_system_state_1.png)

</div>
</center>

# Extracting the end state positions with `MDAnalysis` 

## Relative hybrid topology protocol

The trajectory data stored in the `simulation.nc` file contains the positions of the end-state ligands in their hybrid topology format. This means only atoms that are unique to the end-states have individual positions, with conserved core atoms sharing a single set of positions. As you might have noticed in the visualisation above, this can complicate the analysis and visualisation of the protein-ligand interactions. However, we can identify the atoms relevant to the end states or core atoms using the tempature factors in the topology file:

- `0.0`: The non-alchemical atoms (protein, solvent, etc)
- `0.25`: The unique atoms of state A
- `0.5`: The conserved core atoms present in both end states
- `0.75`: The unique atoms of state B

With this information, we can easily extract the atom positions relevant to `state A` for `lambda=0`:

In [28]:
# get atoms for state A
tempfactor = 0.25

state = sum([u_0.atoms[u_0.atoms.tempfactors == i] for i in (0, 0.5, tempfactor)])

v = nv.show_mdanalysis(state)

# v

<center>
<div style='width: 800px'>
    
![statea_lig](images/statea_ligand.png)

</div>
</center>

## Separated Topologies

This protocol represents both end-state ligands explicitly with unique atom positions. To extract the relevant end-state ligand coordinates, you can use `chainid` to select the state where `chainid` `A` and `B` correspond to the `A` and `B` ligands, respectively, in the `solvent` leg. In the `complex` leg, due to the way the systems are constructed, the `chainid`s are `B` and `E`, which again correspond to the `A` and `B` end-states respectively. 

In [29]:
# load the example septop solvent leg topology file
septop = mda.Universe("topologies/alchemical_system_septop.pdb")
v = nv.show_mdanalysis(septop)
v.center()
# make the ions visible
v.add_representation("ball+stick", selection="chainid D")
# v

<center>
<div style='width: 800px'>
    
![septop_full](images/septop_full.png)

</div>
</center>

In [33]:
# select end-state A benzene only
state_a = septop.select_atoms("resname UNK and chainid A")
v = nv.show_mdanalysis(state_a)
v.center()
# v

<center>
<div style='width: 800px'>
    
![septop_a](images/septop_state_a.png)

</div>
</center>

## Saving the trajectory to file with `MDAnalysis`

We can now use `MDAnalysis` to save the trajectory of the `state A` atoms to a common file format, note that we will also need to write out a new topology file that can be used to load this trajectory:

In [58]:
# write a new PDB topology file for the state A atoms only
state.write("state_a_topology.pdb")
# write the trajectory to an xtc file
with mda.Writer('out.xtc', n_atoms=len(state.atoms)) as w:
    for ts in u_0.trajectory:
        w.write(u_0.atoms[state.atoms.ix])



## Centering the Ligand with `mdtraj`

You may have noticed in the view above that the ligand seems to have drifted away from the protein, this is a visualisation artifact caused by the use of periodic boundary conditions and the way in which `OpenMM` tries to ensure that all particle positions are written into a single periodic box. We can fix this, however, using `mdtraj` and the [image_molecules](https://mdtraj.org/1.9.3/api/generated/mdtraj.Trajectory.html?highlight=image_molecules#mdtraj.Trajectory.image_molecules) function:

In [30]:
traj = md.load_xtc("out.xtc", top="state_a_topology.pdb")
traj = traj.image_molecules()

v = nv.show_mdtraj(traj)

v.center()
# v

<center>
<div style='width: 800px'>
    
![center_system](images/centered_system.png)

</div>
</center>