# Day 1, Practical 2
## Handling simulation trajectory data

MDAnalysis is able to read a wide variety of different simulation coordinate formats. A full list of these can be seen in the [coordinates documentation](https://docs.mdanalysis.org/stable/documentation_pages/coordinates/init.html#supported-coordinate-formats). Of these, many are trajectory formats which hold temporal information from simulations such as coordinates, dimensions, velocities, and forces.

Here we demonstrate how one can use MDAnalysis to read, explore, and write trajectory data.

### Package imports

In [2]:
import MDAnalysis as mda
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## 1. Reading a trajectory

Loading a trajectory is done in the same way as loading any type of coordinates (as shown in session 1). All you have to do is create a `Universe` object by passing it a topology and the trajectory (here in this case a PSF file and DCD trajectory respectively).

In [44]:
# First let's load a PSF and DCD from the MDAnalysis test data
from MDAnalysis.tests.datafiles import PSF, DCD

# Now let's load the XTC trajectory
u = mda.Universe(PSF, DCD)

Trajectory functionality is centered around the `Universe.trajectory` object.

In [45]:
u.trajectory

<DCDReader /home/bioc1523/software/anaconda/install/envs/all/lib/python3.7/site-packages/MDAnalysisTests/data/adk_dims.dcd with 98 frames of 3341 atoms>

This `trajectory` object has a length in `frames` and a time unit of **picoseconds** (see here for more information about the [MDAnalysis base units](https://docs.mdanalysis.org/2.0.0-dev0/documentation_pages/units.html#id4).

The `trajectory` object has many useful attributes, such as the the number of frames `n_frames`, the time between frames `dt`, the total trajectory time `totaltime`.

In [46]:
# print the number of frames
u.trajectory.n_frames

98

In [47]:
# You can also get the number of frames by calling `len` on the trajectory object
len(u.trajectory)

98

In [48]:
# We can get the time between frames with `dt`
u.trajectory.dt

0.9999999119200186

In [49]:
# And the total simulation time from `totaltime`
u.trajectory.totaltime

96.9999914562418

In [50]:
# The trajectory object has a ton of other attributes not shown here
# for example you can get the file format through `format`
u.trajectory.format

'DCD'

### Exercise 1

From MDAnalysis.tests.datafiles get the GRO topology and XTC trajectory. Work out the total number of frames, and the total simulation length of the trajectory.

In [51]:
# Exercise 1 solution
from MDAnalysis.tests.datafiles import GRO, XTC
u_new = mda.Universe(GRO, XTC)
print('number of frames: ', u_new.trajectory.n_frames)
print('total time: ', u_new.trajectory.totaltime)

number of frames:  10
total time:  900.0000686645508


In [None]:
# Exercise 1

It's also possible to load / concatenate multiple trajectories together in one go using MDAnalysis' [ChainReader](https://docs.mdanalysis.org/2.0.0-dev0/documentation_pages/coordinates/chain.html?highlight=chainreader#chainreader-mdanalysis-coordinates-chain).

This can be done simply by passing several trajectories to the Universe when creating the object.

In [53]:
# Let's assume we wanted to load several DCD files
u_multi = mda.Universe(PSF, DCD, DCD)

# Now we have 2 times the number of frames [ 98 frames per DCD trajectory ]
print(u_multi.trajectory.n_frames)

196


In [54]:
# We can also just load a list of trajectories
traj_list = [DCD, DCD, DCD]

u_multi = mda.Universe(PSF, traj_list)

# Now we have concatenated the trajectory 3 times
print(u_multi.trajectory.n_frames)

294


## 2. The Timestep object

One of the key components of trajectories is the *Timestep* object `ts`. This is the object that holds the trajectory information **specific to the current frame**.

This information mainly includes:
* The frame number and time
* Unitcell dimensions as `[A, B, C, alpha, beta, gamma]` (or `None` if not available)
* The positions (also forces and/or velocities if available)


In [66]:
# Here we load the an AMBER topology and trajectory with coordinate, velocity and forces information
from MDAnalysis.tests.datafiles import PRM_NCBOX, TRJ_NCBOX
u = mda.Universe(PRM_NCBOX, TRJ_NCBOX)

In [83]:
# Timestep is available via `ts`
u.trajectory.ts

< Timestep 0 with unit cell dimensions [28.818764 28.278753 27.726164 90.       90.       90.      ] >

In [84]:
# Getting the current frame from `ts`
u.trajectory.ts.frame

0

In [85]:
# Getting the current time from `ts`
u.trajectory.ts.time

1.0

In [86]:
# Getting the dimensions from `ts`
u.trajectory.ts.dimensions

array([28.818764, 28.278753, 27.726164, 90.      , 90.      , 90.      ],
      dtype=float32)

In [87]:
# `ts` also holds positions
u.trajectory.ts.positions

array([[15.249873 , 12.578178 , 15.191731 ],
       [14.925511 , 13.58888  , 14.944009 ],
       [15.285703 , 14.3409605, 15.645962 ],
       ...,
       [ 3.9575078, 14.525827 , 16.14651  ],
       [ 4.6214457, 14.670319 , 15.472312 ],
       [ 4.3571763, 14.856422 , 16.950998 ]], dtype=float32)

In [88]:
# `ts` has a convenient `has_forces` attribute to let you know that forces are available
print(u.trajectory.ts.has_forces)
print(u.trajectory.ts.forces)

True
[[  35.912895     7.5411134  -62.774    ]
 [ -90.35279    163.55951     27.480358 ]
 [  18.25486    -50.773525    18.73405  ]
 ...
 [-211.5503    -127.57705     14.248226 ]
 [  65.75678     44.437996   -59.866737 ]
 [  65.71979     47.376717    45.049736 ]]


### Exercise 2

Check if the Timestep has velocities and print them out

In [94]:
# Exercise 2 - solution
if u.trajectory.ts.has_velocities:
    print(u.trajectory.ts.velocities)

[[-10.8446045   -3.3365366   -6.420965  ]
 [  0.03857359   0.51404047  -5.497332  ]
 [ 17.19698      1.9232591  -15.253312  ]
 ...
 [  3.362785     3.7809153    3.04019   ]
 [  8.890362   -26.293007     0.64297414]
 [  4.1646843   23.021692    -4.7289677 ]]


In [None]:
# Exercise 2

Whilst the *Timestep* information is available, one would normally not access the information it contains by calling `Universe.trajectory.ts` directly.

Instead, as shown in session 1, the information contained in the *Timestep* is passed along to other parts of the `Universe`.

For example, as shown in session 1, coordinate/velocity/forces data is directly accessible via AtomGroups instead.

In [101]:
# create an atomgroup from all the atoms
ag = u.atoms
print("AtomGroup positions:\n", ag.positions)
print("ts positions:\n", u.trajectory.ts.positions)

AtomGroup positions:
 [[15.249873  12.578178  15.191731 ]
 [14.925511  13.58888   14.944009 ]
 [15.285703  14.3409605 15.645962 ]
 ...
 [ 3.9575078 14.525827  16.14651  ]
 [ 4.6214457 14.670319  15.472312 ]
 [ 4.3571763 14.856422  16.950998 ]]
ts positions:
 [[15.249873  12.578178  15.191731 ]
 [14.925511  13.58888   14.944009 ]
 [15.285703  14.3409605 15.645962 ]
 ...
 [ 3.9575078 14.525827  16.14651  ]
 [ 4.6214457 14.670319  15.472312 ]
 [ 4.3571763 14.856422  16.950998 ]]


Similarly, the `dimensions` can be accessed directly from `Universe`.

In [102]:
print("Universe dimensions: ", u.dimensions)
print("ts dimensions: ", u.trajectory.ts.dimensions)

Universe dimensions:  [28.818764 28.278753 27.726164 90.       90.       90.      ]
ts dimensions:  [28.818764 28.278753 27.726164 90.       90.       90.      ]


In [89]:
# Exercise 2 - solution
if u.trajectory.ts.has_velocities:
    print(u.trajectory.ts.velocities)

[[-10.8446045   -3.3365366   -6.420965  ]
 [  0.03857359   0.51404047  -5.497332  ]
 [ 17.19698      1.9232591  -15.253312  ]
 ...
 [  3.362785     3.7809153    3.04019   ]
 [  8.890362   -26.293007     0.64297414]
 [  4.1646843   23.021692    -4.7289677 ]]


## 3. Traversing through a trajectory

Up until this point, we have primarily been inspecting only a single frame of the `trajectory` object. By default when creating a `Universe`, the *Timestep* is loaded with the information from the first (zero-th) frame in the trajectory.

Here we look at how we can traverse through the trajectory and access the data from different frames.

We can consider the `trajectory` object to be an iterator that loads trajectory data from a source (i.e. in most cases the input trajectory file), and feeds the relevant data to the *Timestep* object.

The following operations can be done to access the trajectory:
* Random access via trajectory indexing
* Iterating over all frames
* Slicing to iterate over a sub-section of the trajectory

**Note:** As is standard in python, `trajectory` access is done via **0-based indices**. So the first frame is `0`, and the final frame is `n_frames - 1`.

### 3.1 Trajectory indexing

It is possible to randomly access any frame along a trajectory by passing the index of the frame to the trajectory.

In [131]:
# First let's create a new universe using PRM_NCBOX and TRJ_NCBOX
u = mda.Universe(PRM_NCBOX, TRJ_NCBOX)

# as we can see on creation we access the first frame in the trajectory
print('current frame: ', u.trajectory.frame)
print('total number of frames: ', u.trajectory.n_frames)

current frame:  0
total number of frames:  10


In [132]:
# Let's create an atomgroup for the first two atoms in the Universe
# and check their current position at frame 0
first_two_atoms = u.atoms[:2]
print(first_two_atoms.positions)

[[15.249873 12.578178 15.191731]
 [14.925511 13.58888  14.944009]]


In [133]:
# Now let's move to the 7th frame
u.trajectory[6]

< Timestep 6 with unit cell dimensions [27.78876 27.26805 26.73521 90.      90.      90.     ] >

In [134]:
# As we can see the frame number as now updated accordingly
print('current frame: ', u.trajectory.frame)

current frame:  6


In [135]:
# The AtomGroup also automatically updates with the new Timestep data
print(first_two_atoms.positions)

[[15.79226   12.6949625 15.421089 ]
 [15.945778  13.682353  14.985668 ]]


**Note:** It is particularly important to remember that AtomGroups are not static objects. Whilst the atoms they represent do not change (see UpdatingAtomGroup for when this is not the case), the positions (and forces or velocities if available) will change as you traverse through the trajectory.


It is also worth noting that unless the trajectory is held in memory (see **Section 4**) any changes to variables that change with `Timestep` are temporary.

For example, if you were to override the position of an AtomGroup for a given frame, then seek to another frame and come back to the original frame, the positions would be updated back to reflect the contents of the trajectory file:

In [136]:
# Let's start from frame 0 and override the positions of `first_two_atoms`
u.trajectory[0]

# `first_two_atoms` positions beforehand
print('frame 0 positions: ', first_two_atoms.positions)

# `first_two_atoms` after being zeroed
first_two_atoms.positions = 0
print('zeroed positions: ', first_two_atoms.positions)

frame 0 positions:  [[15.249873 12.578178 15.191731]
 [14.925511 13.58888  14.944009]]
zeroed positions:  [[0. 0. 0.]
 [0. 0. 0.]]


In [140]:
# Now let's go the before last frame
u.trajectory[-2]
first_two_atoms.positions

array([[14.799454, 15.214347, 14.714555],
       [15.001984, 15.870884, 13.868363]], dtype=float32)

In [141]:
# And now we come back to frame 0
u.trajectory[0]

# positions are no longer zeroed
first_two_atoms.positions

array([[15.249873, 12.578178, 15.191731],
       [14.925511, 13.58888 , 14.944009]], dtype=float32)

### 3.2 Iterating through the trajectory

Iterating through a trajectory is the most common way to traverse a trajectory.

For example one could access every frame in the trajectory and store the current time using the following:

In [119]:
# Create a list for the times
times = []

for ts in u.trajectory:
    times.append(u.trajectory.time)
    
print(times)

[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]


### 3.3 Trajectory slicing

Rather than iterating through the entire trajectory, it is possible to slice the trajectory using a `[start:stop:step]` pattern.

In [120]:
# Let's slice starting at the second frame, ending on the before last frame
# and skipping every other frame

times = []

for ts in u.trajectory[1:-2:2]:
    times.append(u.trajectory.time)
    
print(times)

[2.0, 4.0, 6.0, 8.0]


It is also possible to use fancy indexing to access frames in a given order

In [121]:
# indices of frames to access
indices = [9, 5, 4, 8, 1]

times = []

for ts in u.trajectory[indices]:
    times.append(u.trajectory.time)
    
print(times)

[10.0, 6.0, 5.0, 9.0, 2.0]


### Exercise 3

Create a reversed list of the trajectory times

In [122]:
# Exercise 3 -- solution
times = []

for ts in u.trajectory[::-1]:
    times.append(u.trajectory.time)
    
print(times)

[10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0]


In [None]:
# Exercise 3

## 4. Transfering to memory

By default MDAnalysis uses an iterative I/O model. That is to say that it loads trajectory data from disk as necessary. This is done so that very large trajectories can still be read by MDAnalysis without filling up the machine's RAM.

In some cases it can be useful to instead hold the trajectory data in memory:
1. It is much faster to iterate through an in-memory trajectory
2. Changes to the trajectory data is not overwritten


You can create an in-memory trajectory directly on `Universe` creation by setting the `in_memory` keyword.

In [149]:
u = mda.Universe(PRM_NCBOX, TRJ_NCBOX, in_memory=True)

In [150]:
# Note how the trajectory is now stored in 'MEMORY' format
# instead of the input ['NCDF', 'NC'] input format
u.trajectory.format

'MEMORY'

It's also possible to transfer the trajectory to memory after creating the `Universe` using `transfer_to_memory`.

In [151]:
# It's also possible to transfer
u = mda.Universe(PRM_NCBOX, TRJ_NCBOX)
print('starting format: ', u.trajectory.format)

u.transfer_to_memory()
print('new format: ', u.trajectory.format)

starting format:  ['NCDF', 'NC']
new format:  MEMORY


When in-memory, changes to trajectory data like `positions` do not get overwritten if you change frames.

In [152]:
print('current frame: ', u.trajectory.frame)
print('current positions: ', u.atoms.positions)

# zero the positions
u.atoms.positions = 0
print('zeroed positions: ', u.atoms.positions)

current frame:  0
current positions:  [[15.249873  12.578178  15.191731 ]
 [14.925511  13.58888   14.944009 ]
 [15.285703  14.3409605 15.645962 ]
 ...
 [ 3.9575078 14.525827  16.14651  ]
 [ 4.6214457 14.670319  15.472312 ]
 [ 4.3571763 14.856422  16.950998 ]]
zeroed positions:  [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 ...
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [154]:
# Seek to a new frame
u.trajectory[-1]
print('frame -1 positions: ', u.atoms.positions)

# See back to frame 0
u.trajectory[0]

# positions remain zero for frame 0
print('frame 0 positions: ', u.atoms.positions)

frame -1 positions:  [[14.392319  16.360231  14.511796 ]
 [13.606518  17.076824  14.27282  ]
 [14.025402  18.076998  14.383686 ]
 ...
 [ 5.8899493 16.821993   7.454993 ]
 [ 6.58759   17.260208   6.967657 ]
 [ 5.7498684 15.999697   6.9854836]]
frame 0 positions:  [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 ...
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


### Exercise 4

Evaluate how much faster you can retrieve trajectory times from an in-memory trajectory compared to reading from file.

*Hint:* use %timeit

In [155]:
# Exercise 4 -- solution
u_memory = mda.Universe(PRM_NCBOX, TRJ_NCBOX, in_memory=True)
u_disk = mda.Universe(PRM_NCBOX, TRJ_NCBOX)

In [156]:
%timeit [ts.time for ts in u_memory.trajectory]

145 µs ± 3.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [157]:
%timeit [ts.time for ts in u_disk.trajectory]

1.02 ms ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [None]:
# Exercise 4

## 5. Visualizing a trajectory

As we did in the previous session, it is possible to use NGLView to traverse through the trajectory visually.

In [158]:
import nglview as nv
from MDAnalysis.tests.datafiles import PSF, DCD
adk = mda.Universe(PSF, DCD)
nv.show_mdanalysis(adk)



NGLWidget(max_frame=97)

## 6. Updating AtomGroups

In **section 3** we discuss how AtomGroups have a static set of atoms but have positions (forces, velocities, etc..) that update as you traverse through a trajectory.

There is a special type of the `AtomGroup` class which allows for a dynamic selection of atoms that changes as you traverse through the trajectory. This class it called the `UpdatingAtomGroup`. To use it, you must pass the `updating` flag to `select_atoms`.

To illustrate this, we look at an ACE residue in a box of water. We create a selection based on the number of waters within 5 Angstroms of the ACE residue. As we traverse through the trajectory, the `UpdatingAtomGroup` will change in the number of atoms it contains, whilst the `AtomGroup` will not.

In [169]:
# ACE residue in a box of water
u = mda.Universe(PRM_NCBOX, TRJ_NCBOX)

# Create a normal atom group
ag = u.select_atoms('resname WAT and around 5 resname ACE')
updating_ag = u.select_atoms('resname WAT and around 5 resname ACE', updating=True)

In [171]:
ag_atoms = []
updating_ag_atoms = []

for ts in u.trajectory:
    ag_atoms.append(len(ag.atoms))
    updating_ag_atoms.append(len(updating_ag.atoms))

# These number of atoms remains the same
print(ag_atoms)

# The number of atoms changes
print(updating_ag_atoms)

[87, 87, 87, 87, 87, 87, 87, 87, 87, 87]
[87, 81, 78, 84, 76, 79, 83, 90, 83, 72]


**Note:** Whilst `UpdatingAtomGroup` are sometimes very useful, they are also much more computational expensive to use as the selection (in this case a distance search around the ACE residue) has to be done at every step along the trajectory. We recommend using these sparsely.

## 7. Writing trajectory files

Aside from reading trajectories, MDAnalysis is also able to write them out. A list of formats which can be written to is shown in the [supported formats documentation](https://docs.mdanalysis.org/dev/documentation_pages/coordinates/init.html#id23).

The most common way to write out files is by using the `Writer` class as a context manager.

Below we show an example of reading in an AMBER netcdf trajectory (ACE residue in a box of water used in previous sections), and writing it out to a DCD trajectory.

The basic `Writer` usage pattern is:
1. Create a context manager for the Writer using `with`
  * Needs passing the number of atoms that will be written to file
  * Can optionally pass `format` to specific which file format will be used, otherwise the output file extension will be used
2. Iterate through the trajectory frames you wish to write
3. Pass the `AtomGroup` to write out at each frame to the `write()` method
4. Close the output trajectory file (done automatically via the context manager)

There are other ways to write out coordinates, these can be seen in the [user guide entry on trajectory writing](https://userguide.mdanalysis.org/stable/reading_and_writing.html#frames-and-trajectories).

In [172]:
u = mda.Universe(PRM_NCBOX, TRJ_NCBOX)

# Writer requires passing the number of atoms `n_atoms`
# as a keyword argument, this is usually the number of 
# atoms in the atomgroup to write here it will be all
# the atoms in the universe.
ag = u.atoms

# The format argument is not necessary, as it is taken from
# the extension of the output file, but can sometimes be useful
with mda.Writer('test.dcd', n_atoms=ag.n_atoms, format='DCD') as W:
    # Iterate through the trajectory and write at every step
    for ts in u.trajectory:
        W.write(ag)

## 8. Case study 1: FRET distances

## 9. Case study 2: ???

# ~ here be dragons ~

### Working with AtomGroups: FRET distances

Experimental FRET labels: distances

<div>
<img src="figures/fret_distances_adk.png" alt="FRET distances" width="250"/>
</div>


* I52 - K145
* A55 - V169
* A127 - A194

Calculate the C$_\beta$ distances as proxies for the spin-label distances.

Sampling large conformational is challenging with standard equilibrium MD. Therefore we used an enhanced sampling method ("dynamic importance sampling", DIMS) to generate transitions between closed and open apo AdK [2, 3] in addition to "brute force" equilibrium MD (on PSC Anton).


In [None]:
beta = closed_to_open.select_atoms("name CB")

donors = beta.select_atoms("resname ILE and resid 52", 
                           "resname ALA and resid 55",
                           "resname ALA and resid 127")
acceptors = beta.select_atoms("resname LYS and resid 145", 
                           "resname VAL and resid 169",
                           "resname ALA and resid 194")

Indexing the trajectory sets the active frame to that index.

In [None]:
closed_to_open.trajectory[0]
print(f"Frame: {closed_to_open.trajectory.frame}")
print(f"Time: {closed_to_open.trajectory.time}")

In [None]:
closed_to_open.trajectory[-1]
print(f"Frame: {closed_to_open.trajectory.frame}")
print(f"Time: {closed_to_open.trajectory.time}")

Setting the frame updates dynamic data such as positions. Note that the positions array itself does not update.

In [None]:
closed_to_open.trajectory[0]
print(closed_to_open.trajectory.frame)
donor_positions = donors.positions
donor_positions

In [None]:
closed_to_open.trajectory[-1]
print(closed_to_open.trajectory.frame)
donor_positions

Rather, it's the AtomGroup that updates.

In [None]:
donors.positions

The more common way to traverse through a trajectory (e.g. for analysis) is to iterate through it.

In [None]:
for ts in closed_to_open.trajectory:
    print(f"Frame: {ts.frame}, time: {ts.time}")

You can also easily slice the trajectory.

In [None]:
for ts in closed_to_open.trajectory[2:92:8]:
    print(f"Frame: {ts.frame}, time: {ts.time}")

Let's apply this to the FRET analysis we did earlier. First, for convenience, let's codify the analysis into a function. The arguments (`donors`, `acceptors`) are `AtomGroup`s so that we can work the the updated positions arrays for each frame.

In [None]:
def calculate_fret_distances(donors, acceptors):
    return np.linalg.norm(donors.positions - acceptors.positions, axis=1)

In [None]:
distances = []
times = []
for ts in closed_to_open.trajectory:
    d = calculate_fret_distances(donors, acceptors)
    distances.append(d)
    times.append(ts.time)
print(distances[:3])

In [None]:
import matplotlib.pyplot as plt

plt.plot(times, distances)
plt.legend(("I52-K145", "A55-V169", "A127-A194"))
plt.xlabel("Time (ps)")
plt.ylabel(r"Distance (Å)");

### Working with UpdatingAtomGroups: solvent shells

In [None]:
from MDAnalysisData import datasets
ifabp_data = datasets.fetch_ifabp_water()
ifabp = mda.Universe(ifabp_data.topology, ifabp_data.trajectory)

In [None]:
solvshell_static = ifabp.select_atoms("resname TIP3 and around 5.0 protein")
solvshell_static

In [None]:
ifabp.trajectory[-1]
solvshell_static

In [None]:
solvshell_updating = ifabp.select_atoms("resname TIP3 and around 5.0 protein", updating=True)
solvshell_updating

In [None]:
ifabp.trajectory[0]
solvshell_updating

In [None]:
times = []
n_waters = []
for ts in ifabp.trajectory:
    times.append(ts.time)
    n_waters.append(len(solvshell_updating.residues))
print(n_waters[:3])

# uhhh why are the times negative

In [None]:
plt.plot(times, n_waters)
plt.xlabel("Time (ps)")
plt.ylabel(r"# waters within 5 $\AA$ of protein");