In [2]:
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PSF, DCD, GRO, XTC
from MDAnalysis.analysis import rms
import numpy as np
import pandas as pd

## Data structures in MDAnalysis

- A molecular system consists of particles. A particle is represented as an `Atom` object
- `Atom`s are grouped into `AtomGroup`s
- A `Universe` contains all the particles in a molecular system in an `AtomGroup` accessible at the `.atoms` attribute, and combines it with a trajectory at `.trajectory`

A fundamental concept in MDAnalysis is that at any one time, only **one time frame of the trajectory is being accessed**.

## Loading a structure or trajectory

Working with MDAnalysis typically starts with loading data into a **Universe**, the central data structure in MDAnalysis. 

A topology file is **always needed**, it can be **followed by trajectory files**

In [3]:
# using f-string formatted variable
data_dir = '/DURF_datasets/triad_molecule'
top_file = f'{data_dir}/triad_forcefield_ground.prmtop'
traj_file = f'{data_dir}/for_tutorial.dcd'
u = mda.Universe(top_file, traj_file)
len(u.trajectory)

10

# Working with atoms

## Select atoms and store in `Atomgroups`

### Access the particles of the `Universe` with the `atoms`attribute

In [3]:
u.atoms

<AtomGroup with 207 atoms>

### Select by slicing

In [4]:
first_sixty = u.atoms[:60]
first_sixty.atoms

<AtomGroup with 60 atoms>

### Select by filtered lists

In [5]:
carbons = u.atoms[[atom.index for atom in u.atoms if atom.element=='C']]
carbons #You can slice this Atomgroup to get even smaller atom group

<AtomGroup with 132 atoms>

## `Atomgroups`:  the most important class in MDAnalysis

Syntax: `(AtomGroup).(attribute or method)`
- names
- masses
- elements (symbol)
- residues (it belongs to) \-\-no duplication
    - resnames (show residue attributes atom-wise)
- segments
- positions
- center_of_mass()
- center_of_geometry()
- total_mass()/ total_charge()
- topology geometries (**only the atoms involved in the geometry can be in the group**)
    - bonds (2 atoms in the group)
    - angles (3 atoms in the group)
    - dihedral angles (4 atoms in the group)

In [6]:
print(carbons.names)
print(carbons.center_of_mass())

['C1' 'C2' 'C3' 'C4' 'C5' 'C6' 'C7' 'C8' 'C9' 'C10' 'C11' 'C12' 'C13'
 'C14' 'C15' 'C16' 'C17' 'C18' 'C19' 'C20' 'C21' 'C22' 'C23' 'C24' 'C25'
 'C26' 'C27' 'C28' 'C29' 'C30' 'C31' 'C32' 'C33' 'C34' 'C35' 'C36' 'C37'
 'C38' 'C39' 'C40' 'C41' 'C42' 'C43' 'C44' 'C45' 'C46' 'C47' 'C48' 'C49'
 'C50' 'C51' 'C52' 'C53' 'C54' 'C55' 'C56' 'C57' 'C58' 'C59' 'C60' 'C61'
 'C62' 'C63' 'C64' 'C65' 'C66' 'C67' 'C68' 'C69' 'C70' 'C71' 'C72' 'C73'
 'C74' 'C75' 'C76' 'C77' 'C78' 'C79' 'C80' 'C81' 'C82' 'C83' 'C84' 'C85'
 'C86' 'C87' 'C88' 'C89' 'C90' 'C91' 'C92' 'C93' 'C94' 'C95' 'C96' 'C97'
 'C98' 'C99' 'C100' 'C101' 'C102' 'C103' 'C104' 'C105' 'C106' 'C107'
 'C108' 'C109' 'C110' 'C111' 'C112' 'C113' 'C114' 'C115' 'C116' 'C117'
 'C118' 'C119' 'C120' 'C121' 'C122' 'C123' 'C124' 'C125' 'C126' 'C127'
 'C128' 'C129' 'C130' 'C131' 'C132']
[49.5958473  45.47731301 51.33252196]


In [7]:
# Example for topology geometries
angle = carbons[:3].angle.value()
dihedral = carbons[:4].dihedral.value()
print(angle)
print(dihedral)

48.80607524639622
22.219644295983453


# Working with trajectories

## The number of frames in a trajectory

In [8]:
len(u.trajectory)

10

## Iterate over frames to get certain information from the whole dataset

In [9]:
# Get the center of mass for all the 10 frames in the sample dataset
com = []
for ts in u.trajectory:
    center = u.atoms.center_of_mass()
    com.append(center)
np.array(com)

array([[49.34392914, 45.98646084, 51.13193348],
       [52.50568547, 48.05364714, 49.66329707],
       [45.05244407, 48.09455926, 49.13975922],
       [46.95114827, 47.39075919, 47.51692155],
       [50.70829029, 45.26321585, 46.47711484],
       [51.98144021, 48.73396415, 45.97342606],
       [47.76344956, 48.84835129, 44.15456936],
       [50.38872946, 50.27402928, 44.72650511],
       [51.89509917, 50.79428328, 49.61875013],
       [52.24712501, 47.19119227, 46.60381208]])

After iteration, the frame /# sets back to 0:

In [10]:
u.trajectory.frame

0

## Directly give the frame number of interest

In [11]:
#Choose the fifth frame
u.trajectory[4]
u.trajectory.frame # The frame remained at 4 before further changes

4

## Use `next()` method to move to the next frame

In [12]:
print(u.atoms.center_of_mass())
u.trajectory.next()

[50.70829029 45.26321585 46.47711484]


< Timestep 5 with unit cell dimensions [97.384636 97.384636 97.384636 90.       90.       90.      ] >

## Dynamic selection

Set the argument `updating` to `True`

# Save back to disk

## Single frame
Use `write()` method to save any `Atomgroup`s to disk (`(Atomgroup).write()`)

Parse automatically by extension

In [13]:
#carbons.write('xxx.pdb')

## Trajectories

1. Open a trajectory `Writer` and specify the output file name, **how many atoms a frame will contain**
1. **Iterate** through the trajectory and write coordinates frame-by-frame with `Writer.write()`

[with statement 01](https://blog.csdn.net/u012609509/article/details/72911564)

[with statement 02](https://www.cnblogs.com/pythonbao/p/11211347.html)

In [14]:
#with mda.Writer('xxx.pdb', carbons.n_atoms) as w:
    #for ts in u.trajectory:
        #w.write(carbons)

# Analysis
A common scheme for doing analysis in MDAnalysis

1. Initialise the analysis with a `Universe` and other required parameters
1. Run the analysis with `.run()`. Optional arguments are the `start` frame index, `stop` frame index, `step` size, and toggling `verbose`. The default is to run analysis on the whole trajectory.
1. Often, a function is available to operate on **single frames**

## Example: RMSD

### `rmsd()`: RMSD between two numpy arrays of coordinates

In [15]:
# Want to determine the RMSD between the first frame and the last frame
# Only one frame can be read at a time, set to the first frame
u.trajectory[0] 
coor_1 = u.atoms.positions
# Set to the last frame
u.trajectory[-1]
coor_2 = u.atoms.positions
rms.rmsd(coor_1, coor_2)

22.20022899998312

### `RMSD` class: RMSD on (all) trajectories with one reference

In [16]:
# Reset the frame
u.trajectory[0]
# Build the model, select the 5th frame as reference
rmsd_analysis = rms.RMSD(u, ref_frame=4)
rmsd_analysis.run()
pd.DataFrame(rmsd_analysis.rmsd)

Unnamed: 0,0,1,2
0,0.0,0.0,6.378625
1,1.0,0.048888,4.721885
2,2.0,0.097776,5.273952
3,3.0,0.146665,8.722807
4,4.0,0.195553,7.014939e-07
5,5.0,0.244441,3.727947
6,6.0,0.293329,4.691046
7,7.0,0.342217,3.504767
8,8.0,0.391106,4.19442
9,9.0,0.439994,5.229974
