# <center> Workshop - Intro to MDAnalysis Part 2</center>


# Distance calculations in MDAnalysis

Atom coordinates are in the 
`.positions` attribute of an `AtomGroup`

The positions are returned as a NumPy array, which we can then readily manipulate.




In [1]:
# First we import MDAnalysis
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import GRO, TRR

u = mda.Universe(GRO, TRR)

ag = u.select_atoms('protein')

pos = ag.positions

print(f'AtomGroup has length {len(ag)} and positions is shape {pos.shape}')

pos

  import xdrlib


AtomGroup has length 3341 and positions is shape (3341, 3)


array([[52.017067, 43.56005 , 31.554958],
       [51.18792 , 44.112053, 31.722015],
       [51.550823, 42.827724, 31.038803],
       ...,
       [51.15081 , 41.05192 , 22.588148],
       [50.535004, 42.01    , 22.073002],
       [50.529068, 40.31308 , 23.381912]], dtype=float32)

Some built-in functions based on position data:

- `center_of_mass()`

- `center_of_geometry()`


In [2]:
print(ag.center_of_mass())

print(ag.center_of_geometry())

[60.24875121 51.62883133 28.34134841]
[60.36406581 51.71502188 28.40504896]


## The `lib.distances` module

Particle positions are given as numpy arrays, so most work can be done using numpy (and numpy derived) libraries.

MDAnalysis  `lib.distances` module comes handy when

- we have periodic boundary conditions (numpy cannot handle this)

- domain specific algorithms can be used


In [3]:
from MDAnalysis.lib import distances

distances

<module 'MDAnalysis.lib.distances' from '/home/richard/miniconda3/envs/mda/lib/python3.11/site-packages/MDAnalysis/lib/distances.py'>

### `distance_array`

To calculate **all** pairwise distances between two arrays of coordinates.

In [4]:
ag1 = u.atoms[:10]
ag2 = u.atoms[10:30]


da = distances.distance_array(ag1.positions, 
                              ag2.positions,
                              box=u.dimensions)

print(f'Our input atomgroups had sizes {len(ag1)} and {len(ag2)}, the output had shape: {da.shape}')
print()

print(da)

Our input atomgroups had sizes 10 and 20, the output had shape: (10, 20)

[[4.75977987 4.08566621 4.55240204 5.05365216 5.81029095 5.60025122
  4.39684724 2.46426988 2.90528184 3.51211077 3.63822605 4.83737734
  5.0373312  5.4420485  5.47324644 6.48970676 5.10346582 5.30263882
  4.14488932 6.28256501]
 [5.01070018 4.48283351 5.21601535 5.55752695 6.18997295 6.19046375
  4.81600563 2.82888325 2.84233926 4.04800014 4.36943418 5.25682233
  5.25253926 5.90326943 6.09164298 6.89807422 5.4536302  5.45679162
  4.52705133 6.66421694]
 [4.89507874 4.52671222 4.53194417 4.70638093 5.50270622 5.1351855
  3.91204319 3.35714888 3.8491653  4.35576489 4.32928123 5.73773436
  5.99530274 6.34881541 6.28382658 7.41287194 6.0266646  6.27758574
  5.03665197 7.13546103]
 [5.52922002 4.62538362 5.05698502 5.74221794 6.57764005 6.19140689
  5.15532241 2.63168772 3.13787895 3.41055243 3.47894768 4.64791415
  4.93157535 4.97535289 4.9244004  6.04014169 4.46780995 4.77116166
  3.43541877 5.50468309]
 [3.5502045

The output of distance array is a matrix of the distance between each position in the first coordinate array and each position in the second coordinate array.

For example to look at the distance between the 4th and 6th atom in the two AtomGroups:

In [5]:
print(f'The distance between {ag1[3]} and {ag2[5]} is {da[3, 5]} A')

The distance between <Atom 4: H3 of type H of resname MET, resid 1 and segid SYSTEM> and <Atom 16: HE2 of type H of resname MET, resid 1 and segid SYSTEM> is 6.19140688959118 A


### `self_distance_array`

For calculating distances between all combinations of coordinates.

Takes a **single array** of coordinates and calculates all pairwise distances.
This will yield  ½ n(n-1) distances.


In [6]:
sda = distances.self_distance_array(ag1.positions, box=None)

print(f'Our input AtomGroup had size {len(ag1)} and the output has shape {sda.shape}')


Our input AtomGroup had size 10 and the output has shape (45,)


### `calc_bonds`

For calculating a series of distances between pairs of coordinates.

Takes 2 arrays of coordinates **of equal length**, and returns the distances between coordinate pairs in each row.


In [7]:
coords1 = u.atoms[:10].positions
coords2 = u.atoms[10:20].positions
dist = distances.calc_bonds(coords1, 
                            coords2, 
                            box=None)

print(f'The inputs had length {len(coords1)} and {len(coords2)} and the output has shape {dist.shape}')
print()
print(f'The distance between the first coordinate in each array is: {dist[0]}')

The inputs had length 10 and 10 and the output has shape (10,)

The distance between the first coordinate in each array is: 4.759779866326316


### `calc_angles` & `calc_dihedrals`

Calculates either the angle or dihedral angle between 3 or 4 arrays of coordinates.
Takes 3 or 4 arrays of **identical length** coordinates.

For angles, the middle array is the apex of the angle.

For dihedrals, the angle is formed between the plane of the first three coordinates, and the plane of the second three coordinates.


In [8]:
import numpy as np
coords1 = u.atoms[:10].positions
coords2 = u.atoms[10:20].positions
coords3 = u.atoms[20:30].positions

angles = distances.calc_angles(
            coords1, coords2, coords3,
            box=None, result=None)

print(np.rad2deg(angles))


coords4 = u.atoms[30:40].positions

dihedrals = distances.calc_dihedrals(coords1, coords2, coords3, coords4)

[ 46.07622285  78.77579992  67.70916382  37.08371044  27.22331407
  21.63933669  28.11400767 161.52825929 138.09917802 167.75476369]


# Minimum image convention

To account for periodic boundary conditions in distance calculations,
pass the box information as `box=ag.dimensions` to any distance or angle function.

This makes the distance calculation take minimum image convention into account when calculating distance,
which makes the measured distances equal to the minimum possible between all periodic images of the two coordinates.

In [9]:
print(f'The box size of our Universe is {u.dimensions}')

The box size of our Universe is [80.017006 80.017006 80.017006 60.       60.       90.      ]


In [10]:
protein = u.select_atoms('protein')

print(f'The maximum distance without periodic boundaries is {distances.self_distance_array(protein).max()}')

print(f'The maximum distance with periodic boundaries is {distances.self_distance_array(protein, box=u.dimensions).max()}')

The maximum distance without periodic boundaries is 89.17199054119575
The maximum distance with periodic boundaries is 51.016247884634865


# `capped_distance` and `self_capped_distance`

Only find distances up to a maximum limit. It returns:
- an array of indices
- an array of distances

This is much more efficient when dealing with large (>50,000 atoms) systems.

For example, the start of a hydrogen bond analysis might look like:

In [11]:
hydrogens = u.select_atoms('resname SOL and type H')
acceptor = u.select_atoms('protein and type N O')

print(f'We have {len(hydrogens)} hydrogens and {len(acceptor)} acceptors')

ix, dist = distances.capped_distance(hydrogens.positions, 
                          acceptor.positions, 
                          min_cutoff =1.0,
                          max_cutoff=4.0,
                          box=u.dimensions)

print(f'We found {len(ix)} contacts less then 4.0 A')
print()
print(f'The first three are {ix[:3]} at distances {dist[:3]}')

We have 22168 hydrogens and 609 acceptors
We found 2602 contacts less then 4.0 A

The first three are [[  1 133]
 [ 36 472]
 [ 37 472]] at distances [3.89182471 1.91282448 3.38900544]


We can see that capped_distance is approximately 10x faster than the brute force solution.

In [12]:
%timeit distances.distance_array(hydrogens.positions, acceptor.positions, box=u.dimensions)

297 ms ± 817 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [13]:
%timeit distances.capped_distance(hydrogens.positions, acceptor.positions, min_cutoff=1.0, max_cutoff=4.0, box=u.dimensions)

22.6 ms ± 264 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


# A summary of Lecture 2

Calculating pairwise distances:
- calc_bonds
- distance_array
- self_distance_array

Faster, sparse pairwise distances:
- capped_distance
- self_capped_distance

Calculating angles:
- calc_angles
- calc_dihedrals

Use `box=u.dimensions` to take minimum image convention into account (if you want to!).

2022-07-26 11:19:45 2022-07-26 11:19:46 ## A summary of Lecture 1

Most simulation analysis will involve extracting position data from certain atoms.

- A `Universe` contains all information about a simulation system

- An `AtomGroup` contains information about a group of atoms

- We can use `Universe.select_atoms()` to create an `AtomGroup` containing specific atoms from a `Universe`

- Positions of atoms in an AtomGroup are accessed through `AtomGroup.positions`