## Using lib.distances to identify hydrogen bonds

This notebook walks through how to use various functions in MDAnalysis.lib.distances to identify hydrogen bonding between certain residues and the water solvent.

A hydrogen bond (in the context of this analysis) will be defined as an interaction between three atoms:
- An acceptor, which is attracting the hydrogen
- A hydrogen, which is being pulled into the acceptor
- A donor, which is bonded to the hydrogen and being dragged along for the ride.

We will use a geometric criteria:
- a hydrogen-acceptor distance of 3.0A 
- an acceptor-hydrogen-donor angle of greater than 120 degrees.

This notebook is quite dense, feel free to go slowly and make sure that you understand each step.
For each step, try and understand what is being put into each function, and what comes out of each function.
Feel free to explore and change the inputs to see how this affects results.

We start with the usual imports and loading our system (here TPR and TRR).

In [None]:
import MDAnalysis as mda
from MDAnalysisTests.datafiles import TPR, TRR

import numpy as np

In [None]:
u = mda.Universe(TPR, TRR)

We then select our groups of interest.
Firstly we grab all the oxygen atoms on either ASP or GLU residues.

In [None]:
sidegroups = u.select_atoms('resname ASP GLU')

acceptors = sidegroups.select_atoms('element O')

Then we select the hydrogen atoms from the SOL (solvent) molecules.

In [None]:
solvent = u.select_atoms('resname SOL')

hydrogens = solvent.select_atoms('element H')

## Distance criteria

We first want to identify hydrogens and acceptors that are within our distance criteria of 3.0 angstrom.
A naive approach is to calculate a distance array between all acceptors and all hydrogens.:

In [None]:
%%time

da = mda.lib.distances.distance_array(acceptors.positions, hydrogens.positions, box=u.dimensions)

**Hint** `np.where` is a handy function for returning the *indices* of where a condition is True.  Here we use it to extract the row and column numbers of where an entry in a distance matrix is less than 3.0.

In [None]:
acc_idx, hyd_idx = np.where(da < 3.0)

## Using capped distance

This is a great example of where we're not interested in all distances, but instead only those up to a given cutoff - Using `capped_distance` is much quicker here!

**Reminder**: The output of `capped_distance` is no longer a matrix, but an array of indices and the distance values at those indices.  This can be thought of as a sparse matrix.

Try experimenting with the cutoff distance to see how the time required varies.

In [None]:
%%time 

idx, dists = mda.lib.distances.capped_distance(acceptors.positions, hydrogens.positions, max_cutoff=3.0,
                                            box=u.dimensions)

The `idx` array is a `(n, 2)` array of indices; to grab the first and second column, we can transpose the array (`.T`) and assign each row to a varaible, `acc_idx` for the *indices* of the acceptors and `hyd_idx` for the *indices* of the hydrogen atoms.

In [None]:
acc_idx, hyd_idx = idx.T

Remembering that we can slice AtomGroups with numpy arrays, we can use these indices arrays to slice our original AtomGroups to filter them down and make them smaller.

In [None]:
# select potential hydrogen bonds to check angles
potential_hbond_acceptors = acceptors[acc_idx]
potential_hbond_hydrogens = hydrogens[hyd_idx]

To get the **donors** for each hydrogen bond is slightly trickier.
We can use the fact that hydrogens will only have one covalent bond, and simply loop over the hydrogen atoms, grabbing the first (and only) bonded atom of each. 

**Reminder** `sum()` over `MDAnalysis.Atom` objects will produce an `AtomGroup`!

In [None]:
potential_hbond_donors = sum(h.bonded_atoms[0] for h in potential_hbond_hydrogens)

## Checking the angle criteria

Now that we've identified hydrogens and acceptors which are close enough for a hydrogen bond, we can now check our angular criteria.
The angle formed by the acceptor-hydrogen-donor must be greater than 120 degrees!

**Reminder**: The input to `calc_angles` must be in the correct order, with the second array of positions being the vertex of the angle.  Results are returned in radians!

By first checking the distance criteria and filtering down our input, we greatly reduce the number of angles we must calculate.
This is important as angles calculations are computationally more expensive than distance calculations.

In [None]:
angles = np.rad2deg(
    mda.lib.distances.calc_angles(potential_hbond_acceptors.positions,
                                  potential_hbond_hydrogens.positions,
                                  potential_hbond_donors.positions, box=u.dimensions)
)

Again we can use `np.where` to get the *indices* of where a condition is True, here if a value is above 120.

In [None]:
angle_idx = np.where(angles >= 120.0)

Finally, we can slice our list of candidate atoms with `angle_idx` to get three AtomGroups, each representing a different component in a hydrogen bond.

In [None]:
hbond_acceptors = potential_hbond_acceptors[angle_idx]
hbond_hydrogens = potential_hbond_hydrogens[angle_idx]
hbond_donors = potential_hbond_donors[angle_idx]

In [None]:
hbond_donors

## Extension work

The `lib.distances` module is used heavily throughout MDAnalysis.

For further exercises:

- Currently bonds are guessed based upon distances between atoms.  Which functions could you use to find all pairs of atoms that are close enough to be bonded?

- A radial distribution function can be calculated using a histogram of distances.  Using `np.hist` (to make a histogram) how could you calculate the distribution of distances between two AtomGroups?

### The Analysis class way

Because hydrogen bond analysis is so common, it already exists as an Analysis class.
I'm sorry about making you write it all over again,
but perhaps it teaches another lesson about checking for existing solutions before writing your own.

In [None]:
from MDAnalysis.analysis.hydrogenbonds import HydrogenBondAnalysis

In [None]:
hbonds = HydrogenBondAnalysis(u,
                             acceptors_sel='resname ASP GLU and element O',
                             hydrogens_sel='resname SOL and element H')

We can then run analysis for the first 5 frames of the trajectory.

**Reminder** By default all frames will be analysed, defining `start`, `stop`, `step` in `run()` will control how the trajectory is sliced.

In [None]:
hbonds.run(stop=5)

Results are then available through the `.results` attribute.
The format of this is slightly confusing, from the documentation:

"
Hydrogen bond data are returned in a numpy.ndarray on a “one line, one observation” basis and can be accessed via HydrogenBondAnalysis.hbonds:

```
results = [
    [
        <frame>,
        <donor index (0-based)>,
        <hydrogen index (0-based)>,
        <acceptor index (0-based)>,
        <distance>,
        <angle>
    ],
    ...
]
```
"

In [None]:
hbonds.results.hbonds[0]

The second to fourth values are actually integers...

In [None]:
u.atoms[3561]

In [None]:
u.atoms[3563]

In [None]:
u.atoms[2492]