In [1]:
import mdtraj as md
import numpy as np
import math

# Hydrogen bond analysis with GROMACS

This will be a simple example about how to automate the calculation of hydrogen bonds using GROMACS.  To automate the process, we need to pass to GROMACS a text file with the two relevant groups we want. 

First let us specify what we want to analyze. 

In [30]:
# path to the trajectory file
traj_path = '../../4_prod_f305/trimmed_470.xtc'

# we need to tell us what tpr file to use
tpr_path = '../../4_prod_f305/run.tpr'

# we also need to know the location of the index file
ndx_path = '../../index.ndx'


## Index file 

Let us look at the index file to see what current group options we have:

In [50]:
groups =!cat ../../index.ndx | grep '\['

print(groups)

['[ System ]', '[ Other ]', '[ ucer2 ]', '[ chol ]', '[ ffa24 ]', '[ tip3p ]', '[ lipids ]']


Let us take this index information and put it into an array we can iterate over

In [51]:
#strip out the square brackets and space
for i,group in enumerate(groups):
    group = group.strip('[]')
    group = group.strip()
    groups[i] = group
    
print(groups)

['System', 'Other', 'ucer2', 'chol', 'ffa24', 'tip3p', 'lipids']


GROMACS will expect us to give the selections as numerical indices (starting at 0) not strings.  As such, System is 0, Other is 1, etc.  to store this info we will just create a dictionary that stores the name and index.

For the calculation, we will only focus on the inidividual species (i.e., ucer2, chol, ffa24, and tip3p), and not the aggregated groups (System, Other, and lipids), as such we will basically leave these out when we create the array of dictionary entries, to make it easier to just loop over all pairs of species.  

* Note, the reason to ignore the aggregated species, aside from being able to calculate this from the individual species calculations, is that we need to make sure that there is either full overlap between groups we consider (e.g., ucer2 with ucer2) or zero overlap (e.g., ucer2 with ffa24).  


In [52]:
stripped_group = []
for i in range(0, len(groups)):
    if 'Other' not in groups[i]:
        if 'System' not in groups[i]:
            if 'lipids' not in groups[i]:
                temp_group = {'name': groups[i], 'index':i}
                stripped_group.append(temp_group)
                
                

In [54]:
for group in stripped_group:
    print(group['name'], group['index'])

ucer2 2
chol 3
ffa24 4
tip3p 5


## Automating the calculation

With the groups and indices determined, we will loop over all pairs. We will save the output from the calculation to a file that contains the group names, e.g., `hbond_ucer2_ffa24.xvg`.  The indices are written to a temporary file named `selection.txt` that we pipe into the gmx program. 

Note, we save output filenames to an array that we can access later on to parse the information easily. 

In [46]:
filenames = []
for i in range(0, len(stripped_group)):
    for j in range(i, len(stripped_group)):
        index_i = stripped_group[i]['index']
        index_j = stripped_group[j]['index']
        name_i = stripped_group[i]['name']
        name_j = stripped_group[j]['name']
        print(index_i, index_j, '##', name_i, name_j)

        
        text_file = open(f'selection.txt', 'w')
        text_file.write(f'{index_i}\n{index_j}\n')
        text_file.close()

        output_file = f'hbond_{name_i}_{name_j}.xvg'
        filenames.append(output_file)
        msg = f'/usr/local/gromacs/bin/gmx hbond -f {traj_path} -s {tpr_path} -n {ndx_path} -num {output_file} < selection.txt'
        
        print(msg)
        !{msg}

2 2 ## ucer2 ucer2
/usr/local/gromacs/bin/gmx hbond -f ../../4_prod_f305/trimmed_470.xtc -s ../../4_prod_f305/run.tpr -n ../../index.ndx -num hbond_ucer2_ucer2.xvg < selection.txt >log.txt
                      :-) GROMACS - gmx hbond, 2022.2 (-:

Executable:   /usr/local/gromacs/bin/gmx
Data prefix:  /usr/local/gromacs
Working dir:  /Users/cri/Dropbox/Mac (3)/Documents/Projects/CER_reverse_mapped_v3_allext/analysis_scripts/hydrogen_bonding
Command line:
  gmx hbond -f ../../4_prod_f305/trimmed_470.xtc -s ../../4_prod_f305/run.tpr -n ../../index.ndx -num hbond_ucer2_ucer2.xvg

Reading file ../../4_prod_f305/run.tpr, VERSION 2020.6 (single precision)
Note: file tpx version 119, software tpx version 127
Group     0 (         System) has 271956 elements
Group     1 (          Other) has 271956 elements
Group     2 (          ucer2) has 103200 elements
Group     3 (           chol) has 29600 elements
Group     4 (          ffa24) has 59200 elements
Group     5 (          tip3p) has 79956 e

## Parsing the data

We can easily read in the files we generated.  Note that I just outputted these to the execution directory, so I just set full_path to be the filename. 

In [55]:
from os.path import exists

hbond_data = []
hbond_dict = []
for file in filenames:
    full_path = file
    breakdown = file.split('_')
    breakdown2 = breakdown[-1].split('.')

    
    header = ['time', 'hbonds', 'pairs']
    
    file_exists = exists(full_path)
    
    if file_exists:
        B = np.genfromtxt(full_path, names=header, dtype=None, skip_header=25) 

        data_dict = { "i": breakdown[0],
                      "j": breakdown[1],
                      "loc" :  breakdown2[0],
                      "hbonds" : [B[:]['hbonds']] }
        hbond_dict.append(data_dict)

        hbond_data.append(B)
       # print()
        print(file, '\t' , np.mean(B[:]['hbonds']), np.std(B[:]['hbonds']),  np.mean(B[:]['pairs']), 
              np.mean(B[:]['hbonds'])/ np.mean(B[:]['pairs']))

hbond_ucer2_ucer2.xvg 	 761.4084507042254 17.527505518822657 4109.179577464789 0.1852945183704982
hbond_ucer2_chol.xvg 	 204.11971830985917 12.342519199998872 413.90492957746477 0.49315604556397763
hbond_ucer2_ffa24.xvg 	 544.4894366197183 19.428359581308733 889.8521126760563 0.6118875584643753
hbond_ucer2_tip3p.xvg 	 695.8732394366198 19.41661810203144 1330.5669014084508 0.5229900418385682
hbond_chol_chol.xvg 	 0.2007042253521127 0.4177398069614321 0.7992957746478874 0.2511013215859031
hbond_chol_ffa24.xvg 	 97.28169014084507 6.836486822411156 189.21478873239437 0.5141336509295271
hbond_chol_tip3p.xvg 	 95.89084507042253 7.059010872432736 200.1232394366197 0.4791589689451922
hbond_ffa24_ffa24.xvg 	 238.75 8.56648604146365 1046.5633802816901 0.2281276074610395
hbond_ffa24_tip3p.xvg 	 588.2147887323944 14.402158715261798 1001.330985915493 0.5874329237845403
hbond_tip3p_tip3p.xvg 	 45475.81338028169 86.00108722607321 89533.7852112676 0.5079179135895471
