<a href="https://colab.research.google.com/github/dgoppenheimer/Molecular-Dynamics/blob/main/gromacs_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Installation of GROMACS

### Download GROMACS

These instructions were largely adapted from [Installing Software on Google Colab for IBM3202 tutorials](https://colab.research.google.com/github/pb3lab/ibm3202/blob/master/tutorials/lab00_software.ipynb). I updated the GROMACS version, and to do that I needed to upgrade `cmake`.

<mark>Installation of this software takes about 40 min.</mark> Therefore, we will save the compiled software on Google Drive to save time later.



<mark>**VERY IMPORTANT FIRST STEP:**</mark> Go to the Menu &#8594; *Runtime* &#8594; *Change Runtime Type* and choose GPU!

**Note:** a page reload will be required. This is okay.

First, let's confirm that we are in the correct directory.

In [None]:
!pwd

/content


In [None]:
#Download GROMACS 2021.5
!wget https://ftp.gromacs.org/gromacs/gromacs-2021.5.tar.gz

--2022-03-22 20:23:10--  https://ftp.gromacs.org/gromacs/gromacs-2021.5.tar.gz
Resolving ftp.gromacs.org (ftp.gromacs.org)... 130.237.11.165, 2001:6b0:1:1191:216:3eff:fec7:6e30
Connecting to ftp.gromacs.org (ftp.gromacs.org)|130.237.11.165|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 38023772 (36M) [application/x-gzip]
Saving to: ‘gromacs-2021.5.tar.gz’


2022-03-22 20:23:13 (19.4 MB/s) - ‘gromacs-2021.5.tar.gz’ saved [38023772/38023772]



### Install GROMACS

We will install the software into a pre-defined user directory that will not be deleted when we quit this notebook.

Start a `bash` subshell to run several `bash` commands in the same code cell.

In [None]:
%%bash
# extracting the software
tar xfz gromacs-2021.5.tar.gz
echo "GROMACS extraction completed"

GROMACS extraction completed


tar (child): gromacs-2021.5.tar.gz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now


In [None]:
%%bash
# create and enter the build directory
cd gromacs-2021.5
mkdir build
cd build

bash: line 1: cd: gromacs-2021.5: No such file or directory


In [None]:
# check the cmake version
!cmake --version

cmake version 3.12.0

CMake suite maintained and supported by Kitware (kitware.com/cmake).


In [None]:
!apt remove cmake

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  cmake-data libarchive13 liblzo2-2 librhash0 libuv1
Use 'apt autoremove' to remove them.
The following packages will be REMOVED:
  cmake
0 upgraded, 0 newly installed, 1 to remove and 39 not upgraded.
After this operation, 17.5 MB disk space will be freed.
(Reading database ... 155335 files and directories currently installed.)
Removing cmake (3.10.2-1ubuntu2.18.04.2) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
W: Operation was interrupted before it could finish


In [None]:
!pip install cmake --upgrade

[31mERROR: Operation cancelled by user[0m


In [None]:
!cmake --version

cmake version 3.12.0

CMake suite maintained and supported by Kitware (kitware.com/cmake).


In [None]:
!pwd

/content


In [None]:
%cd gromacs-2021.5/build/

[Errno 2] No such file or directory: 'gromacs-2021.5/build/'
/content


In [None]:
!cmake --version

cmake version 3.12.0

CMake suite maintained and supported by Kitware (kitware.com/cmake).


In [None]:
!apt remove cmake

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Package 'cmake' is not installed, so not removed
The following packages were automatically installed and are no longer required:
  cmake-data libarchive13 liblzo2-2 librhash0 libuv1
Use 'apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 39 not upgraded.


In [None]:
!pip install cmake --upgrade

Collecting cmake
  Downloading cmake-3.22.3-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (22.5 MB)
[K     |████████████████████████████████| 22.5 MB 6.2 MB/s 
[31mERROR: Operation cancelled by user[0m
[?25h

In [None]:
!cmake --version

cmake version 3.12.0

CMake suite maintained and supported by Kitware (kitware.com/cmake).


In [None]:
#@title make
%%bash
# had to change -DGMX_GPU=on to -DGMX_GPU=CUDA.
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DGMX_GPU=CUDA -DCMAKE_INSTALL_PREFIX=/content/gromacs-2021

CMake Error: The source directory "/" does not appear to contain CMakeLists.txt.
Specify --help for usage, or press the help button on the CMake GUI.


In [None]:
%%bash
make
# ~20 min?

make: *** No targets specified and no makefile found.  Stop.


In [None]:
%%bash
make check
# 31 min

make: *** No rule to make target 'check'.  Stop.


In [None]:
%%bash
make install

make: *** No rule to make target 'install'.  Stop.


We now check that the installation was successful by loading the GROMACS PATH onto Google Colab.

In [None]:
##Checking that GROMACS was successfully installed
%%bash
source /content/gromacs-2021/bin/GMXRC
gmx -h

bash: line 1: /content/gromacs-2021/bin/GMXRC: No such file or directory
bash: line 2: gmx: command not found


<mark>SWEET!</MARK>

In [None]:
!pwd

/content


In [None]:
%cd ../../drive/MyDrive

[Errno 2] No such file or directory: '../../drive/MyDrive'
/content


In [None]:
#Copying your compiled GROMACS to your Google Drive
#We will create and/or use the IBM3202 folder to create a folder for compiled programs
import os
import shutil
from pathlib import Path 
IBM3202 = Path("/content/drive/MyDrive/IBM3202/")
if os.path.exists(IBM3202):
  print("IBM3202 already exists")
if not os.path.exists(IBM3202):
  os.mkdir(IBM3202)
  print("IBM3202 did not exists and was succesfully created")
#Then, we will copy the compiled GROMACS to this folder
shutil.copytree(str('/content/gromacs-2021'), str(IBM3202/'gromacs-2021'))
#!cp -d -r /content/gromacs-2021 "$IBM3202"/gromacs-2021
print("GROMACS successfully backed up!")

IBM3202 already exists


FileNotFoundError: ignored

## Important Code to Run

**Connect to a runtime**  
The code cells below need to be run each time you return to Colab.  


In [None]:
# This gets you into the correct directory
%cd /content/drive/MyDrive/

/content/drive/MyDrive


In [None]:
# Give permissions to run gmx
!chmod 755 -R /content/drive/MyDrive/IBM3202/gromacs-2021

In [None]:
# Import stuff for graphing
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
from pathlib import Path

## Using GROMACS

In [None]:
# Checking that our GROMACS works
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx -h

/content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC: line 13: /content/gromacs-2021/bin/GMXRC.bash: No such file or directory
bash: line 2: gmx: command not found


Need to change path in `GMXRC`. On line 13, change 

```bash
. /content/gromacs-2021/bin/GMXRC.bash
```
to 
```bash
. /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC.bash
```


In [None]:
# Checking that our GROMACS works
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx -h

bash: line 2: gmx: command not found


Need to fix the path in `GMXRC.bash`. Change line 53 to `GMXPREFIX=/content/drive/MyDrive/IBM3202/gromacs-2021`

In [None]:
# Checking that our GROMACS works
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx -h

bash: line 2: /content/drive/MyDrive/IBM3202/gromacs-2021/bin/gmx: Permission denied


Ouch! 

In [None]:
!chmod 755 -R /content/drive/MyDrive/IBM3202/gromacs-2021

In [None]:
# Checking that our GROMACS works
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx -h

<mark>SWEET! Finally!</mark>

Okay, it looks like the saved GROMACS binary is working.

For this test, we will follow the excellent tutorial by Justin Lemkul, [Lysozyme in water](http://www.mdtutorials.com/gmx/lysozyme/index.html), but we will use a different protein. Here we will use the human prion protein (RCSB ID: 1qLz).

In [None]:
!wget https://www.rcsb.org/structure/1QLZ

--2022-03-24 17:34:39--  https://www.rcsb.org/structure/1QLZ
Resolving www.rcsb.org (www.rcsb.org)... 128.6.159.248
Connecting to www.rcsb.org (www.rcsb.org)|128.6.159.248|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 72560 (71K) [text/html]
Saving to: ‘1QLZ’


2022-03-24 17:34:39 (936 KB/s) - ‘1QLZ’ saved [72560/72560]



## Preparing Structure Files

The 1QLZ protein structure was solved by NMR, which means that 20 structures were deposited in one `.pdb` file. Here we will use the *stream editor*, `sed`, because `grep` is designed for use on lines of text and we want to collect a block of text<a name="cite_ref-1"></a>[<sup>[1]</sup>](#cite_note-1).

The `.pdb` file has the following format:

```pdb
MODEL        1                                                                  
ATOM      1  N   LEU A 125       4.329 -12.012   2.376  1.00  0.00           N  
ATOM      2  CA  LEU A 125       5.029 -10.769   2.674  1.00  0.00           C  
...
ENDMDL                                                                          
MODEL        2                                                                  
ATOM      1  N   LEU A 125       5.962 -12.281  -0.586  1.00  0.00           N  
ATOM      2  CA  LEU A 125       6.228 -10.948  -0.052  1.00  0.00           C  
...
```

Note that each model is preceded by a line that designates the model number (`MODEL 1`, `MODEL 2`, and so on) and ends with the line `ENDMDL`. We can use these as starting and stopping patterns for each model that we want to extract from this file.

*Regex*, which is short for *Regular Expressions*<a name="cite_ref-2"></a>[<sup>[2]</sup>](#cite_note-2) is used for search and replace of characters in certain text files (but cannot be used for `html` files--see [Stackoverflow](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454)). In our case we want to extract the text between the patterns `MODEL 1` and `ENDMDL` (including the patterns) into a new file. I got the code, below, from [Stackoverflow](https://stackoverflow.com/questions/4857424/extract-lines-between-2-tokens-in-a-text-file-using-bash), but had to modify it for using `sed` on Mac OSX (you need the `-E` option). The `-E` option may or may not be necessary for Colab.

```bash
# Here is the sed command
sed -E -n '/^MODEL +1 /,/^ENDMDL/w 1qLz-model1.pdb' 1qlz.pdb
```

#### Explanation of command

- `sed -E` use extended regular expressions  
- `-n` do not echo every line to output  
- `'/START/,/STOP/'` pattern to search for; we start at lines that begin with `MODEL 1` and end with lines that begin with `ENDMDL`  
- `^` is Regex for the beginning of a line  
- `<space> +` search for 1 or more spaces  
- `1 <space>`, search for the number 1 followed by a space (or else you get model 19)  
- `w 1qLz-model1.pdb` write the output to the file `1qLz-model1.pdb`  
- `1qlz.pdb` is the input file

---
<a name="cite_note-1"></a>1. It is possible to extract blocks of text using `grep` but it is not as easy as using `sed`.[&#8617;](#cite_ref-1)

<a name="cite_note-2"></a>2. The [Python Regex Cheat Sheet](https://www.geeksforgeeks.org/python-regex-cheat-sheet/) has a list of many common regular expressions and is a useful reference.[&#8617;](#cite_ref-2)



#### Questions

Look at your single-model file. How many amino acids are in the protein? (Hint: you can use `grep` to quickly determine this).

## Preparing the Simulation

For this test, I will follow along with the [Molecular Modeling Practical](http://md.chem.rug.nl/~mdcourse/molmod2012/md.html) and with the [Lysozyme in water](http://www.mdtutorials.com/gmx/lysozyme/01_pdb2gmx.html) tutorial.

Note that the protein structure file we are using has no missing loops or other problems.

### Structure Conversion And Topology

We want to use the GROMOS 45a3 force field and the SPC water model

To make things easier, rename the `1qLz-model1.pdb` file to `protein.pdb`.



In [None]:
%cd /content/drive/MyDrive/

/content/drive/MyDrive


In [None]:
!pwd

/content/drive/MyDrive


In [None]:
%mv 1qLz-model1.pdb protein.pdb

Usually when running `pdb2gmx` we interactively select the force field, and the water model, but when using Colab, we have to specify them when running the command.


```bash
gmx pdb2gmx -f protein.pdb -o protein.gro -p protein.top -ignh -ff -water spce -ff amber99sb-ildn
```

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx pdb2gmx -f protein.pdb -o protein.gro -p protein.top -ignh -water spce -ff oplsaa

Using the Oplsaa force field in directory oplsaa.ff

going to rename oplsaa.ff/aminoacids.r2b
Reading protein.pdb...
Read '', 877 atoms

Analyzing pdb file
Splitting chemical chains based on TER records or chain id changing.

There are 1 chains and 0 blocks of water and 104 residues with 877 atoms

  chain  #res #atoms

  1 'A'   104    877  

All occupancies are one

Reading residue database... (Oplsaa)

Processing chain 1 'A' (877 atoms, 104 residues)

Identified residue LEU125 as a starting terminus.

Identified residue ARG228 as a ending terminus.
Start terminus LEU-125: NH3+
End terminus ARG-228: COO-

Checking for duplicate atoms....

Generating any missing hydrogen atoms and/or adding termini.

Now there are 104 residues with 1694 atoms

Making bonds...

Number of bonds was 1715, now 1715

Generating angles, dihedrals and pairs...

Making cmap torsions...

There are 4470 dihedrals,  381 impropers, 3053 angles
          4393 pairs,     1715 bonds and     0 virtual sites

Total ma

                     :-) GROMACS - gmx pdb2gmx, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mu

#### Questions

Write down the number of atoms before and after the conversion and explain the difference.

List the atoms, atom types and charges from a tyrosine residue as given in the topology file

### Energy Minimization

Use the `minim.mdp` file from [here](http://md.chem.rug.nl/~mdcourse/molmod2012/minim.mdp). Transfer it to your `/content/drive/MyDrive` directory.

NOTE: this `.mdp` file causes a fatal error with the current version of GROMACS.

```bash
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx grompp -f minim.mdp -c protein.gro -p protein.top -o protein-EM-vacuum.tpr
```



In [None]:
%%bash
# this had a fatal error
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx grompp -f minim.mdp -c protein.gro -p protein.top -o protein-EM-vacuum.tpr

### Solvation

The below is from Justin Lemkul's [Lysozyme in water](http://www.mdtutorials.com/gmx/lysozyme/03_solvate.html) tutorial.

```bash
gmx editconf -f protein.gro -o protein_newbox.gro -c -d 1.0 -bt cubic
```


In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx editconf -f protein.gro -o protein_newbox.gro -c -d 1.0 -bt cubic

Note that major changes are planned in future for editconf, to improve usability and utility.
Read 1694 atoms
Volume: 63.8311 nm^3, corresponds to roughly 28700 electrons
No velocities found
    system size :  4.966  3.770  3.410 (nm)
    diameter    :  5.100               (nm)
    center      : -0.395  0.122 -0.045 (nm)
    box vectors :  4.966  3.770  3.410 (nm)
    box angles  :  90.00  90.00  90.00 (degrees)
    box volume  :  63.83               (nm^3)
    shift       :  3.945  3.428  3.595 (nm)
new center      :  3.550  3.550  3.550 (nm)
new box vectors :  7.100  7.100  7.100 (nm)
new box angles  :  90.00  90.00  90.00 (degrees)
new box volume  : 357.96               (nm^3)


                     :-) GROMACS - gmx editconf, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu M

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx solvate -cp protein_newbox.gro -cs spc216.gro -o protein_solv.gro -p protein.top


         based on residue and atom names, since they could not be
         definitively assigned from the information in your input
         files. These guessed numbers might deviate from the mass
         and radius of the atom type. Please check the output
         files if necessary.

NOTE: From version 5.0 gmx solvate uses the Van der Waals radii
from the source below. This means the results may be different
compared to previous GROMACS versions.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
A. Bondi
van der Waals Volumes and Radii
J. Phys. Chem. 68 (1964) pp. 441-451
-------- -------- --- Thank You --- -------- --------

Adding line for 11130 solvent molecules with resname (SOL) to topology file (protein.top)


                     :-) GROMACS - gmx solvate, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mu

### Adding Ions

Use the `.mdp` file from [here](http://www.mdtutorials.com/gmx/lysozyme/Files/ions.mdp).

```bash
gmx grompp -f ions.mdp -c protein_solv.gro -p protein.top -o ions.tpr
```

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx grompp -f ions.mdp -c protein_solv.gro -p protein.top -o ions.tpr

Setting the LD random seed to -1627914545

Generated 330891 of the 330891 non-bonded parameter combinations

Generated 330891 of the 330891 1-4 parameter combinations

Excluding 3 bonded neighbours molecule type 'Protein_chain_A'

Excluding 2 bonded neighbours molecule type 'SOL'
Analysing residue names:
There are:   104    Protein residues
There are: 11130      Water residues
Analysing Protein...

This run will generate roughly 3 Mb of data


                      :-) GROMACS - gmx grompp, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mu

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx genion -s ions.tpr -o protein_solv_ions.gro -p protein.top -pname NA -nname CL -neutral

Will try to add 3 NA ions and 0 CL ions.
Select a continuous group of solvent molecules


                      :-) GROMACS - gmx genion, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mu

[Simulating User Interaction In Gromacs in Bash](https://stackoverflow.com/questions/45885541/simulating-user-interaction-in-gromacs-in-bash)

Note the `echo 13 | gmx genion ...` should work too.

In [None]:
# try this
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx genion -s ions.tpr -o protein_solv_ions.gro -p protein.top -pname NA -nname CL -neutral <<EOF
13
EOF

Will try to add 3 NA ions and 0 CL ions.
Select a continuous group of solvent molecules
Selected 13: 'SOL'

Processing topology
Replacing 3 solute molecules in topology file (protein.top)  by 3 NA and 0 CL ions.


                      :-) GROMACS - gmx genion, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mu

### Energy Minimization

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx grompp -f minim.mdp -c protein_solv_ions.gro -p protein.top -o em.tpr

Setting the LD random seed to -537658913

Generated 330891 of the 330891 non-bonded parameter combinations

Generated 330891 of the 330891 1-4 parameter combinations

Excluding 3 bonded neighbours molecule type 'Protein_chain_A'

Excluding 2 bonded neighbours molecule type 'SOL'

Excluding 1 bonded neighbours molecule type 'NA'
Analysing residue names:
There are:   104    Protein residues
There are: 11127      Water residues
There are:     3        Ion residues
Analysing Protein...
Analysing residues not classified as Protein/DNA/RNA/Water and splitting into groups...
Calculating fourier grid dimensions for X Y Z
Using a fourier grid of 60x60x60, spacing 0.118 0.118 0.118

Estimate for the relative computational load of the PME mesh part: 0.26

This run will generate roughly 3 Mb of data


                      :-) GROMACS - gmx grompp, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mu

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx mdrun -v -deffnm em

                      :-) GROMACS - gmx mdrun, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mur

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx energy -f em.edr -o potential.xvg <<EOF
10 0
EOF


Statistics over 855 steps [ 0.0000 through 854.0000 ps ], 1 data sets
All statistics are over 677 points (frames)

Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Potential                   -583233      11000    28087.3     -74421  (kJ/mol)


                      :-) GROMACS - gmx energy, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mu

```py
import numpy as np
import matplotlib.pyplot as plt

x, y = [], []

with open("data.xvg") as f:
    for line in f:
        cols = line.split()

        if len(cols) == 2:
            x.append(float(cols[0]))
            y.append(float(cols[1]))


fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("Plot title...")    
ax1.set_xlabel('your x label..')
ax1.set_ylabel('your y label...')
ax1.plot(x,y, c='r', label='the data')
leg = ax1.legend()
plt.show() 
```

also

```py
x,y = np.loadtxt("file.xvg",comments="@",unpack=True)
plt.plot(x,y)
```

In [None]:
%cd /content/drive/MyDrive/

/content/drive/MyDrive


In [None]:
# import stuff
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import plotly.express as px
from pathlib import Path

In [None]:
# Remove the comments from the xvg file and write a new file
!grep -v -e '\#' -e '\@' potential.xvg > potential.csv

In [None]:
# create the graph
df = pd.read_csv('potential.csv',
        sep='\s\s+', engine='python')
df.columns = ["Time (ps)", "kJ/mol"] # add headers to columns
fig = px.line(df, x="Time (ps)", y="(kJ/mol)")
fig.update_layout(width=700, title_text="GROMACS Energies")
fig.show()

Let's see if I can do it another way.

```py
data = np.loadtxt('potential.xvg',comments=['#', '@'])
```

I want to try to import the data without having to `grep` it into a new file.

In [None]:
# create the graph
import numpy as np
import pandas as pd

a = np.loadtxt('potential.xvg',comments=['#', '@']) # strip out the comments
header = ["x", "y"] # add headers to columns
frame = pd.DataFrame(a, columns=header)

fig = px.line(frame, x="x", y="y")
fig.update_xaxes(title_text="Time (ps)") # label x-axis
fig.update_yaxes(title_text="kJ/mol") # label y-axis
fig.update_layout(width=700, title_text="GROMACS Energies")
fig.show()

## Equilibration

Upload the `nvt.mdp` file to Colab.

In [None]:
!chmod 755 -R /content/drive/MyDrive/IBM3202/gromacs-2021

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx grompp -f nvt.mdp -c em.gro -r em.gro -p protein.top -o nvt.tpr

Setting the LD random seed to -403963915

Generated 330891 of the 330891 non-bonded parameter combinations

Generated 330891 of the 330891 1-4 parameter combinations

Excluding 3 bonded neighbours molecule type 'Protein_chain_A'

turning H bonds into constraints...

Excluding 2 bonded neighbours molecule type 'SOL'

turning H bonds into constraints...

Excluding 1 bonded neighbours molecule type 'NA'

turning H bonds into constraints...

Setting gen_seed to -13296898

Velocities were taken from a Maxwell distribution at 300 K
Analysing residue names:
There are:   104    Protein residues
There are: 11127      Water residues
There are:     3        Ion residues
Analysing Protein...
Analysing residues not classified as Protein/DNA/RNA/Water and splitting into groups...

Determining Verlet buffer for a tolerance of 0.005 kJ/mol/ps at 300 K

Calculated rlist for 1x1 atom pair-list as 1.035 nm, buffer size 0.035 nm

Set rlist, assuming 4x4 atom pair-list, to 1.000 nm, buffer size 0.000 nm

N

                      :-) GROMACS - gmx grompp, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mu

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx mdrun -deffnm nvt

# this took 1 min to run instead of 1 hr

                      :-) GROMACS - gmx mdrun, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mur

Plot the energy.

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx energy -f nvt.edr -o temperature.xvg  <<EOF
16 0
EOF


Statistics over 50001 steps [ 0.0000 through 100.0000 ps ], 1 data sets
All statistics are over 501 points

Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Temperature                  300.08       0.19    2.90712   0.674016  (K)


                      :-) GROMACS - gmx energy, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mu

Let's plot the temperature vs time. First we'll look at the file.

In [None]:
# glance at the file
!head -20 temperature.xvg

# This file was created Thu Mar 31 16:37:17 2022
# Created by:
#                      :-) GROMACS - gmx energy, 2021.5 (-:
# 
# Executable:   /content/drive/MyDrive/IBM3202/gromacs-2021/bin/gmx
# Data prefix:  /content/drive/MyDrive/IBM3202/gromacs-2021
# Working dir:  /content/drive/MyDrive
# Command line:
#   gmx energy -f nvt.edr -o temperature.xvg
# gmx energy is part of G R O M A C S:
#
# GROup of MAchos and Cynical Suckers
#
@    title "GROMACS Energies"
@    xaxis  label "Time (ps)"
@    yaxis  label "(K)"
@TYPE xy
@ view 0.15, 0.15, 0.75, 0.85
@ legend on
@ legend box on


In [None]:
# plot the graph
b = np.loadtxt('temperature.xvg',comments=['#', '@']) # strip out the comments
header = ["x", "y"] # add headers to columns
frame2 = pd.DataFrame(b, columns=header)

fig2 = px.line(frame2, x="x", y="y")
fig2.update_xaxes(title_text="Time (ps)") # label x-axis
fig2.update_yaxes(title_text="K") # label y-axis
fig2.update_layout(width=700, title_text="GROMACS Energies")
fig2.show()

## Equilibration Part 2

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx grompp -f npt.mdp -c nvt.gro -r nvt.gro -t nvt.cpt -p protein.top -o npt.tpr

Setting the LD random seed to -1691751028

Generated 330891 of the 330891 non-bonded parameter combinations

Generated 330891 of the 330891 1-4 parameter combinations

Excluding 3 bonded neighbours molecule type 'Protein_chain_A'

turning H bonds into constraints...

Excluding 2 bonded neighbours molecule type 'SOL'

turning H bonds into constraints...

Excluding 1 bonded neighbours molecule type 'NA'

turning H bonds into constraints...

The center of mass of the position restraint coord's is  3.566  3.563  3.553

The center of mass of the position restraint coord's is  3.566  3.563  3.553
Analysing residue names:
There are:   104    Protein residues
There are: 11127      Water residues
There are:     3        Ion residues
Analysing Protein...
Analysing residues not classified as Protein/DNA/RNA/Water and splitting into groups...

Determining Verlet buffer for a tolerance of 0.005 kJ/mol/ps at 300 K

Calculated rlist for 1x1 atom pair-list as 1.035 nm, buffer size 0.035 nm

Set rlist,

                      :-) GROMACS - gmx grompp, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mu

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx mdrun -deffnm npt

                      :-) GROMACS - gmx mdrun, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mur

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx energy -f npt.edr -o pressure.xvg <<EOF
18 0
EOF


Statistics over 50001 steps [ 0.0000 through 100.0000 ps ], 1 data sets
All statistics are over 501 points

Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Pressure                   0.898809        4.4    164.802   -1.57823  (bar)


                      :-) GROMACS - gmx energy, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mu

In [None]:
!head -20 pressure.xvg

In [None]:
# plot the graph
c = np.loadtxt('pressure.xvg',comments=['#', '@']) # strip out the comments
header = ["x", "y"] # add headers to columns
frame3 = pd.DataFrame(c, columns=header)

fig3 = px.line(frame3, x="x", y="y")
fig3.update_xaxes(title_text="Time (ps)") # label x-axis
fig3.update_yaxes(title_text="bar") # label y-axis
fig3.update_layout(width=700, title_text="GROMACS Energies")
fig3.show()

From [this site](https://stackoverflow.com/questions/55512643/set-up-multiple-subplots-with-moving-averages-using-cufflinks-and-plotly-offline)



```py
df = cf.datagen.lines().iloc[:,0:4]
df.columns = ['StockA', 'StockB', 'StockC', 'StockD']

# Function for moving averages
def movingAvg(df, win, keepSource):
    """Add moving averages for all columns in a dataframe.

    Arguments: 
    df -- pandas dataframe
    win -- length of movingAvg estimation window
    keepSource -- True or False for keep or drop source data in output dataframe


```

In [None]:
# Remove the comments from the xvg file and write a new file
!grep -v -e '\#' -e '\@' pressure.xvg > pressure.csv

From [this site](https://www.geeksforgeeks.org/how-to-calculate-moving-average-in-a-pandas-dataframe/)

In [None]:
# moving average
c = np.loadtxt('pressure.xvg',comments=['#', '@']) # strip out the comments
header = ["x", "y"] # add headers to columns
frame3 = pd.DataFrame(c, columns=header)

# updating our dataFrame to have only
# one column 'Close' as rest all columns
# are of no use for us at the moment
# using .to_frame() to convert pandas series
# into dataframe.
frame3b = frame3['y'].to_frame()

# calculating simple moving average
# using .rolling(window).mean() ,
# with window size = 30
frame3b['ma10'] = frame3b['y'].rolling(10).mean()


fig3 = px.line(frame3, x="x", y="y")
fig3.update_traces(line=dict(color = 'dodgerblue'), 
                    name="pressure")
fig3b = px.line(frame3b, y="ma10")
fig3b.update_traces(line=dict(color = 'firebrick'), 
                    name="10ps average")
fig3c = go.Figure(data=fig3.data + fig3b.data)
fig3c.update_layout(width=700, title_text="Pressure <br>NPT Equilibration")
fig3c.update_xaxes(title_text="Time (ps)") # label x-axis
fig3c.update_yaxes(title_text="bar") # label y-axis
fig3c.update_traces(showlegend=True)
fig3c.show()

Let's look at density.

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx energy -f npt.edr -o density.xvg <<EOF
24 0
EOF


Statistics over 50001 steps [ 0.0000 through 100.0000 ps ], 1 data sets
All statistics are over 501 points

Energy                      Average   Err.Est.       RMSD  Tot-Drift
-------------------------------------------------------------------------------
Density                      1015.4        0.3    2.89136  -0.160947  (kg/m^3)


                      :-) GROMACS - gmx energy, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mu

In [None]:
!head -20 density.xvg

# This file was created Thu Mar 31 19:37:36 2022
# Created by:
#                      :-) GROMACS - gmx energy, 2021.5 (-:
# 
# Executable:   /content/drive/MyDrive/IBM3202/gromacs-2021/bin/gmx
# Data prefix:  /content/drive/MyDrive/IBM3202/gromacs-2021
# Working dir:  /content/drive/MyDrive
# Command line:
#   gmx energy -f npt.edr -o density.xvg
# gmx energy is part of G R O M A C S:
#
# GROwing Monsters And Cloning Shrimps
#
@    title "GROMACS Energies"
@    xaxis  label "Time (ps)"
@    yaxis  label "(kg/m^3)"
@TYPE xy
@ view 0.15, 0.15, 0.75, 0.85
@ legend on
@ legend box on


In [None]:
# let's plot this
# moving average
c = np.loadtxt('density.xvg',comments=['#', '@']) # strip out the comments
header = ["x", "y"] # add headers to columns
frame4 = pd.DataFrame(c, columns=header)

# updating our dataFrame to have only
# one column 'Close' as rest all columns
# are of no use for us at the moment
# using .to_frame() to convert pandas series
# into dataframe.
# frame4b = frame4['y'].to_frame()

# calculating simple moving average
# using .rolling(window).mean() ,
# with window size = 30
frame4b['ma10'] = frame4b['y'].rolling(10).mean()


fig4 = px.line(frame4, x="x", y="y")
fig4.update_traces(line=dict(color = 'dodgerblue'), 
                    name="kg/m<sup>3</sup>")

fig4b = px.line(frame4b, y="ma10")
fig4b.update_traces(line=dict(color = 'firebrick'), 
                    name="10 ps running average")
fig4c = go.Figure(data=fig4.data + fig4b.data)
fig4c.update_layout(width=700, title_text="Density<br>NPT Equilibration")
fig4c.update_xaxes(title_text="Time (ps)") # label x-axis
fig4c.update_yaxes(title_text="kg/m<sup>3</sup>") # label y-axis
fig4c.update_traces(showlegend=True)
fig4c.show()

## Production MD Simulation

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p protein.top -o md_0_1.tpr

Setting the LD random seed to 1056636799

Generated 330891 of the 330891 non-bonded parameter combinations

Generated 330891 of the 330891 1-4 parameter combinations

Excluding 3 bonded neighbours molecule type 'Protein_chain_A'

turning H bonds into constraints...

Excluding 2 bonded neighbours molecule type 'SOL'

turning H bonds into constraints...

Excluding 1 bonded neighbours molecule type 'NA'

turning H bonds into constraints...
Analysing residue names:
There are:   104    Protein residues
There are: 11127      Water residues
There are:     3        Ion residues
Analysing Protein...
Analysing residues not classified as Protein/DNA/RNA/Water and splitting into groups...

Determining Verlet buffer for a tolerance of 0.005 kJ/mol/ps at 300 K

Calculated rlist for 1x1 atom pair-list as 1.036 nm, buffer size 0.036 nm

Set rlist, assuming 4x4 atom pair-list, to 1.000 nm, buffer size 0.000 nm

Note that mdrun will redetermine rlist based on the actual pair-list setup

Reading Coordina

                      :-) GROMACS - gmx grompp, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mu

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx mdrun -deffnm md_0_1 -nb gpu

                      :-) GROMACS - gmx mdrun, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mur

## Analysis

1000 ps took 11 min.

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx trjconv -s md_0_1.tpr -f md_0_1.xtc -o md_0_1_noPBC.xtc -pbc mol -center <<EOF
1
0
EOF

Note that major changes are planned in future for trjconv, to improve usability and utility.
Select group for centering
Selected 1: 'Protein'
Select group for output
Selected 0: 'System'


                     :-) GROMACS - gmx trjconv, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mu

### RMSD

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx rms -s md_0_1.tpr -f md_0_1_noPBC.xtc -o rmsd.xvg -tu ns <<EOF
4
4
EOF

Selected 4: 'Backbone'
Selected 4: 'Backbone'


                       :-) GROMACS - gmx rms, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Murt

In [None]:
!head -20 rmsd.xvg

# This file was created Thu Mar 31 20:44:15 2022
# Created by:
#                       :-) GROMACS - gmx rms, 2021.5 (-:
# 
# Executable:   /content/drive/MyDrive/IBM3202/gromacs-2021/bin/gmx
# Data prefix:  /content/drive/MyDrive/IBM3202/gromacs-2021
# Working dir:  /content/drive/MyDrive
# Command line:
#   gmx rms -s md_0_1.tpr -f md_0_1_noPBC.xtc -o rmsd.xvg -tu ns
# gmx rms is part of G R O M A C S:
#
# Great Red Oystrich Makes All Chemists Sane
#
@    title "RMSD"
@    xaxis  label "Time (ns)"
@    yaxis  label "RMSD (nm)"
@TYPE xy
@ subtitle "Backbone after lsq fit to Backbone"
   0.0000000    0.0004943
   0.0100000    0.0785261


In [None]:
%cd /content/drive/MyDrive/

/content/drive/MyDrive


In [None]:
# RMSD
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import plotly.express as px
from pathlib import Path
c = np.loadtxt('rmsd.xvg',comments=['#', '@']) # strip out the comments
header = ["x", "y"] # add headers to columns
frame5a = pd.DataFrame(c, columns=header)

d = np.loadtxt('rmsd_xtal.xvg',comments=['#', '@']) # strip out the comments
header = ["x", "y"] # add headers to columns
frame5b = pd.DataFrame(d, columns=header)

fig5a = px.line(frame5a, x="x", y="y")
fig5a.update_traces(line=dict(color = 'dodgerblue'), 
                    name="RMSD")

fig5b = px.line(frame5b, x="x", y="y")
fig5b.update_traces(line=dict(color = 'firebrick'), 
                    name="crystal RMSD")
fig5 = go.Figure(data=fig5a.data + fig5b.data)
fig5.update_layout(width=700, title_text="RMSD")
fig5.update_xaxes(title_text="Time (ps)") # label x-axis
fig5.update_yaxes(title_text="RMSD") # label y-axis
fig5.update_traces(showlegend=True)
fig5.show()

#### Questions

See [PHY542: MD analysis with VMD tutorial](https://becksteinlab.physics.asu.edu/pages/courses/2017/PHY542/practicals/md/dynamics/rmsd_fitting.html)  

Questions:

- What is the maximum RMSD?
- How does the RMSD change when you include
  - all atoms?
  - all heavy atoms (i.e., do not use hydrogens)?
- Does your RMSD result depend on the previous superposition step?
- Is the result consistent with the previous result of your RMSD analysis of the static structures?

In [None]:
!chmod 755 -R /content/drive/MyDrive/IBM3202/gromacs-2021

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx rms -s em.tpr -f md_0_1_noPBC.xtc -o rmsd_xtal.xvg -tu ns <<EOF
4
4
EOF

Selected 4: 'Backbone'
Selected 4: 'Backbone'


                       :-) GROMACS - gmx rms, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Murt

### Radius of Gyration

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx gyrate -s md_0_1.tpr -f md_0_1_noPBC.xtc -o gyrate.xvg <<EOF
1
EOF

Selected 1: 'Protein'


                      :-) GROMACS - gmx gyrate, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Mu

In [None]:
#@title Radius of Gyration

e = np.loadtxt('gyrate.xvg',comments=['#', '@']) # strip out the comments
header = ["t", "rg", "x", "y", "z"] # add headers to columns
frame6 = pd.DataFrame(e, columns=header)

fig6 = px.line(frame6, x="t", y="rg")
fig6.update_traces(line=dict(color = 'dodgerblue'), 
                    name="radius of gyration"
                   )

fig6.update_layout(width=700, title_text="Radius of Gyration")
fig6.update_xaxes(title_text="Time (ps)")
fig6.update_yaxes(range=[1.40, 1.60])
fig6.update_yaxes(title_text="R<sub>g</sub> (nm)") # label y-axis
fig6.update_traces(showlegend=True)
fig6.show()

In [None]:
!head -40 gyrate.xvg

# This file was created Fri Apr  1 12:22:26 2022
# Created by:
#                      :-) GROMACS - gmx gyrate, 2021.5 (-:
# 
# Executable:   /content/drive/MyDrive/IBM3202/gromacs-2021/bin/gmx
# Data prefix:  /content/drive/MyDrive/IBM3202/gromacs-2021
# Working dir:  /content/drive/MyDrive
# Command line:
#   gmx gyrate -s md_0_1.tpr -f md_0_1_noPBC.xtc -o gyrate.xvg
# gmx gyrate is part of G R O M A C S:
#
# GRowing Old MAkes el Chrono Sweat
#
@    title "Radius of gyration (total and around axes)"
@    xaxis  label "Time (ps)"
@    yaxis  label "Rg (nm)"
@TYPE xy
@ view 0.15, 0.15, 0.75, 0.85
@ legend on
@ legend box on
@ legend loctype view
@ legend 0.78, 0.8
@ legend length 2
@ s0 legend "Rg"
@ s1 legend "Rg\sX\N"
@ s2 legend "Rg\sY\N"
@ s3 legend "Rg\sZ\N"
         0     1.47486     1.00816     1.26001     1.32152
        10     1.49046     1.00053      1.2713     1.35117
        20      1.4855    0.995395     1.28199     1.33384
        30      1.5044     1.00574     1.29379   

### Ramachandran

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx rama -s md_0_1.tpr -f md_0_1_noPBC.xtc -o rama.xvg

In [None]:
%%bash
source /content/drive/MyDrive/IBM3202/gromacs-2021/bin/GMXRC
gmx chi -s md_0_1.tpr -f md_0_1_noPBC.xtc -rama -o chi.xvg

                       :-) GROMACS - gmx chi, 2021.5 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Murt

In [None]:
!head -150 /content/drive/MyDrive/ramaPhiPsiVAL85.xvg

# This file was created Sat Apr  2 16:43:22 2022
# Created by:
#                       :-) GROMACS - gmx chi, 2021.5 (-:
# 
# Executable:   /content/drive/MyDrive/IBM3202/gromacs-2021/bin/gmx
# Data prefix:  /content/drive/MyDrive/IBM3202/gromacs-2021
# Working dir:  /content/drive/MyDrive
# Command line:
#   gmx chi -s md_0_1.tpr -f md_0_1_noPBC.xtc -rama -o chi.xvg
# gmx chi is part of G R O M A C S:
#
# Giant Rising Ordinary Mutants for A Clerical Setup
#
@    title "Ramachandran Plot"
@    xaxis  label "\xf\f{} (deg)"
@    yaxis  label "\xy\f{} (deg)"
@TYPE xy
@ with g0
@ world xmin -180
@ world ymin -180
@ world xmax 180
@ world ymax 180
@ xaxis tick on
@ xaxis tick major 90
@ xaxis tick minor 30
@ xaxis ticklabel prec 0
@ yaxis tick on
@ yaxis tick major 90
@ yaxis tick minor 30
@ yaxis ticklabel prec 0
@    s0 type xy
@    s0 symbol 2
@    s0 symbol size 0.410000
@    s0 symbol fill 1
@    s0 symbol color 1
@    s0 symbol linewidth 1
@    s0 symbol linestyle 1
@    s0 symbol cen

In [None]:
!head -40 rama.xvg

# This file was created Sat Apr  2 16:17:13 2022
# Created by:
#                       :-) GROMACS - gmx rama, 2021.5 (-:
# 
# Executable:   /content/drive/MyDrive/IBM3202/gromacs-2021/bin/gmx
# Data prefix:  /content/drive/MyDrive/IBM3202/gromacs-2021
# Working dir:  /content/drive/MyDrive
# Command line:
#   gmx rama -s md_0_1.tpr -f md_0_1_noPBC.xtc -b 1 -e 1000 -o rama.xvg
# gmx rama is part of G R O M A C S:
#
# Guyana Rwanda Oman Macau Angola Cameroon Senegal
#
@    title "Ramachandran Plot"
@    xaxis  label "Phi"
@    yaxis  label "Psi"
@TYPE xy
@    with g0
@    s0 linestyle 0
@    s0 color 1
@ view 0.2, 0.2, 0.8, 0.8
@ world xmin -180
@ world ymin -180
@ world xmax 180
@ world ymax 180
@    xaxis  tick on
@    xaxis  tick major 60
@    xaxis  tick minor 30
@    yaxis  tick on
@    yaxis  tick major 60
@    yaxis  tick minor 30
@ s0 symbol 2
@ s0 symbol size 0.4
@ s0 symbol fill 1
125.589  -57.6255  GLY-126
-112.993  31.1285  GLY-127
-136.399  139.094  TYR-128
-107.063  154.71

In [None]:
!head -150 rama.xvg

# This file was created Fri Apr  1 18:10:31 2022
# Created by:
#                       :-) GROMACS - gmx rama, 2021.5 (-:
# 
# Executable:   /content/drive/MyDrive/IBM3202/gromacs-2021/bin/gmx
# Data prefix:  /content/drive/MyDrive/IBM3202/gromacs-2021
# Working dir:  /content/drive/MyDrive
# Command line:
#   gmx rama -s md_0_1.tpr -f md_0_1_noPBC.xtc -o rama.xvg
# gmx rama is part of G R O M A C S:
#
# Gravel Rubs Often Many Awfully Cauterized Sores
#
@    title "Ramachandran Plot"
@    xaxis  label "Phi"
@    yaxis  label "Psi"
@TYPE xy
@    with g0
@    s0 linestyle 0
@    s0 color 1
@ view 0.2, 0.2, 0.8, 0.8
@ world xmin -180
@ world ymin -180
@ world xmax 180
@ world ymax 180
@    xaxis  tick on
@    xaxis  tick major 60
@    xaxis  tick minor 30
@    yaxis  tick on
@    yaxis  tick major 60
@    yaxis  tick minor 30
@ s0 symbol 2
@ s0 symbol size 0.4
@ s0 symbol fill 1
132.867  -50.2001  GLY-126
-92.3485  -44.8956  GLY-127
-69.2231  135.079  TYR-128
-144.455  172.195  MET-129
-9

In [None]:
!grep -v -e '\#' -e '\@' rama.xvg > rama.txt

In [None]:
!grep -v -e '\#' -e '\@' chi.xvg > chi.txt

In [None]:
h = pd.read_csv('chi.txt', header=None, sep='\s\s+', engine='python')
print(h)

       0      1      2      3      4      5
0      1  0.101  0.808  0.101  0.808  0.000
1      2  0.740  0.982  0.884  0.740  0.982
2      3  0.815  0.980  0.953  0.815  0.980
3      4  0.830  0.970  0.830  0.926  0.970
4      5  0.907  0.981  0.907  0.957  0.981
..   ...    ...    ...    ...    ...    ...
99   100  0.933  0.987  0.953  0.933  0.987
100  101  0.940  0.986  0.940  0.968  0.986
101  102  0.832  0.989  0.952  0.832  0.989
102  103  0.655  0.978  0.900  0.655  0.978
103  104  0.017  0.976  0.808  0.017  0.976

[104 rows x 6 columns]


In [None]:
!head -5 rama.txt

132.867  -50.2001  GLY-126
-92.3485  -44.8956  GLY-127
-69.2231  135.079  TYR-128
-144.455  172.195  MET-129
-92.6489  148.354  LEU-130


In [None]:


e = pd.read_csv('rama.txt', header=None, sep='\s\s+', engine='python')
# e = pd.read_csv('rama.xvg',comments=['#', '@'])
# e = np.loadtxt('rama.xvg',  
#                comments=['#', '@'], # strip out the comments
#                dtype='object') 

# e.columns=["phi", "psi", "aa"]
print(e)


# header = ["phi", "psi", "aa"] # add headers to columns
# frame7 = pd.DataFrame(e, columns=header)
# e.columns = ["phi", "psi", "aa"]
# e.head()

# frame7 = pd.DataFrame(e.values, columns=header)
# frame7
# fig7 = px.scatter(frame7, x="phi", y="psi",
#                  hover_name="aa" 
#                  )

# names = [_ for _ in 'abcdef']
# df = pd.DataFrame(A, index=names, columns=names)

# fig5b = px.line(frame5b, x="x", y="y")
# fig5b.update_traces(line=dict(color = 'firebrick'), 
#                     name="crystal RMSD")
# fig5 = go.Figure(data=fig5a.data + fig5b.data)
# fig5.update_layout(width=700, title_text="RMSD")
# fig7.update_xaxes(title_text="Phi") # label x-axis
# fig7.update_yaxes(title_text="Psi") # label y-axis
# fig7.update_traces(showlegend=True)
# fig7.show()

              0          1        2
0      132.8670  -50.20010  GLY-126
1      -92.3485  -44.89560  GLY-127
2      -69.2231  135.07900  TYR-128
3     -144.4550  172.19500  MET-129
4      -92.6489  148.35400  LEU-130
...         ...        ...      ...
10297  -63.9506  -25.20160  GLN-223
10298  -80.0791  -35.53770  ALA-224
10299  -61.7054  -30.42300  TYR-225
10300  -75.6890    2.99482  TYR-226
10301  -96.1143   35.34090  GLN-227

[10302 rows x 3 columns]


In [None]:
!head -5 rama.txt

132.867  -50.2001  GLY-126
-92.3485  -44.8956  GLY-127
-69.2231  135.079  TYR-128
-144.455  172.195  MET-129
-92.6489  148.354  LEU-130


All of the rama data for each time frame is in a single file with no demarkation for each time point. I can use `sed` to add a new column.

I can use `sed` to identify the chunks that represent each time point. Start with `GLY-126` and end with `GLN-227`.

UPDATE:

I can use `gmx chi` to gather the `phi/psi` angles for each amino acid as a time series in a separate file. 

- I can move them all to their own directory.
  - Or define the directory in the `-o` flag.
- Concatenate them as I create a dataframe.
- At some point I need to extract the amino acid number (from the filename) and put it into the file as a new column.
- I also need to fix the amino acid numbers as they are offset by 126. The orginial protein starts at aa 126, but the `chi` command renumbered them as it created the filenames.
- Then plot all of them on the same plot as a time series.

This is better than trying to parse the `rama.xvg` file.

Let's give it a whirl.

```py

files=(*.xvg)



```

`grep "string" "${files[@]}"`  
will expand to:  
`grep "string" "1.txt" "2.txt" "3.txt"`


from [this site](https://unix.stackexchange.com/questions/550964/grep-over-multiple-files-redirecting-to-a-different-filename-each-time)

```bash
for f in *-QTR*.tsv
do 
  grep 8-K < "$f" > "${f:0:4}"Q"${f:8:1}".txt
done
```

- the first four characters of the filename -- the year
- the letter Q
- the 9th character of the filename -- the quarter

In my case, I want to add part of the filename (the amino acid name and number) to a column in the table.

From [this site](https://stackoverflow.com/questions/41857659/python-pandas-add-filename-column-csv)

```py
import os

for csv in globbed_files:
    frame = pd.read_csv(csv)
    frame['filename'] = os.path.basename(csv)
    data.append(frame)
```

from [this site](https://stackoverflow.com/questions/51845613/adding-columns-to-dataframe-based-on-file-name-in-python)

```py
import pandas as pd 

#load data files
data1 = pd.read_csv('C:/file1_USA_Car_1d.txt')
data2 = pd.read_csv('C:/file2_USA_Car_2d.txt')
data3 = pd.read_csv('C:/file3_USA_Car_1m.txt')
data4 = pd.read_csv('C:/file3_USA_Car_6m.txt')
data5 = pd.read_csv('C:file3_USA_Car_1Y.txt')

df = pd.DataFrame()

print(df)

df = data1

---

import glob
import pandas as pd

df_list = []
for file in glob.glob('C:/file1_*_*_*.txt'):
    # Tweak this to work for your actual filepaths, if needed.
    country, typ, dur = file.split('.')[0].split('_')[1:]  
    df = (pd.read_csv(file)
            .assign(Country=country, Type=typ, duration=dur))
    df_list.append(df)

df = pd.concat(df_list)
```

Probably best to use `mv` to batch rename the files.

from [this site](


```bash
rename -n 's/<search for>/<replace with>/' <target files>
```

`-n` perform a dry run and show what output would look like  
`s/` perform a substitution  
`<search for>` what you want to replace--can use regex  
`<replace with>` self explanatory  
`/'` need a closing slash and command needs to be wrapped in single or double quotes.  
`<target files>` files to rename--can use wildcards.


from [this site](https://stackoverflow.com/questions/32042019/ubuntu-bulk-file-rename)

```bash
rename -n "s/ramaPhiPsi//" ramaPhiPsi*
```

<mark>This works!</mark>

Now we can use Pandas to put a column in each file that contains the filename, which we will use in plotly on hover.

From [this site](https://stackoverflow.com/questions/42756696/read-multiple-csv-files-and-add-filename-as-new-column-in-pandas) we see how to add filenames to the `.xvg` files, then split off the `.xvg` extension leaving only the amino acid name and number.

**Need to move files into their own directory first.**

```bash
%mkdir rama
%mv ramaPhiPsi*.xvg rama/
```bash
files = glob.glob('samples_for_so/*.csv')
print (files)
#['samples_for_so\\a.csv', 'samples_for_so\\b.csv', 'samples_for_so\\c.csv']


df = pd.concat([pd.read_csv(fp).assign(New=os.path.basename(fp)) for fp in files])
```


In [None]:
!rename -n "s/ramaPhiPsi//" ramaPhiPsi*

In [None]:
# make a new directory
%mkdir rama

In [None]:
# move all the rama files into the new directory
%mv ramaPhiPsi*.xvg rama/

In [None]:
# move into the new directory
%cd rama

/content/drive/MyDrive/rama


In [None]:
# rename the files
!rename "s/ramaPhiPsi//" ramaPhiPsi*

In [None]:
%%bash
for f in *.xvg
do
  base=${f%%.xvg}
  grep -v -e '\#' -e '\@' "$f" > "${base%%.*}".csv
done

# This worked perfectly
# ready for adding columns with filename and then splitting

In [None]:
# get rid of the .xvg files
%rm *.xvg

from [this site](https://tldp.org/LDP/abs/html/parameter-substitution.html)

>`${var%%Pattern}` Remove from `$var` the longest part of `$Pattern` that matches the back end of `$var`.

Wow. Is this not the most esoteric bash stuff?

In [None]:
!head -10 ALA100.csv

  -68.7619    -18.6306
  -72.5399    -34.9786
  -71.3601    -42.1641
   -81.947    -19.4706
   -70.269    -26.1112
  -90.7979    -34.7991
  -94.7337    -29.3207
  -78.0507    -15.2262
  -62.4729    -51.2244
   -88.934    -48.1266


In [None]:
!head -50 ALA100.xvg

In [None]:
%rm ramaX1X2*.xvg

#### Adding filename column and splitting csv files

from [this site](https://stackoverflow.com/questions/42756696/read-multiple-csv-files-and-add-filename-as-new-column-in-pandas)

```py
import pandas as pd
import glob, os


files = glob.glob('samples_for_so/*.csv')
print (files)
#['samples_for_so\\a.csv', 'samples_for_so\\b.csv', 'samples_for_so\\c.csv']


df = pd.concat([pd.read_csv(fp).assign(New=os.path.basename(fp)) for fp in files])
print (df)
   a  b  c  d    New
0  0  1  2  5  a.csv
1  1  5  8  3  a.csv
0  0  9  6  5  b.csv
1  1  6  4  2  b.csv
0  0  7  1  7  c.csv
1  1  3  2  6  c.csv
```

This did not work well for my files. I think I need to add headers to my files at some point.

But the splitting will probably work.

Splitting:

```py
files = glob.glob('samples_for_so/*.csv')
df = pd.concat([pd.read_csv(fp).assign(New=os.path.basename(fp).split('.')[0]) 
       for fp in files])
print (df)
   a  b  c  d New
0  0  1  2  5   a
1  1  5  8  3   a
2  0  9  6  5   b
3  1  6  4  2   b
4  0  7  1  7   c
5  1  3  2  6   c
```



In [None]:
# adding filename column
import pandas as pd
import glob, os

files = glob.glob('*.csv')
# print (files)


Adding filenames as a column

from [this site](https://stackoverflow.com/questions/41857659/python-pandas-add-filename-column-csv)

```py
import pandas as pd
import glob

globbed_files = glob.glob("*.csv") #creates a list of all csv files

for csv in globbed_files:
    frame = pd.read_csv(csv, sep='^') # or other separator
    frame['filename'] = os.path.basename(csv)
    data.append(frame)
```

In [None]:
import pandas as pd
import glob

globbed_files = glob.glob("*.csv")
# print(globbed_files)


data = []
for csv in globbed_files:
    frame = pd.read_csv(csv, sep='\s+') # or other separator
    frame['filename'] = os.path.basename(csv)
    data.append(frame)

#### Adding headers

Try like earlier

In [None]:
import os
import pandas as pd
import glob

globbed_files = glob.glob("ALA100.csv")

# print(globbed_files)

for csv in globbed_files:
    testframe = pd.read_csv(csv, header=None, sep='\s\s+', engine='python')
    testframe.columns=["phi", "psi"]
    testframe['filename'] = os.path.basename(csv)
    data.append(testframe)

print(testframe)

# worked like a charm!





         phi      psi    filename
0   -68.7619 -18.6306  ALA100.csv
1   -72.5399 -34.9786  ALA100.csv
2   -71.3601 -42.1641  ALA100.csv
3   -81.9470 -19.4706  ALA100.csv
4   -70.2690 -26.1112  ALA100.csv
..       ...      ...         ...
96  -96.7204 -22.3246  ALA100.csv
97  -69.9809 -41.4373  ALA100.csv
98  -70.1587 -29.2849  ALA100.csv
99  -98.4556 -28.9369  ALA100.csv
100 -80.0791 -35.5377  ALA100.csv

[101 rows x 3 columns]


#### Splitting filename

In [None]:
# making a new dataframe from splitting the column

testframe2 = testframe['filename'].str.split('.', expand = True)

print(testframe2)

          0    1
0    ALA100  csv
1    ALA100  csv
2    ALA100  csv
3    ALA100  csv
4    ALA100  csv
..      ...  ...
96   ALA100  csv
97   ALA100  csv
98   ALA100  csv
99   ALA100  csv
100  ALA100  csv

[101 rows x 2 columns]


In [None]:
# adding the separate amino acid column from the new data frame
testframe["aa"]= testframe2[0]
print(testframe)


         phi      psi      aa
0   -68.7619 -18.6306  ALA100
1   -72.5399 -34.9786  ALA100
2   -71.3601 -42.1641  ALA100
3   -81.9470 -19.4706  ALA100
4   -70.2690 -26.1112  ALA100
..       ...      ...     ...
96  -96.7204 -22.3246  ALA100
97  -69.9809 -41.4373  ALA100
98  -70.1587 -29.2849  ALA100
99  -98.4556 -28.9369  ALA100
100 -80.0791 -35.5377  ALA100

[101 rows x 3 columns]


So far, so good.

Next, see [this site](https://jonathansoma.com/lede/foundations-2017/classes/working-with-many-files/class/).

In [None]:
print(testframe)

         phi      psi      aa
0   -68.7619 -18.6306  ALA100
1   -72.5399 -34.9786  ALA100
2   -71.3601 -42.1641  ALA100
3   -81.9470 -19.4706  ALA100
4   -70.2690 -26.1112  ALA100
..       ...      ...     ...
96  -96.7204 -22.3246  ALA100
97  -69.9809 -41.4373  ALA100
98  -70.1587 -29.2849  ALA100
99  -98.4556 -28.9369  ALA100
100 -80.0791 -35.5377  ALA100

[101 rows x 3 columns]


In [None]:
# Add new column to the DataFrame
testframe['time'] = (range(0, 1010, 10)) # range starting at 0 ending at 1000 with a stepsize of 10.)

print(testframe)

         phi      psi      aa  time
0   -68.7619 -18.6306  ALA100     0
1   -72.5399 -34.9786  ALA100    10
2   -71.3601 -42.1641  ALA100    20
3   -81.9470 -19.4706  ALA100    30
4   -70.2690 -26.1112  ALA100    40
..       ...      ...     ...   ...
96  -96.7204 -22.3246  ALA100   960
97  -69.9809 -41.4373  ALA100   970
98  -70.1587 -29.2849  ALA100   980
99  -98.4556 -28.9369  ALA100   990
100 -80.0791 -35.5377  ALA100  1000

[101 rows x 4 columns]


### Creating the Ramachandran Plot

In [None]:
#@title Ramachandran Plot
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd

# Create dataframes for each file
dfgenaLLowed1 = pd.read_csv('/content/drive/MyDrive/rama8000/generaL-aLLowed1.csv')
dfgenaLLowed2 = pd.read_csv('/content/drive/MyDrive/rama8000/generaL-aLLowed2.csv')
dfgenaLLowed3 = pd.read_csv('/content/drive/MyDrive/rama8000/generaL-aLLowed3.csv')
dfgenaLLowed4 = pd.read_csv('/content/drive/MyDrive/rama8000/generaL-aLLowed4.csv')
dfgenaLLowed5 = pd.read_csv('/content/drive/MyDrive/rama8000/generaL-aLLowed5.csv')
dfgenaLLowed6 = pd.read_csv('/content/drive/MyDrive/rama8000/generaL-aLLowed6.csv')

dfgenfavored1 = pd.read_csv('/content/drive/MyDrive/rama8000/generaL-favored1.csv')
dfgenfavored2 = pd.read_csv('/content/drive/MyDrive/rama8000/generaL-favored2.csv')
dfgenfavored3 = pd.read_csv('/content/drive/MyDrive/rama8000/generaL-favored3.csv')
dfgenfavored4 = pd.read_csv('/content/drive/MyDrive/rama8000/generaL-favored4.csv')
dfgenfavored5 = pd.read_csv('/content/drive/MyDrive/rama8000/generaL-favored5.csv')

# ===== create figures =====
# x and y are the column names
figgenaLLowed1 = px.line(dfgenaLLowed1, x="phi", y="psi",
                 hover_name="number"
                ) 
# add line color
figgenaLLowed1.update_traces(line=dict(
    color = 'deepskyblue',
    width=1))

figgenaLLowed2 = px.line(dfgenaLLowed2, x="phi", y="psi",
                 hover_name="number" 
                 ) 
figgenaLLowed2.update_traces(line=dict(
    color = 'deepskyblue',
    width=1))

figgenaLLowed3 = px.line(dfgenaLLowed3, x="phi", y="psi",
                 hover_name="number" 
                 ) 
figgenaLLowed3.update_traces(line=dict(
    color = 'deepskyblue',
    width=1))

figgenaLLowed4 = px.line(dfgenaLLowed4, x="phi", y="psi",
                 hover_name="number" 
                 ) 
figgenaLLowed4.update_traces(line=dict(
    color = 'deepskyblue',
    width=1))

figgenaLLowed5 = px.line(dfgenaLLowed5, x="phi", y="psi",
                 hover_name="number" 
                 )                
figgenaLLowed5.update_traces(line=dict(
    color = 'deepskyblue',
    width=1))

figgenaLLowed6 = px.line(dfgenaLLowed6, x="phi", y="psi",
                 hover_name="number") 
figgenaLLowed6.update_traces(line=dict(
    color = 'deepskyblue',
    width=1))

figgenfavored1 = px.line(dfgenfavored1, x="phi", y="psi",
                 hover_name="number") 
figgenfavored1.update_traces(line=dict(
    color = 'deepskyblue',
    width=2))

figgenfavored2 = px.line(dfgenfavored2, x="phi", y="psi",
                 hover_name="number")
figgenfavored2.update_traces(line=dict(
    color = 'deepskyblue',
    width=2)) 

figgenfavored3 = px.line(dfgenfavored3, x="phi", y="psi",
                 hover_name="number") 
figgenfavored3.update_traces(line=dict(
    color = 'deepskyblue',
    width=2))

figgenfavored4 = px.line(dfgenfavored4, x="phi", y="psi",
                 hover_name="number") 
figgenfavored4.update_traces(line=dict(
    color = 'deepskyblue',
    width=2))

figgenfavored5 = px.line(dfgenfavored5, x="phi", y="psi",
                 hover_name="number") 
figgenfavored5.update_traces(line=dict(
    color = 'deepskyblue',
    width=2))


# figtestrama = px.scatter(testframe, x="phi", y="psi", animation_frame="time", animation_group="aa",
          #  hover_name="aa", range_x=[-180,180], range_y=[-180,180], color_discrete_sequence=['white'])

# ==========================================
#      Create a multi-aa plot 
# ==========================================

figtestrama2 = px.scatter(bigframe, x="phi", y="psi", animation_frame="time", animation_group="aa",
           hover_name="aa", range_x=[-180,180], range_y=[-180,180], color_discrete_sequence=['white'])

# ==========================================
#      Add the plot for the allowed regions
# ==========================================
figtestrama2.add_trace(figgenaLLowed1.data[0])
figtestrama2.add_trace(figgenaLLowed2.data[0])
figtestrama2.add_trace(figgenaLLowed3.data[0])
figtestrama2.add_trace(figgenaLLowed4.data[0])
figtestrama2.add_trace(figgenaLLowed5.data[0])
figtestrama2.add_trace(figgenaLLowed6.data[0])
figtestrama2.add_trace(figgenfavored1.data[0])
figtestrama2.add_trace(figgenfavored2.data[0])
figtestrama2.add_trace(figgenfavored3.data[0])
figtestrama2.add_trace(figgenfavored4.data[0])
figtestrama2.add_trace(figgenfavored5.data[0])


figtestrama2.update_layout(width=700, 
                         height=700, 
                         title_text="General",
                         # unicode for greek characters
                         xaxis=dict(title=u"\u03A6"),
                         yaxis=dict(title=u"\u03A8"),
                         plot_bgcolor="black"
                         )
figtestrama2.update_traces(showlegend=False)


# update the axes
figtestrama2.update_xaxes(showline=True,
                  zeroline=True,
                  showgrid=False,
                  zerolinewidth=1,
                  zerolinecolor='grey'
)

figtestrama2.update_yaxes(showline=True,
                   zeroline=True,
                   showgrid=False, 
                   zerolinewidth=1,
                   zerolinecolor='grey'
)


# show the graph
figtestrama2.show()

# save it as html to share it on website
figtestrama2.write_html("rama-arg.html")

<mark>**Success!**</mark>

Wow. This took a while. But the result is what I wanted. Each amino acid is in its own file, so I can add just the glycines to the glycine plot, etc. I could probably tweak a few more things, but I'll stop here for now. I will likely add more amino acids to this plot and set up separate plots for the glycines and the ILE-VALs.

See [How to Save Plotly Animations: The Ultimate Guide](https://holypython.com/how-to-save-plotly-animations-the-ultimate-guide/) for how to save this animation on my MkDocs site.

>In Github case you can simply start a public repository, upload your file to it, commit the file and then share its link in the `iframe`.

```html
<iframe width="900" height="800" frameborder="0" scrollng="no" src="https://holypython.github.io/holypython2/covid_cases.html"></iframe>
```

### Building Another Ramachandran Plot

Let's try to add a few more amino acids to the General Ramachandran plot.

In [None]:
# move into the directory with the files
%cd rama

/content/drive/MyDrive/rama


#### Prepare the files

In [None]:
import os
import pandas as pd
import glob

globbed_files = glob.glob("AR*.csv")
# print(globbed_files)
# this worked

In [None]:
# don't need this cell
# the cell below worked
list_of_dfs = [pd.read_csv(filename, 
                           header=None, 
                           sep='\s\s+', 
                           engine='python') 
for filename in globbed_files]
print(list_of_dfs)
# this worked

The most useful information was from this question/answer on Stackoverflow: [Python Pandas add Filename Column CSV](https://stackoverflow.com/questions/41857659/python-pandas-add-filename-column-csv).

In [None]:
globbed_files = glob.glob("AR*.csv") #creates a list of all csv files

data = [] # pd.concat takes a list of dataframes as an agrument
for csv in globbed_files:
    frame = pd.read_csv(csv, header=None, sep='\s\s+', engine='python')
    frame.columns=["phi", "psi"]
    frame['filename'] = os.path.basename(csv)
    data.append(frame)
    frame['time'] = (range(0, 1010, 10))
    frame2 = frame['filename'].str.split('.', expand = True)
    frame["aa"]= frame2[0]
    del frame["filename"]

# print(frame)

# Add a time column to the DataFrame
# frame['time'] = (range(0, 1010, 10)) # range starting at 0 ending at 1000 with a stepsize of 10.)

# making a new dataframe from splitting the column
# frame2 = frame['filename'].str.split('.', expand = True)

# adding the separate amino acid column from the new data frame
# frame["aa"]= frame2[0]
# delete the filename column
# del frame["filename"]
# print(frame)

bigframe = pd.concat(data, ignore_index=True) # dont want pandas to try an align row indexes
print(bigframe)
# bigframe.to_csv("Pandas_output2.csv")
# print(bigframe)
# print(frame)
# This worked to make a large .csv file

          phi       psi  time      aa
0   -136.1290  179.4160     0  ARG104
1   -124.4370  149.5180    10  ARG104
2   -143.4290  159.3080    20  ARG104
3   -114.2420  110.7290    30  ARG104
4   -121.3340  130.1790    40  ARG104
..        ...       ...   ...     ...
803  -49.4732  -45.9122   960   ARG96
804  -68.9671  -30.8921   970   ARG96
805  -65.6781  -17.0645   980   ARG96
806  -85.1430  -29.5437   990   ARG96
807  -62.3135  -53.4847  1000   ARG96

[808 rows x 4 columns]


Add the `bigframe` to the animated time series chart, above.