<small><i>This notebook was put together by [Wesley Beckner](http://wesleybeckner.github.io/).</i></small>

<a id='top'></a>

# Simulation workflow

* generate system (topology and coordinate files) 
    * For a given research project, there are often repetetive tasks associated with creating your systems. 
    * There are ways to automate this process but we will save that for later this quarter
    * **This is a very, very important step in the simulation workflow, please stay tuned**
* energy minimization (address overlapping particles)
* temperature and pressure equilibration
* production run
    * NPT vs NVT
        * Cp, Cv
        * virial (can't have frozen atoms)
        * calculations that depend on pressure fluctuations
* scaling analysis for subsequent runs
    * **Jim will lose his kettle bells if you do not do this for all of your unique systems**

# Shell commands

* [printf and echo](#print)
* [tail and head](#tail)
* [sed](#sed)
* [awk and grep](#awk)
* [if, while, break, wait](#if)
* [source](#source)
* [mail](#mail)
* [for loops](#for) (for your homework)

# Examples with emphasis on software and directory structures

* [input file](#input)
* [master file](#master)
* [directory file](#directory)
* [subprocess files](#subprocess)

#### Aside:

it is best to do all your scripting in a syntax highlighted IDE of your choice. I use vim and will be a forever fan. To enable syntax highlighting go to your terminal and type

linux:
```
echo "set hlsearch" >> ~/.vimrc
```

mac:
```
echo "syntax on" >> ~/.vimrc
```

[more break out time](#hand)

# Homework
 
1. vary some input **parameter** other than temperature to give three distinct simulations 
2. run script with a new automated analyses that you expect to be affected by your **parameter**
3. **form a hypothesis** on how this analysis will be different for the three runs and justify (1 paragraph)
4. **create plots** for your analysis using python and state whether your hypothesis was falsified or not
    * note: I **do not** *want you to manually adjust* any of the output files from gromacs. You can either adjust these with bash script (see my example with heat capacity), or you can set the `xvg -none` flag in gmx energy, or you can tell pandas the length of the header to ignore in your file. The easiest way to read into python in any of these cases will be to use [pandas.read_csv](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html).
    * note: I **do not** *want you to manually submit* the commands to complete these simulations. Write a wrapper for the scripts I've provided to iterate through your parameters. We will go over for loops at the end of this lesson to help with this.
5. write a **further work** paragraph on how you would improve these scripts given more time (could be on either what I've provided or what you've written)

### Deliverables

1. your plots (publishable quality)
2. your scripts (wrapper and analysis)
3. your written paragraphs (hypothesis and further work)
 

<a id='print'></a>

## printf and echo

#### What's the difference, anyway?

The differences are minor. 

1. printf is standardized across systems while echo is not. For this reason some people prefer to use printf
2. echo sends a newline at the end of the output

In [6]:
%%bash
printf "this is a test"

this is a test

also, **print is an input/output statement of awk** so you can do operations as in the cell below:

In [22]:
%%bash
nmol=`tail -2 conf.gro | head -1 | awk '{print $1}' | sed "s/[[:alpha:].-]/ /g" | awk '{print $1}'`
echo $nmol

884


<a id='tail'></a>

what we've just done is grab the number of molecules in our gro file and print it to the screen.

speaking of which, that brings us to...

## tail and head

[back to top](#top)

and also **sed** and **awk** but we'll get to that in a second...

tail and head work exactly as they sound

In [29]:
%%bash
tail -4  conf.gro
head -4 conf.gro

  884SOL     OW 2650   1.855   0.165   2.093  0.6897  0.5791  0.7848
  884SOL    HW1 2651   1.863   0.096   2.027 -0.9398  0.6805  0.4391
  884SOL    HW2 2652   1.790   0.226   2.056 -0.0484  0.7734  2.3419
   3.00081   3.00081   3.00081
water in water
 2652
    1SOL     OW    1   1.915   1.754   0.496  0.3976  0.1489 -0.0020
    1SOL    HW1    2   1.946   1.831   0.544  0.3310  0.4931 -0.5088


what are the default number of lines for head and tail?

<a id='sed'></a>
## sed

[back to top](#top)

so sed is super jedi.

It works exactly like your ":rs" or search command in vim (where r indicates range). Open vim now and see what the following line does!

```
:%s/[[:alpha:].-]/ /g
```

Let's spend some time breaking this down...

We could spend a lot of time on sed, I'm just going to list some of the sed commands you're going to see in your homework:

```
    sed -i "51s/.*/ref_t                        = ${Temp}/" npt.mdp
    sed -i "62s/.*/gen_temp             = ${Temp}/" npt.mdp
```

<a id='awk'></a>

## awk and grep

[back top top](#top)

awk and grep were the first bash tools I ever learned. They will be your bread and butter (mmmm butter)

awk (traditionally AWK) is a programming language that is designed for processing text-based data

grep on the other hand, is a simple filtration tool (written in perl I think?)

Let's go back to our first example:

In [34]:
%%bash
nmol=`tail -2 conf.gro | head -1 | awk '{print $2}' | sed "s/[[:alpha:].-]/ /g" | awk '{print $1}'`
echo $nmol

2


can you adjust the print statement to figure out what our 'print' statement is grabbing within the awk language? open conf.gro in vim to help

You will also see awk used in our file nodeseaker.sh

```
nodes=$(nodestate pfaendtner | grep nodes)
lines=$(nodestate pfaendtner | grep nodes | tee /dev/tty | wc -l)
#if lines = 1 there's only one type of node. Find the type
if [ "$lines" == 1 ] ; then
  type=$(echo $nodes | awk '{print $3}')
```

<a id='if'></a>

##if, while, break, wait

[back to top](#top)

let's discuss the following snipet that you will find in spMD.sh

```
while true ; do
  if [ ! -f $ILhome/${LABEL}/mini/confout.gro ] ; then
    echo "waiting for minimization to complete"
    sleep 60
  else
    echo "minimization complete... moving forward"
    sleep 1
    break
  fi
done
```

<a id='source'></a>

##source

[back to top](#top)

when sourcing files, their environment variables remain available after it's finished executing. This contrasts with running scripts normally where environment variables spawned within the script are lost upon exit.

In [3]:
%%bash
echo "Jim=9000" > test.sh
source test.sh
echo $Jim

9000


In [4]:
%%bash
echo "Jim=9000" > test.sh
./test.sh
echo $Jim




note how the second cell doesn't inherit the variables in test after it is run with "./"

![](vegeta.jpg)

<a id='mail'></a>

## mail

[back to top](#top)

mail is one of those commands that will make you feel as if you've performed wizardry. How excellent it will be to recieve an email from hyak at 3am announcing that your jobs have successfully completed! yeah!

send yourself an email using the template below:

```
name=myspecialrun
echo "job complete for ${name}" | mail -s "message from hyak" wesleybeckner@gmail.com
```

<a id='for'></a>

## for

[back to top](#top)

This will probably be useful for your homework. Here is a simple case showing how to iterate through an array (say like a parameter in your input file ;P )

In [14]:
%%bash
Temps=(350 400 450)
for (( i=0 ; i<=${#Temps[*]} ; i++ )) ; do
  echo " "
  echo ${Temps[$i]}
done


 
350
 
400
 
450
 



# Examples with emphasis on software and directory structures

<a id='input'></a>

## input.inp

[back to top](#top)

```
#!/bin/bash
###*****************************************************************
### INPUT FILE TO GENERATE SYSTEM, ENERGY MINIMIZE, EQUILIBRATE,
### AND EXECUTE PRODUCTION MD RUN AND ANALYSES
###*****************************************************************

### MAIN PARAMETERS 
Temp=350
MD=yes                  #yes or no ; choose yes if no MD files are present
Analysis=yes            #yes or no ; choose yes if wanting analysis on MD
nvt_production=yes      #yes or no ; NVT ensemble
npt_production=no       #yes or no ; NPT ensemble
nvt_length=1000000      #1ns (production NVT MD length)
```

<a id='master'></a>

## master.sh

[back to top](#top)

```
#!/bin/bash
#run this script with the name of your system (match input file)
#and a name for your log file
#for an HMI CHL salt at 350K this could look like:
#source master.sh HMI_CHL > HMI_CHL.350 &
#source master.sh ${name} > ${name}.${characteristic} &
set -e                                          #exit script on error
name=$1                                         #set variable name to first argument
source ${name}.inp                              #source the input file
LABEL=${name}.${Temp}                           #set a label to organize your files
echo "sourcing ${name}.inp"
echo " "
echo "beginning ${name} runs"
cat ${name}.inp                                 #print input to log file
sed -i "6s/.*/name=${name}/" ${name}.inp        #print name to input file
wait
source directory.inp                            #create directory architecture

###*****************************************************************
###Run EQ

  cd $SANDBOX
  if [ "${MD}" == "yes" ] ; then
    source spMD.sh
    wait
  fi

###*****************************************************************
###Run Analysis

  if [ "${Analysis}" == "yes" ] ; then
    source spAnalysis.sh
    wait
  fi

###*****************************************************************
###Run Analysis

echo "job complete"
echo "job complete for ${name}" | mail -s "message from hyak" wesleybeckner@gmail.com
```

<a id='directory'></a>

## directory.inp

[back to top](#top)

```
#!/bin/bash
###source the goodies we need for GROMACS
MYROOT=/suppscr/pfaendtner/wesley/MOLSIM/SimulationWorkFlow/HW4

###should not need to change these
source /suppscr/pfaendtner/vanouk/scripts/activate_amber14.sh
source /gscratch/pfaendtner/wesley/software/ENVIRONMENT.sh
if [ ! -d $MYROOT/${name} ] ; then
  mkdir $MYROOT/${name}
fi
PROJROOT=$MYROOT/${name}
SANDBOX=$MYROOT
```

<a id='subprocess'></a>

## spMD.sh

[back to top](#top)

```
#!/bin/bash
set -e

###ENERGY MINIMIZE THE STARTING STRUCTURE
cd $PROJROOT
if [ ! -d $PROJROOT/mini ] ; then
  mkdir mini
  echo "made mini directory"
fi
if [ ! -f mini/confout.gro ] && [ ! -f mini/md.log ] ; then
  echo "initiating energy minimization for ${name}"
  cp $SANDBOX/mini.mdp mini/
  cp $SANDBOX/GROMACS.pbs mini/
  cp $SANDBOX/nodeseaker.sh mini/
  cp $SANDBOX/conf.gro mini/
  cp $SANDBOX/topol.top mini/
  cd mini/
  gmx_8c grompp -f mini.mdp
  source nodeseaker.sh
  wait
  qsub GROMACS.pbs
  cd -
fi
while true ; do
  if [ ! -f $PROJROOT/mini/confout.gro ] ; then
    echo "waiting for minimization to complete"
    sleep 60
  else
    echo "minimization complete... moving forward"
    sleep 1
    break
  fi
done
```

## spAnalysis.sh

[back to top](#top)

```
echo "starting spEqAnalysis"
#*****************************************************************
#prep for the right direcotry depending on NPT or NVT
if [ "$nvt_production" == "yes" ] ; then
  cd ${PROJROOT}/NVT
elif [ "$npt_production" == "yes" ] ; then
  cd ${PROJROOT}/NPT
fi
if [ ! -d ${PROJROOT}/analysis ] ; then
  mkdir ${PROJROOT}/analysis
fi
cp $SANDBOX/GROMACS.pbs .
cp $SANDBOX/topol.top .
#*****************************************************************
#heat capacity   
if [ ! -f heatcapacity.txt ] ; then     #nmol grabs the correct amount of molecules for g_energy
  nmol=`tail -2 conf.gro | head -1 | awk '{print $1}' | sed "s/[[:alpha:].-]/ /g" | awk '{print $1}'`
  echo "calculating heat capacity for ${name}" 
  echo 8 6 0 | gmx_8c energy -f ener.edr -driftcorr -fluct_props -nmol ${nmol} >> temp
  grep -i 'Heat capacity' temp | awk '{print $8}' >> heatcapacity.txt
  rm energy.xvg
  rm temp
fi
#*****************************************************************
#temperature
if [ ! -f temperature.xvg ] ; then
  echo "calculating temperature for ${name}" 
  echo 8 0 | gmx_8c energy -f ener.edr -o temperature.xvg
fi
#*****************************************************************
echo "analysis complete"
cd $SANDBOX
```

<a id='hands'></a>

# Breakout

[back to top](#top)

Use the remaining time to try reading in the temperature.xvg file we've created and using matplotlib to plot the data

Here is some code to get you started

In [2]:
import matplotlib.pylab as plt
from itertools import cycle
###PREPARE DATA
lines = ["-","--","-."]
marker = ['.','v','^','<','>']#,'8','s', '+', '.', 'o', '*','8','s','.', 'o', '*'
linecycler = cycle(lines)
markercycler = cycle(marker)

# These are the "Tableau 20" colors as RGB.    
tableau20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),    
             (44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),    
             (148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),    
             (227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),    
             (188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]    
  
# Scale the RGB values to the [0, 1] range, which is the format matplotlib accepts.    
for i in range(len(tableau20)):    
    r, g, b = tableau20[i]    
    tableau20[i] = (r / 255., g / 255., b / 255.)   

plt.rc("figure", facecolor="white")

params = {
    'lines.markersize' : 3,
    'axes.labelsize': 8,
    'font.size': 10,
    'legend.fontsize': 10,
    'xtick.labelsize': 10,
    'ytick.labelsize': 10,
    'text.usetex': False,
   }
plt.rcParams.update(params)