# Conformational sampling of Alanine Dipeptide with weighted ensemble Hamiltonian annealing
Authors: Baily Ford<br>
Email:&nbsp;&nbsp; bwf15@pitt.edu

## Setup

Welcome! This notebook is made with the assumption that you're working on `jupyter.crc.pitt.edu` (or `hub.crc.pitt.edu`) and have setup your virtual environments using https://github.com/Bailyford/drug-design2025_weha. If not, follow the steps in "[Setting up the virtual environment](#Setting-up-the-virtual-environment)".

To start, make sure it says `drug-design2025_weha` on the top right hand corner! If it doesn't, click on `Python 3 (ipykernel)` (Or whatever it currently is) and select `drug-design2025_weha` from the drop down menu as your preferred kernel. Click Select.

![image](./img/Notebook-kernel.png)

### Setting up the virtual environment

Follow the instructions in venv_instructions.pdf. Here are the condensed steps:

Start a terminal (Should be an option in the "Launcher" tab)
run python -m pip install nglview
Reset the Jupyter Server.
run cd ~
run git clone https://github.com/Bailyford/drug-design2025_weha
run cd drug-design2025_weha
run bash run_bash.sh then wait. It might take a short while.
run bash activate_env.sh

## Learning Objectives

- modify an AMBER parameter file to change force field calculations
- generate starting structures
- prepare simulation files for conformational sampling with WEHA of Alanine Dipeptide
- submit a slurm job to the pitt crc cluster
- clustering conformations with Kmeans NANI
- use <code>WEDAP</code> to generate Ramanchandra plots of  
- Visualize the results of clustering with <code>matplotlib</code>
- calculate state populations using structure weights 

System Requirements
AmberTools24 is necessary to run this simulation.
Note that it is already installed as a module on H2P
numpy, matplotlib, ipympyl, mdtraj, and nglview are used in this jupyter notebook.
nglview, matplotlib, ipympl, and numpy are optional for visualization purposes.
These are already installed as part of the virtual environment.

## TABLE OF CONTENTS

[Introduction](#Introduction)

Day 1:

[1. Scaling the Hamiltonian and generating initial states](#1.-Scaling-the-Hamiltonian-and-generating-initial-states)

[2. Prepare WEHA input files](#2.-Prepare-WEHA-input-files)

[3. Running WEHA Simulation](#3.-Run-WEHA-Simulation)

Day 2:
[4. Visualize the Results](#4.-Visualize-the-results)

[5. Analyze the MD Results](#5.-Analyze-the-MD-results)

## Introduction

This tutorial is designed to provide an introduction to conformational sampling of biomolecules with the weighted ensemble Hamiltonian Annealing (WEHA) method of the WESTPA software package. It is designed for alanine dipeptide in AMBER 24. This notebook is designed with the assumption that you are working within a virtual environment on the H2P Cluster at Pitt.

AMBER stands for Assisted Model Building and Energy Refinement. It refers not only to the molecular dynamics programs, but also a set of force fields that describe the potential energy function and parameters of the interactions of biomolecules.

In order to run a Molecular Dynamics simulation in Amber, each molecule's interactions are described by a molecular force field. WEHA aims to yield Boltzmann-weighted ensembles by reducing the energy barrier between conformations induced by torsional and non-bonded interactions. This scaling is gradually reduced with frequent resampling in which trajectories with large weights are duplicated and trajectories with small weights are terminated.
We will be using Amber's gpu accelerated <code>pmemd.cuda</code> to handle all dynamics propogation.


## 1. Scaling the Hamiltonian and generating initial states

The first step to creating your basis (starting) states is to properly scale the Hamiltonian. This is done by introducing an alchemical scaling factor, λ, to the torsional and non-bonded terms. We hav to be careful with our scaling, however, as irresponsible scaling can result in the solvent box evaporating. As such, we should avoid scaling any non-bonded interactions that willl aid in conformational sampling. In this case, only solute-solute and solute-solvent interactions are scaled. Thus, we need to figure out how each Lennard-Jones interaction is calculation. Put simply, atoms with similar vander wal radii and well depth are grouped together and assigned a numeric type index. In the prmtop file, the are values for interactions between each type index. In this case, we woant to scale everything except for water-water interactions. We can figure out which index values correspond to each atom group using PARMED as shown below.

In [30]:
import parmed
from parmed.tools import printLJTypes
from parmed.amber import AmberParm
parm = AmberParm('common_files/diala.prmtop')
action = printLJTypes(parm)
action.execute()
print('%s' % action)


  ATOM NUMBER   NAME TYPE
---------------------------------------------
ATOM 1          H1   HC  : Type index: 1
ATOM 2          CH3  CT  : Type index: 2
ATOM 3          H2   HC  : Type index: 1
ATOM 4          H3   HC  : Type index: 1
ATOM 5          C    C   : Type index: 3
ATOM 6          O    O   : Type index: 4
ATOM 7          N    N   : Type index: 5
ATOM 8          H    H   : Type index: 6
ATOM 9          CA   XC  : Type index: 2
ATOM 10         HA   H1  : Type index: 7
ATOM 11         CB   CT  : Type index: 2
ATOM 12         HB1  HC  : Type index: 1
ATOM 13         HB2  HC  : Type index: 1
ATOM 14         HB3  HC  : Type index: 1
ATOM 15         C    C   : Type index: 3
ATOM 16         O    O   : Type index: 4
ATOM 17         N    N   : Type index: 5
ATOM 18         H    H   : Type index: 6
ATOM 19         C    CT  : Type index: 2
ATOM 20         H1   H1  : Type index: 7
ATOM 21         H2   H1  : Type index: 7
ATOM 22         H3   H1  : Type index: 7
ATOM 23         O    OW  

We can modify the proper terms using the python script 'modprmtop.py' that is located within the common_files/ subdirectory. Modprmtop.py takes 4-6 arguments based on the flage chosen. For all flags, the arguments are as follows: the value of λ, the prmtop file flag to modify, the prmtop file being modified, and the new prmtop file in that order. If the flag is CHARGE, an addition argument is needed for the number of atoms we wish to scale. In this case, this is all non-water charges. If the flag is a lennard_jones parameter, a 5th and 6th parameter are needed that specify the last soulte LJ type (5) and the last solvent LJ type (6). In the code below are some dummy arguments (X) for the additional arguments. Using the results you found from parmed, fill in these values.

In [35]:
!python common_files/modprmtop.py 0.2 CHARGE common_files/diala.prmtop common_files/mod.prmtop X
!python common_files/modprmtop.py 0.2 DIHEDRAL_FORCE_CONSTANT common_files/mod.prmtop common_files/mod.prmtop
!python common_files/modprmtop.py 0.2 LENNARD_JONES_ACOEF common_files/mod.prmtop common_files/mod.prmtop X X
!python common_files/modprmtop.py 0.2 LENNARD_JONES_BCOEF common_files/mod.prmtop common_files/mod.prmtop X X

!diff common_files/diala.prmtop common_files/mod.prmtop

156,160c156,160
<   2.04636429E+00 -6.67300626E+00  2.04636429E+00  2.04636429E+00  1.08823576E+01
<  -1.03484442E+01 -7.57501011E+00  4.95464337E+00  6.14091510E-01  1.49969529E+00
<  -3.32556975E+00  1.09880469E+00  1.09880469E+00  1.09880469E+00  1.08841798E+01
<  -1.03484442E+01 -7.57501011E+00  4.95464337E+00 -2.71512270E+00  1.77849648E+00
<   1.77849648E+00  1.77849648E+00  0.00000000E+00  1.23755293E+01  1.23755293E+01
---
>   4.09272858E-01 -1.33460125E+00  4.09272858E-01  4.09272858E-01  2.17647152E+00
>  -2.06968884E+00 -1.51500202E+00  9.90928674E-01  1.22818302E-01  2.99939058E-01
>  -6.65113950E-01  2.19760938E-01  2.19760938E-01  2.19760938E-01  2.17683596E+00
>  -2.06968884E+00 -1.51500202E+00  9.90928674E-01 -5.43024540E-01  3.55699296E-01
>   3.55699296E-01  3.55699296E-01  0.00000000E+00  1.23755293E+01  1.23755293E+01
2287,2289c2287,2289
<   2.00000000E+00  2.50000000E+00  0.00000000E+00  0.00000000E+00  0.00000000E+00
<   0.00000000E+00  8.00000000E-01  8.00000000E

To generate suitable initial states, we need to create an ensemble of scaled down structures. The simplest way to do this is to gradually reduce the torsional and non-bonded interactions in our system with sufficient equilibration. This is analogous to equilibrating a system at constant pressure in whihc you are gradually removing restraints. A subsequent, brief conventional md simulation is then ran with frames frequently saved that will serve as the starting structures. Because alanine dipeptide is a small molecule, we should not need to impose any restraints. To save time, an equilibrated, reduced hamiltonian, restart file has been provided in 'common_files/initial.ncrst.' An example of the code used to generate this state is shown below, but **note that it does not need to be executed for the purposes of this tutorial**.

In [None]:
#Example code Do not run!

for x in $(seq 0.95 -0.05 0.20 | awk '{printf "%.2f\n", $1}'); do
    python modprmtop.py $x CHARGE diala.prmtop mod.prmtop 22 &&\
    python modprmtop.py $x DIHEDRAL_FORCE_CONSTANT mod.prmtop mod.prmtop &&\
    python modprmtop.py $x LENNARD_JONES_ACOEF mod.prmtop mod.prmtop 7 10 &&\
    python modprmtop.py $x LENNARD_JONES_BCOEF mod.prmtop mod.prmtop 7 10 &&\

    if [[ "$x" == "0.95" ]]; then
        prev_x="1"
    else
        prev_x=$(printf "%.2f" $(echo "$x + 0.05" | bc -l))
    fi

    pmemd.cuda -O -i lambda_equil.in -o lambda_${x}.out -p mod.prmtop \
        -c ${prev_x}.ncrst -r ${x}.ncrst -x ${x}.nc -inf ${x}.info -AllowSmallBox
done
    

### Generating Starting Structures

Now that we have a scaled starting structure, we can use this structure to generate the rest. We will do so by starting a brief conventional, scaled, md run from initial.ncrst in the common_files subdirectory using pmemd. We will save the trajectory and write a new restart file every 2 ps. The Langevin thermostat will be used to control the temperature. The random number generator will be initialized with a random seed.

To control all these settings, we will write a simple input file in a text editor. Unix has many text editors available, but we will use the one built into Jupyter Lab.

The following cell will create the file. You may open (and edit) the file by double clicking it in the file browser to the left. Edit it to your liking (and press Cmd/Ctrl+s to save). You may edit the rate restart files are saved to adjust the number of starting states to your liking. Try picking a value that you believe will both best represent a Boltzmann distribute (many entries) while keeping the simulation time managable. Amber will write a new file every ntwr steps. The final count is equal to nstlim/ntwr. Your final count should fall between 50-200 starting states. Be sure that ntwr is a negative value. This lets Amber know to save a new file rather than overwrite a previous file. The output files will be enumerated by the frame they were written on and can be rewritten later. To keep time manageable, only change the value of ntwr. A set of starting structures are provided in the supplemental_data subdirectory if needed. 

In [36]:
%%writefile common_files/get_bstates.in
1 ns unrestrained NPT equilibration using Langevin thermostat and MC barostat
 &cntrl
  irest     = 1,
  ntx       = 5,
  ig        = -1,
  dt        = 0.002,
  nstlim    = 500000,
  nscm      = 500,
  ntr       = 0,
  ntb       = 2,
  ntp       = 1,
  barostat  = 2,
  pres0     = 1.0,
  mcbarint  = 100,
  comp      = 44.6,
  taup      = 1.0,
  ntt       = 3,
  temp0     = 300.0,
  gamma_ln  = 1.0,
  ntf       = 2,
  ntc       = 2,
  cut       = 10.0,
  ntpr      = 1000,
  ntxo      = 2,
  ntwr      = -x,
  ioutfm    = 1,
  ntwx      = 10000,
  iwrap     = 1,
 /



Writing get_bstates.in


In [37]:
!module load gcc/10.2.0 openmpi/4.1.1 amber/24 &&\
pmemd -O -i common_files/get_bstates.in -o bstates.out -p common_files/mod.prmtop -c common_files/initial.ncrst -x production.nc -inf base.info -AllowSmallBox


/bin/bash: module: command not found


### Assigning Initial Weights

Now that you have generated an ensemble of structures to start with, it is time to assign each structure a weight proportional to its Boltzmann factor. In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution[1]) is a probability distribution or probability measure that gives the probability that a system will be in a certain state as a function of that state's energy and the temperature of the system. The distribution is expressed in the form: pi ∝ exp(εi/kT)

where pi is the probability of the system being in state i, exp is the exponential function, εi is the energy of that state, and a constant kT of the distribution is the product of the Boltzmann constant k and thermodynamic temperature T. In our case, the system is the ensemble of alanine dipeptide states. Any state that is more stable (lower energy) will have a higher probability of occuring. Because our solute is small and the number of water molecules and system temperature are constant, we can assume that any differences in energy between our systems from the solvent is negligible. Because the Boltzmann distribution is an absolute relationship between energy and probability, we are able to standardize our energies so that we are concerned with the differences in potential energy between states.

We can get this energy by using the CPPTRAJ toolkit, a software package for analyzing AMBER simulations. 

In [None]:
!module load gcc/10.2.0 openmpi/4.1.1 amber/24 &&\

counter = 1
!for file in restrt_*; do
    new_name=$(printf "%02d" $counter)
    #rename the file
    mv "$file" "bstates/$new_name"
    counter=$((counter + 1))
!for x in $(seq -f "%02d" 1 100); do
 cpptraj $top << EOF
parm common_files/mod.prmtop
trajin bstates/$x
autoimage
strip :WAT
energy :1-22 out energy_$x.dat
go
EOF
done


While WEHA lacks the traditional progress coordinate that weighted ensemble uses, Westpa requires that we have a progress coordinate that defines how our system progresses. In this case we will use the dihedral angle phi. This will be explained in more detail later, but for now we need to create two files: pcoord.init and bstates.txt. pcoord.init will contain the initial phi angle and energy for each state. bstates.txt will contain the starting weight and location of each state. First, lets make pcoord.init by using MDtraj to calculate the phi angel

In [39]:
import mdtraj as md
import numpy as np
import pandas as pd
import csv
import mdtraj

names = []
energy = []
phi_list = []

# Iterate over frames with zero-padded filenames
for i in range(1, 101):
    # Zero-padding file names
    names.append(f"{i:02d}.ncrst")

for i in range(1, 101):
    # Read energy file and store energy in the list
    energy_data = pd.read_csv(f'energy_{i:02d}.dat', delim_whitespace=True)
    energy.append(energy_data.iloc[0, 9])

for i in range(1, 101):
        traj = mdtraj.load('bstates/'str(i).zfill(2)+'.ncrst', top = 'common_files/mod.prmtop')
        _, phi = mdtraj.compute_phi(traj)
        phi = np.squeeze(phi) *180/np.pi
        phi_list.append(phi)

# Create a DataFrame
df = pd.DataFrame({'file': names, 'phi': phi_list, 'energy': energy})

df['energy'] = df['energy']-df['energy'].min()
# Save DataFrame to a file without brackets around RMSD values
df.to_csv('bstates/pcoord.init', sep='\t', index=False, header=True, float_format='%.6f')

# Print the DataFrame
print(df)



FileNotFoundError: [Errno 2] No such file or directory: '../energy_100000.dat'

And now the bstates.txt

In [None]:
import math
import pandas as pd
import numpy as np
from scipy.special import logsumexp
df = pd.read_csv('bstates/pcoord.init', delim_whitespace=True)
prenormalized_weight = []
weight = []
for i in range(0, 100):
        x = np.exp((float(df.iloc[i, 2])/(-0.002*300)))
        prenormalized_weight.append(x)
print(prenormalized_weight)
partition = sum(prenormalized_weight)
for x in prenormalized_weight:
        final_weights = x/partition
        weight.append(final_weights)
bstate = []
for x in range(1, 101):
        bstate.append(f"{x:02d}")
rd = []
print(len(weight))
for y in range(1, 101):
        rd.append(f"{y:02d}")
df2 = pd.DataFrame({'file': bstate, 'weight': weight, 'rd': rd})
df2.to_csv('bstates/bstates.txt', sep='\t', index=False, header=False)
print(df2)


## 2. Prepare WEHA input files

As a subset of the WESTPA software package, there are a few additional files that are required to configure, propogate, and resample our trajectories. Most of these files are consistent between weighted ensemble simulations and will be ignored for the purposes of this tutorial. The files that are required that we will look at are the westpa configuration file, west.cfg, the custom WEHA resampler, resampler.py, the AMBER dynamics production file, production.in, calculation of the dihedral angles phi and psi with get_dihedrals.py, and the script runseg.sh that controls the iterative processes of WESTPA.

### West.cfg

This is the configuration file that controls the parameters of the WEHA simulation. There are three main sections to focus on: drivers, system, and data. The drivers section destermines both the root directory for the simulation as well as which algorithm we will be using to handle our resampling. As we are using a custom resampler, this value should be set to resampler.CustomDriver.

The system section handles the dimensionality of the data, the size of the dataset, its data type, and specifies how we wish to define our binning. As WEHA is efefctively a binless, progress coordinate-less approach to weighted ensemble, we should define our bins so that our entire progress coordinate falls within a singular bin. Becasue our progress coordinate is the one-dimensional dihedral angle 'phi', our bin will have the shape [-180, 180]. If desired, [0, 360] is also acceptable if that notion is kept consistent elsewhere.

The most important section is data. Here we have to define all datasets that will be used in the simulation. For WEHA we need to store the to dihedral angles 'phi' and 'psi' as well as the total energy, each on the constituent energies (angle, bond, dihedral, van der Waals, and electrostatics), and the three-dimensional coordinates of each structure. 

In [None]:
%%writefile west.cfg
# The master WEST configuration file for a simulation.
# vi: set filetype=yaml :
---
west: 
  drivers:
    module_path: $WEST_SIM_ROOT
    we_driver: resampler.CustomDriver
  system:
    driver: westpa.core.systems.WESTSystem
    system_options:
      # Dimensionality of your progress coordinate
      pcoord_ndim: 1
      # Number of data points per iteration
      pcoord_len: 10
      # Data type for your progress coordinate 
      pcoord_dtype: !!python/name:numpy.float32
      # begin fixed binning
      bins:
        type: RectilinearBinMapper
#        # The edges of the bins 
        boundaries:         
          -  [-180, 180]
      bin_target_counts: 100
  propagation:
    max_total_iterations: 10000
    max_run_wallclock:    100:00:00
    propagator:           executable
    gen_istates:          false
  data:
    west_data_file: west.h5
    datasets:
       - name:        pcoord
         scaleoffset: 4
       - name: energy
         scaleoffset: 4
         dtype: float32     
       - name: dih_energy
         scaleoffest: 4
         dtype: float32
       - name: vdw_energy
         scaleoffest: 4
         dtype: float32
       - name: elec_energy
         scaleoffest: 4
       - name: bond_energy
         scaleoffset: 4
         dtype: float32
       - name: angle_energy
         scaleoffset: 4
         dtype: float32
       - name: coordinates
         scaleoffset: 4
         dtype: float32
       - name: psi
         scaleoffset: 4
         dtype: float32
    data_refs:
#      iteration:     $WEST_SIM_ROOT/traj_segs/iter_{n_iter:06d}.h5
      segment:       $WEST_SIM_ROOT/traj_segs/{segment.n_iter:06d}/{segment.seg_id:06d}
      basis_state:   $WEST_SIM_ROOT/bstates/{basis_state.auxref}
      initial_state: $WEST_SIM_ROOT/istates/{initial_state.iter_created}/{initial_state.state_id}.ncrst
  plugins:
  executable:
    environ:
      PROPAGATION_DEBUG: 1
    datasets:
      - name: energy
        enabled: true
      - name: dih_energy
        enabled: true
      - name: vdw_energy
        enabled: true
      - name: elec_energy
        enabled: true
      - name : bond_energy
        enabled: true
      - name: angle_energy
        enabled: true
      - name: coordinates
        loader: npy_loader
        enabled: true
        filename: $WEST_SIM_ROOT/traj_segs/{segment.n_iter:06d}/{segment.seg_id:06d}/coordinates.npy
        dir: true
      - name: psi
        enabled: true
    propagator:
      executable: $WEST_SIM_ROOT/westpa_scripts/runseg.sh
      stdout:     $WEST_SIM_ROOT/seg_logs/{segment.n_iter:06d}-{segment.seg_id:06d}.log
      stderr:     stdout
      stdin:      null
      cwd:        null
      environ:
        SEG_DEBUG: 1
    get_pcoord:
      executable: $WEST_SIM_ROOT/westpa_scripts/get_pcoord.sh
      stdout:     /dev/null #$WEST_SIM_ROOT/get_pcoord.log
      stderr:     stdout
    gen_istate:
      executable: $WEST_SIM_ROOT/westpa_scripts/gen_istate.sh
      stdout:     /dev/null 
      stderr:     stdout
    post_iteration:
      enabled:    true
      executable: $WEST_SIM_ROOT/westpa_scripts/post_iter.sh
      stderr:     stdout
    pre_iteration:
      enabled:    false
      executable: $WEST_SIM_ROOT/westpa_scripts/pre_iter.sh
      stderr:     stdout
 

### Resampler.py

This files is responsible for handling the custom resampling and is the most important file for the WEHA method. This files is quite large so we will not be analyzing it in depth, but it is important to note that this file handles an adaptive annealing schedule, reweighting segments, and splitting/mergeing trajectories. In jupyterhub, open resampler.py. On lines 22-25, change the variables so that they match the number of starting states, initial lambda value, and the desired effective sample size.
The effective sample size is a measure of how correlated your data is. The larger the value, the less autocorrelation. We calculate this with Kish's approximation: 

$$ ESS = (\sum^{N}_{i=1}{p_i})^2/(\sum^{N}_{i=1}{p_i^2})$$ 
where $p_i$ is the probability of state i and N is the total number of trajectories. The closer the trajectories weights are, the closer the ESS approaches N. Functionally, this controls the rate of annealing and is a generalized cost vs rewards function. A large ESS promotes a slower annealing with multiple iterations with smaller λ increments. Note that a larger ESS leads to betetr convergence and consistency, but is ultimately subjected to diminishing returns. An ESS of 99 vs 98 is only marginally more consistent yet takes about 2 additional hours to run. Try picking an effective sample size that you believe will work well for the chosen number of starting states. Do note that ESS must be less than N.

### Production.in

This is AMBER input script that handles all propogation. We should establish constant temperature and pressure, NPT, a Monte Carlo barostat, and Langevin thermostat. This script runs for 100 ps and saves information every 10 ps. Be sure that ig=RAND is set as this indicates to WESTPA to use its own random number generator.

In [None]:
%%writefile common_files/production.in
Production file
&cntrl
 imin=0,
 irest=1,
 ntx=5,
 ntpr=5000,
 ntwx=5000,
 ntwr=5000,
 nstlim=50000,
 dt=0.002,
 ntt=3,
 tempi=300,
 temp0=300,
 gamma_ln=1.0,
 ig=RAND,
 ntp=1,
 ntc=2,
 ntf=2,
 cut=9,
 ntb=2,
 iwrap=1,
 ioutfm=1,
 barostat=2,
 pres0=1.0,
 mcbarint=100,
 comp=44.6,
 taup=1.0
&end



### Get_Dihedrals.py

This script will use MDTraj to compute the dihedral angles of each structure at the ened of each propogation. This file should save angles for each saved frame for both 'phi' and 'psi' in an easily differentiable way. This can either be done as two seperate files as shown below, or as a singular file with 2 columns. 

In [None]:
%%writefile common_files/production.in
import numpy as np
import mdtraj
phi_list = []
psi_list = []
traj = mdtraj.load('seg.nc', top = 'mod.prmtop')
_, phi = mdtraj.compute_phi(traj)
_, psi = mdtraj.compute_psi(traj)
phi = np.squeeze(phi) *180/np.pi
psi = np.squeeze(psi)*180/np.pi
np.savetxt('phi.dat', phi)
np.savetxt('psi.dat', psi)


### Runseg.sh

This file is located in the westpa_scripts subdirectory and specifies what happens over the course of an iteration. We first want to create a modified prmtop file to use for the upcoming propogation, so we execute those commands first. We then replace RAND in the AMBER submission script with a seed generated from WESTPA's internal RNG. We then run AMBER. Afterwards, we compute all the relevant data and send that data to the corresponding datasets defined in the west.cfg file.

In [41]:
%%writefile westpa_scripts/runseg.sh

#!/bin/bash

if [ -n "$SEG_DEBUG" ] ; then
  set -x
  env | sort
fi

cd $WEST_SIM_ROOT
mkdir -pv $WEST_CURRENT_SEG_DATA_REF
cd $WEST_CURRENT_SEG_DATA_REF
tempF=$(tail -n 1 $WEST_SIM_ROOT/common_files/lambda.dat | awk '{print $2}')
python $WEST_SIM_ROOT/common_files/modprmtop.py $tempF CHARGE $WEST_SIM_ROOT/common_files/diala.prmtop ./mod.prmtop 22
python $WEST_SIM_ROOT/common_files/modprmtop.py $tempF DIHEDRAL_FORCE_CONSTANT ./mod.prmtop ./mod.prmtop
python $WEST_SIM_ROOT/common_files/modprmtop.py $tempF LENNARD_JONES_ACOEF ./mod.prmtop ./mod.prmtop 7 10
python $WEST_SIM_ROOT/common_files/modprmtop.py $tempF LENNARD_JONES_BCOEF ./mod.prmtop ./mod.prmtop 7 10
 
#python $WEST_SIM_ROOT/common_files/modify_data_with_lambda.py $tempF $WEST_SIM_ROOT/common_files/p1_nowat.prmtop ./p1_nowat_mod.prmtop


ln -sv $WEST_SIM_ROOT/common_files/diala.prmtop .
ln -sv $WEST_SIM_ROOT/common_files/mod.prmtop .
ln -sv $WEST_SIM_ROOT/common_files/diala.pdb .

#echo $WEST_PARENT_DATA_REF

if [ "$WEST_CURRENT_SEG_INITPOINT_TYPE" = "SEG_INITPOINT_CONTINUES" ]; then
  sed "s/RAND/$WEST_RAND16/g" $WEST_SIM_ROOT/common_files/production.in > production.in
#  sed "s/RAND/$WEST_RAND16/g" $WEST_SIM_ROOT/common_files/md.in > md.in
  ln -sv $WEST_PARENT_DATA_REF/seg.ncrst ./parent.ncrst
elif [ "$WEST_CURRENT_SEG_INITPOINT_TYPE" = "SEG_INITPOINT_NEWTRAJ" ]; then
  sed "s/RAND/$WEST_RAND16/g" $WEST_SIM_ROOT/common_files/production.in > production.in
  #sed "s/RAND/$WEST_RAND16/g" $WEST_SIM_ROOT/common_files/md.in > md.in
  cp $WEST_PARENT_DATA_REF.ncrst ./parent.ncrst
  #ln -sv $WEST_PARENT_DATA_REF/bstate.ncrst ./parent.ncrst
fi

export CUDA_DEVICES=(`echo $CUDA_VISIBLE_DEVICES_ALLOCATED | tr , ' '`)
export CUDA_VISIBLE_DEVICES=${CUDA_DEVICES[$WM_PROCESS_INDEX]}

echo "RUNSEG.SH: CUDA_VISIBLE_DEVICES_ALLOCATED = " $CUDA_VISIBLE_DEVICES_ALLOCATED
echo "RUNSEG.SH: WM_PROCESS_INDEX = " $WM_PROCESS_INDEX
echo "RUNSEG.SH: CUDA_VISIBLE_DEVICES = " $CUDA_VISIBLE_DEVICES


#tempP=$(awk -v "iter=$WEST_CURRENT_ITER" 'NR==iter' $WEST_SIM_ROOT/common_files/temp.dat | awk '{print $2}')


echo $tempF $WEST_SIM_ROOT

$PMEMD  -O -i production.in   -p mod.prmtop -c parent.ncrst \
           -r seg.ncrst -x seg.nc      -o seg.log    -inf seg.nfo -AllowSmallBox

#DIST=$(mktemp)
COMMAND="         parm mod.prmtop\n"
COMMAND="$COMMAND trajin $WEST_CURRENT_SEG_DATA_REF/seg.nc\n"
COMMAND="$COMMAND autoimage \n"
COMMAND="$COMMAND strip :WAT \n"
COMMAND="$COMMAND energy  @1-22  out energy.dat \n"
COMMAND="$COMMAND trajout nowater.nc \n"
COMMAND="$COMMAND go\n"
python $WEST_SIM_ROOT/common_files/get_dihedrals.py
echo -e $COMMAND | $CPPTRAJ
#python $WEST_SIM_ROOT/common_files/shift_energy.py
cat phi.dat > $WEST_PCOORD_RETURN
cat psi.dat > $WEST_PSI_RETURN
cat energy.dat | tail -n +2 | awk '{print $4}' > $WEST_DIH_ENERGY_RETURN
cat energy.dat | tail -n +2 | awk '{print ($5 + $7)}' > $WEST_VDW_ENERGY_RETURN
cat energy.dat | tail -n +2 | awk '{print ($6 + $8)}' > $WEST_ELEC_ENERGY_RETURN
cat energy.dat | tail -n +2 | awk '{print $2}' > $WEST_BOND_ENERGY_RETURN
cat energy.dat | tail -n +2 | awk '{print $3}' > $WEST_ANGLE_ENERGY_RETURN
python $WEST_SIM_ROOT/common_files/get_energy.py > $WEST_ENERGY_RETURN
#cat $RMSD | tail -n +2 | awk {'print $2 , $3'} > $WEST_AUX_RETURN
python $WEST_SIM_ROOT/common_files/get_coordinates.py

Overwriting westpa_scripts/runseg.sh


## 3. Running WEHA Simulation

### Initialization

Before running any weighted ensemble simulation, you should initialize the simualtion. This cleanses the directory or any old simulation data and prepares the system for new data. Initialization is performed by executing the command below. If you need to restart your simulation for any reason, it is a good idea to reinitialize before hand. Be warned that any data deleted cannot be recovered, so only initialize if you want a fresh restart and have saved all relevent data. You may ignore the conda error and module errors if executed from the block below instead of from the submission script.

In [43]:
!bash init.sh

env.sh: line 6: module: command not found
env.sh: line 8: module: command not found
/home/bwf15/miniconda3/envs/AmberTools23
/home/bwf15/miniconda3/envs/AmberTools23/bin/pmemd.cuda

CondaError: Run 'conda init' before 'conda activate'

/home/bwf15/weha_tutorial
simulation weha_tutorial root is /home/bwf15/weha_tutorial
--bstate-file /home/bwf15/weha_tutorial/bstates/bstates.txt
Updating system with the options from the configuration file
Creating HDF5 file '/home/bwf15/weha_tutorial/west.h5'
0 target state(s) present
Calculating progress coordinate values for basis states.
100 basis state(s) present
Calculating progress coordinate values for start states.
0 start state(s) present
Preparing initial states

        Total bins:            1
        Initial replicas:      100 in 1 bins, total weight = 1
        Total target replicas: 100
        
1-prob: 0.0000e+00
Simulation prepared.
1 of 1 (100.000000%) active bins are populated
per-bin minimum non-zero probability:       1
per-bin maxi

### Running WEHA

We are now ready to run the WEHA simulation. We will be using 1 gpu node with 4 cores on pitt-crc. To submit a slurm job, we need a submisison script, environment file, and node configuration files. These have been provided in the base directory. This should take a little under a day to run. You can view the progress of your submission by executing the command squeue -u [your pitt username]-M teach. For example, squeue -u bwf15 -M teach would display any jobs running from the bwf15 account. Ideally you should have data to analyze for the second part of the lab, but if something happens, data will be provided. If you notice some issue, feel free to resubmit the job, or reach out to Baily with the error, and he can help restart the job. To start the simulation, execute the block below. This will submit a job request. For more information on the slurm, please refer to crc user manual: https://crc-pages.pitt.edu/user-manual/.

In [None]:
!sbatch run.sh