#### GROMACS for production
(See also: [`GROMACS_for_CHARMM-GUI.ipynb`](https://colab.research.google.com/github/bioinfkaustin/gromacs-on-colab/blob/main/GROMACS_for_CHARMM-GUI.ipynb).)

#### Documentation
Please click ***↳ cells hidden*** below to show the documentation for this notebook.

##### License

> This notebook as a work of software is licensed under the terms of the [AGPL-3.0](https://opensource.org/licenses/AGPL-3.0) or later.

##### About this software

> This notebook runs or extends a **GROMACS production simulation**.
>
> It operates within a **run folder** containing a `grompp.mdp` simulation parameterisation file alongside its prerequisites (the initial condition `conf.gro`, the labels `index.ndx`, the topology `topol.top` and `toppar/`, and so on). 

#### Configuration

In [None]:
import os
import re
import shutil

#@markdown Specify the location of the **GROMACS project folder** to simulate. It should contain at least the input files `conf.gro` and `topol.top` (and probably `toppar/`), as well as a simulation parameters file `grompp.mdp` with any secondary inputs it references such as `index.ndx` or `restraint.gro`.
project_folder = "{GoogleDrive}/GROMACS/7FBF_FABPH_vs_octanoic_acid" #@param {type: "string"}
# default: {GoogleDrive}/GROMACS/7FBF_FABPH_vs_octanoic_acid

#@markdown Choose for how long the simulation should run.
simulation_duration_ns = 10 #@param {type: "integer"}
# default: 10

#@markdown Provide a unique filename prefix for this simulation. 
output_prefix = "sim" #@param {type: "string"}
# default: sim

#@markdown If applicable, please see also the advanced settings below. **After filling in this form, run the notebook by clicking *Runtime -> Run all* in the toolbar.**

#@markdown \
#@markdown **Early stopping**
#@markdown 
#@markdown Optionally, a group may be specified for which the RMSD from the initial conformation should be monitored. The run stops if a threshold RMSD (Angstroms) is exceeded. 
rmsd_group = "" #@param {type: "string"}
rmsd_early_stop_threshold_A = 12.0 #@param {type: "number"}
# default: 12.0


#
# Google Drive
#

if not os.path.isdir("/content/drive/MyDrive") and project_folder.startswith("{GoogleDrive}"):
  from google.colab import drive
  drive.mount("/content/drive")


#
# Validate the input values
#


def google_drive_format(folder):
  if "{GoogleDrive}" in folder:
    if not folder.startswith("{GoogleDrive}"):
      raise ValueError(f"Error: {{GoogleDrive}} is a path prefix, but appears later: {folder}")
    if not os.path.isdir("/content/drive/MyDrive"):
      from google.colab import drive
      drive.mount("/content/drive")
  return folder.format(GoogleDrive="/content/drive/MyDrive")
  #             ^^^ raises KeyError if any {...} placeholder is present except {GoogleDrive}


# project_folder

project_folder = os.path.abspath(google_drive_format(project_folder.strip()))


# simulation_duration_ns

if simulation_duration_ns <= 0:
  raise RuntimeError(f"Error: simulation duration must be more than 0 ns, but got: {simulation_duration_ns} ns")

simulation_duration_ps = 1000 * simulation_duration_ns


# rmsd_group


# rmsd_early_stop_threshold

if rmsd_early_stop_threshold_A is None:
  rmsd_early_stop_threshold_A = 0.0

rmsd_early_stop_threshold_nm = rmsd_early_stop_threshold_A / 10.


# output_prefix

if not output_prefix:
  raise RuntimeError("Error: an output prefix must be provided")

if not re.match(r"^[0-9a-zA-Z]+$", output_prefix):
  raise RuntimeError(f"Error: special characters are not allowed in output prefix, but got: {output_prefix}")

if output_prefix in ("pre", "lig"):
  raise RuntimeError(f"Error: reserved output prefix: {output_prefix}")


#
# Make sure that the notebook is in the start folder
#

if "START" not in os.environ or not os.environ["START"]:
  %env START={os.getcwd()}
else:
  %cd {os.environ["START"]}


#
# Use a clean scratch directory for the rest of the run
#

try:
  shutil.rmtree("scratch")
except FileNotFoundError:
  pass
os.makedirs("scratch")
%cd "scratch"

#### Input

In [None]:
%%bash -s "$project_folder" "$output_prefix"
project_folder="$1"
output_prefix="$2"
#@markdown Extract the system from the project folder.

if [[ ! -z "$(ls -A)" ]]; then
  exit 0  # already extracted
fi

if [[ ! -d "${project_folder}" ]]; then
  echo "Error: folder not found: ${project_folder}" >&2
  exit 1
fi

pushd "${project_folder}"

cp "grompp.mdp"  "conf.gro"  "restraint.gro"  "index.ndx"  "topol.top"  "${START}/scratch/" 2> /dev/null

top_level_dir="$(realpath .)"
function get_includes_recursively {
  f="$1"
  sed -E "/^#include/!d; s/^#include +['\"]//; s/['\"]$//" "${f}" | while read -r g; do
    d="$(dirname "${g}")"
    b="$(basename "${g}")"
    pushd "${d}" > /dev/null
    get_includes_recursively "${b}"
    echo "$(realpath --relative-to="${top_level_dir}" "${b}")"
    popd > /dev/null
  done
}
get_includes_recursively "topol.top" | while read -r f; do
  cp --parents "${f}" "${START}/scratch/"
done

cp "${output_prefix}".*  "#${output_prefix}.log".*"#"  "${START}/scratch/" 2> /dev/null

popd

if [[ ! -s "grompp.mdp" || ! -s "conf.gro" || ! -s "topol.top" ]]; then
  echo "Error: essential files not found: grompp.mdp, conf.gro, topol.top" >&2
  exit 1
fi

#### Installation

In [None]:
#@markdown In the following cells, applications are preferentially downloaded from a **persistent cache** in your Google Drive. If not found there, they will be downloaded from the internet and compiled and/or installed instead.
#@markdown
#@markdown This cell sets up the cache folder.

if os.path.isdir("/content/drive/MyDrive"):
  storage = "/content/drive/MyDrive/gromacs-on-colab"
  os.makedirs(storage, exist_ok=True)
  %env STORAGE={storage}
else:
  %env STORAGE=/dev/null  # deliberately not a folder

In [None]:
%%bash
#@markdown **GROMACS** is downloaded and compiled from source code. (This takes a while.)

if [[ -d "/usr/local/gromacs" ]]; then
  exit 0  # already installed
fi

gromacs_vers="2023" #@param {type: "string"}
cache_gromacs="${STORAGE}/gromacs-${gromacs_vers}.tar.gz"

if [[ -s "${cache_gromacs}" ]]; then
  tar -xzf "${cache_gromacs}" -C "/usr/local"
else
  wget -q "ftp://ftp.gromacs.org/gromacs/gromacs-${gromacs_vers}.tar.gz"
  if [[ ! -s "gromacs-${gromacs_vers}.tar.gz" ]]; then
    echo "Error: could not download: gromacs-${gromacs_vers}.tar.gz" >&2
    exit 1
  fi
  tar -xzf "gromacs-${gromacs_vers}.tar.gz"
  rm "gromacs-${gromacs_vers}.tar.gz"

  cd "gromacs-${gromacs_vers}"
  mkdir "build"
  cd "build"
  cmake .. -DGMX_BUILD_OWN_FFTW=ON -DGMX_GPU=CUDA
  make -j $(nproc)
  make install # -> /usr/local/gromacs

  if [[ -d "$(dirname "${cache_gromacs}")" ]]; then
    tar -czf "my_gromacs.tar.gz" -C "/usr/local" "gromacs"
    mv "my_gromacs.tar.gz" "${cache_gromacs}"
  fi
fi

#### Simulation

In [None]:
%%writefile "run.bash"
project_folder="$1"
simulation_duration_ps=$2
rmsd_group="$3"
rmsd_early_stop_threshold_nm=$4
prefix="$5"
#@markdown Create a script to run the production simulation.

source "/usr/local/gromacs/bin/GMXRC.bash"
export GMX_MAXCONSTRWARN=-1

{
  echo "***"
  echo "${project_folder}"
  date "+%F %T"
  echo "---"
  echo "$(nproc) cores, $(free -m | awk 'NR == 2 { print $2 }') MiB"
  nvidia_smi="$(nvidia-smi --query-gpu="name,memory.total" --format="csv,noheader")"
  if (( $? == 0 )); then
    echo "$nvidia_smi"
  fi
  echo "***"
  echo ""
} | tee -a "${prefix}.summary"

# Get the runtime of each individual run
sim_dt=$(awk '$1 == "dt" { print $3 }' "grompp.mdp")
sim_nsteps=$(awk '$1 == "nsteps" { print $3 }' "grompp.mdp")
block_ps=$(perl -e "printf(\"%.0f\n\", ${sim_dt} * ${sim_nsteps})")

# Is this the first ever run?
if [[ -s "${prefix}.xtc" || -s "${prefix}.trr" ]]; then
  skip_because_continuation=true
else

  skip_because_continuation=false
  
  # Construct the `gmx grompp ...` command
  cmd=( gmx grompp -f "grompp.mdp" -o "${prefix}.tpr" -c "conf.gro" -p "topol.top" -maxwarn 999 )
  if [[ -s "restraint.gro" ]]; then
    cmd+=( -r "restraint.gro" )
  fi
  if [[ -s "index.ndx" ]]; then
    cmd+=( -n "index.ndx" )
  fi

  # Run `gmx grompp ...`
  "${cmd[@]}"


  #
  # Run a block of the simulation with `gmx mdrun ...`
  #

  {
    echo "Block: 0 ps to ${block_ps} ps / ${simulation_duration_ps} ps"
    date "+%F %T"
    echo ""
  } | tee -a "${prefix}.summary"
  
  gmx mdrun -v -stepout 1000 -deffnm "${prefix}"

fi

if [[ -s "${prefix}.xtc" ]]; then
  xtcext="xtc"
elif [[ -s "${prefix}.trr" ]]; then
  xtcext="trr"
else
  echo "Error: no trajectory found: ${prefix}.xtc or ${prefix}.trr" >&2
  exit 1
fi

while true; do

  if $skip_because_continuation; then
    skip_because_continuation=false
  else

    # Save useful performance information to the summary file
    {
      grep "^Performance: " -B3 -A1 "${prefix}.log"
      echo ""
    } | tee -a "${prefix}.summary"

    # Back up any files in $project_folder which might be overwritten in the following upload
    pushd "${project_folder}"
    for f in "${prefix}".*; do
      mv "${f}" "_${f}_" 2> /dev/null
    done
    popd

    # Upload the outputs from the previous block
    cp "${prefix}".*  "#${prefix}.log".*"#"  "${project_folder}/"

  fi

  # Get the current total time of the simulation
  read -r step_label frames frame_dt < <(gmx check -f "${prefix}.${xtcext}" 2>&1 | grep "^Step ")
  t=$(($frame_dt * ($frames - 1)))

  # Does this exceed the target time? If so, quit
  if (( $t >= $simulation_duration_ps )); then
    {
      echo "Detected: current simulation time ${t} ps exceeded target time ${simulation_duration_ps} ps"
      echo ""
    } | tee -a "${prefix}.summary"
    break
  fi

  # Get RMSD of specified group
  if [[ ! -z "${rmsd_group}" && ! -z "${rmsd_early_stop_threshold_nm}" ]] && perl -e "${rmsd_early_stop_threshold_nm} > 0.001 ? exit 0 : exit 1"; then
    gmx rms -s "${prefix}.tpr" -f "${prefix}.${xtcext}" -b $(($t - $block_ps)) < <(echo "C-alpha"; echo "${rmsd_group}")
    rmsd_alarm=$(sed '/^[#@&]/d; /^ *$/d' "rmsd.xvg" | awk "\$2 > ${rmsd_early_stop_threshold_nm}" | wc -l)

    # Does this exceed the threshold RMSD? If so, quit
    if (( $rmsd_alarm > 0 )); then
      {
        echo "Detected: RMSD of ${rmsd_group} exceeded threshold ${rmsd_early_stop_threshold_nm} nm"
        echo ""
      } | tee -a "${prefix}.summary"
      break
    fi
  fi


  #
  # Run another block of the simulation
  #

  {
    echo "Block: ${t} ps to $((${t} + ${block_ps})) ps / ${simulation_duration_ps} ps"
    date "+%F %T"
    echo ""
  } | tee -a "${prefix}.summary"
  
  gmx convert-tpr -s "${prefix}.tpr" -extend $block_ps -o "tprout.tpr"
  mv "tprout.tpr" "${prefix}.tpr"
  
  gmx mdrun -cpi "${prefix}.cpt" -v -stepout 1000 -deffnm "${prefix}"

done

{
  echo "***"
  echo "End"
  date "+%F %T"
  echo "***"
  echo ""
} | tee -a "${prefix}.summary"

gmx trjconv -f "${prefix}.${xtcext}" -s "${prefix}.tpr" -pbc whole -o "whole.xtc" <<< 0

gmx trjconv -f "whole.xtc" -s "${prefix}.tpr" -fit progressive -o "${prefix}_reference.xtc" < <(echo "C-alpha"; echo "0")

cp "${prefix}_reference.xtc" "${project_folder}/"

In [None]:
#@markdown Execute the simulation script.
#@markdown 
#@markdown Run a loop of blocks (typically 1 ns) until the **production simulation** is complete -- each loop iteration saves a partial output.
!bash "run.bash" "$project_folder" "$simulation_duration_ps" "$rmsd_group" "$rmsd_early_stop_threshold_nm" "$output_prefix"
!sleep 10

In [None]:
#@markdown Finally, disconnect the runtime. (This option is ignored if the project folder is not in your Google Drive.)
disconnect = True #@param {type: "boolean"}

if disconnect and project_folder.startswith("/content/drive/MyDrive/"):
  from google.colab import drive, runtime
  drive.flush_and_unmount()
  runtime.unassign()