Skip to content

CELPP on TSCC Cluster

Chris Churas edited this page Oct 23, 2018 · 23 revisions

This page provides steps and tasks to run CELPP on Triton Shared Computing Cluster (TSCC) cluster

Step 1 Creation of Singularity D3R Image

The easiest way to get CELPP running on TSCC cluster is to create a Singularity image. Instructions for this can be found here

One above is done, upload this image file to TSCC cluster and put in $HOME/bin directory. Then run the following (assuming the image file is named d3r.img):

chmod a+x $HOME/bin/d3r.img

Step 2 Create a celpp directory on Oasis filesystem

Log into TSCC and create directories on TSCC Oasis filesystem by running these commands:

mkdir -p $HOME/bin
mkdir -p /oasis/tscc/scratch/$USER/data
mkdir -p /oasis/tscc/scratch/$USER/pdb
mkdir -p /oasis/tscc/scratch/$USER/archive

These directories will store the copy of the pdb as well as the outputs of the runs of CELPP.

Step 3 Create pdb download script

The following script downloads the pdb using rsync and tars up the data into a single file for each week since the oasis filesystem has problems with lots of little files.

Write the following code to a file named $HOME/bin/pdb-data-updater.sh on TSCC cluster replacing <> with valid values like in case of <PUT YOUR EMAIL ADDRESS HERE>:

#!/bin/bash
CONTACTS="<PUT YOUR EMAIL ADDRESS HERE>"

if [ $# -ne 2 ] ; then
  echo "$0 <tmpdir> <base download directory>"
  exit 1
fi

base_dir=$1
pdb_dir_name="pdb.`date +%s`"
pdb_dir_name_tar="${pdb_dir_name}.tar"
pdb_dir="$base_dir/${pdb_dir_name}"

dest_dir="$2/pdb"

#
# TODO: If $dest_dir/latest_pdb exists and points to a
#       pdb entry, need to copy that tar file to
#       $base_dir, uncompress it and move its contents
#       into $pdb_dir folder so that the rsync will
#       run faster
#

rsync -rlpt -v -z --delete --port=33444 \
rsync.rcsb.org::ftp_data/structures/divided/pdb/ $pdb_dir
if [ $? != 0 ]; then
        echo -e "RSYNC failed rsync.rcsb.org::ftp_data/structures/divided/pdb/\n\nSincerely,\n$0"
        exit 1
fi

echo "Tarring $pdb_dir"
cd $base_dir
tar -c $pdb_dir_name > $pdb_dir_name_tar

if [ $? != 0 ] ; then
   echo "Error running tar -c $pdb_dir_name > $pdb_dir_name_tar"
   exit 2
fi

echo "Copying $pdb_dir_name_tar to $dest_dir"
cp $pdb_dir_name_tar $dest_dir/.

if [ $? != 0 ] ; then
  echo "Error running cp $pdb_dir_name_tar $dest_dir/."
  exit 3
fi


echo "Updating latest_pdb symbolic link"
rm $dest_dir/latest_pdb

if [ -e "$dest_dir/latest_pdb" ] ; then
  if [ $? != 0 ] ; then
    echo "Error running rm $dest_dir/latest_pdb"
    exit 4
  fi
fi

ln -s $dest_dir/$pdb_dir_name_tar $dest_dir/latest_pdb

if [ $? != 0 ] ; then
  echo "Error running ln -s $dest_dir/$pdb_dir_name_tar $dest_dir/latest_pdb"
  exit 5
fi

exit 0

Make the above script executable by running:

chmod a+x $HOME/bin/pdb-data-updater.sh

Step 4 Create pdb job script to run on TSCC

Write the following to $HOME/bin/pdb-data-updater.qsub replacing values within <> such as <ACCOUNT> etc.

#!/bin/bash

#PBS -q condo
#PBS -N pdbdownload
#PBS -l nodes=1:ppn=1
#PBS -l walltime=4:00:00
#PBS -j oe
#PBS -o /oasis/tscc/scratch/<USER>/celpp/joblogs/pdbdownload.$PBS_JOBID.out
#PBS -w /oasis/tscc/scratch/<USER>/celpp
#PBS -V
#PBS -M <UCSD EMAIL ADDRESS>
#PBS -m abe
#PBS -A <ACCOUNT>

/usr/bin/time -v $HOME/bin/pdb-data-updater.sh /state/partition1/$USER/$PBS_JOBID /oasis/tscc/scratch/$USER/celpp

Make the above script executable by running:

chmod a+x $HOME/bin/pdb-data-updater.qsub

Step 5 Add pdb job script to cron

This will run the pdb download at 8pm every Tuesday. Add the following to cron on the TSCC login node, its easiest to put all the crons on the same login node:

0 20 * * 2 /opt/torque/bin/qsub /home/$USER/bin/pdb-data-updater.qsub

Step 6 Setup Openeye license file

For Openeye, the license file should be placed in $HOME directory and only be visible to the user.

Name this file oe_license.txt and run the following command to restrict visibility:

chmod go-rwx $HOME/oe_license.txt

Also set the environment variable OE_LICENSE to $HOME/oe_license.txt. This can be done automatically by adding this to $HOME/.bash_profile file:

export OE_LICENSE=$HOME/oe_license.txt

Step 7 Create CELPP configuration files

Create the following configuration files in $HOME directory:

box.config -- Click here for what to put in this file

rest.config -- Click here for what to put in this file

smtp.config -- Click here for what to put in this file

Be sure to restrict visibility of these files since they contain passwords with the following command:

chmod go-rwx $HOME/box.config $HOME/rest.config $HOME/smtp.config

Step 8 Create Challenge generation job script

Add the following to $HOME/bin/genchallenge.qsub file replacing any <> text with valid values:

#!/bin/bash

#PBS -q home
#PBS -N genchallenge
#PBS -l nodes=1:ppn=3
#PBS -l walltime=96:00:00
#PBS -j oe
#PBS -o /oasis/tscc/scratch/<USER>/celpp/joblogs/genchall.$PBS_JOBID.out
#PBS -w /oasis/tscc/scratch/<USER>/celpp/testy
#PBS -V
#PBS -M <UCSD EMAIL ADDRESS>
#PBS -m abe
#PBS -A <ACCOUNT>

module load python/1
module load singularity

export SCHRODINGER=/opt/schrodinger
export SCHROD_LICENSE_FILE=<ADD SCHRODINGER LICENSE SERVER>

echo "Current time is `date` ... createchallenge job"

echo "Copying pdb from /oasis/tscc/scratch/$USER/celpp/pdb/latest_pdb to $TMPDIR"
cd $TMPDIR

pdb_dir=""
for Y in `seq 1 10` ; do
  /usr/bin/time -p tar -xf /oasis/tscc/scratch/$USER/celpp/pdb/latest_pdb 
  ecode=$?
  if [ $ecode == 0 ] ; then
    pdb_dir=`find $TMPDIR -maxdepth 1 -name "pdb*" -type d`
    break
  fi
  echo "`date` : Untar of pdb failed. Sleeping 60 seconds and trying again"
  sleep 60
done

if [ "$pdb_dir" == "" ] ; then
  echo "Error unable to uncompress /oasis/tscc/scratch/$USER/celpp/pdb/latest_pdb to $TMPDIR"
  exit 1
fi
echo "PDB dir: $pdb_dir"
cd $pdb_dir
if [ $? != 0 ] ; then
  echo "Error unable to cd to $pdb_dir"
  exit 2
fi
echo "Uncompressing pdb files"

/usr/bin/time -p find . -name *.gz -exec gunzip {} \;

export MGL_ROOT=/usr/local/mgltools/
export PATH=$PATH:/opt/UCSF/Chimera64-1.10.2/bin

singularity run --bind /oasis --bind /proc --bind /state $HOME/d3r.img --importsleep 1200 --importretry 144 --blastnfiltertimeout 72000 --stage createchallenge --pdbdb $pdb_dir --compinchi http://ligand-expo.rcsb.org/dictionaries --ftpconfig $HOME/box.config --rdkitpython /opt/miniconda2 --log DEBUG --createweekdir --email <UCSD EMAIL ADDRESS> --smtpconfig $HOME/smtp.config /oasis/tscc/scratch/$USER/celpp/data

ecode=$?

echo "Done time is `date` and exit code is: $ecode"

exit $ecode

Make $HOME/bin/genchallenge.qsub executable by running:

chmod a+x $HOME/bin/genchallenge.qsub

Step 9 Add Challenge generation job to cron

Add this to cron which will run genchallenge.qsub script we created in Step 8 at 9:01pm on Friday nights. Add the following to cron on the TSCC login node, its easiest to put all the crons on the same login node:

1 21 * * 5 . $HOME/.bash_profile;/opt/torque/bin/qsub $HOME/bin/genchallenge.qsub 

Step 10 Create Evaluation job script

Add the following to $HOME/bin/evaluation.qsub file replacing any <> text with valid values:

#!/bin/bash

#PBS -q home
#PBS -N evaluation
#PBS -l nodes=1:ppn=8
#PBS -l walltime=96:00:00
#PBS -j oe
#PBS -o /oasis/tscc/scratch/<USER>/celpp/joblogs/evaluation.$PBS_JOBID.out
#PBS -w /oasis/tscc/scratch/<USER>/celpp/testy
#PBS -V
#PBS -M <UCSD EMAIL ADDRESS>
#PBS -m abe
#PBS -A <ACCOUNT>

module load python/1
module load singularity

export SCHRODINGER=/opt/schrodinger
export SCHROD_LICENSE_FILE=<ADD SCHRODINGER LICENSE SERVER>

echo "Current time is `date` ... evaluation job"

c_epoch=`date +%s`
link_age=`stat -c %Z /oasis/tscc/scratch/$USER/celpp/pdb/latest_pdb`
age_of_symlink=`echo "$c_epoch - $link_age" | bc -l`

while [ $age_of_symlink -gt 259200 ] ; do
  echo "Sleeping 600 seconds"
  sleep 600

  c_epoch=`date +%s`
  link_age=`stat -c %Z /oasis/tscc/scratch/$USER/celpp/pdb/latest_pdb`
  age_of_symlink=`echo "$c_epoch - $link_age" | bc -l`
done


echo "Copying pdb from /oasis/tscc/scratch/$USER/celpp/pdb/latest_pdb to $TMPDIR"
cd $TMPDIR

pdb_dir=""
for Y in `seq 1 10` ; do
  /usr/bin/time -p tar -xf /oasis/tscc/scratch/$USER/celpp/pdb/latest_pdb 
  ecode=$?
  if [ $ecode == 0 ] ; then
    pdb_dir=`find $TMPDIR -maxdepth 1 -name "pdb*" -type d`
    break
  fi
  echo "`date` : Untar of pdb failed. Sleeping 60 seconds and trying again"
  sleep 60
done

if [ "$pdb_dir" == "" ] ; then
  echo "Error unable to uncompress /oasis/tscc/scratch/$USER/celpp/pdb/latest_pdb to $TMPDIR"
  exit 1
fi
echo "PDB dir: $pdb_dir"
cd $pdb_dir
if [ $? != 0 ] ; then
  echo "Error unable to cd to $pdb_dir"
  exit 2
fi
echo "Uncompressing pdb files"

/usr/bin/time -p find . -name *.gz -exec gunzip {} \;

export MGL_ROOT=/usr/local/mgltools/
export PATH=$PATH:/opt/UCSF/Chimera64-1.10.2/bin

singularity run --bind /oasis --bind /proc --bind /state $HOME/d3r.img --stage evaluation,postevaluation --pdbdb $pdb_dir --evaluation evaluate.py --ftpconfig $HOME/box.config --rdkitpython /opt/miniconda2 --log DEBUG --createweekdir --email <UCSD EMAIL ADDRESS> --smtpconfig $HOME/smtp.config --websiteserviceconfig $HOME/rest.config /oasis/tscc/scratch/$USER/celpp/data
ecode=$?

echo "Done time is `date` and exit code is: $ecode"

exit $ecode

Make $HOME/bin/evaluation.qsub executable by running:

chmod a+x $HOME/bin/evaluation.qsub

Step 11 Add Evaluation generation job to cron

Add this to cron which will run evaluation.qsub script we created in previous step at 1am on Wednesday morning. Add the following to cron on the TSCC login node, its easiest to put all the crons on the same login node:

0 1 * * 3 /opt/torque/bin/qsub $HOME/bin/evaluation.qsub

Step 12 Create external submission download script

Write the following to $HOME/bin/extsubdownload_datamover.sh:

#!/bin/bash -l

module load python/1
module load singularity

export SCHRODINGER=/opt/schrodinger
export SCHROD_LICENSE_FILE=<ADD SCHRODINGER LICENSE SERVER>


celppdir="/oasis/tscc/scratch/$USER/celpp/data"
cd $celppdir


singularity run --bind /oasis --bind /proc --bind /state $HOME/d3r.img --stage extsubmission  --log DEBUG --email <UCSD EMAIL ADDRESS> --ftpconfig $HOME/box.config --smtpconfig $HOME/smtp.config $celppdir
ecode=$?

echo "Done time is `date` and exit code is: $ecode"

exit $ecode

Make the above script executable by running the following command:

chmod a+x $HOME/bin/extsubdownload_datamover.sh

Step 13 Add external submission download script to datamover cron

Log into the TSCC data mover server. This can be done by logging into a login node then running ssh tscc-dm1 from that machine. Add the following to cron which will run the external submission download at 3:10pm every Tuesday:

10 15 * * 2 $HOME/bin/extsubdownload_datamover.sh >> $HOME/bin/extsubdownload_datamover.sh.log 2>&1

NOTE: If this is production change the time to 3:00 by changing the 10 above to 0

Clone this wiki locally