# Analysis of _Klebsiella pneumoniae_ isolates from Qatar

We will employ the tool `Bactopia` for most analysis in the pipeline.

Steps to prepare the tool:
1. Compute canada does not allow use of Anaconda to avoid conflicts between tools. Fortunately, we can install a container with the tool. `Singularity` can be load directly into Compute Canada (CC).
2. Pull the singularity container of Bactopia to our scratch directory. Use a compute node to build it


In [1]:
module load singularity/3.8

cd /scratch/mdprieto/

singularity pull bactopia_2.1.1.sif https://depot.galaxyproject.org/singularity/bactopia%3A2.1.1--hdfd78af_0

SyntaxError: invalid decimal literal (546294115.py, line 5)

### Clean target files

Bactopia requires a text file with metadata of all `fastq` input files (`PATHS`, sequencing type and filenames). 

A short script is included to produce the necessary format. 

In [None]:
bactopia_folder="/scratch/mdprieto/bactopia/bin/bactopia/"
kleb_project="/home/mdprieto/git/klebsiella_Qatar_2022/"

# establish dependencies to run b
module load python/3.8.10 nextflow

In [None]:
# Location of raw data in CC
~/project_mdprieto/qatar_klebsiella/


cd /project/6056895/mdprieto/qatar_klebsiella/all_isolates
ls *fastq
gzip *fastq
date

# create file of filenames (FOFN) for input to bactopia, save it in project folder
$bactopia_folder/bactopia-prepare.py \
    --fastq_ext '_001.fastq.gz' \
    /project/6056895/mdprieto/qatar_klebsiella/all_isolates \
    > $kleb_project/input/kpneu_qatar_fofn.txt

### Create virtual environment to run bactopia

Bactopia has complex requirements for all pipelines. Thus, it's better to create a virtual environment that satisfies all dependencies, as Conda environments are not allowed in Compute Canada. 

In [3]:
import os

# create a virtual environment to work with bactopia
module load python/3.8.2
cd /home/mdprieto
virtualenv --no-download bactopia_miguel

# activate and set environment packages
source /home/mdprieto/bactopia_miguel/bin/activate
pip install --no-index --upgrade pip
pip3 install requests Bio executor

SyntaxError: invalid syntax (3001510046.py, line 4)

### Script to run bactopia datasets with nextflow

In [None]:
#!/bin/bash
#SBATCH --account=def-whsiao-ab
#SBATCH --mem-per-cpu=10G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel
#SBATCH --cpus-per-task=1 # CPU cores per task
#SBATCH --job-name="test_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/git_bactopia_test.out
#SBATCH --mail-user=mprietog@sfu.ca
#SBATCH --mail-type=END

################################## preparation #########################################

# load singularity and python
module purge
module load python/3.8.2 nextflow/22.04.3

# PATH to bactopia
bactopia_app='/scratch/mdprieto/bactopia/bin/bactopia'

################################## BACTOPIA  #########################################

# start environment with dependencies
source /home/mdprieto/bactopia_miguel/bin/activate

# run bactopia 
nextflow run $bactopia_app/bactopia-datasets.py

### Script to run full bactopia with nextflow

In [None]:
#!/bin/bash
#SBATCH --account=def-whsiao-ab
#SBATCH --mem-per-cpu=10G #  GB of memory per cpu core
#SBATCH --time=00:40:00
#SBATCH --ntasks=6 # tasks in parallel
#SBATCH --cpus-per-task=1 # CPU cores per task
#SBATCH --job-name="test_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/git_bactopia_test.out
#SBATCH --mail-user=mprietog@sfu.ca
#SBATCH --mail-type=END

################################## preparation #########################################

# load singularity and python
module load python/3.8.2 nextflow/22.04.3

# file of filenames 
kleb_fofn="/home/mdprieto/git/klebsiella_Qatar_2022/input/kpneu_qatar_fofn.txt"

# PATH to bactopia
main_nf='/scratch/mdprieto/bactopia/main.nf'

# create variables and output dir
OUTPUT_DIR="/scratch/mdprieto/temp_results/"
INPUT_DIR="/project/6056895/mdprieto/hilliam_pseudomonas/bronchiectasis_reads"

################################## BACTOPIA  #########################################

# start environment with dependencies
source /home/mdprieto/bactopia_miguel/bin/activate

# run bactopia
nextflow run $main.nf datasets

## Trying to run Bactopia from singularity container

In [None]:
## trying to install latest singularity image into scratch directory
cd /home/mdprieto/scratch

# run inside a job
singularity pull oras://ghcr.io/bactopia/bactopia:2.1.1

In [None]:
#!/bin/bash
#SBATCH --account=def-whsiao-ab
#SBATCH --mem-per-cpu=40G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel
#SBATCH --cpus-per-task=1 # CPU cores per task
#SBATCH --job-name="test_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/bactopia_test.out
#SBATCH --mail-user=mprietog@sfu.ca
#SBATCH --mail-type=END

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8 python/3.10.2 nextflow/22.04.3

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

################################## BACTOPIA  #########################################

# start environment with dependencies
source /home/mdprieto/bactopia_miguel/bin/activate
module load python/3.10.2

# run bactopia container
cd /scratch/mdprieto/
singularity exec $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets

## 20221102 This one seems to be working

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel
#SBATCH --cpus-per-task=4 # CPU cores per task
#SBATCH --job-name="test_bactopia_Nov2"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/Nov2_bactopia.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch"

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Staphylococcus aureus" \
    --include_genus \
    --limit 10 \
    --cpu 4  \
    --verbose

Additional test with Klebsiella and more genomes

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel
#SBATCH --cpus-per-task=4 # CPU cores per task
#SBATCH --job-name="test_kleb_bactopia_Nov2"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/Nov3_kleb_bactopia.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch"

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --include_genus \
    --limit 100 \
    --cpu 4 \
    --verbose

### Nov 4

- Added `-B localscracth:/temp` to solve issue while downloading amrfinder-db
    - Worked perfectly apparently

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel
#SBATCH --cpus-per-task=4 # CPU cores per task
#SBATCH --job-name="1103_kleb_amrfinder"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=temp_results/%j_nov4.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --include_genus \
    --limit 100 \
    --cpu 4 \
    --verbose

Test with S. aureus and 100 genomes
- Worked perfectly once again

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel
#SBATCH --cpus-per-task=4 # CPU cores per task
#SBATCH --job-name="Nov4_staph_test"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=temp_results/%j_nov4.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

# define new temp folders for singularity

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Staphylococcus aureus" \
    --include_genus \
    --limit 100 \
    --cpu 4 \
    --verbose

Do not load the `localscratch:temp` and see what happens

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel
#SBATCH --cpus-per-task=4 # CPU cores per task
#SBATCH --job-name="Nov4_kleb"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=temp_results/%j_nov4.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch"

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --include_genus \
    --limit 100 \
    --cpu 4 \
    --verbose

### Recommendations to setup `Bactopia` in Compute Canada (Cedar) 
__Contribution of Zohaib Anwar__

- Modified requirements for scheduler 
- Clone modified git repo (dev branch) adapted to function in Compute Canada (CC)
- Create directory to save singularity images in the bactopia dir created in CC

In [None]:
# clone bactopia git repo to scratch folder
cd /scratch/mdprieto
git clone https://github.com/anwarMZ/bactopia.git
cd bactopia/

# change to dev branch where zohaib adapted it to run in the cedar cluster
git status
git branch -r
git checkout master
git checkout origin/dev
git log

# make directory for singularity storage
mkdir -p resources/sge_cache

# to run nextflow
module load nextflow
nextflow run main.nf
# modify params.config in bactopia to include as singularity cache your scratch
/scratch/mdprieto/bactopia/resources/sge_cache

### Test for bactopia datasets

In [1]:
# load singluratiry
module load singularity/3.8
module list 

BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"
-B /home -B /project -B /scratch -B /localscratch:/temp

# run test
singularity exec $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Staphylococcus aureus" \
    --include_genus \
    --limit 100 \
    --cpus 1

SyntaxError: invalid syntax (702053496.py, line 2)

### Meeting with Zohaib - 20221024

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=12G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel
#SBATCH --cpus-per-task=4 # CPU cores per task
#SBATCH --job-name="test_bactopia_zohaib"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/sing_bactopia_zohaib.out
#SBATCH --mail-user=mprietog@sfu.ca
#SBATCH --mail-type=END

################################## preparation #########################################

# load singularity
module load singularity/3.8 nextflow/22.04.3

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

################################## BACTOPIA  #########################################

# start environment with dependencies
# source /home/mdprieto/bactopia_miguel/bin/activate

# run bactopia container
cd /scratch/mdprieto/
singularity exec $BIND_MOUNT bactopia_2.1.1.sif /bin/bash -c "bactopia datasets --species 'Vibrio parahaemolyticus' --include_genus --limit 10 --cpu 1  --verbose"