## Script to test analyses for Qatar _KLebsiella pneumoniae_ project

## 20221110 - Define why bactopia datasets works only sometimes

Do not specify a __clear environment__ for singularity 
- Does not work, even if you specify '-B /localscratch:/temp'  and the location of cache for singularity


In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=20G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel
#SBATCH --cpus-per-task=2 # CPU cores per task
#SBATCH --job-name="Nov4_kleb"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=temp_results/%j_nov4.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch:/temp"

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## BACTOPIA  #########################################

singularity exec $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --include_genus \
    --limit 100 \
    --cpu 4 \
    --verbose

Specify a clear environment with `exec -e`, but do not specify a location for the cache of singularity
- __*No error*__

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=4 # tasks in parallel
#SBATCH --job-name="test_kleb_bactopia_nov10"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/nov10_kleb_bactopia_e_temp_nocache.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --include_genus \
    --limit 100 \
    --cpu 4 \
    --verbose

#### Other options

Specify a clear environment with `exec -e` and the location of cache for singularity but do not mount scratch as temp directory
- No error

Run it with only a clear environment `exec -e` without specifying where to save the cache for singulatiry or mounting localscratch as temp
- No error

In [None]:
singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --include_genus \
    --limit 100 \
    --cpu 4 \
    --verbose

###########################################################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --include_genus \
    --cpu 4 \
    --verbose

#### Conclusion
The issue seems to be a variable from the environment that was interferring with the singularity run. 

## 20221117 to 20221120 - Running main bactopia pipeline

Running with `-profile singularity` does not work properly. For test runs, I will use only 4 cpus and 30 min
- Using `-profile slurm` does not work either
- Using or not a clean environment `exec -e` does not work
- Last try corrected loading of nextflow, also does not clean the directory

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=8G #  GB of memory per cpu core
#SBATCH --time=00:45:00
#SBATCH --ntasks=4 # tasks in parallel / number of cores
#SBATCH --job-name="main_workflow_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_main_bactopia.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8
module load nextflow/22.04.3

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

# git directory with input variables
kleb_git="/home/mdprieto/git/klebsiella_Qatar_2022/input"

# make output directory if necessary\
mkdir -p /scratch/mdprieto/temp_results/bactopia_output/

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia \
    --samples $kleb_git/kleb_qatar_fofn.txt \
    --datasets /scratch/mdprieto/datasets \
    --outdir /scratch/mdprieto/temp_results/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --singularity_cache $SINGULARITY_CACHEDIR \
    --max_cpus 4 \
    --verbose \
    -profile singularity

Trying to run after independently downloading one of the bactopia tools singularity container


In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=8G #  GB of memory per cpu core
#SBATCH --time=01:00:00
#SBATCH --ntasks=16 # tasks in parallel / number of cores
#SBATCH --job-name="main_workflow_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_main_bactopia.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8
module load nextflow/22.04.3

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

# git directory with input variables
kleb_git="/home/mdprieto/git/klebsiella_Qatar_2022/input"

# make output directory if necessary\
mkdir -p /scratch/mdprieto/temp_results/bactopia_output/

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## BACTOPIA  #########################################

singularity exec $BIND_MOUNT bactopia_2.1.1.sif bactopia \
    --samples $kleb_git/kleb_qatar_fofn.txt \
    --datasets /scratch/mdprieto/datasets \
    --outdir /scratch/mdprieto/temp_results/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --cleanup_workdir \
    --singularity_cache $SINGULARITY_CACHEDIR \
    --max_cpus 4 \
    --verbose \
    -profile singularity

In [None]:
Modify the number of tasks and cpus per task to use

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=8G #  GB of memory per cpu core
#SBATCH --time=01:00:00
#SBATCH --ntasks=4 # tasks in parallel / number of cores
#SBATCH --cpus-per-task=4 # number of tasks per
#SBATCH --job-name="main_workflow_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_main_bactopia.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8
module load nextflow/22.04.3

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

# git directory with input variables
kleb_git="/home/mdprieto/git/klebsiella_Qatar_2022/input"

# make output directory if necessary\
mkdir -p /scratch/mdprieto/temp_results/bactopia_output/

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## BACTOPIA  #########################################

singularity exec $BIND_MOUNT bactopia_2.1.1.sif bactopia \
    --samples $kleb_git/kleb_qatar_fofn.txt \
    --datasets /scratch/mdprieto/datasets \
    --outdir /scratch/mdprieto/temp_results/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --cleanup_workdir \
    --singularity_cache $SINGULARITY_CACHEDIR \
    --max_cpus 4 \
    --verbose \
    -profile singularity

In [None]:
- Not using -profile singularity, which may be just for running through conda
    - Does not work
- Using -profile singularity alone
    - Not working
- Using -profile singularity,slurm
    - Same error
_ Without specifying a cache folder for singularity
    - Not working

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel / number of cores
#SBATCH --cpus-per-task=4 # number of tasks per
#SBATCH --job-name="download_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_download_bactopia_singprof_envslurm.out

################################## preparation #########################################

# load singularity
module load singularity/3.8
module load nextflow/22.04.3

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

# define new temp folders for singularity cache in CIDGOH shared folder
export SINGULARITY_CACHEDIR="/project/6007413/cidgoh_share/singularity_imgs"

# singularity temp will be in my scratch folder
mkdir -p /scratch/$USER/singularity/tmp
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

# export PATH to run singularity to container
export SINGULARITYENV_APPEND_PATH="/opt/software/bin/"

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia \
    --samples $kleb_git/kleb_qatar_fofn.txt \
    --datasets /scratch/mdprieto/datasets/ \
    --outdir /scratch/mdprieto/temp_results/bactopia_output \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --max_cpus 2 \
    --verbose \
    -profile singularity

#### Bactopia download

Try bactopia download to setup all modules required for analysis

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=20G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel / number of cores
#SBATCH --cpus-per-task=2 # number of tasks per
#SBATCH --job-name="download_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_download_bactopia.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8
module load nextflow/22.04.3

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## download #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia download \
    --verbose \
    --use_defaults \
    --envtype "singularity" \
    --singularity_cache $SINGULARITY_CACHEDIR \
    --singularity_pull_docker_container

#### Bactopia build

Try to run bactopia build to have an environment for analysis

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=8G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=4 # tasks in parallel / number of cores
#SBATCH --cpus-per-task=1 # number of tasks per
#SBATCH --job-name="main_workflow_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_build_bactopia_E.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8
module load nextflow/22.04.3

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia build --default

## 20221122 - Install singularity images on their own

In [None]:
#!/bin/bash
#SBATCH --account=def-whsiao-ab
#SBATCH --mem-per-cpu=8G #  GB of memory per cpu core
#SBATCH --time=03:00:00
#SBATCH --ntasks=1 # tasks in parallel
#SBATCH --cpus-per-task=1 # CPU cores per task
#SBATCH --job-name="assembly_qc_checkm"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=utility_singbuild.out

###################################     preparation ##############################
module load singularity

# define new temp folders for singularity cache in CIDGOH shared folder
export SINGULARITY_CACHEDIR="/project/6007413/cidgoh_share/singularity_imgs"

# singularity temp will be in my scratch folder
mkdir -p /scratch/$USER/singularity/tmp
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

singularity build  /project/6007413/cidgoh_share/singularity_imgs/quay.io-biocontainers-multiqc-1.11--pyhdfd78af_0.img docker://quay.io/biocontainers/multiqc:1.11--pyhdfd78af_0

### 20221122 - Bactopia download troubleshooting

Modifications to download pipeline tried. Must specify environment "singularity".

- Specify `-profile singularity` -> **error command not found**
- Singularity profile and not `exec -e` -> **error executing program**
- Using profile slurm -> **error**
- Using standard environment -> **error not enough permission to set conda environment**
- Append the PATH to singularity and mount location of command -> **error command not found**
- Append all CC PATH to singularity and mount /opt and /cvmfs -> **new error: error due to certificate in quay.io registry**
- All above and profile slurm -> **error once again**

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=3G #  GB of memory per node
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel 
#SBATCH --cpus-per-task=4 # cores per task
#SBATCH --job-name="download_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_download_bactopia_bind_nov24.out

################################## preparation #########################################

# load singularity and nextflow
module load singularity
module load nextflow

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home,/project,/scratch,/localscratch,/localscratch:/temp,/opt,/cvmfs"

# define new temp folders for singularity cache in CIDGOH shared folder
export SINGULARITY_CACHEDIR="/project/6007413/cidgoh_share/singularity_imgs"

# singularity temp will be in my scratch folder
mkdir -p /scratch/$USER/singularity/tmp
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

# export PATH to run singularity to container
export SINGULARITYENV_APPEND_PATH=$PATH

################################## download #########################################

# runs only with clean environment `-e` option

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia download \
    --verbose \
    --use_defaults \
    -profile slurm \
    --singularity_cache $SINGULARITY_CACHEDIR \
    --envtype "singularity"

## 20221126 - Main bactopia pipeline troubleshoot

Baseline 

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=01:00:00
#SBATCH --ntasks=4 # tasks in parallel / number of cores
#SBATCH --cpus-per-task=1 # number of cores per task
#SBATCH --job-name="main_workflow_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_main_bactopia_nov26.out

################################## preparation #########################################

# load singularity
module load singularity
module load nextflow

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home,/project,/scratch,/localscratch,/localscratch:/temp,/opt,/cvmfs"

# git directory with input variables
kleb_git="/home/mdprieto/git/klebsiella_Qatar_2022/input"

# make output directory if necessary\
mkdir -p /scratch/mdprieto/temp_results/bactopia_output/

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/tmp
export SINGULARITY_CACHEDIR="/project/6007413/cidgoh_share/singularity_imgs"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

# export PATH to run singularity to container
export SINGULARITYENV_APPEND_PATH=$PATH

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia \
    --samples $kleb_git/kleb_qatar_fofn.txt \
    --datasets /scratch/mdprieto/datasets \
    --outdir /scratch/mdprieto/temp_results/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --singularity_cache $SINGULARITY_CACHEDIR \
    --max_cpus 2 \
    --verbose \
    -profile slurm,singularity \
    -resume

### Pettit suggestions

First try

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=01:00:00
#SBATCH --ntasks=4 # tasks in parallel / number of cores
#SBATCH --cpus-per-task=1 # number of cores per task
#SBATCH --job-name="main_workflow_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_main_bactopia_petit_nov30.out

################################## preparation #########################################

# load singularity
module load singularity
module load nextflow

# git directory with input variables
kleb_git="/home/mdprieto/git/klebsiella_Qatar_2022/input"

# make output directory if necessary\
mkdir -p /scratch/mdprieto/temp_results/bactopia_output/

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/tmp
export SINGULARITY_CACHEDIR="/project/6007413/cidgoh_share/singularity_imgs"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

# export PATH to run singularity to container
export SINGULARITYENV_APPEND_PATH=$PATH

################################## preparation #########################################

nextflow run bactopia/bactopia \
    --samples $kleb_git/kleb_qatar_fofn.txt \
    --datasets /scratch/mdprieto/datasets \
    --outdir /scratch/mdprieto/temp_results/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --singularity_cache $SINGULARITY_CACHEDIR \
    --max_cpus 2 \
    --verbose \
    -profile slurm,singularity \
    -resume

Alternatives 
1. Specify `-with-singularity` and bactopia container --> error downloading updated singularity containers (for bactopia 2.2.0)

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=01:00:00
#SBATCH --ntasks=4 # tasks in parallel / number of cores
#SBATCH --cpus-per-task=1 # number of cores per task
#SBATCH --job-name="main_workflow_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_main_bactopia_petit_nov30.out

################################## preparation #########################################

# load singularity
module load singularity
module load nextflow

# git directory with input variables
kleb_git="/home/mdprieto/git/klebsiella_Qatar_2022/input"

# make output directory if necessary\
mkdir -p /scratch/mdprieto/temp_results/bactopia_output/

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/tmp
export SINGULARITY_CACHEDIR="/project/6007413/cidgoh_share/singularity_imgs"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

# export PATH to run singularity to container
export SINGULARITYENV_APPEND_PATH=$PATH

################################## preparation #########################################

nextflow run bactopia/bactopia -r v2.1.1 -with-singularity bactopia_2.1.1.sif \
    --samples $kleb_git/kleb_qatar_fofn.txt \
    --datasets /scratch/mdprieto/datasets \
    --outdir /scratch/mdprieto/temp_results/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --singularity_cache $SINGULARITY_CACHEDIR \
    --max_cpus 2 \
    --verbose \
    -profile slurm,singularity \
    -resume


In [None]:
salloc --time=2:0:0 --cpus-per-task=8 --mem 16G --account=rrg-whsiao-ab

In [None]:
# env variables
kleb_git="/home/mdprieto/git/klebsiella_Qatar_2022/input"
mkdir -p /scratch/mdprieto/temp_results/bactopia_output/
mkdir -p /scratch/$USER/singularity/tmp
export SINGULARITY_CACHEDIR="/project/6007413/cidgoh_share/singularity_imgs"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"
export SINGULARITYENV_APPEND_PATH=$PATH
BIND_MOUNT="-B /home,/project,/scratch,/localscratch,/localscratch:/temp,/opt,/cvmfs"

singularity shell $BIND_MOUNT bactopia_2.1.1.sif

singularity exec $BIND_MOUNT bactopia_2.1.1.sif bactopia download \
    --verbose \
    --use_defaults \
    -profile singularity \
    --singularity_cache $SINGULARITY_CACHEDIR \
    --envtype "singularity"

nextflow run bactopia/bactopia -r v2.1.1 -with-singularity bactopia_2.1.1.sif \
    --samples /home/mdprieto/git/klebsiella_Qatar_2022/input/kleb_qatar_fofn.txt \
    --datasets /scratch/mdprieto/datasets \
    --outdir /scratch/mdprieto/temp_results/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --singularity_cache /project/6007413/cidgoh_share/singularity_imgs \
    --max_cpus 4 \
    -profile slurm,singularity \
    --verbose

nextflow run bactopia/bactopia/download -r v2.1.1 -with-singularity bactopia_2.1.1.sif
    --verbose \
    --use_defaults \
    -profile singularity \
    --singularity_cache $SINGULARITY_CACHEDIR \
    --envtype "singularity"

In [None]:
export NXF_SINGULARITY_LIBRARYDIR="/project/6007413/cidgoh_share/singularity_imgs"

nextflow run bactopia/bactopia -r v2.1.1 -with-singularity bactopia_2.1.1.sif \
    --samples /home/mdprieto/git/klebsiella_Qatar_2022/input/kleb_qatar_fofn.txt \
    --datasets /scratch/mdprieto/datasets \
    --outdir /scratch/mdprieto/temp_results/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --singularity_cache /project/6007413/cidgoh_share/singularity_imgs \
    --max_cpus 4 \
    -profile singularity

## January 2022 - Corroborating what worked

First I try to run it normally from the container and then using nextflow at the beginning and the reference repo

In [None]:
salloc --time=2:0:0 --cpus-per-task=8 --mem 16G --account=rrg-whsiao-ab

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=12G #  GB of memory per cpu core
#SBATCH --time=02:00:00
#SBATCH --ntasks=1 # tasks in parallel / number of cores
#SBATCH --cpus-per-task=8 # number of cores per task
#SBATCH --job-name="main_workflow_bactopia_jun_sing"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_main_bactopia_jan10.out

################################## preparation #########################################

# load singularity
module load singularity/3.8
module load nextflow/22.04.03

# git directory with input variables
kleb_git="/home/mdprieto/git/klebsiella_Qatar_2022/input"

# make output directory if necessary\
mkdir -p /scratch/mdprieto/temp_results/bactopia_output/

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/tmp
export SINGULARITY_CACHEDIR="/project/6007413/cidgoh_share/singularity_imgs"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"
export NXF_SINGULARITY_LIBRARYDIR="/project/6007413/cidgoh_share/singularity_imgs"

# export PATH to run singularity to container
export SINGULARITYENV_APPEND_PATH=$PATH

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home,/project,/scratch,/localscratch,/localscratch:/temp,/opt,/cvmfs"

################################## nextflow run #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia \
    --samples $kleb_git/kleb_qatar_fofn.txt \
    --datasets /scratch/mdprieto/datasets \
    --outdir /scratch/mdprieto/temp_results/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --singularity_cache $SINGULARITY_CACHEDIR \
    --verbose \
    -profile singularity \
    -resume

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=12G #  GB of memory per cpu core
#SBATCH --time=02:00:00
#SBATCH --ntasks=1 # tasks in parallel / number of cores
#SBATCH --cpus-per-task=8 # number of cores per task
#SBATCH --job-name="main_workflow_bactopia_jun_sing"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_main_bactopia_jan10.out

################################## preparation #########################################

# load singularity
module load singularity
module load nextflow

# git directory with input variables
kleb_git="/home/mdprieto/git/klebsiella_Qatar_2022/input"

# make output directory if necessary\
mkdir -p /scratch/mdprieto/temp_results/bactopia_output/

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/tmp
export SINGULARITY_CACHEDIR="/project/6007413/cidgoh_share/singularity_imgs"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"
export NXF_SINGULARITY_LIBRARYDIR="/project/6007413/cidgoh_share/singularity_imgs"

# export PATH to run singularity to container
export SINGULARITYENV_APPEND_PATH=$PATH

################################## nextflow run #########################################

nextflow run bactopia/bactopia -r v2.1.1 -with-singularity bactopia_2.1.1.sif \
    --samples /home/mdprieto/git/klebsiella_Qatar_2022/input/kleb_qatar_fofn.txt \
    --datasets /scratch/mdprieto/datasets \
    --outdir /scratch/mdprieto/temp_results/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --singularity_cache /project/6007413/cidgoh_share/singularity_imgs \
    --max_cpus 4 \
    -profile singularity

In [None]:
genomes="/project/60005/mdprieto/qatar_klebsiella_isolates"

# Eagle cluster testing

In the eagle cluster, jobs do not need an account. I will test my pipeline with a few samples before running in cedar. 

## 20230209 - Datasets download for bactopia

In [None]:
#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --cpus-per-task=8                                       # number of cores per task
#SBATCH --job-name="datasets_kleb_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./%j_feb10.out                                 # write output to temp files

#############################################################################################
module load singularity

singularity exec -e -B /scratch bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --limit 10 \
    --cpu 8  \
    --outdir /scratch/mdprieto/bactopia_datasets

# create file of filenames in eagle
cd /scratch/mdprieto/
singularity exec -B /home,/project,/scratch bactopia_2.1.1.sif bactopia prepare \
    --fastq_ext "_001.fastq.gz" \
    /project/60005/mdprieto/qatar_klebsiella_2022/all_isolates \
    > /project/60005/mdprieto/qatar_klebsiella_2022/all_isolates/kleb_qatar_fofn.txt

# create pilot file of filenames
head -n 15 /project/60005/mdprieto/qatar_klebsiella_2022/all_isolates/kleb_qatar_fofn.txt > \
    /project/60005/mdprieto/qatar_klebsiella_2022/all_isolates/trial_fofn.txt

## 20230210 - Main pipeline bactopia

In [None]:
# validate file of filenames
DATA_KLEB_EAGLE="/project/60005/mdprieto/qatar_klebsiella_2022/all_isolates/"
export NXF_SINGULARITY_LIBRARYDIR="/project/60005/cidgoh_share/singularity_imgs"

singularity exec -e -B /scratch bactopia_2.1.1.sif bactopia \
    --samples $DATA_KLEB_EAGLE/trial_fofn.txt \
    --check_samples \
    -profile singularity

nextflow run bactopia/bactopia -r v2.1.1 -with-singularity bactopia_2.1.1.sif \
    --samples $DATA_KLEB_EAGLE/trial_fofn.txt \
    --datasets /scratch/mdprieto/bactopia_datasets \
    --outdir /scratch/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --singularity_cache $NXF_SINGULARITY_LIBRARYDIR \
    --max_cpus 4 \
    -profile singularity

singularity exec -B /scratch,/project bactopia_2.1.1.sif bactopia \
    --samples $DATA_KLEB_EAGLE/trial_fofn.txt \
    --datasets /scratch/mdprieto/bactopia_datasets \
    --outdir /scratch/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --singularity_cache $NXF_SINGULARITY_LIBRARYDIR \
    --max_cpus 4 \
    -profile singularity

In [None]:
# for eagle

#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --cpus-per-task=8                                       # number of cores per task
#SBATCH --job-name="datasets_kleb_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./%j_feb10.out                                 # write output to temp files

#############################################################################################
module load singularity

singularity exec -e -B /scratch bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --limit 10 \
    --cpu 8  \
    --outdir /scratch/mdprieto/bactopia_datasets

In [None]:
#!/bin/bash
#SBATCH --time=00:45:00
#SBATCH --mem-per-cpu=8G
#SBATCH --cpus-per-task=8                                       # number of cores per task
#SBATCH --job-name="datasets_kleb_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./%j_feb10.out                                 # write output to temp files

#############################################################################################

