## Script to test analyses for Qatar _KLebsiella pneumoniae_ project

## 20221110 - Define why bactopia datasets works only sometimes

Do not specify a __clear environment__ for singularity 
- Does not work, even if you specify '-B /localscratch:/temp'  and the location of cache for singularity


In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel
#SBATCH --cpus-per-task=4 # CPU cores per task
#SBATCH --job-name="Nov4_kleb"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=temp_results/%j_nov4.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch:/temp"

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## BACTOPIA  #########################################

singularity exec $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --include_genus \
    --limit 100 \
    --cpu 4 \
    --verbose

Specify a clear environment with `exec -e`, but do not specify a location for the cache of singularity
- No error

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=4 # tasks in parallel
#SBATCH --job-name="test_kleb_bactopia_nov10"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/nov10_kleb_bactopia_e_temp_nocache.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --include_genus \
    --limit 100 \
    --cpu 4 \
    --verbose

Specify a clear environment with `exec -e` and the location of cache for singularity but do not mount scratch as temp directory
- No error

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel
#SBATCH --cpus-per-task=4 # CPU cores per task
#SBATCH --job-name="test_kleb_bactopia_nov10"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/nov10_kleb_bactopia_e_cache_notemp.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch "

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --include_genus \
    --limit 100 \
    --cpu 4 \
    --verbose

Run it with only a clear environment `exec -e` without specifying where to save the cache for singulatiry or mounting localscratch as temp
- No error

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=02:30:00
#SBATCH --ntasks=4 # tasks in parallel
#SBATCH --job-name="setup_datasets_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_error_log.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch "

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --include_genus \
    --cpu 4 \
    --verbose

**Conclusion:**
The issue seems to be a variable from the environment that was interferring with the singularity run. 

## 20221116 - Running bactopia

Running with `-profile singularity` does not work properly. For test runs, I will use only 4 cpus and 30 min
- Using `-profile slurm` does not work either
- Using or not a clean environment `exec -e` does not work
- Last try corrected loading of nextflow, also does not clean the directory

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=8G #  GB of memory per cpu core
#SBATCH --time=00:45:00
#SBATCH --ntasks=4 # tasks in parallel / number of cores
#SBATCH --job-name="main_workflow_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_main_bactopia.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8
module load nextflow/22.04.3

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

# git directory with input variables
kleb_git="/home/mdprieto/git/klebsiella_Qatar_2022/input"

# make output directory if necessary\
mkdir -p /scratch/mdprieto/temp_results/bactopia_output/

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia \
    --samples $kleb_git/kleb_qatar_fofn.txt \
    --datasets /scratch/mdprieto/datasets \
    --outdir /scratch/mdprieto/temp_results/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --singularity_cache $SINGULARITY_CACHEDIR \
    --max_cpus 4 \
    --verbose \
    -profile singularity

Trying to run after independently downloading one of the bactopia tools singularity container


In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=8G #  GB of memory per cpu core
#SBATCH --time=01:00:00
#SBATCH --ntasks=16 # tasks in parallel / number of cores
#SBATCH --job-name="main_workflow_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_main_bactopia.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8
module load nextflow/22.04.3

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

# git directory with input variables
kleb_git="/home/mdprieto/git/klebsiella_Qatar_2022/input"

# make output directory if necessary\
mkdir -p /scratch/mdprieto/temp_results/bactopia_output/

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## BACTOPIA  #########################################

singularity exec $BIND_MOUNT bactopia_2.1.1.sif bactopia \
    --samples $kleb_git/kleb_qatar_fofn.txt \
    --datasets /scratch/mdprieto/datasets \
    --outdir /scratch/mdprieto/temp_results/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --cleanup_workdir \
    --singularity_cache $SINGULARITY_CACHEDIR \
    --max_cpus 4 \
    --verbose \
    -profile singularity

In [None]:
Modify the number of tasks and cpus per task to use

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=8G #  GB of memory per cpu core
#SBATCH --time=01:00:00
#SBATCH --ntasks=4 # tasks in parallel / number of cores
#SBATCH --cpus-per-task=4 # number of tasks per
#SBATCH --job-name="main_workflow_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_main_bactopia.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8
module load nextflow/22.04.3

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

# git directory with input variables
kleb_git="/home/mdprieto/git/klebsiella_Qatar_2022/input"

# make output directory if necessary\
mkdir -p /scratch/mdprieto/temp_results/bactopia_output/

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## BACTOPIA  #########################################

singularity exec $BIND_MOUNT bactopia_2.1.1.sif bactopia \
    --samples $kleb_git/kleb_qatar_fofn.txt \
    --datasets /scratch/mdprieto/datasets \
    --outdir /scratch/mdprieto/temp_results/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --cleanup_workdir \
    --singularity_cache $SINGULARITY_CACHEDIR \
    --max_cpus 4 \
    --verbose \
    -profile singularity

In [15]:
def numbers(one, two=2, three, four=4):
    n = str(one) + str(two) + str(three) + str(four)
    return n

print(numbers('ss', 'aa', 'bb', 'rr'))

SyntaxError: non-default argument follows default argument (730316369.py, line 1)