## Script to test analyses for Qatar _KLebsiella pneumoniae_ project

## 20221110 - Define why bactopia datasets works only sometimes

Do not specify a __clear environment__ for singularity 
- Does not work, even if you specify '-B /localscratch:/temp'  and the location of cache for singularity


In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel
#SBATCH --cpus-per-task=4 # CPU cores per task
#SBATCH --job-name="Nov4_kleb"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=temp_results/%j_nov4.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch:/temp"

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## BACTOPIA  #########################################

singularity exec $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --include_genus \
    --limit 100 \
    --cpu 4 \
    --verbose

Specify a clear environment with `exec -e`, but do not specify a location for the cache of singularity
- No error

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=4 # tasks in parallel
#SBATCH --job-name="test_kleb_bactopia_nov10"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/nov10_kleb_bactopia_e_temp_nocache.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch -B /localscratch:/temp"

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --include_genus \
    --limit 100 \
    --cpu 4 \
    --verbose

Specify a clear environment with `exec -e` and the location of cache for singularity but do not mount scratch as temp directory
- No error

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=00:30:00
#SBATCH --ntasks=1 # tasks in parallel
#SBATCH --cpus-per-task=4 # CPU cores per task
#SBATCH --job-name="test_kleb_bactopia_nov10"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/nov10_kleb_bactopia_e_cache_notemp.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch "

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/{cache,tmp}
export SINGULARITY_CACHEDIR="/scratch/$USER/singularity/cache"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --include_genus \
    --limit 100 \
    --cpu 4 \
    --verbose

Run it with only a clear environment `exec -e` without specifying where to save the cache for singulatiry or mounting localscratch as temp
- No error

In [None]:
#!/bin/bash
#SBATCH --account=rrg-whsiao-ab
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=02:30:00
#SBATCH --ntasks=4 # tasks in parallel
#SBATCH --job-name="setup_datasets_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_error_log.out

################################## preparation #########################################

# load singularity
module purge
module load singularity/3.8

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home -B /project -B /scratch -B /localscratch "

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia datasets \
    --species "Klebsiella pneumoniae" \
    --include_genus \
    --cpu 4 \
    --verbose

**Conclusion:**
The issue seems to be a variable from the environment that was interferring with the singularity run. 