# A utility workflow to submit jobs to CSG nodes

This notebook provides a short-cut to submit bash scripts to CSG computing nodes.


Suppose we would like to submit these lines of commands to the cluster:

```
sos run gatk_joint_calling.ipynb call \
    --container-option /mnt/mfs/statgen/containers/gatk4-annovar.sif \
    --vcf-prefix output/minimal_example \
    --samples /mnt/mfs/statgen/data_private/gatk_joint_call_example/20200820_sample_manifest.txt \
    --samples-dir /mnt/mfs/statgen/data_private/gatk_joint_call_example/ \
    --ref-genome /mnt/mfs/statgen/isabelle/REF/refs/Homo_sapiens.GRCh37.75.dna_sm.primary_assembly.fa\
    --cwd output/ \ 
    --variant_filter 'strict'

sos run gatk_joint_calling.ipynb strict_filter \
    --vcf-prefix output/minimal_example \
    --cwd output/ \
    --variant_filter 'strict'
    
sos run gatk_joint_calling.ipynb basic_filter \
    --vcf-prefix output/minimal_example \
    --cwd output/ \
    --variant_filter 'basic'

sos run gatk_joint_calling.ipynb vcf_qc \
    --vcf-prefix output/minimal_example \
    --cwd output/ \
    --variant_filter 'basic'
    
    
```

First, we save the above lines to a text file, e.g. call it `analysis_commands_20200825.txt`, then use the following workflow steps to allocate resources and submit the jobs.

Example to submit a job:

```
sos run submit_csg.ipynb submit_csg \
    --cmd_file command_1027.txt     
sos run submit_csg.ipynb submit_csg \
    --cmd_file ~/gatk_joint_calling/command_1027.txt 
```


If you want to run in a dryrun mode, meaning just simply test the process but do not genrate results
```
sos run submit_csg.ipynb submit_csg \
    --cmd_file analysis_commands_20200825.txt \
    --dryrun
```

In [None]:
# Job submission on CSG cluster
[submit_csg]
# Path to job file
parameter: cmd_file=path
# Total run time allocated to the script
parameter: time='36:00:00'
# Memory allocated to a job, in terms of Gigabyte
parameter: mem=12
parameter: dryrun = False
input: cmd_file
python3: expand = '$[ ]'
    tpl = '''
    #!/bin/sh
    #$ -l h_rt=$[time]
    #$ -l h_vmem=$[mem+6]G
    #$ -N gatk_joint_call
    #$ -cwd
    #$ -j y
    #$ -S /bin/bash
    module load Singularity
    module load VCFTOOLS/0.1.17
    module load Plink/1.9.10 
    export PATH=$HOME/miniconda3/bin:$PATH
    set -e
    '''
    script = tpl.lstrip() + ''.join(open($[_input:r]).readlines())
    exe = 'cat' if $[dryrun] else 'qsub'
    from subprocess import Popen, PIPE
    import sys
    p = Popen(exe, shell = False, stdin = PIPE, stdout = PIPE, stderr = PIPE, close_fds = True)
    for item in p.communicate(script.encode(sys.getdefaultencoding())):
        output = item.decode(sys.getdefaultencoding()).rstrip()
        if output:
            print(output)