Install vsn from: https://vsn-pipelines.readthedocs.io/en/latest/getting-started.html

Put the fastq to process in a dir `./fastq`

Generate a config file read by the vsn pipelin. $nwork is a temporary file destination.

In [None]:
nwork=${VSC_SCRATCH}/20210929_20210813_hydrop-atac_384_pbmc/
mkdir $nwork
export NXF_WORK=$nwork

VSN=vib-singlecell-nf/vsn-pipelines/main_atac.nf

nextflow pull vib-singlecell-nf/vsn-pipelines -r develop_atac

module load graphviz
module load Nextflow

nextflow config $VSN \
    -profile atac_preprocess_bap,vsc \
    > atac_preprocess_bap.config

### now make some changes to the config:
* redirect to correct metadata_auto.tsv file
* fix BWA parameters
    * change the bwa index directory to the right, in this case hg39
    * change the bwa executor to local to run on current node
    * number of bwa CPUs: better to have 2 forks running with 17 threads than to have 1 fork with 36 threads due to I/O overhead
* add whitelists for each sample
* check if bap parameters are correct
* make sure all output dirs etc. exist

## Note about bap jaccard index cutoff values
Figuring Jaccard index cutoffs for bap requires a two-pass approach. First, bap is run with generic cutoffs, e.g. 0.01. We then take a look at the bap-outputed barcode pair jaccard kneeplots and try to find the knee output. We then enter these jaccard cutoffs as new filters and re-run the pipeline:
```
minimum_jaccard_index = [
    default: 0.0,
    Broad_mito_1: 0.07,
    Broad_mito_2: 0.07,
    CNAG_1: 0.05,
    CNAG_2: 0.05,
    Sanger_1: 0.005,
    Sanger_2: 0.005,
    Stanford_1: 0.08,
    Stanford_2: 0.06,
    VIB_1: 0.09,
    VIB_2: 0.04,
    pbmc_unsorted_3k: 0.01,
    atac_pbmc_5k_v1: 0.08,
    atac_pbmc_5k_nextgem: 0.08,
    VIB_Hydrop_11: 0.02,
    VIB_Hydrop_12: 0.02,
    VIB_Hydrop_21: 0.02,
    VIB_Hydrop_22: 0.02
]
```
In the future, we will write a patch that enables the changing of these cutoffs without re-running bap's computational steps.

Here is a functional config file tailored to our computing environment:

In [8]:
cat atac_preprocess_bap.config

singularity {
   cacheDir = '/staging/leuven/res_00001/software/vsn_containers/'
   enabled = true
   autoMounts = true
   runOptions = '--cleanenv -H $PWD -B /lustre1,/staging,/data,${VSC_SCRATCH},${VSC_SCRATCH}/tmp:/tmp'
}

manifest {
   name = 'vib-singlecell-nf/vsn-pipelines'
   description = 'A repository of pipelines for single-cell data in Nextflow DSL2'
   homePage = 'https://github.com/vib-singlecell-nf/vsn-pipelines'
   version = '0.25.0'
   mainScript = 'main.nf'
   defaultBranch = 'master'
   nextflowVersion = '!>=20.10.0'
}

params {
   global {
      project_name = '10x_PBMC'
      outdir = 'out'
   }
   misc {
      test {
         enabled = false
      }
   }
   utils {
      container = 'vibsinglecellnf/utils:0.4.0'
      publish {
         compressionLevel = 6
         annotateWithBatchVariableName = false
         mode = 'copy'
      }
   }
   sc {
      file_converter {
         off = 'h5ad'
         tagCellWithSampleId = true
         remove10xGEMWell = false
     

Then, generate a metadata file as described here: https://vsn-pipelines.readthedocs.io/en/latest/scatac-seq.html

In [3]:
cat metadata_auto.tsv

sample_name	technology	fastq_PE1_path	fastq_barcode_path	fastq_PE2_path
Broad_1	biorad	fastq/Broad_PBMC_1_S1_R1_001.fastq.gz		fastq/Broad_PBMC_1_S1_R2_001.fastq.gz
Broad_1	biorad	fastq/PBMC_1_S1_L001_R1_001.fastq.gz		fastq/PBMC_1_S1_L001_R2_001.fastq.gz
Broad_1	biorad	fastq/PBMC_1_S1_L002_R1_001.fastq.gz		fastq/PBMC_1_S1_L002_R2_001.fastq.gz
Broad_2	biorad	fastq/Broad_PBMC_2_S2_R1_001.fastq.gz		fastq/Broad_PBMC_2_S2_R2_001.fastq.gz
Broad_2	biorad	fastq/PBMC_2_S2_L001_R1_001.fastq.gz		fastq/PBMC_2_S2_L001_R2_001.fastq.gz
Broad_2	biorad	fastq/PBMC_2_S2_L002_R1_001.fastq.gz		fastq/PBMC_2_S2_L002_R2_001.fastq.gz
Broad_mito_1	standard_revcomp	fastq/Benchmark_1_S1_L001_R1_001.fastq.gz	fastq/Benchmark_1_S1_L001_R2_001.fastq.gz	fastq/Benchmark_1_S1_L001_R3_001.fastq.gz
Broad_mito_1	standard_revcomp	fastq/Benchmark_1_S1_L002_R1_001.fastq.gz	fastq/Benchmark_1_S1_L002_R2_001.fastq.gz	fastq/Benchmark_1_S1_L002_R3_001.fastq.gz
Broad_mito_1	standard_revcomp	fastq/Benchmark_1_S1_L003_R1_001.fastq.gz	

Then, in a tmux session to avoid interruption:

In [None]:
nextflow -C atac_preprocess_bap.config run $VSN -entry atac_preprocess_bap -resume