# Running the pipeline
Here, we will run the pipeline. You should have a `metadata.tsv` file now that looks like so (replace by `metadata_10x.tsv` if you ran the 10x workflow):

In [1]:
cat metadata.tsv

sample_name	technology	fastq_PE1_path	fastq_barcode_path	fastq_PE2_path
BIO_ddseq_4	biorad	/lustre1/project/stg_00002/lcb/fderop/data/20230411_ATACflow_tutorial/PUMATAC_example_fastq/BIO_ddseq_4__R1.LIBDS.fastq.gz	/lustre1/project/stg_00002/lcb/fderop/data/20230411_ATACflow_tutorial/PUMATAC_example_fastq/BIO_ddseq_4__R1.LIBDS.fastq.gz	/lustre1/project/stg_00002/lcb/fderop/data/20230411_ATACflow_tutorial/PUMATAC_example_fastq/BIO_ddseq_4__R1.LIBDS.fastq.gz
EPF_hydrop_1	hydrop_2x384	/lustre1/project/stg_00002/lcb/fderop/data/20230411_ATACflow_tutorial/PUMATAC_example_fastq/EPF_hydrop_1__R1.LIBDS.fastq.gz	/lustre1/project/stg_00002/lcb/fderop/data/20230411_ATACflow_tutorial/PUMATAC_example_fastq/EPF_hydrop_1__R1.LIBDS.fastq.gz	/lustre1/project/stg_00002/lcb/fderop/data/20230411_ATACflow_tutorial/PUMATAC_example_fastq/EPF_hydrop_1__R1.LIBDS.fastq.gz
OHS_s3atac_1	s3atac_1	/lustre1/project/stg_00002/lcb/fderop/data/20230411_ATACflow_tutorial/PUMATAC_example_fastq/OHS_s3atac_1__R1.LIBDS.fastq

In [2]:
cat metadata_10x.tsv

sample_name	technology	fastq_PE1_path	fastq_barcode_path	fastq_PE2_path
ASA__0201f1__20220902_MO-016-b-ATAC	atac_revcomp	/lustre1/project/stg_00002/lcb/ngs_runs/NextSeq2000_20221004/Demultiplexed/ASA__0201f1__20220902_MO-016-b-ATAC_S5_L001_R1_001.fastq.gz	/lustre1/project/stg_00002/lcb/ngs_runs/NextSeq2000_20221004/Demultiplexed/ASA__0201f1__20220902_MO-016-b-ATAC_S5_L001_R2_001.fastq.gz	/lustre1/project/stg_00002/lcb/ngs_runs/NextSeq2000_20221004/Demultiplexed/ASA__0201f1__20220902_MO-016-b-ATAC_S5_L001_R3_001.fastq.gz
ASA__0201f1__20220902_MO-016-b-ATAC	atac_revcomp	/lustre1/project/stg_00002/lcb/ngs_runs/NextSeq2000_20221004/Demultiplexed/ASA__0201f1__20220902_MO-016-b-ATAC_S5_L002_R1_001.fastq.gz	/lustre1/project/stg_00002/lcb/ngs_runs/NextSeq2000_20221004/Demultiplexed/ASA__0201f1__20220902_MO-016-b-ATAC_S5_L002_R2_001.fastq.gz	/lustre1/project/stg_00002/lcb/ngs_runs/NextSeq2000_20221004/Demultiplexed/ASA__0201f1__20220902_MO-016-b-ATAC_S5_L002_R3_001.fastq.gz
ASA__0201f1__2022090

### 1. Generate a .config file
The config file contains a number of variables and paths that are needed by the pipeline. `$NXF_WORK` is a temporary `work` directory, which will contain intermediate files used by the pipeline. The final output files will be written in a directory which we aribtrarily name `PUMATAC_tutorial_preprocessing_out`.

In [4]:
chmod 755 PUMATAC_dependencies/nextflow/nextflow-22.10.7-all

In [5]:
module load Java/17.0.2
NXF_WORK=./work
[ ! -d $NXF_WORK ] && mkdir $NXF_WORK
PUMATAC_dependencies/nextflow/nextflow-22.10.7-all config ./PUMATAC/main_atac.nf \
    -profile atac_preprocess_rapid,vsc \
    > atac_preprocess_rapid_template.config

In [6]:
pygmentize -g atac_preprocess_rapid_template.config

manifest[38;5;250m [39m{
[38;5;250m   [39mname[38;5;250m [39m[38;5;241m=[39m[38;5;250m [39m[38;5;124m'[39m[38;5;124mvib-singlecell-nf/vsn-pipelines[39m[38;5;124m'[39m
[38;5;250m   [39mdescription[38;5;250m [39m[38;5;241m=[39m[38;5;250m [39m[38;5;124m'[39m[38;5;124mA repository of pipelines for single-cell data in Nextflow DSL2[39m[38;5;124m'[39m
[38;5;250m   [39mhomePage[38;5;250m [39m[38;5;241m=[39m[38;5;250m [39m[38;5;124m'[39m[38;5;124mhttps://github.com/vib-singlecell-nf/vsn-pipelines[39m[38;5;124m'[39m
[38;5;250m   [39mversion[38;5;250m [39m[38;5;241m=[39m[38;5;250m [39m[38;5;124m'[39m[38;5;124m0.27.0[39m[38;5;124m'[39m
[38;5;250m   [39mmainScript[38;5;250m [39m[38;5;241m=[39m[38;5;250m [39m[38;5;124m'[39m[38;5;124mmain.nf[39m[38;5;124m'[39m
[38;5;250m   [39mdefaultBranch[38;5;250m [39m[38;5;241m=[39m[38;5;250m [39m[38;5;124m'[39m[38;5;124mmaster[39m[38;5;124m'[39m
[38;5;250m   [39mnextflowVers

### 2. Edit the .config file
Now, you must modify `atac_preprocess_rapid_template.config` to fit your use case. I recommend saving a new file named `atac_preprocess_rapid.config`.  

Rename the project and the output directory, where the output files will be written:

In [None]:
params {
   global {
      project_name = 'PUMATAC_tutorial'
      outdir = 'PUMATAC_tutorial_preprocessing_out'
   }
}

Redirect variable `metadata` to our newly made `metadata.tsv` or `metadata_10x.tsv` file:

In [None]:
params {
   data {
      atac_preprocess {
         metadata = 'metadata.tsv'
        }
    }
}

Add correct whitelist for each technology called in `metadata.tsv`. For example, we included a technology named `atac`. We therefore add `atac = 'resources/whitelists/737K-cratac-v1.txt.gz'`. `atac` is not one of the 5 basic options (`standard`, `hydrop_3x96`, `hydrop_2x384`, `multiome` and `biorad`.), so the pipeline will treat it as a `standard` barcode correction case, but with a custom whitelist.  
  
`PUMATAC_dependencies` already contains whitelists for 10x and HyDrop. If you have a custom technique, such as `s3_atac`, you can add your own whitelist.

In [None]:
params {
   tools {
      singlecelltoolkit {
         container = 'vibsinglecellnf/singlecelltoolkit:2022-04-15-16314db'
         barcode_correction {
            max_mismatches = 1
            min_frac_bcs_to_find = 0.5
            whitelist {
                atac = 'PUMATAC_dependencies/whitelists/737K-cratac-v1.txt.gz'
                atac_revcomp = 'PUMATAC_dependencies/whitelists/737K-cratac-v1.REVCOMP.txt.gz'
                multiome = 'PUMATAC_dependencies/whitelists/737K-arc-v1.txt.gz'
                multiome_revcomp = 'PUMATAC_dependencies/whitelists/737K-arc-v1.REVCOMP.txt.gz'
                hydrop_2x384 = 'PUMATAC_dependencies/whitelists/hydrop_384x384.REVCOMP.txt.gz'
                s3_atac_1 = 'PUMATAC_dependencies/whitelists/s3_atac_1.txt.gz'
            }
         }
     }
 }

Change `bwa_fasta = '/path/to/bwa_index/hg38.fa'` to your genome of choice.

In [None]:
params {
    tools {
         bwamaptools {
         bwa_fasta = 'PUMATAC_dependencies/genomes/hg38_bwamem2/genome.fa'
        }
    }
}

Change the `withLabel:compute_resources__bwa_mem` parameters, e.g. change `executor` to `local` to run on current node, instead of submitting a job with the PBS system. You can also change the number of CPUs per sample (i.e. per line in the `metadata.tsv`) here. If the runtime of `bwa-mem2` exceeds the time defined here (24 hours), the pipeline will stop. You can thus increase this time limit if necessary.

In [None]:
process {
   withLabel:compute_resources__bwa_mem {
      executor = 'local'
      cpus = 20
      memory = '120 GB'
      time = '24h'
      maxForks = 2
   }
}

Edit the singularity run options to include all mounts in which files that you require or want to write are located:

In [None]:
runOptions = '--cleanenv -H $PWD -B /lustre1,/staging,/data,${VSC_SCRATCH},${VSC_SCRATCH}/tmp:/tmp,${HOME}/.nextflow/assets/'

Give an arbitrary location to the `cache` directory. I like to arbitrarily name this directory `vsn_cache`, in the `resources/` subdirectory. The cache directory is somewhat important, as it will contain all singularity containers, which can be easily executed. For example, if you want to re-run some of the tools that the pipeline calls, without re-running the entire pipeline, you can simply open the relevant singularity container, and call the tool from within the container.

In [None]:
cacheDir = 'PUMATAC_dependencies/cache'

That should be everything. Here is an example of a functional `.config`. I suggest you run a difftool (e.g. https://text-compare.com/) on this to compare this to your own `.config` if you run into problems:

In [6]:
pygmentize -g atac_preprocess_rapid.config

manifest[38;5;250m [39m{
[38;5;250m   [39mname[38;5;250m [39m[38;5;241m=[39m[38;5;250m [39m[38;5;124m'[39m[38;5;124mvib-singlecell-nf/vsn-pipelines[39m[38;5;124m'[39m
[38;5;250m   [39mdescription[38;5;250m [39m[38;5;241m=[39m[38;5;250m [39m[38;5;124m'[39m[38;5;124mA repository of pipelines for single-cell data in Nextflow DSL2[39m[38;5;124m'[39m
[38;5;250m   [39mhomePage[38;5;250m [39m[38;5;241m=[39m[38;5;250m [39m[38;5;124m'[39m[38;5;124mhttps://github.com/vib-singlecell-nf/vsn-pipelines[39m[38;5;124m'[39m
[38;5;250m   [39mversion[38;5;250m [39m[38;5;241m=[39m[38;5;250m [39m[38;5;124m'[39m[38;5;124m0.27.0[39m[38;5;124m'[39m
[38;5;250m   [39mmainScript[38;5;250m [39m[38;5;241m=[39m[38;5;250m [39m[38;5;124m'[39m[38;5;124mmain.nf[39m[38;5;124m'[39m
[38;5;250m   [39mdefaultBranch[38;5;250m [39m[38;5;241m=[39m[38;5;250m [39m[38;5;124m'[39m[38;5;124mmaster[39m[38;5;124m'[39m
[38;5;250m   [39mnextflowVers

### 3. Run the pipeline
We now call the right version of NextFlow to run our pipeline `main_atac.nf` on our `atac_preprocess_rapid.config`. I usually do this in a tmux session, or as a submitted job.

In [None]:
PUMATAC_dependencies/nextflow/nextflow-21.04.3-all -C atac_preprocess_rapid.config run PUMATAC/main_atac.nf -entry atac_preprocess_rapid

The output will look like so:

```
N E X T F L O W  ~  version 21.04.3                                                                                                                                                                                                          Launching `PUMATAC/main_atac.nf` [festering_easley] - revision: c18ea24e2f                                                                                                                                                                  executor >  local (35)                                                                                                                                                                                                                       [03/1df167] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:bc_correct_standard:SCTK__BARCODE_CORRECTION (2)                         [100%] 2 of 2 ✔                                                                                   executor >  local (35)                                                                                                                                                                                                                       [03/1df167] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:bc_correct_standard:SCTK__BARCODE_CORRECTION (2)                         [100%] 2 of 2 ✔                                                                                   executor >  local (35)                                                                                                                                                                                                                       [03/1df167] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:bc_correct_standard:SCTK__BARCODE_CORRECTION (2)                         [100%] 2 of 2 ✔                                                                                   executor >  local (51)                                                                                                                                                                                                                       [03/1df167] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:bc_correct_standard:SCTK__BARCODE_CORRECTION (2)                         [100%] 2 of 2 ✔                                                                                   [68/016dd5] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:bc_correct_standard:PUBLISH_BC_STATS (2)                                 [100%] 2 of 2 ✔                                                                                   [17/c1a9fc] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:bc_correct_standard:SCTK__BARCODE_10X_SCATAC_FASTQ (1)                   [100%] 2 of 2 ✔                                                                                   [-        ] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:SCTK__EXTRACT_HYDROP_ATAC_BARCODE_3x96                                   -                                                                                                 [-        ] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:bc_correct_hydrop_3x96:SCTK__BARCODE_CORRECTION                          -                                                                                                 [-        ] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:bc_correct_hydrop_3x96:PUBLISH_BC_STATS                                  -                                                                                                 [-        ] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:bc_correct_hydrop_3x96:SCTK__BARCODE_10X_SCATAC_FASTQ                    -                                                                                                 [a3/c8a3a3] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:SCTK__EXTRACT_HYDROP_ATAC_BARCODE_2x384 (1)                              [100%] 1 of 1, cached: 1 ✔                                                                        [17/7f7075] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:bc_correct_hydrop_2x384:SCTK__BARCODE_CORRECTION (1)                     [100%] 1 of 1 ✔                                                                                   [90/d88953] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:bc_correct_hydrop_2x384:PUBLISH_BC_STATS (1)                             [100%] 1 of 1 ✔                                                                                   [38/9b45b9] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:bc_correct_hydrop_2x384:SCTK__BARCODE_10X_SCATAC_FASTQ (1)               [100%] 1 of 1 ✔                                                                                   [e8/f70d41] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:bc_correct_biorad:SCTK__EXTRACT_AND_CORRECT_BIORAD_BARCODE (1)           [100%] 1 of 1, cached: 1 ✔                                                                        [73/069f44] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:bc_correct_biorad:PUBLISH_BR_BC_STATS (1)                                [100%] 1 of 1, cached: 1 ✔                                                                        [ba/727e0e] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:adapter_trimming:TRIMGALORE__TRIM (3)                                    [100%] 4 of 4, cached: 1 ✔                                                                        [53/e25895] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:adapter_trimming:PUBLISH_FASTQS_TRIMLOG_PE1 (4)                          [100%] 4 of 4, cached: 1 ✔                                                                        [7e/b3bbae] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:adapter_trimming:PUBLISH_FASTQS_TRIMLOG_PE2 (4)                          [100%] 4 of 4, cached: 1 ✔                                                                        [66/e06b36] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:mapping:BWA_MAPPING_PE:BWA_MEM_PE (4)                                    [100%] 4 of 4, cached: 1 ✔                                                                        [c2/40a212] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:mapping:BWA_MAPPING_PE:PUBLISH_BAM (4)                                   [100%] 4 of 4, cached: 1 ✔                                                                        [b0/24e40b] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:mapping:BWA_MAPPING_PE:PUBLISH_BAM_INDEX (4)                             [100%] 4 of 4, cached: 1 ✔                                                                        [e6/7a5bf4] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:mapping:BWA_MAPPING_PE:MAPPING_SUMMARY (4)                               [100%] 4 of 4, cached: 1 ✔                                                                        [99/3a3557] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:mapping:BWA_MAPPING_PE:PUBLISH_MAPPING_SUMMARY (4)                       [100%] 4 of 4, cached: 1 ✔                                                                        [-        ] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:mapping:PICARD__MERGE_SAM_FILES_AND_SORT                                 -                                                                                                 [0e/2213fe] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:mapping:BAM_TO_FRAGMENTS:BARCARD__CREATE_FRAGMENTS_FROM_BAM (4)          [100%] 4 of 4, cached: 1 ✔                                                                        [26/ba7d27] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:mapping:DETECT_BARCODE_MULTIPLETS:BARCARD__DETECT_BARCODE_MULTIPLETS (4) [100%] 4 of 4, cached: 1 ✔                                                                        [4b/4b05bd] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:mapping:DETECT_BARCODE_MULTIPLETS:GENERATE_REPORT (4)                    [100%] 4 of 4, cached: 1 ✔                                                                        [5d/9aa30c] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:mapping:DETECT_BARCODE_MULTIPLETS:REPORT_TO_HTML (4)                     [100%] 4 of 4, cached: 1 ✔                                                                        [4f/0355ac] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:mapping:PUBLISH_FRAGMENTS (4)                                            [100%] 4 of 4, cached: 1 ✔                                                                        [78/2a227d] process > atac_preprocess_rapid:ATAC_PREPROCESS_RAPID:mapping:PUBLISH_FRAGMENTS_INDEX (4)                                      [100%] 4 of 4, cached: 1 ✔                                                                        Completed at: 19-Apr-2023 14:12:58                                                                                                                                                                                                           Duration    : 2h 45m 24s                                                                                                                                                                                                                     CPU hours   : 59.7 (21.3% cached)                                                                                                                                                                                                            Succeeded   : 51                                                                                                                                                                                                                             Cached      : 17                                                                                          
```