# Nextflow Bowtie Pipeline Tutorial

## 1. Set Up Nextflow Enivornment

Create a conda environment and install Nextflow
- `conda create -n nextflow`
- `source activate nextflow`
- `conda install -c bioconda nextflow`

Clone nf-bowtie github repository
- `git clone https://github.com/czbiohub/nf-bowtie.git`

Install Docker (if you want to run nextflow locally)
- https://docs.docker.com/docker-for-mac/install/

## 2. nf-bowtie inputs

This pipeline can take different inputs for the reference file:
    - Single fasta (--reference_type single_file)
    - A folder of fasta references to bowtie against (--reference_type folder)
    - Nested folders (--reference_type embedded_folder)
        - For example:
            - /mnt/data/sample1
                - /mnt/data/sample1/contigs.fasta
            - /mnt/data/sample2
                - /mnt/data/sample2/contigs.fasta
            - /mnt/data/sample3
                - /mnt/data/sample3/contigs.fasta
                
The reads can either be in a `.fastq` or `.fastq.gz` form

## 3. Typical Bowtie Command Structure

Typical structure:

```
nextflow run main.nf -profile czbiohub_aws --skip_fastqc  --skip_count false

--reference_type single_file --reads ‘<read_folder>/**{R1,R2}_001.fastq’

 --reference ‘<reference_folder>/<sample>.fasta’

--outdir ’<output_directory>’```

* -- skip_fastqc: stops the fastqc process which QCs all you read files you push into the alignment pipeline
* --skip_count false: counts are usually off but this allows you to turn it on
* --reads: path to reads
* --reference: path to reference
* --outdir: specify an output directory or just let it default to placing the output in `./results`
* -profile: you can have the pipeline running in aws batch using `czbiohub_aws` or locally on your computer using `local`

## 4. Run and edit the Makefile

Makefiles are amazing. Take a look at the Makefile (I suggest cd-ing into the nf-bowtie repository you cloned and then calling `atom .`) 

```
demo1:
	nextflow run main.nf -profile czbiohub_aws \\
    --skip_fastqc --reference_type single_file \\
    --reads 's3://kalani-bucket/FLASH/fastq/**{R1,R2}_001.fastq' \\
    --reference \\
    's3://kalani-bucket/FLASH/amrFLASH_all_98percent_1433.fasta' \\
    --outdir './results_flash_demo1'
```

Looking at the command associated with `demo 1`, it looks pretty complicated. But I can edit the command on atom (or nano while on terminal) and run it by just typing `Make demo1`

## 5. Troubleshooting

- `cat nextflow.config` in wherever directory you ran your nextflow command to see where it stopped
- aws batch interface
    - Can ID the jobs that were submitted to see if they were running, succeeded, failed or stalling
    - Use the job ID to go to cloudwatch and understand the reason a job might have failed
- pipeline info folders in your `result` folder can let you know which processes worked and which ones didn't

## 6. Making your own Nextflow Pipeline

- Take a look at Olga's repository on starting with Nextflow (section entitled `Creating your own Nextflow Workflow`): https://github.com/czbiohub/nextflow-tutorial-2019
- https://github.com/nf-core/tools
- I also really like this page for common commands and patterns if you need help with certain parts of your processes: http://nextflow-io.github.io/patterns/index.html
- post on #eng-support and #nextflow on slack for questions