# Workflows with Snakemake

Take the following pieces of code, and turn them into a Snakefile.

## Quality control with `fastqc`

The `fastqc` program generates a report that summarises the quality of next generation sequencing data. It takes a single input - a FASTQ file - and generates an HTML report. For example, the command:

```{bash}
fastqc SRR765688.fastq
```

will generate a report `SRR765688_fastqc.html`.

### Solution

```
IDS, = glob_wildcards("{id}.fastq")

rule all:
  input:
    expand("{id}_fastqc.html", id=IDS)

rule fastqc:
  input:
    "{d1}.fastq"
  output:
    "{d1}_fastqc.html"
  shell:
    "fastqc {input}"
```

## Conversion of FASTQ to FASTA

The following Python code converts FASTQ to FASTA

```{python}
from Bio import SeqIO
SeqIO.convert('SRR765688.fastq','fastq','SRR765688.fasta','fasta')
```

### Solution

```
IDS, = glob_wildcards("{id}.fastq")

rule all:
  input:
    expand("{id}.fasta", id=IDS)

rule fastq2fasta:
  input:
    "{d1}.fastq"
  output:
    "{d1}.fasta"
  run:
    "
from Bio import SeqIO
SeqIO.convert('{input}','fastq','{output}','fasta')
"
```

## Running IgBLAST

```
igblastn \
    -germline_db_V database/human_igh_v \
    -germline_db_D database/human_igh_d \
    -germline_db_J database/human_igh_j \
    -auxiliary_data optional_file/human_gl.aux \
    -domain_system imgt -ig_seqtype Ig -organism human \
    -outfmt '7 std qseq sseq btop' \
    -query SRR765688.fasta \
    -out SRR765688.fmt7
```

### Solution

```
IDS, = glob_wildcards("{id,[A-Z]{3}[0-9]+}.fasta")

rule all:
  input:
    expand("{id}.fmt7",id=IDS)

rule igblast:
  input:
    "{d1}.fasta"
  output:
    "{d1}.fmt7"
  shell:
    "igblastn \
    -germline_db_V database/human_igh_v \
    -germline_db_D database/human_igh_d \
    -germline_db_J database/human_igh_j \
    -auxiliary_data optional_file/human_gl.aux \
    -domain_system imgt -ig_seqtype Ig -organism human \
    -outfmt '7 std qseq sseq btop' \
    -query {input} \
    -out {output}"
```

## Converting IgBLAST to ChangeO

```{bash}
MakeDb.py igblast -i SRR765688.fmt7 -s SRR765688.fasta -r IMGT_Human_IGH[VDJ].fasta \
    --regions --scores
```

This generates a file `SRR765688_db-pass.tab`.

### Solution

In [None]:
```
IDS, = glob_wildcards("{id}.fmt7")

rule all:
  input:
    expand("{id}_db-pass.tab", id=IDS)

rule makedb:
  input:
    ["{d1}.fmt7","{d1}.fasta"]
  output:
    "{d1}_db-pass.tab"
  shell:
    "MakeDb.py igblast -i {input[0]} -s {input[1]} -r IMGT_Human_IGH[VDJ].fasta \
    --regions --scores"
```

## Splitting ChangeO database into functional and nonfunctional

```{bash}
ParseDb.py split -d SRR765688_db-pass.tab -f FUNCTIONAL
```

### Solution

```
IDS, = glob_wildcards("{id}_db-pass.tab")

rule all:
  input:
    expand(["{id}_db-pass_FUNCTIONAL-F.tab","{id}_db-pass_FUNCTIONAL-T.tab"], id=IDS)

rule splitfunctional:
  input:
    "{d1}.tab"
  output:
    ["{d1}_FUNCTIONAL-F.tab","{d1}_FUNCTIONAL-T.tab"]
  shell:
    "ParseDb.py split -d {input} -f FUNCTIONAL"
```