# Workflows with Snakemake

Take the following pieces of code, and turn them into a Snakefile.

## Quality control with `fastqc`

The `fastqc` program generates a report that summarises the quality of next generation sequencing data. It takes a single input - a FASTQ file - and generates an HTML report. For example, the command:

```bash
fastqc SRR765688.fastq
```

will generate a report `SRR765688_fastqc.html`.

### Solution

## Conversion of FASTQ to FASTA

The following Python code converts FASTQ to FASTA

```python
from Bio import SeqIO
SeqIO.convert('SRR765688.fastq','fastq','SRR765688.fasta','fasta')
```

### Solution

## Running IgBLAST

```
igblastn \
    -germline_db_V database/human_igh_v \
    -germline_db_D database/human_igh_d \
    -germline_db_J database/human_igh_j \
    -auxiliary_data optional_file/human_gl.aux \
    -domain_system imgt -ig_seqtype Ig -organism human \
    -outfmt '7 std qseq sseq btop' \
    -query SRR765688.fasta \
    -out SRR765688.fmt7
```

### Solution

## Converting IgBLAST to ChangeO

```bash
MakeDb.py igblast -i SRR765688.fmt7 -s SRR765688.fasta -r IMGT_Human_IGH[VDJ].fasta \
    --regions --scores
```

This generates a file `SRR765688_db-pass.tab`.

### Solution

## Splitting ChangeO database into functional and nonfunctional

```bash
ParseDb.py split -d SRR765688_db-pass.tab -f FUNCTIONAL
```

### Solution

## Putting it all together

You can now cut and paste together a Snakefile containing all the rules. You will have to modify the IDs command, and the top-level rule to get the whole thing to work.