# HERA Bioinformatics training

Hi!

Welcome to the hands-on course part of the HERA bioinformatics training.

Before getting started, underneath this text you'll see a button with the text "*Show code*" with a play-button next to it.  
Please click that play button now, it will install and configure everything necessary for this course. 

Installation will take about 5~6 minutes.

---

In [None]:
#@title
!pip install igv-jupyter --quiet > /dev/null 2>&1
!sed -i -e '1,2d' ~/.bashrc && source ~/.bashrc && bash -c "$(curl -sL https://raw.githubusercontent.com/RIVM-bioinformatics/HERA-Bioinformatics-Training/main/setup.sh)"

## Table of contents

1. [Removing sequencing adapters & quality control](#scrollTo=O2yVf3BatvPS)
  * [Illumina data](#w)
  * [Nanopore data](#e)
2. [Removing primer sequences]()
  * [Illumina data]()
  * [Nanopore data]()
3. [Aligning reads to reference]()
  * [Illumina data]()
  * [Nanopore data]()
4. [Consensus calling]()

# Removing sequencing adapters & quality control

## Illumina data

### Removing adapters from Illumina Sequencing data
Trimmomatic  
FastP  

Ook mogelijk om alignments te gebruiken in het geval van een 'targeted' analyse, zoals het geval is bij analyses van SARS-CoV-2

In [None]:
%%bash
source activate base; conda activate Alignments

## Align the reads to the reference sequence
minimap2 -ax sr source/extra/GCF_009858895_2_ASM985889v3_genomic.fasta example_data/illumina_fastq_1.fastq.gz example_data/illumina_fastq_2.fastq.gz | samtools view -F 256 -F 512 -F 4 -F 2048 -uS | samtools sort -o output_data/alignments/illumina_raw_alignment.bam
samtools index output_data/alignments/illumina_raw_alignment.bam

## extract the reads from the alignment back to a single fastq file without the adapters
python source/extra/clipper.py --input output_data/alignments/illumina_raw_alignment.bam --output output_data/adapter_removal/illumina_no_adapters.fastq

### Quality control in Illumina data

Usually can be combined with adapter removal if fastp or trimmomatic is used for adapter removal. 

In [None]:
%%bash
source activate base; conda activate Data_cleanup

fastp -i output_data/adapter_removal/illumina_no_adapters.fastq -o output_data/quality_control/illumina_post_qc.fastq \
  -A --cut_right --cut_right_mean_quality 20 --cut_right_window_size 5 -l 100 \
  -h output_data/quality_control/illumina_fastp.html -j output_data/quality_control/illumina_fastp.json

In [None]:
#@markdown <-- click to show quality control report
!sed -i 's/http:\/\//https:\/\//g' ./output_data/quality_control/*.html
from IPython.display import HTML

HTML(filename="/content/output_data/quality_control/illumina_fastp.html")

## Nanopore data

### Removing sequencing adapters from Nanopore Sequencing data

In [None]:
%%bash
source activate base; conda activate Alignments

## Align the reads to the reference sequence
minimap2 -ax map-ont source/extra/GCF_009858895_2_ASM985889v3_genomic.fasta example_data/nanopore_fastq.fastq.gz | samtools view -F 256 -F 512 -F 4 -F 2048 -uS | samtools sort -o output_data/alignments/nanopore_raw_alignment.bam
samtools index output_data/alignments/nanopore_raw_alignment.bam

## Extract the reads from the alignment back to a single fastq file without the adapters
python source/extra/clipper.py --input output_data/alignments/nanopore_raw_alignment.bam --output output_data/adapter_removal/nanopore_no_adapters.fastq

### Quality control in Nanopore data

The best filtering tool depends on your experimental setup as well as the data which you have available.  
There are many nanopore-specific filtering tools.   
However, we're just using FastQ data here and we don't really need anything nanopore-specific for filtering.  
So we'll be using FastP for consistency. 

In [None]:
%%bash
source activate base; conda activate Data_cleanup

fastp -i output_data/adapter_removal/nanopore_no_adapters.fastq -o output_data/quality_control/nanopore_post_qc.fastq \
  -A --cut_right --cut_right_mean_quality 7 --cut_right_window_size 20 -l 100 \
  -h output_data/quality_control/nanopore_fastp.html -j output_data/quality_control/nanopore_fastp.json

In [None]:
#@markdown <-- click to show quality control report
!sed -i 's/http:\/\//https:\/\//g' ./output_data/quality_control/*.html
from IPython.display import HTML

HTML(filename="/content/output_data/quality_control/nanopore_fastp.html")

# Removing Primer sequences

## Illumina data

In [None]:
%%bash
source activate base; conda activate Data_cleanup

AmpliGone --input output_data/quality_control/illumina_post_qc.fastq \
  --output output_data/primer_removal/illumina_no_primers.fastq \
  --reference source/extra/GCF_009858895_2_ASM985889v3_genomic.fasta \
  --primers source/extra/articv3.bed \
  --amplicon-type end-to-mid \
  --export-primers output_data/primer_removal/illumina_removed_primers.bed

## Nanopore data

In [None]:
%%bash
source activate base; conda activate Data_cleanup

AmpliGone --input output_data/quality_control/nanopore_post_qc.fastq \
  --output output_data/primer_removal/nanopore_no_primers.fastq \
  --reference source/extra/GCF_009858895_2_ASM985889v3_genomic.fasta \
  --primers source/extra/articv3.bed \
  --amplicon-type end-to-end \
  --export-primers output_data/primer_removal/nanopore_removed_primers.bed

# Aligning reads to a reference

## Illumina data

In [None]:
%%bash
source activate base; conda activate Alignments

## Align the reads to the reference sequence
minimap2 -ax sr source/extra/GCF_009858895_2_ASM985889v3_genomic.fasta output_data/primer_removal/illumina_no_primers.fastq | samtools view -F 256 -F 512 -F 4 -F 2048 -uS | samtools sort -o output_data/alignments/illumina_cleaned_alignment.bam
samtools index output_data/alignments/illumina_cleaned_alignment.bam

In [None]:
#@markdown <-- Click to show alignment results

!source activate base; conda activate Alignments; samtools view -bs 0.01 /content/output_data/alignments/illumina_cleaned_alignment.bam > /content/output_data/alignments/illumina_cleaned_subsampled_for_view.bam
!source activate base; conda activate Alignments; samtools index /content/output_data/alignments/illumina_cleaned_subsampled_for_view.bam
import igv_notebook
igv_notebook.init()
b = igv_notebook.Browser(
    {
        "genome": "ASM985889v3",
        "locus": "NC_045512.2:1-300",
        "tracks": [
          {
            "name": "Illumina read alignment",
            "path": "/content/output_data/alignments/illumina_cleaned_subsampled_for_view.bam",
            "indexPath": "/content/output_data/alignments/illumina_cleaned_subsampled_for_view.bam.bai",
            "type": "alignment",
            "format": "bam",
            "showSoftClips": False,
            "colorBy": "strand"
           },
           {
              "name": "Removed primers",
              "type": "annotation",
              "format": "bed",
              "path": "/content/output_data/primer_removal/illumina_removed_primers.bed",
              "displayMode": "EXPANDED"
           }
        ]
    }
)

## Nanopore data

In [None]:
%%bash
source activate base; conda activate Alignments

## Align the reads to the reference sequence
minimap2 -ax map-ont source/extra/GCF_009858895_2_ASM985889v3_genomic.fasta output_data/primer_removal/nanopore_no_primers.fastq | samtools view -F 256 -F 512 -F 4 -F 2048 -uS | samtools sort -o output_data/alignments/nanopore_cleaned_alignment.bam
samtools index output_data/alignments/nanopore_cleaned_alignment.bam

In [None]:
#@markdown <-- Click to show alignment results

!source activate base; conda activate Alignments; samtools view -bs 0.01 /content/output_data/alignments/nanopore_cleaned_alignment.bam > /content/output_data/alignments/nanopore_cleaned_subsampled_for_view.bam
!source activate base; conda activate Alignments; samtools index /content/output_data/alignments/nanopore_cleaned_subsampled_for_view.bam
import igv_notebook
igv_notebook.init()
b = igv_notebook.Browser(
    {
        "genome": "ASM985889v3",
        "locus": "NC_045512.2:1-300",
        "tracks": [
          {
            "name": "Nanopore read alignment",
            "path": "/content/output_data/alignments/nanopore_cleaned_subsampled_for_view.bam",
            "indexPath": "/content/output_data/alignments/nanopore_cleaned_subsampled_for_view.bam.bai",
            "type": "alignment",
            "format": "bam",
            "showSoftClips": False,
            "colorBy": "strand"
           },
           {
              "name": "Removed primers",
              "type": "annotation",
              "format": "bed",
              "path": "/content/output_data/primer_removal/nanopore_removed_primers.bed",
              "displayMode": "EXPANDED"
           }
        ]
    }
)

# Consensus calling

# Illumina data

# Nanopore data

In [None]:
%%bash
source activate base; conda activate Consensus_seq

TrueConsense --input output_data/alignments/nanopore_cleaned_alignment.bam \
  --reference /content/source/extra/GCF_009858895_2_ASM985889v3_genomic.fasta \
  --features /content/source/extra/features.gff \
  --coverage-level 30 \
  --samplename nanopore_example \
  --output output_data/nanopore_consensus_sequence.fasta \
  --variants output_data/nanopore_variants.vcf \
  --output-gff output_data/nanopore_corrected_features.gff \
  --depth-of-coverage output_data/nanopore_coverage.tsv