After basecalling, we can obtain several basecalling statistic notably about the basecalling quality, mean reads lenght, number of bases calls, N50 of reads...

For that two software were used: PycoQC and NanoStat
- PycoQC will give the statistics as html report
- NanoStat will give a table with basecalling statistics

Guppy outputs by default a sequencing_summary.txt file containing basic quality control (QC) information of the basecalling run (phred score, number of reads, N50…). Dorado does not output directly this file, and requires the following command line, to output a .tsv sequencing_summary file :

# Create Dorado basecalling summary file

In [None]:
$ for dir in /bigvol/omion/01-Basecalling/Dorado/*/Gd*/; do
    summary_file="${dir}sequencing_summary.tsv"
    for bam_file in "${dir}"*.bam; do
        dorado summary "$bam_file" >> "$summary_file"
    done
done

# PycoQC (Control Quality analysis)

>***Installation PycoQC***

In [None]:
$ pip install pycoQC

>***Creation of pycoQC html reports for both Dorado and Guppy sequencing_summary files***


In [None]:
#!/bin/bash

# Create directories to stock PycoQC reports
mkdir -p /bigvol/omion/02-QC_analysis/pycoQC/{Dorado,Guppy}/{modbasecalling,basecalling}

# For loop to execute PycoQC on all files
for i in /bigvol/omion/01-Basecalling/*/*/Gd*/sequencing_summary.{txt,tsv}; do
    if [ -f "$i" ]; then
        sample=$(basename "$i")
        directory=$(dirname "$i")
        gd_part=$(basename "$directory")

        # Extract the basecalling method (Dorado or Guppy) and type (modbasecalling or basecalling)
        basecalling_method=$(echo "$i" | awk -F'/' '{print $5}')
        basecalling_type=$(echo "$i" | awk -F'/' '{print $6}')

        # Create the output directory path
        output_dir="/bigvol/omion/02-QC_analysis/pycoQC/${basecalling_method}/${basecalling_type}"

        # Run pycoQC and save the output in the corresponding directory
        pycoQC -f "$i" -o "${output_dir}/pycoQC_${gd_part}.html"
    fi
done


>***Visualisation of PycoQC reports***


To visualise pycoQC reports either right click on the file and click open with firefox or use a terminal :

In [None]:
$ firefox ./pycoQC_Gd45.html  ## Open the report for sample Gd45 in firefox

# NanoStat (QC analysis)

>***Install NanoStat***

In [None]:
$ pip install nanostat

In [None]:
#!/bin/bash

# Create base directories
mkdir -p /bigvol/omion/02-QC_analysis/NanoStat/{Dorado,Guppy}/{modbasecalling,basecalling}

for i in /bigvol/omion/01-Basecalling/*/*/Gd*/sequencing_summary.{txt,tsv}; do
    if [ -f "$i" ]; then
        sample=$(basename "$i")
        directory=$(dirname "$i")
        gd_part=$(basename "$directory")

        # Extract the basecalling method (Dorado or Guppy) and type (modbasecalling or basecalling)
        basecalling_method=$(echo "$i" | awk -F'/' '{print $5}')
        basecalling_type=$(echo "$i" | awk -F'/' '{print $6}')

        # Create the output directory path
        output_dir="/bigvol/omion/02-QC_analysis/NanoStat/${basecalling_method}/${basecalling_type}"

        # Run NanoStat and save the output in the corresponding directory 
        NanoStat --summary "$i" > "${output_dir}/${gd_part}.txt"
    fi
done