# Metafly: a whitefly metagenomics project
By Cyrielle Ndougonna \
Supervision: Ezechiel B. Tibiri, Romaric Nanema & Fidèle Tiendrébéogo

Project aims: \
O1: establish the diversity of viruses associated with whiteflies originating from two locations in Côte d'Ivoire (Bonoua and N'Djem) \
O2: catalogue the endosymbiotic bacteria associated with whiteflies originating from the two sites \
O3: characterise whitefly (_Bemisia tabaci_) genotypes circulating in the two areas

This notebook describes the steps in the bioinformatics pipeline used for the analysis of Oxford Nanopore reads generated from whitefly samples collected in Bonoua and N'Djem.
The analysis was executed on the iTrop HPC.

# Getting started

In [None]:
# connect to distant server
ssh bioinfo-master1.ird.fr -l ndougonna

# check available partitions
sinfo

# launch an interactive session
srun -c12 --pty bash -i

In [None]:
# create project directory in /scratch
mkdir -p /scratch/whitefly_ont_sequencing/from_pod5

# A. Basecalling ONT reads with Dorado

## 1. Locate raw data and create working directory basecalling

In [None]:
# raw data is located in the following directory
/home/cndougonna/whitefly/FAV02519/pod5/

In [None]:
# create basecalling directories in /scratch
mkdir -p /scratch/whitefly_ont_sequencing/from_pod5/basecalling

## 2. Basecalling

In [None]:
cd /scratch/whitefly_ont_sequencing/from_pod5/basecalling
pwd

In [None]:
# load Dorado
module load dorado/0.8.3
module list

In [None]:
# print Dorado options
dorado basecaller --help

In [None]:
# DO NOT RUN THIS CODE
# list models available for download
dorado download --list
# download appropriate model
dorado download --model dna_r10.4.1_e8.2_400bps_sup@v5.0.0
# run Dorado on input directory
for FILE in /home/cndougonna/whitefly/FAV02519/pod5/; do FILENAME=$(FAV02519_6c0a1734_fba2136f_ "$FILE" .pod5); dorado basecaller sup --emit-fastq "$FILE" > ./fastq/${FILENAME}.fastq; done

In [None]:
# there are 42 .pod5 files in total; dorado automatically detects all .pod5 in input directory with the --recursive flag
## do not use --emit-fastq flag as we want the output in .bam
dorado basecaller --recursive sup --no-trim --min-qscore 10 /home/cndougonna/whitefly/FAV02519/pod5/ > calls.bam

In [None]:
# generate summary
## this command is only compatible with reads basecalled from .pod5
dorado summary /scratch/whitefly_ont_sequencing/basecalling/calls.bam > summary.tsv

In [None]:
# demultiplex reads; specify .fastq as output
## we obtain one .fastq per barcode
mkdir -p /scratch/whitefly_ont_sequencing/basecalling/demultiplexing
dorado demux --threads 8 --emit-fastq --emit-summary --kit-name SQK-NBD114-96 --output-dir /scratch/whitefly_ont_sequencing/basecalling/demultiplexing /scratch/whitefly_ont_sequencing/basecalling/calls.bam

# B. Quality control with NanoPlot

From here on, the code that needs to be run is found in pipeline_from_fastq.ipynb, starting with quality control with NanoPlot.