# Metafly: a whitefly metagenomics project
By Cyrielle Ndougonna \
Supervision: Ezéchiel B. Tibiri & Fidèle Tiendrébéogo

project aims: \
O1: characterise whitefly (_Bemisia tabaci_) genotypes circulating in the two study areas (Bonoua and N'Djem) \
O2: establish the diversity of viruses associated with whiteflies originating from Bonoua and N'Djem \
O3: catalogue the endosymbiotic bacteria associated with whiteflies originating from Bonoua and N'Djem

This notebook describes the steps in the bioinformatics pipeline used for the analysis of whitefly Oxford Nanopore reads
The analysis was executed on the UJKZ HPC.

conventions: \
directory names in all caps \
file names with underscore and no caps

# A. Basecalling ONT reads with Guppy

## 1. Create working directories and import raw data files

In [None]:
# connect to distant server
ssh cndougonna@102.216.123.67
scontrol show partitions

In [None]:
# reserve a node and create personal folder in /scratch
srun --ntasks=1 --cpus-per-task=8 --mem=32G --time=03:00:00 --pty bash -i
mkdir -p /tmp/whitefly_ont_sequencing

In [None]:
# create data and basecalling directories
mkdir -p /tmp/whitefly_ont_sequencing/raw_data
mkdir -p /tmp/whitefly_ont_sequencing/basecalling

In [None]:
# copy raw data from source folder on remote server to working directory
cd /tmp/whitefly_ont_sequencing/raw_data
scp -r /home/cndougonna/whitefly/*.fast5 .
ls
head -20 whitefly_data.fast5

## 2. Basecalling

In [None]:
# print Guppy options
guppy_basecaller --help

In [None]:
# list existing configuration files; select high-accuracy (hac) or super-accuracy model (sup)
guppy_basecaller --print_workflows | grep xxx_ont_kit_used_xxx

In [None]:
# run Guppy with options to trim adapters and barcodes and remove reads with qscore below 7
cd /tmp/whitefly_ont_sequencing/basecalling
guppy_basecaller -c xxxxxxxxxxxxx.cfg -i /tmp/whitefly_ont_sequencing/raw_data/whitefly.fast5 \
                    -t 12 -s /tmp/whitefly_ont_sequencing/basecalling \
                    --detect_adapter --trim_adapters --detect_barcodes --enable_trim_barcodes --min_qscore 7

# B. Quality control with NanoPlot

## 1. Create working directory qc

In [None]:
# create qc directory
mkdir -p /tmp/whitefly_ont_sequencing/qc

## 2. Run NanoPlot

In [None]:
# print NanoPlot help menu
NanoPlot --help

In [None]:
#run NanoPlot
NanoPlot -t 2 -o /tmp/whitefly_ont_sequencing/qc \
            --fastq /tmp/whitefly_ont_sequencing/basecalling/whitefly_basecalled.fastq


In [None]:
# examine QC reports

# C. _de novo_ assembly using Flye

In [None]:
# create assembly directory
mkdir -p /tmp/whitefly_ont_sequencing/assembly