<a id='setup'></a>

## 0. Setup

Setup of the packages and setting of data directory.

In [6]:
#For the import of packages
import numpy as np
import pandas as pd
from qiime2 import Visualization
import matplotlib.pyplot as plt
import qiime2 as q2

data_dir = "data"
database_dir = "database"

<a id='Input'></a>

### 0.1 Input

Inspection of the input dataset and loading of metadata.

In [7]:
! qiime tools peek ../$data_dir/sequences_demux_paired.qza

[32mUUID[0m:        b5fec962-ca06-4df5-b043-3aa289e4d753
[32mType[0m:        SampleData[PairedEndSequencesWithQuality]
[32mData format[0m: SingleLanePerSamplePairedEndFastqDirFmt


In [None]:
#visualizing the input data first
! qiime demux summarize \
    --i-data ../$data_dir/sequences_demux_paired.qza \
    --o-visualization ../$data_dir/sequences_demux_paired.qzv

In [None]:
Visualization.load(f'../{data_dir}/sequences_demux_paired.qzv')

**Brief Summary of paired end sequences with quality score**
* Lowest sequencing depth of 8000 
* Mean of 30012.224086 reads per sequence, median about the same  
* Total number of reads: 50090402 
* median length both forward and reverse is about 230nts with most (96% of samples) being +/- 10nts in length
* quality of reads starts to drop below score 20 at different lengths for forward and reverse reads, hence we will use the _"denoise-paired" command and seperately trim the ends to the length they fall below a Phred score of 20_
* median quality of 38 (Phred quality score)

In [None]:
# this line parses the TSV file to create a DataFrame object. 
metadata_df = pd.read_csv(f'{data_dir}/metadata.tsv', sep='\t', index_col=0)
# Grab 5 random samples
metadata_df.sample(n=5)

<a id='denoising'></a>

## 1. Denoising and generation of ASV's

1. Truncation and denoising of the data.
2. Generation of the feature table

In [6]:
! qiime dada2 denoise-paired \
    --i-demultiplexed-seqs ../$data_dir/sequences_demux_paired.qza \
    --p-trunc-len-f 223 \
    --p-trunc-len-r 165 \
    --p-n-threads 3 \
    --o-table ../$data_dir/PJNB_dada2_table_.qza \
    --o-representative-sequences ../$data_dir/PJNB_dada2_rep_set.qza \
    --o-denoising-stats ../$data_dir/PJNB_dada2_stats.qza

[32mSaved FeatureTable[Frequency] to: data//PJNB_dada2_table_.qza[0m
[32mSaved FeatureData[Sequence] to: data//PJNB_dada2_rep_set.qza[0m
[32mSaved SampleData[DADA2Stats] to: data//PJNB_dada2_stats.qza[0m
[0m

In [7]:
#Statistics of denoising
! qiime metadata tabulate \
    --m-input-file ../$data_dir/PJNB_dada2_stats.qza \
    --o-visualization ../$data_dir/PJNB_dada2_stats.qzv

[32mSaved Visualization to: data//PJNB_dada2_stats.qzv[0m
[0m

In [8]:
Visualization.load(f'../{data_dir}/PJNB_dada2_stats.qzv')

In [9]:
#Feature table visualization
! qiime feature-table summarize \
    --i-table ../$data_dir/PJNB_dada2_table_.qza \
    --m-sample-metadata-file ../$data_dir/metadata.tsv \
    --o-visualization ../$data_dir/PJNB_dada2_table.qzv

[32mSaved Visualization to: data//PJNB_dada2_table.qzv[0m
[0m

In [10]:
Visualization.load(f'../{data_dir}/PJNB_dada2_table.qzv')