# Create ASV Frequency Tables and Annotations:


## Steps

1. Import reads downloaded from GenBank into three qiime2 artifacts (one per Illumina run).
2. Remove reads without a primer sequence in the 5' and remove primer in remaining reads (cutadapt) for each run.
3. Denoise and find unique reads (ASVs), and calculate ASV counts per sample (dada2), for each run.
4. Merge the results from the 3 Illumina runs into a single ASV table. Resulting in:
    1. Artifact with ASV frequency table
    2. Artifact with the sequence of each ASV
5. Create tree of the ASVs.
6. Assign Taxonomy to each ASV. Download Silva database and assign taxonomy.


## 1. Import Reads

The list of sequences is in the `manifestX.txt` files in the `data` directory. These files were created by the previous notebook [01-DataDownload.ipynb](01-DataDownload.ipynb) when it downloaded and renamed the GenBank files.

In [None]:
!mkdir 01-Intermediate

!qiime tools import \
  --type 'SampleData[SequencesWithQuality]' \
  --input-path maps/manifest1.txt \
  --output-path 01-Intermediate/demultiplexed-seqs1.qza \
  --input-format SingleEndFastqManifestPhred33V2

!qiime tools import \
  --type 'SampleData[SequencesWithQuality]' \
  --input-path maps/manifest2.txt \
  --output-path 01-Intermediate/demultiplexed-seqs2.qza \
  --input-format SingleEndFastqManifestPhred33V2

!qiime tools import \
  --type 'SampleData[SequencesWithQuality]' \
  --input-path maps/manifest3.txt \
  --output-path 01-Intermediate/demultiplexed-seqs3.qza \
  --input-format SingleEndFastqManifestPhred33V2

!qiime demux summarize \
  --i-data 01-Intermediate/demultiplexed-seqs1.qza \
  --p-n 100000 \
  --verbose \
  --o-visualization 01-Intermediate/demux_seqs1.qzv

!qiime demux summarize \
  --i-data 01-Intermediate/demultiplexed-seqs2.qza \
  --p-n 100000 \
  --verbose \
  --o-visualization 01-Intermediate/demux_seqs2.qzv

!qiime demux summarize \
  --i-data 01-Intermediate/demultiplexed-seqs3.qza \
  --p-n 100000 \
  --verbose \
  --o-visualization 01-Intermediate/demux_seqs3.qzv

## 2. Remove reads without primer and trim primer

In [None]:
!qiime cutadapt trim-single \
  --i-demultiplexed-sequences 01-Intermediate/demultiplexed-seqs1.qza \
  --p-cores 12 \
  --p-front AGRGTTTGATCMTGGCTCAG \
  --p-discard-untrimmed \
  --o-trimmed-sequences 01-Intermediate/trimmed-seqs1.qza \
  --verbose > 01-Intermediate/trim1.log

!qiime cutadapt trim-single \
  --i-demultiplexed-sequences 01-Intermediate/demultiplexed-seqs2.qza \
  --p-cores 12 \
  --p-front AGRGTTTGATCMTGGCTCAG \
  --p-discard-untrimmed \
  --o-trimmed-sequences 01-Intermediate/trimmed-seqs2.qza \
  --verbose > 01-Intermediate/trim2.log

!qiime cutadapt trim-single \
  --i-demultiplexed-sequences 01-Intermediate/demultiplexed-seqs3.qza \
  --p-cores 12 \
  --p-front AGRGTTTGATCMTGGCTCAG \
  --p-discard-untrimmed \
  --o-trimmed-sequences 01-Intermediate/trimmed-seqs3.qza \
  --verbose > 01-Intermediate/trim3.log

!qiime demux summarize \
  --i-data 01-Intermediate/trimmed-seqs1.qza \
  --p-n 100000 \
  --verbose \
  --o-visualization 01-Intermediate/trimed_seqs1.qzv

!qiime demux summarize \
  --i-data 01-Intermediate/trimmed-seqs2.qza \
  --p-n 100000 \
  --verbose \
  --o-visualization 01-Intermediate/trimed_seqs2.qzv

!qiime demux summarize \
  --i-data 01-Intermediate/trimmed-seqs3.qza \
  --p-n 100000 \
  --verbose \
  --o-visualization 01-Intermediate/trimed_seqs3.qzv

## 3. Denoise and find unique ASVs

In [None]:
!qiime dada2 denoise-single \
  --i-demultiplexed-seqs 01-Intermediate/trimmed-seqs1.qza \
  --p-trim-left 0 \
  --p-trunc-len 220 \
  --p-n-threads 12 \
  --o-representative-sequences 01-Intermediate/rep-seqs-dada1.qza \
  --o-table 01-Intermediate/table-dada1.qza \
  --o-denoising-stats 01-Intermediate/stats-dada1.qza

!qiime metadata tabulate \
  --m-input-file 01-Intermediate/stats-dada1.qza \
  --o-visualization 01-Intermediate/stats-dada1.qzv

!qiime feature-table summarize \
  --i-table 01-Intermediate/table-dada1.qza \
  --o-visualization 01-Intermediate/table-dada1.qzv \
  --m-sample-metadata-file maps/map1.txt

In [None]:
!qiime dada2 denoise-single \
  --i-demultiplexed-seqs 01-Intermediate/trimmed-seqs2.qza \
  --p-trim-left 0 \
  --p-trunc-len 220 \
  --p-n-threads 12 \
  --o-representative-sequences 01-Intermediate/rep-seqs-dada2.qza \
  --o-table 01-Intermediate/table-dada2.qza \
  --o-denoising-stats 01-Intermediate/stats-dada2.qza

!qiime metadata tabulate \
  --m-input-file 01-Intermediate/stats-dada2.qza \
  --o-visualization 01-Intermediate/stats-dada2.qzv

!qiime feature-table summarize \
  --i-table 01-Intermediate/table-dada2.qza \
  --o-visualization 01-Intermediate/table-dada2.qzv \
  --m-sample-metadata-file maps/map2.txt

In [None]:
!qiime dada2 denoise-single \
  --i-demultiplexed-seqs 01-Intermediate/trimmed-seqs3.qza \
  --p-trim-left 0 \
  --p-trunc-len 220 \
  --p-n-threads 12 \
  --o-representative-sequences 01-Intermediate/rep-seqs-dada3.qza \
  --o-table 01-Intermediate/table-dada3.qza \
  --o-denoising-stats 01-Intermediate/stats-dada3.qza

!qiime metadata tabulate \
  --m-input-file 01-Intermediate/stats-dada3.qza \
  --o-visualization 01-Intermediate/stats-dada3.qzv

!qiime feature-table summarize \
  --i-table 01-Intermediate/table-dada3.qza \
  --o-visualization 01-Intermediate/table-dada3.qzv \
  --m-sample-metadata-file maps/map3.txt

## 4. Merge denoised data

In [None]:
!mkdir 02-QiimeResults

!qiime feature-table merge \
  --i-tables 01-Intermediate/table-dada1.qza \
  --i-tables 01-Intermediate/table-dada2.qza \
  --i-tables 01-Intermediate/table-dada3.qza \
  --o-merged-table 02-QiimeResults/table-dada.qza

!qiime feature-table merge-seqs \
  --i-data 01-Intermediate/rep-seqs-dada1.qza \
  --i-data 01-Intermediate/rep-seqs-dada2.qza \
  --i-data 01-Intermediate/rep-seqs-dada3.qza \
  --o-merged-data 02-QiimeResults/rep-seqs-dada.qza

!qiime feature-table summarize \
  --i-table 02-QiimeResults/table-dada.qza \
  --o-visualization 02-QiimeResults/table-dada.qzv \
  --m-sample-metadata-file maps/map.txt

## 5. Create tree of ASVs

In [None]:
!qiime phylogeny align-to-tree-mafft-fasttree \
  --i-sequences 02-QiimeResults/rep-seqs-dada.qza \
  --o-alignment 02-QiimeResults/aligned-rep-seqs.qza \
  --o-masked-alignment 02-QiimeResults/masked-aligned-rep-seqs.qza \
  --o-tree 02-QiimeResults/unrooted-tree.qza \
  --o-rooted-tree 02-QiimeResults/rooted-tree.qza

## 6. Assign Taxonomy to ASVs

### Download Silva Database

In [None]:
!mkdir silva

!wget -O silva/silva-138-99-seqs.qza https://data.qiime2.org/2020.11/common/silva-138-99-seqs.qza
!wget -O silva/silva-138-99-tax.qza https://data.qiime2.org/2020.11/common/silva-138-99-tax.qza    

### Assign Taxonomy

In [None]:
!qiime feature-classifier classify-consensus-vsearch \
  --i-query 02-QiimeResults/rep-seqs-dada.qza \
  --i-reference-reads silva/silva-138-99-seqs.qza \
  --i-reference-taxonomy silva/silva-138-99-tax.qza \
  --p-threads 10 \
  --o-classification 02-QiimeResults/taxonomy-vsearch.qza \
  --verbose

!qiime metadata tabulate \
  --m-input-file 02-QiimeResults/taxonomy-vsearch.qza \
  --o-visualization 02-QiimeResults/taxonomy-vsearch.qzv