# Qiime2 Pipeline (Version 2019.10)

## Import Sequences and create quality-score plot visualizations

In [1]:
!qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path ../WB2019-seqs/ \
  --input-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path WB2019-seqs.qza

[32mImported ../WB2019-seqs/ as CasavaOneEightSingleLanePerSampleDirFmt to WB2019-seqs.qza[0m


In [2]:
!qiime demux summarize \
--i-data WB2019-seqs.qza \
--o-visualization WB2019-seqs.qzv

#!qiime tools view WB161916S-seqs.qzv

[32mSaved Visualization to: WB2019-seqs.qzv[0m


In [20]:
!qiime tools view WB2019-seqs.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.
Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

## Quality control using DADA2 and generate OTU table
#### Each dataset went through trimming and truncating optimization to allow for best sequence retention
#### Noisy sequences were trimmed off the beginning and most datasets were truncated where sequences consistantly dropped under a qc-score of 35

In [3]:
# Try with more stringent parameters:

!qiime dada2 denoise-paired --verbose \
  --i-demultiplexed-seqs WB2019-seqs.qza \
  --o-table WB2019-table \
  --o-representative-sequences WB2019-rep-seqs \
  --o-denoising-stats WB2019-stats \
  --p-n-threads 0 \
  --p-trim-left-f 5 \
  --p-trim-left-r 5 \
  --p-trunc-len-f 230 \
  --p-trunc-len-r 230 
 

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmpk09loyu_/forward /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmpk09loyu_/reverse /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmpk09loyu_/output.tsv.biom /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmpk09loyu_/track.tsv /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmpk09loyu_/filt_f /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmpk09loyu_/filt_r 230 230 5 5 2.0 2.0 2 consensus 1.0 0 1000000

R version 3.5.1 (2018-07-02) 
Loading required package: Rcpp
DADA2: 1.10.0 / Rcpp: 1.0.2 / RcppParallel: 4.4.4 
1) Filtering ...............................................................................
2) Learning Error Rates
228492675 total bases in 1015523 reads fr

In [4]:
# 

!qiime dada2 denoise-paired --verbose \
  --i-demultiplexed-seqs WB2019-seqs.qza \
  --o-table WB2019-table2 \
  --o-representative-sequences WB2019-rep-seqs2 \
  --o-denoising-stats WB2019-stats2 \
  --p-n-threads 0 \
  --p-trim-left-f 12 \
  --p-trim-left-r 12 \
  --p-trunc-len-f 240 \
  --p-trunc-len-r 230

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmpaux0omde/forward /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmpaux0omde/reverse /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmpaux0omde/output.tsv.biom /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmpaux0omde/track.tsv /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmpaux0omde/filt_f /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmpaux0omde/filt_r 240 230 12 12 2.0 2.0 2 consensus 1.0 0 1000000

R version 3.5.1 (2018-07-02) 
Loading required package: Rcpp
DADA2: 1.10.0 / Rcpp: 1.0.2 / RcppParallel: 4.4.4 
1) Filtering ...............................................................................
2) Learning Error Rates
229670784 total bases in 1007328 reads 

In [10]:
# Blend two previous parameters, change MaxEE to 3

#changing MaxEE resulted in an increase of frequency of ~1000 per sample. Good or no? 

!qiime dada2 denoise-paired --verbose \
  --i-demultiplexed-seqs WB2019-seqs.qza \
  --o-table WB2019-table3 \
  --o-representative-sequences WB2019-rep-seqs3 \
  --o-denoising-stats WB2019-stats3 \
  --p-n-threads 0 \
  --p-trim-left-f 12 \
  --p-trim-left-r 12 \
  --p-trunc-len-f 230 \
  --p-trunc-len-r 230 \
  --p-max-ee-f 3 \
  --p-max-ee-r 3 

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmplxgv7gk2/forward /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmplxgv7gk2/reverse /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmplxgv7gk2/output.tsv.biom /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmplxgv7gk2/track.tsv /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmplxgv7gk2/filt_f /var/folders/0f/ff5j1xns5jbgs40zs1d96b2w0000gn/T/tmplxgv7gk2/filt_r 230 230 12 12 3.0 3.0 2 consensus 1.0 0 1000000

R version 3.5.1 (2018-07-02) 
Loading required package: Rcpp
DADA2: 1.10.0 / Rcpp: 1.0.2 / RcppParallel: 4.4.4 
1) Filtering ...............................................................................
2) Learning Error Rates
230745806 total bases in 1058467 reads 

In [None]:
!qiime dada2 denoise-paired --verbose \
  --i-demultiplexed-seqs WB2019-seqs.qza \
  --o-table WB2019-table4 \
  --o-representative-sequences WB2019-rep-seqs4 \
  --o-denoising-stats WB2019-stats4 \
  --p-n-threads 0 \
  --p-trim-left-f 12 \
  --p-trim-left-r 12 \
  --p-trunc-len-f 230 \
  --p-trunc-len-r 230 \ 

In [3]:
!qiime tools view WB2019-table.qzv

#Number of samples	79
#Number of features	21,560
#Total frequency	3,327,999

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.
Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

In [4]:
!qiime tools view WB2019-table2.qzv

#Number of samples	79
#Number of features	21,417
#Total frequency	3,353,221

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.
Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

In [5]:
!qiime tools view WB2019-table3.qzv

#Number of samples	79
#Number of features	21,493
#Total frequency	3,647,664

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.
Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

In [6]:
!qiime tools view WB2019-table4.qzv

#Number of samples	79
#Number of features	21,545
#Total frequency	3,385,150

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.
Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

In [None]:
#Proceed with option 4

In [5]:
# Create visualizations for OTU table and representative sequences

# Create visualizations for OTU table and representative sequences
!qiime feature-table summarize \
--i-table WB2019-table.qza \
--o-visualization WB2019-table.qzv 

!qiime feature-table tabulate-seqs \
--i-data WB2019-rep-seqs.qza \
--o-visualization WB2019-rep-seqs.qzv

!qiime feature-table summarize \
--i-table WB2019-table2.qza \
--o-visualization WB2019-table2.qzv 

!qiime feature-table tabulate-seqs \
--i-data WB2019-rep-seqs2.qza \
--o-visualization WB2019-rep-seqs2.qzv

!qiime feature-table summarize \
--i-table WB2019-table3.qza \
--o-visualization WB2019-table3.qzv 

!qiime feature-table tabulate-seqs \
--i-data WB2019-rep-seqs3.qza \
--o-visualization WB2019-rep-seqs3.qzv

!qiime feature-table summarize \
--i-table WB2019-table4.qza \
--o-visualization WB2019-table4.qzv 

!qiime feature-table tabulate-seqs \
--i-data WB2019-rep-seqs4.qza \
--o-visualization WB2019-rep-seqs4.qzv

# View visualizations for OTU table and representative sequences 
#!qiime tools view WB161916S-table.qzv
#!qiime tools view WB161916S-rep-seqs.qzv

[32mSaved Visualization to: WB2019-table2.qzv[0m
[32mSaved Visualization to: WB2019-rep-seqs2.qzv[0m


# Taxonomy assignment

Download qiime 138 silva classifiers made by Mike Robeson on dropbox: https://www.dropbox.com/sh/nz7c5asn6b3hr1j/AAA9zGchu8Ya2Z93g6H7Xk65a?dl=0 (version 0.01, not sure what the differences are)

### Classify representative sequences

In [7]:
## This step sometimes freezes up

!qiime feature-classifier classify-sklearn \
  --i-classifier silva-138-qiime/Silva-v138-515f-806r-taxonomy-classifier.qza \
  --i-reads  WB2019-rep-seqs4.qza \
  --o-classification  WB2019-taxonomy.qza

[32mSaved FeatureData[Taxonomy] to: WB2019-taxonomy.qza[0m


## Export representative sequences, taxonomy, OTU table, and metadata to create a .biom file to use for data analysis

In [None]:
# Make a new directory to export the files to
!mkdir  WB2019_OTU_table

# Export OTU table, representative sequences, and taxonomy; copy metadata (.txt)  
!qiime tools export WB161916S-table4.qza --output-dir WB2019_OTU_table
!qiime tools export WB161916S-rep-seqs.qza --output-dir WB2019_OTU_table
!qiime tools export WB161916S-taxonomy.qza --output-dir WB2019_OTU_table
!cp dai2016-metadata.txt WB2019_OTU_table/

# Check out files in directory; should have dna-sequences.fastq, taxonomy.tsv, feature-table.biom, and the metadata.txt
!ls WB2019_OTU_table/

# Add information from metadata.txt to the feature-table.biom
!biom add-metadata \
-i WB2019_OTU_table/feature-table.biom \
-o WB2019_OTU_table/feature-table-metaD.biom \
-m WB2019_OTU_table/WB161916S-table-metadata.txt

# Add taxonomy data
!biom add-metadata \
-i WB2019_OTU_table/feature-table-metaD.biom \
-o WB2019_OTU_table/feature-table-metaD-tax.biom \
--observation-metadata-fp WB2019_OTU_table/taxonomy.tsv \
--sc-separated taxonomy \
--observation-header OTUID,taxonomy

# Check your work by creating a summary text file - view summary to make sure information was saved to .biom
!biom summarize-table \
-i WB2019_OTU_table/feature-table-metaD-tax.biom \
-o WB2019_OTU_table/feature-table-metaD-tax-summary.txt

!head -20 WB2019_OTU_table/feature-table-metaD-tax-summary.txt

# convert the .biom to json format to work with phyloseq package
!biom convert \
-i WB2019_OTU_table/feature-table-metaD-tax.biom \
-o WB2019_OTU_table/feature-table-metaD-tax_json.biom \
--table-type="OTU table" \
--to-json