In [1]:
import os
import pandas as pd
from qiime2 import Visualization
import matplotlib.pyplot as plt
import numpy as np

import qiime2 as q2

%matplotlib inline

data_dir = 'poop_data/Taxonomy'

**Training classifier:** 

the forward and reverse primers used in this experiment:

    FWD: GTGYCAGCMGCCGCGGTAA
    REV: GGACTACNVGGGTWTCTAAT


Sadly, all silva classifiers are too large to compute. Only greengenes classifiers can be run. Greengenes 515f weighted classifier chosen. Using 515f 806r is okay because our primers align with the same base/place in sequence, checked in blast against 16s e. coli.

In [None]:
! wget -nv -O $data_dir/weighted-greengenes-515f-806r-classifier.qza https://data.qiime2.org/2022.8/common/gg-13-8-99-515-806-nb-weighted-classifier.qza

**Assigning Taxonomy** greengenes is okay or should we use better one? who can run this on their computer?

In [None]:
! qiime feature-classifier classify-sklearn \
    --i-classifier $data_dir/weighted-greengenes-515f-806r-classifier.qza \
    --i-reads $'poop_data/Denoising'/dada2_rep_set.qza \
    --o-classification $data_dir/taxonomy_new.qza

**Visualization**

In [None]:
! qiime metadata tabulate \
    --m-input-file $data_dir/taxonomy_new.qza \
    --o-visualization $data_dir/taxonomy_new.qzv

In [None]:
Visualization.load(f'{data_dir}/taxonomy_new.qzv')

**filtering out mitochondria and chloroplasts**

In [None]:
! qiime taxa filter-table \
    --i-table $'poop_data/Denoising'/dada2_table.qza \
    --i-taxonomy $data_dir/taxonomy_new.qza \
    --p-exclude mitochondria,chloroplast \
    --o-filtered-table $data_dir/table-filtered_new.qza

! qiime taxa filter-seqs \
    --i-sequences $'poop_data/Denoising'/dada2_rep_set.qza \
    --i-taxonomy $data_dir/taxonomy_new.qza \
    --p-exclude mitochondria \
    --o-filtered-sequences $data_dir/rep-seqs-filtered_new.qza

#removes 10 ASVs with new tax

In [None]:
! qiime metadata tabulate \
    --m-input-file $data_dir/rep-seqs-filtered_new.qza \
    --o-visualization $data_dir/rep-seqs-filtered_new.qzv

In [None]:
#vis of filtered sequences. don't have blast links, not the same table
Visualization.load(f'{data_dir}/rep-seqs-filtered_new.qzv')

In [None]:
! qiime metadata tabulate \
    --m-input-file $data_dir/table-filtered_new.qza \
    --o-visualization $data_dir/table-filtered_new.qzv

In [None]:
Visualization.load(f'{data_dir}/table-filtered_new.qzv')

In [None]:
! qiime taxa barplot \
    --i-table $data_dir/table-filtered_new.qza \
    --i-taxonomy $data_dir/taxonomy_new.qza \
    --m-metadata-file $'poop_data'/metadata.tsv \
    --o-visualization $data_dir/table-filtered_new_barplot.qzv

In [None]:
Visualization.load(f'{data_dir}/table-filtered_new_barplot.qzv')

**putting it in Panda**

In [None]:
pd.set_option('max_colwidth', 150)

In [None]:
# note: QIIME 2 artifact files can be loaded as python objects! This is how.
taxa = q2.Artifact.load(f'{data_dir}/taxonomy_new.qza')
# view as a `pandas.DataFrame`. Note: Only some Artifact types can be transformed to DataFrames
taxa = taxa.view(pd.DataFrame)

In [None]:
taxa.head()