# Taxonomy full lenght


### NOTE: THIS FILE COMPAIRES THE FULL LENGTH CLASSIFIER VS THE 515f/806r CLASSIFIER FROM GREENGENES. 

**RESULTS:** When comparing the taxonomic classification of some feature IDs (classificatio with 515f/806r vs full-length) we discovered that the 515f-806r-classifier classified them to a slightly deeper taxonomic level. (SEE FIGURE SCREENSHOTS in project pdf) A more precise classification is beneficial to further analysis and since it was reported that species-level classification performance of 16S rRNA gene simulated reads had a slightly lower accuracy in full-length sequences than in V1–3 and V4 subdomains we decided to continue with the 515f-806r-classifier from Greengenes.

feature ID example: 008e3cd88aac471b14f17c6ebd6dcff1	

| Classifier   |      Taxonomic classification      |  Confidence |
|----------|:-------------:|------:|
| full-length |  k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales | 0.9977607741262936 |
| 515f-806r |    k__Bacteria; p__Actinobacteria; c__Actinobacteria; o__Actinomycetales; f__Microbacteriaceae; g__Pseudoclavibacter; s__bifida   | 0.8394070235862647   |


515f-806r-classifier was able to assign taxonomy to species level whereas full-length classifier stopped already at order. Nevertheless, results should be interpreted with caution as also the confidence is lower with the 515f-806r-classifier. 


In [1]:
import os
import pandas as pd
from qiime2 import Visualization
import matplotlib.pyplot as plt
import numpy as np

import qiime2 as q2

%matplotlib inline
data_dir = 'CE'

In [2]:
#fetch the pre-trained classifier 
! wget -nv -O $data_dir/gg-13-8-99-nb-classifier.qza https://data.qiime2.org/2022.8/common/gg-13-8-99-nb-classifier.qza


2022-12-16 21:33:55 URL:https://s3-us-west-2.amazonaws.com/qiime2-data/2022.8/common/gg-13-8-99-nb-classifier.qza [104512483/104512483] -> "CE/gg-13-8-99-nb-classifier.qza" [1]


In [3]:
! qiime feature-classifier classify-sklearn \
    --i-classifier $data_dir/gg-13-8-99-nb-classifier.qza \
    --i-reads $data_dir/dada2_rep_set.qza \
    --o-classification $data_dir/taxonomy_full_length.qza

[32mSaved FeatureData[Taxonomy] to: CE/taxonomy_full_length.qza[0m
[0m

In [4]:
! qiime tools peek $data_dir/taxonomy_full_length.qza

[32mUUID[0m:        d30b0164-b887-402d-943e-23fd4aa3ddbf
[32mType[0m:        FeatureData[Taxonomy]
[32mData format[0m: TSVTaxonomyDirectoryFormat


visualizations:

In [5]:
! qiime metadata tabulate \
    --m-input-file $data_dir/taxonomy_full_length.qza \
    --o-visualization $data_dir/taxonomy_full_length.qzv

[32mSaved Visualization to: CE/taxonomy_full_length.qzv[0m
[0m

In [6]:
Visualization.load(f'{data_dir}/taxonomy_full_length.qzv')

In [7]:
#here we use the feature table artifact which was previously aligned with the metadata [see taxonomy.ipynb]
#filter feature table and exclude mitochondria,chloroplast
! qiime taxa filter-table \
--i-table $data_dir/dada2_table_aligned.qza \
--i-taxonomy $data_dir/taxonomy_full_length.qza \
--p-exclude mitochondria,chloroplast \
--o-filtered-table $data_dir/dada2_table_align_filtered_full_length.qza

[32mSaved FeatureTable[Frequency] to: CE/dada2_table_align_filtered_full_length.qza[0m
[0m

In [8]:
#filter sequences and exclude mitochondria,chloroplast
! qiime taxa filter-seqs \
--i-sequences $data_dir/dada2_rep_set.qza \
--i-taxonomy $data_dir/taxonomy_full_length.qza \
--p-exclude mitochondria,chloroplast \
--o-filtered-sequences $data_dir/dada2_rep_set_filtered_full_length.qza

[32mSaved FeatureData[Sequence] to: CE/dada2_rep_set_filtered_full_length.qza[0m
[0m

In [9]:
#this is the new barplot with the filtered feature table and sequences NO MITOCHONDRIA AND CHLOOROPLAST VISIBLE
! qiime taxa barplot \
--i-table $data_dir/dada2_table_align_filtered_full_length.qza \
--i-taxonomy $data_dir/taxonomy_full_length.qza \
--m-metadata-file $data_dir/food-metadata.tsv \
--o-visualization $data_dir/taxa-bar-plots_full_length_filtered.qzv

[32mSaved Visualization to: CE/taxa-bar-plots_full_length_filtered.qzv[0m
[0m

In [10]:
Visualization.load(f'{data_dir}/taxa-bar-plots_full_length_filtered.qzv')