# Taxonomy Classification


In [1]:
import os
import pandas as pd
from qiime2 import Visualization
import matplotlib.pyplot as plt
import numpy as np

import qiime2 as q2

%matplotlib inline

In [2]:
data_dir = '../data'

if not os.path.isdir(data_dir):
    os.makedirs(data_dir)

### 1. Training taxonomy classifier

We tried to use the silva database and train the classifers using naive bayes, but it exceeded the memory capacity by training the classifer and also by using the pre-trained Silva database, so we decided to use the pre-trained Greengene classifier. 

In [3]:
#! qiime feature-classifier fit-classifier-naive-bayes \
#     --i-reference-reads $data_dir/silva-138-ssu-nr99-seqs-515f-806r-uniq.qza \
#     --i-reference-taxonomy $data_dir/silva-138-ssu-nr99-tax-515f-806r-derep-uniq.qza \
#     --p-classify--chunk-size 1000 \
#     --o-classifier $data_dir/515f-806r-classifier.qza

In [5]:
! wget -nv -O $data_dir/taxonomy/gg-13-8-99-nb-classifier.qza 'https://data.qiime2.org/2022.8/common/gg-13-8-99-nb-classifier.qza'

2022-12-16 20:27:35 URL:https://s3-us-west-2.amazonaws.com/qiime2-data/2022.8/common/gg-13-8-99-nb-classifier.qza [104512483/104512483] -> "../data/taxonomy/gg-13-8-99-nb-classifier.qza" [1]


### 2. Taxonomy assignment

We do taxonomy classification using the pre-trained classifier we downloaded above.

In [6]:
! qiime feature-classifier classify-sklearn \
    --i-classifier $data_dir/taxonomy/gg-13-8-99-nb-classifier.qza \
    --i-reads $data_dir/denoising/dada2_rep_set.qza \
    --o-classification $data_dir/taxonomy/taxonomy.qza

[32mSaved FeatureData[Taxonomy] to: ../data/taxonomy/taxonomy.qza[0m
[0m

### 3. Taxonomy visualization

Visualize the taxonomy classifications and taxonomy classifications based on metadata

In [7]:
! qiime metadata tabulate \
    --m-input-file $data_dir/taxonomy/taxonomy.qza \
    --o-visualization $data_dir/taxonomy/taxonomy.qzv

[32mSaved Visualization to: ../data/taxonomy/taxonomy.qzv[0m
[0m

In [8]:
Visualization.load(f'{data_dir}/taxonomy/taxonomy.qzv')

In [9]:
! qiime taxa barplot \
    --i-table $data_dir/denoising/dada2_table.qza \
    --i-taxonomy $data_dir/taxonomy/taxonomy.qza \
    --m-metadata-file $data_dir/metadata/sample_metadata.tsv \
    --o-visualization $data_dir/taxonomy/taxa-bar-plots.qzv

[32mSaved Visualization to: ../data/taxonomy/taxa-bar-plots.qzv[0m
[0m

In [10]:
Visualization.load(f'{data_dir}/taxonomy/taxa-bar-plots.qzv')

Mitochondria and chloroplast are filtered since they don't belong to the gut microbiota communities:

In [11]:
! qiime taxa filter-table \
    --i-table $data_dir/denoising/dada2_table.qza \
    --i-taxonomy $data_dir/taxonomy/taxonomy.qza \
    --p-exclude mitochondria,chloroplast \
    --o-filtered-table $data_dir/taxonomy/table-filtered.qza

[32mSaved FeatureTable[Frequency] to: ../data/taxonomy/table-filtered.qza[0m
[0m

In [12]:
! qiime taxa filter-seqs \
    --i-sequences $data_dir/denoising/dada2_rep_set.qza \
    --i-taxonomy $data_dir/taxonomy/taxonomy.qza \
    --p-exclude mitochondria,chloroplast \
    --o-filtered-sequences $data_dir/taxonomy/taxonomy-filtered.qza

[32mSaved FeatureData[Sequence] to: ../data/taxonomy/taxonomy-filtered.qza[0m
[0m

We visualize the taxonomies again. We observe that the number of detected taxonomies is decreased as a result of filtering.

In [13]:
! qiime taxa barplot \
    --i-table $data_dir/taxonomy/table-filtered.qza \
    --i-taxonomy $data_dir/taxonomy/taxonomy.qza \
    --m-metadata-file $data_dir/metadata/sample_metadata.tsv \
    --o-visualization $data_dir/taxonomy/taxa-bar-plots_filtered.qzv

[32mSaved Visualization to: ../data/taxonomy/taxa-bar-plots_filtered.qzv[0m
[0m

In [14]:
Visualization.load(f'{data_dir}/taxonomy/taxa-bar-plots_filtered.qzv')

### 4. Taxonomy Analysis

First, seperate the sample data based on whether they are abducted, and analyze the taxonomy classifications seperately for abducted and not abducted patients:

In [20]:
features = q2.Artifact.load(f'{data_dir}/taxonomy/table-filtered.qza')
features_df = features.view(pd.DataFrame)

df_meta = pd.read_csv(f'{data_dir}/metadata/str_metadata.tsv', sep='\t')
locations = df_meta[df_meta['alleged_abduction'] == 'abducted']['sampleid']

features_abducted = features_df.loc[locations]

features_abducted_artifact = q2.Artifact.import_data("FeatureTable[Frequency]", features_abducted)

features_abducted_artifact.save(f'{data_dir}/taxonomy/table-filtered_abducted.qza')

'../data/taxonomy/table-filtered_abducted.qza'

In [24]:
locations = df_meta[df_meta['alleged_abduction'] == 'non_abducted']['sampleid']

features_not_abducted = features_df.loc[locations]

features_not_abducted_artifact = q2.Artifact.import_data("FeatureTable[Frequency]", features_not_abducted)

features_not_abducted_artifact.save(f'{data_dir}/taxonomy/table-filtered_not_abducted.qza')

'../data/taxonomy/table-filtered_not_abducted.qza'

Taxonomy visualization:

Based on the below two visualizations, we conclude that the taxonomic classifications is the same across the abducted and non-abducted patients

In [21]:
## abducted patients
! qiime taxa barplot \
    --i-table $data_dir/taxonomy/table-filtered_abducted.qza \
    --i-taxonomy $data_dir/taxonomy/taxonomy.qza \
    --m-metadata-file $data_dir/metadata/str_metadata.tsv \
    --o-visualization $data_dir/taxonomy/taxa-bar-plots_filtered_abducted.qzv

[32mSaved Visualization to: ../data/taxonomy/taxa-bar-plots_filtered_abducted.qzv[0m
[0m

In [22]:
Visualization.load(f'{data_dir}/taxonomy/taxa-bar-plots_filtered_abducted.qzv')

In [25]:
## not abducted patients
! qiime taxa barplot \
    --i-table $data_dir/taxonomy/table-filtered_not_abducted.qza \
    --i-taxonomy $data_dir/taxonomy/taxonomy.qza \
    --m-metadata-file $data_dir/metadata/str_metadata.tsv \
    --o-visualization $data_dir/taxonomy/taxa-bar-plots_filtered_not_abducted.qzv

[32mSaved Visualization to: ../data/taxonomy/taxa-bar-plots_filtered_not_abducted.qzv[0m
[0m

In [27]:
Visualization.load(f'{data_dir}/taxonomy/taxa-bar-plots_filtered_not_abducted.qzv')