# Taxonomy classification

# 1. Download and construct reference database 


## 1.1 Download reference database

In [1]:
! qiime rescript get-silva-data \
    --p-version '138' \
    --p-target 'SSURef_NR99' \
    --p-include-species-labels \
    --o-silva-sequences Data/3-silva-138-ssu-nr99-seqs.qza \
    --o-silva-taxonomy Data/3-silva-138-ssu-nr99-tax.qza

[32mSaved FeatureData[RNASequence] to: Data/3-silva-138-ssu-nr99-seqs.qza[0m
[32mSaved FeatureData[Taxonomy] to: Data/3-silva-138-ssu-nr99-tax.qza[0m
[0m

## 1.2 Database curation

In [1]:
! qiime rescript cull-seqs \
    --i-sequences Data/3-silva-138-ssu-nr99-seqs.qza \
    --p-num-degenerates 5 \
    --p-homopolymer-length 8 \
    --p-n-jobs 3 \
    --o-clean-sequences Data/3-silva-138-ssu-nr99-seqs-cleaned.qza

[32mSaved FeatureData[Sequence] to: Data/3-silva-138-ssu-nr99-seqs-cleaned.qza[0m
[0m

In [1]:
! qiime rescript filter-seqs-length-by-taxon \
    --i-sequences Data/3-silva-138-ssu-nr99-seqs-cleaned.qza \
    --i-taxonomy Data/3-silva-138-ssu-nr99-tax.qza \
    --p-labels Archaea Bacteria Eukaryota \
    --p-min-lens 900 1200 1400 \
    --o-filtered-seqs Data/3-silva-138-ssu-nr99-seqs-filt.qza \
    --o-discarded-seqs Data/3-silva-138-ssu-nr99-seqs-discard.qza

[32mSaved FeatureData[Sequence] to: Data/3-silva-138-ssu-nr99-seqs-filt.qza[0m
[32mSaved FeatureData[Sequence] to: Data/3-silva-138-ssu-nr99-seqs-discard.qza[0m
[0m

In [1]:
! qiime rescript dereplicate \
    --i-sequences Data/3-silva-138-ssu-nr99-seqs-filt.qza  \
    --i-taxa Data/3-silva-138-ssu-nr99-tax.qza \
    --p-rank-handles 'silva' \
    --p-mode 'uniq' \
    --p-threads 3 \
    --o-dereplicated-sequences Data/3-silva-138-ssu-nr99-seqs-derep-uniq.qza \
    --o-dereplicated-taxa Data/3-silva-138-ssu-nr99-tax-derep-uniq.qza

[32mSaved FeatureData[Sequence] to: Data/3-silva-138-ssu-nr99-seqs-derep-uniq.qza[0m
[32mSaved FeatureData[Taxonomy] to: Data/3-silva-138-ssu-nr99-tax-derep-uniq.qza[0m
[0m

## 1.3 PCR-region extraction

### 1.3.1 Bacteria (27f-338r)

In [4]:
! qiime feature-classifier extract-reads \
    --i-sequences Data/3-silva-138-ssu-nr99-seqs-derep-uniq.qza \
    --p-f-primer AGRGTTHGATYMTGGCTCAG \
    --p-r-primer GCTGCCTCCCGTAGGAGT \
    --p-n-jobs 3 \
    --p-read-orientation 'forward' \
    --o-reads Data/3-silva-138-ssu-nr99-seqs-27f-338r.qza

[32mSaved FeatureData[Sequence] to: Data/3-silva-138-ssu-nr99-seqs-27f-338r.qza[0m
[0m

In [1]:
! qiime rescript dereplicate \
    --i-sequences Data/3-silva-138-ssu-nr99-seqs-27f-338r.qza \
    --i-taxa Data/3-silva-138-ssu-nr99-tax-derep-uniq.qza \
    --p-rank-handles 'silva' \
    --p-mode 'uniq' \
    --p-threads 3 \
    --o-dereplicated-sequences Data/3-silva-138-ssu-nr99-seqs-27f-338r-uniq.qza \
    --o-dereplicated-taxa  Data/3-silva-138-ssu-nr99-tax-27f-338r-derep-uniq.qza

[32mSaved FeatureData[Sequence] to: Data/3-silva-138-ssu-nr99-seqs-27f-338r-uniq.qza[0m
[32mSaved FeatureData[Taxonomy] to: Data/3-silva-138-ssu-nr99-tax-27f-338r-derep-uniq.qza[0m
[0m

### 1.3.2 Archaea (arch349F-arch806R)

In [2]:
! qiime feature-classifier extract-reads \
    --i-sequences Data/3-silva-138-ssu-nr99-seqs-derep-uniq.qza \
    --p-f-primer GYGCASCAGKCGMGAAW \
    --p-r-primer GGACTACVSGGGTATCTAAT \
    --p-n-jobs 3 \
    --p-read-orientation 'forward' \
    --o-reads Data/3-silva-138-ssu-nr99-seqs-349f-806r.qza

[32mSaved FeatureData[Sequence] to: Data/3-silva-138-ssu-nr99-seqs-349f-806r.qza[0m
[0m

In [3]:
! qiime rescript dereplicate \
    --i-sequences Data/3-silva-138-ssu-nr99-seqs-349f-806r.qza \
    --i-taxa Data/3-silva-138-ssu-nr99-tax-derep-uniq.qza \
    --p-rank-handles 'silva' \
    --p-mode 'uniq' \
    --p-threads 3 \
    --o-dereplicated-sequences Data/3-silva-138-ssu-nr99-seqs-349f-806r-uniq.qza \
    --o-dereplicated-taxa  Data/3-silva-138-ssu-nr99-tax-349f-806r-derep-uniq.qza

[32mSaved FeatureData[Sequence] to: Data/3-silva-138-ssu-nr99-seqs-349f-806r-uniq.qza[0m
[32mSaved FeatureData[Taxonomy] to: Data/3-silva-138-ssu-nr99-tax-349f-806r-derep-uniq.qza[0m
[0m

# 2. Taxonomy assignment of bacteria 

## 2.1 Taxonomy classifier for region 27f-308r

In [4]:
! qiime feature-classifier fit-classifier-naive-bayes \
    --i-reference-reads Data/3-silva-138-ssu-nr99-seqs-27f-338r-uniq.qza \
    --i-reference-taxonomy Data/3-silva-138-ssu-nr99-tax-27f-338r-derep-uniq.qza \
    --o-classifier Data/3-classifier-27f-308r.qza

## 2.1 Taxonomy Assignment

In [6]:
! qiime feature-classifier classify-sklearn \
    --i-classifier Data/3-classifier-27f-308r.qza \
    --i-reads Data/1-rep-seqs_bac.qza \
    --o-classification Results/3-taxonomy_bac.qza \
    --p-n-jobs 3

# 3. Taxonomy assignment of archaea

## 3.1 Taxonomy classifier for region archaea349f-archaea806r

In [5]:
! qiime feature-classifier fit-classifier-naive-bayes \
    --i-reference-reads Data/3-silva-138-ssu-nr99-seqs-349f-806r-uniq.qza \
    --i-reference-taxonomy Data/3-silva-138-ssu-nr99-tax-349f-806r-derep-uniq.qza \
    --o-classifier Data/3-classifier-349f-806r.qza

Usage: [94mqiime feature-classifier fit-classifier-naive-bayes[0m 
           [OPTIONS]

  Create a scikit-learn naive_bayes classifier for reads

[1mInputs[0m:
  [94m[4m--i-reference-reads[0m ARTIFACT [32mFeatureData[Sequence][0m
                                                                    [35m[required][0m
  [94m[4m--i-reference-taxonomy[0m ARTIFACT [32mFeatureData[Taxonomy][0m
                                                                    [35m[required][0m
  [94m--i-class-weight[0m ARTIFACT [32mFeatureTable[RelativeFrequency][0m
                                                                    [35m[optional][0m
[1mParameters[0m:
  [94m--p-classify--alpha[0m NUMBER
                                                              [35m[default: 0.001][0m
  [94m--p-classify--chunk-size[0m INTEGER
                                                              [35m[default: 20000][0m
  [94m--p-classify--class-prior[0m TEXT
                    

## 2.1 Taxonomy Assignment

In [4]:
!qiime feature-classifier classify-sklearn

Usage: [94mqiime feature-classifier classify-sklearn[0m [OPTIONS]

  Classify reads by taxon using a fitted classifier.

[1mInputs[0m:
  [94m[4m--i-reads[0m ARTIFACT [32mFeatureData[Sequence][0m
                         The feature data to be classified.         [35m[required][0m
  [94m[4m--i-classifier[0m ARTIFACT
    [32mTaxonomicClassifier[0m  The taxonomic classifier for classifying the reads.
                                                                    [35m[required][0m
[1mParameters[0m:
  [94m--p-reads-per-batch[0m VALUE [32mInt % Range(1, None) | Str % Choices('auto')[0m
                         Number of reads to process in each batch. If "auto",
                         this parameter is autoscaled to min( number of query
                         sequences / [4mn-jobs[0m, 20000).         [35m[default: 'auto'][0m
  [94m--p-n-jobs[0m INTEGER     The maximum number of concurrently worker processes.
                         If -1 all CPUs are u