# QIIME 2 enables comprehensive end-to-end analysis of diverse microbiome data and comparative studies with publicly available data

this is a QIIME 2 Artifact CLI notebook which replicated analyses in the QIIME 2 protocol

**environment:** qiime2-2020.2

Use this version of [q2-classo](https://github.com/Leo-Simpson/q2-classo/tree/edbf9d20fed931cbc1cff77f5634bb0dcd783c8a) 

In [1]:
pip install --upgrade c-lasso

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install c-lasso

Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install zarr

Note: you may need to restart the kernel to use updated packages.


In [10]:
pip install plotly

Note: you may need to restart the kernel to use updated packages.


In [2]:
cd ..

/data


In [3]:
!python setup.py install

running install
running bdist_egg
running egg_info
writing q2_classo.egg-info/PKG-INFO
writing dependency_links to q2_classo.egg-info/dependency_links.txt
writing entry points to q2_classo.egg-info/entry_points.txt
writing top-level names to q2_classo.egg-info/top_level.txt
reading manifest file 'q2_classo.egg-info/SOURCES.txt'
writing manifest file 'q2_classo.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
copying q2_classo/_summarize/_visualizer.py -> build/lib/q2_classo/_summarize
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/q2_classo
copying build/lib/q2_classo/_dict.py -> build/bdist.linux-x86_64/egg/q2_classo
copying build/lib/q2_classo/_tree.py -> build/bdist.linux-x86_64/egg/q2_classo
creating build/bdist.linux-x86_64/egg/q2_classo/_summarize
copying build/lib/q2_classo/_summarize/_visualizer.py -> build/bdist.linux-x86_64/egg/q2

In [4]:
!pip install -e .

Obtaining file:///data
Installing collected packages: q2-classo
  Attempting uninstall: q2-classo
    Found existing installation: q2-classo 0.0.0.dev0
    Uninstalling q2-classo-0.0.0.dev0:
      Successfully uninstalled q2-classo-0.0.0.dev0
  Running setup.py develop for q2-classo
Successfully installed q2-classo


In [5]:
!qiime dev refresh-cache

[33mQIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.[0m


In [6]:
cd example/data/atacama/

/data/example/data/atacama


In [13]:
import plotly as plt
plt.__version__

'5.17.0'

In [16]:
ls

atacama-sample-metadata.tsv  genus_table_clr.qza      training-targets.qza
atacama-table.qza            regress-predictions.qza  wcovariates.qza
ccovariates.qza              regress-xtest.qza        wtaxa.qza
classification.qza           regress-xtraining.qza    xclr.qza
classify-xtest.qza           regresstaxa.qza          xcovariates.qza
classify-xtraining.qza       regresstaxa_1.qzv        xtaxa.qza
filtered-table.qza           table-summary.qzv
genus_table.qza              test-targets.qza


In [16]:
import plotly.express as px
import plotly.io as pio
from plotly import graph_objects, express, offline

# Create a sample plot
fig = px.scatter(x=[1, 2, 3], y=[4, 5, 6])

offline.plot(fig, filename="test.html", auto_open=False, image='svg')


'test.html'

In [17]:
ls

Dockerfile  [0m[01;34mbuild[0m/  [01;34mexample[0m/    [01;34mq2_classo.egg-info[0m/  setup.py    test.html
README.md   [01;34mdist[0m/   [01;34mq2_classo[0m/  requirements.txt     test..html  [01;34mtutorial[0m/


## Filter features

In [17]:
# !qiime feature-table filter-features \
#   --i-table atacama-table.qza \
#   --p-min-samples 53 \
#   --o-filtered-table filtered-table.qza

[32mSaved FeatureTable[Frequency] to: filtered-table.qza[0m
[0m

## log-contrast and taxa processing

Either collapse at genus level, which is the 'easy way', but not really what we want

In [11]:
!qiime taxa collapse \
  --i-table atacama-table.qza \
  --i-taxonomy classification.qza \
  --p-level 6 \
  --o-collapsed-table genus_table.qza

[32mSaved FeatureTable[Frequency] to: genus_table.qza[0m
[0m

In [12]:
!qiime classo transform-features \
     --p-transformation clr \
     --p-coef 0.5 \
     --i-features genus_table.qza \
     --o-x genus_table_clr

[32mSaved FeatureTable[Frequency] to: genus_table_clr.qza[0m
[0m

In [13]:
!qiime feature-table summarize \
    --i-table filtered-table.qza \
    --o-visualization table-summary.qzv

[32mSaved Visualization to: table-summary.qzv[0m
[0m

In [14]:
!qiime classo transform-features \
     --p-transformation clr \
     --p-coef 0.5 \
     --i-features atacama-table.qza \
     --o-x xclr

[32mSaved FeatureTable[Frequency] to: xclr.qza[0m
[0m

In [10]:
!qiime classo add-taxa \
	--i-features atacama-table.qza  \
	--i-taxa classification.qza \
	--o-x xtaxa \
    --o-aweights wtaxa

[32mSaved FeatureTable[Frequency] to: xtaxa.qza[0m
[32mSaved Weights to: wtaxa.qza[0m
[0m

In [23]:
!qiime classo regress --help

Usage: [94mqiime classo regress[0m [OPTIONS]

  The function computes the constrainted_sparse_regression vector with
  respect to the formulation of regression that is asked and with respect to
  the model selection parameters given

[1mInputs[0m:
  [94m[4m--i-features[0m ARTIFACT   Matrix representing the data of the problem
    [32mFeatureTable[Design][0m                                            [35m[required][0m
  [94m--i-c[0m ARTIFACT          Constraint matrix, default is the zero-sum
    [32mConstraintMatrix[0m                                                [35m[optional][0m
  [94m--i-weights[0m ARTIFACT    Vector of weights for penalization
    [32mWeights[0m                                                         [35m[optional][0m
[1mParameters[0m:
  [94m[4m--m-y-file[0m METADATA
  [94m[4m--m-y-column[0m COLUMN  [32mMetadataColumn[Numeric][0m
                          Vector representing the output of the problem
                              

## Add covariates

In [17]:
!qiime classo add-covariates \
    --i-features xtaxa.qza \
    --i-weights wtaxa.qza \
    --m-covariates-file atacama-sample-metadata.tsv \
    --p-to-add elevation ph	toc	ec average-soil-relative-humidity average-soil-temperature \
    --p-w-to-add 1. 0.1 0.1 0.1 1. 1. \
    --o-new-features xcovariates \
    --o-new-c ccovariates \
    --o-new-w wcovariates

[32mSaved FeatureTable[Frequency] to: xcovariates.qza[0m
[32mSaved ConstraintMatrix to: ccovariates.qza[0m
[32mSaved Weights to: wcovariates.qza[0m
[0m

## Split table

Split data into training and testing sets : 

In [23]:
!qiime sample-classifier split-table \
	--i-table xcovariates.qza \
	--m-metadata-file atacama-sample-metadata.tsv \
	--m-metadata-column average-soil-temperature \
	--p-test-size 0.2 \
	--p-random-state 42 \
	--p-stratify False \
	--o-training-table regress-xtraining \
	--o-test-table regress-xtest \
    --o-training-targets training-targets.qza \
    --o-test-targets test-targets.qza

[32mSaved FeatureTable[Frequency] to: regress-xtraining.qza[0m
[32mSaved FeatureTable[Frequency] to: regress-xtest.qza[0m
[32mSaved SampleData[TrueTargets] to: training-targets.qza[0m
[32mSaved SampleData[TrueTargets] to: test-targets.qza[0m
[0m

In [22]:
!qiime sample-classifier split-table \
	--i-table xcovariates.qza \
	--m-metadata-file atacama-sample-metadata.tsv \
	--m-metadata-column vegetation  \
	--p-test-size 0.2 \
	--p-random-state 42 \
	--p-stratify False \
	--o-training-table classify-xtraining \
	--o-test-table classify-xtest \
    --o-training-targets training-targets.qza \
    --o-test-targets test-targets.qza

[32mSaved FeatureTable[Frequency] to: classify-xtraining.qza[0m
[32mSaved FeatureTable[Frequency] to: classify-xtest.qza[0m
[32mSaved SampleData[TrueTargets] to: training-targets.qza[0m
[32mSaved SampleData[TrueTargets] to: test-targets.qza[0m
[0m

## Regression task 

Apply classo to the training set to solve the linear regression problem : 

In [24]:
!qiime classo regress  \
    --i-features regress-xtraining.qza \
    --i-c ccovariates.qza \
    --i-weights wcovariates.qza \
    --m-y-file atacama-sample-metadata.tsv \
    --m-y-column average-soil-temperature  \
    --p-concomitant \
    --p-stabsel \
    --p-cv \
    --p-path \
    --p-lamfixed \
    --p-stabsel-threshold 0.5 \
    --p-cv-seed 1 \
    --p-no-cv-one-se \
    --o-result regresstaxa

[32mSaved CLASSOProblem to: regresstaxa.qza[0m
[0m

## Classification task

In [25]:
!qiime classo classify  \
    --i-features classify-xtraining.qza \
    --i-c ccovariates.qza \
    --i-weights wcovariates.qza \
    --m-y-file atacama-sample-metadata.tsv \
    --m-y-column vegetation  \
    --p-huber \
    --p-stabsel \
    --p-cv \
    --p-path \
    --p-lamfixed \
    --p-stabsel-threshold 0.5 \
    --p-cv-seed 42 \
    --p-no-cv-one-se \
    --o-result classifytaxa

^C

Aborted!


## Prediction 

In [26]:
!qiime classo predict \
    --i-features regress-xtest.qza \
    --i-problem regresstaxa.qza \
    --o-predictions regress-predictions.qza

[32mSaved CLASSOProblem to: regress-predictions.qza[0m
[0m

In [21]:
!qiime classo predict \
    --i-features classify-xtest.qza \
    --i-problem classifytaxa.qza \
    --o-predictions classify-predictions.qza

[32mSaved CLASSOProblem to: classify-predictions.qza[0m


## Visualization

In [7]:
pwd

'/data/example/data/atacama'

In [8]:
!qiime classo summarize \
  --i-problem regresstaxa.qza \
  --i-taxa classification.qza \
  --i-predictions regress-predictions.qza \
  --o-visualization regresstaxa_1.qzv

[32mSaved Visualization to: regresstaxa_1.qzv[0m


In [12]:
!qiime classo summarize \
  --i-problem classifytaxa.qza \
  --i-taxa taxonomy.qza \
  --i-predictions classify-predictions.qza \
  --o-visualization classifytaxa.qzv \
  --verbose


Index.__and__ operating as a set operation is deprecated, in the future this will be a logical operation matching Series.__and__.  Use index.intersection(other) instead


Index.__and__ operating as a set operation is deprecated, in the future this will be a logical operation matching Series.__and__.  Use index.intersection(other) instead


Index.__and__ operating as a set operation is deprecated, in the future this will be a logical operation matching Series.__and__.  Use index.intersection(other) instead


Index.__and__ operating as a set operation is deprecated, in the future this will be a logical operation matching Series.__and__.  Use index.intersection(other) instead


Index.__and__ operating as a set operation is deprecated, in the future this will be a logical operation matching Series.__and__.  Use index.intersection(other) instead


Index.__and__ operating as a set operation is deprecated, in the future this will be a logical operation matching Series.__and__.  Use index.int

In [24]:
!qiime tools view regresstaxa.qzv

Usage: [34mqiime tools view[0m [OPTIONS] VISUALIZATION

Error: Visualization viewing is currently not supported in headless environments. You can view Visualizations (and Artifacts) at https://view.qiime2.org, or move the Visualization to an environment with a display and view it with `qiime tools view`.


In [25]:
!qiime tools view classifytaxa.qzv

Usage: [34mqiime tools view[0m [OPTIONS] VISUALIZATION

Error: Visualization viewing is currently not supported in headless environments. You can view Visualizations (and Artifacts) at https://view.qiime2.org, or move the Visualization to an environment with a display and view it with `qiime tools view`.


Alternatively, one can drag&drop the file problemtaxa.qzv on : https://view.qiime2.org
Thanks to this alternative, one can also track the workflow that the qiime2 artifact did. 

In [26]:
print("hello classo!")

hello classo!
