# QIIME 2 classo with Atacama soil microbiome

**environment:** qiime2-2020.2

Use this version of [q2-classo](https://github.com/Leo-Simpson/q2-classo/tree/edbf9d20fed931cbc1cff77f5634bb0dcd783c8a) 

In [None]:
pip install --upgrade c-lasso

In [None]:
pip install c-lasso

In [None]:
pip install zarr

In [None]:
pip install plotly

In [None]:
cd ..

In [None]:
!python setup.py install

In [None]:
!pip install -e .

In [None]:
!qiime dev refresh-cache

In [9]:
cd example/data/atacama/

/data/example/data/atacama


## Filter features

In [17]:
# !qiime feature-table filter-features \
#   --i-table atacama-table.qza \
#   --p-min-samples 53 \
#   --o-filtered-table filtered-table.qza

[32mSaved FeatureTable[Frequency] to: filtered-table.qza[0m
[0m

## log-contrast and taxa processing

Either collapse at genus level, which is the 'easy way', but not really what we want

In [14]:
!qiime taxa collapse \
  --i-table atacama-table.qza \
  --i-taxonomy classification.qza \
  --p-level 6 \
  --o-collapsed-table genus_table.qza

[32mSaved FeatureTable[Frequency] to: genus_table.qza[0m


In [15]:
!qiime classo transform-features \
     --p-transformation clr \
     --p-coef 0.5 \
     --i-features genus_table.qza \
     --o-x genus_table_clr

[32mSaved FeatureTable[Design] to: genus_table_clr.qza[0m


In [16]:
!qiime feature-table summarize \
    --i-table filtered-table.qza \
    --o-visualization table-summary.qzv

[32mSaved Visualization to: table-summary.qzv[0m


In [17]:
!qiime classo transform-features \
     --p-transformation clr \
     --p-coef 0.5 \
     --i-features atacama-table.qza \
     --o-x xclr

[32mSaved FeatureTable[Design] to: xclr.qza[0m


In [21]:
!qiime classo add-taxa \
	--i-features xclr.qza  \
	--i-taxa classification.qza \
	--o-x xtaxa \
    --o-aweights wtaxa

[32mSaved FeatureTable[Design] to: xtaxa.qza[0m
[32mSaved Weights to: wtaxa.qza[0m


## Add covariates

In [22]:
!qiime classo add-covariates \
    --i-features xtaxa.qza \
    --i-weights wtaxa.qza \
    --m-covariates-file atacama-sample-metadata.tsv \
    --p-to-add elevation ph	toc	ec average-soil-relative-humidity average-soil-temperature \
    --p-w-to-add 1. 0.1 0.1 0.1 1. 1. \
    --o-new-features xcovariates \
    --o-new-c ccovariates \
    --o-new-w wcovariates

[32mSaved FeatureTable[Design] to: xcovariates.qza[0m
[32mSaved ConstraintMatrix to: ccovariates.qza[0m
[32mSaved Weights to: wcovariates.qza[0m


## Split table

Split data into training and testing sets : 

In [25]:
!qiime sample-classifier split-table \
	--i-table xcovariates.qza \
	--m-metadata-file atacama-sample-metadata.tsv \
	--m-metadata-column average-soil-temperature \
	--p-test-size 0.2 \
	--p-random-state 42 \
	--p-stratify False \
	--o-training-table regress-xtraining \
	--o-test-table regress-xtest

### in newer version of QIIME use these parameters
    # --o-training-targets training-targets.qza \
    # --o-test-targets test-targets.qza

[32mSaved FeatureTable[Design] to: regress-xtraining.qza[0m
[32mSaved FeatureTable[Design] to: regress-xtest.qza[0m


In [26]:
!qiime sample-classifier split-table \
	--i-table xcovariates.qza \
	--m-metadata-file atacama-sample-metadata.tsv \
	--m-metadata-column vegetation  \
	--p-test-size 0.2 \
	--p-random-state 42 \
	--p-stratify False \
	--o-training-table classify-xtraining \
	--o-test-table classify-xtest

### in newer version of QIIME use these parameters
    # --o-training-targets training-targets.qza \
    # --o-test-targets test-targets.qza

[32mSaved FeatureTable[Design] to: classify-xtraining.qza[0m
[32mSaved FeatureTable[Design] to: classify-xtest.qza[0m


## Regression task 

Apply classo to the training set to solve the linear regression problem : 

In [27]:
!qiime classo regress  \
    --i-features regress-xtraining.qza \
    --i-c ccovariates.qza \
    --i-weights wcovariates.qza \
    --m-y-file atacama-sample-metadata.tsv \
    --m-y-column average-soil-temperature  \
    --p-concomitant \
    --p-stabsel \
    --p-cv \
    --p-path \
    --p-lamfixed \
    --p-stabsel-threshold 0.5 \
    --p-cv-seed 1 \
    --p-no-cv-one-se \
    --o-result regresstaxa

[32mSaved CLASSOProblem to: regresstaxa.qza[0m


## Classification task

In [28]:
!qiime classo classify  \
    --i-features classify-xtraining.qza \
    --i-c ccovariates.qza \
    --i-weights wcovariates.qza \
    --m-y-file atacama-sample-metadata.tsv \
    --m-y-column vegetation  \
    --p-huber \
    --p-stabsel \
    --p-cv \
    --p-path \
    --p-lamfixed \
    --p-stabsel-threshold 0.5 \
    --p-cv-seed 42 \
    --p-no-cv-one-se \
    --o-result classifytaxa

[32mSaved CLASSOProblem to: classifytaxa.qza[0m


## Prediction 

In [29]:
!qiime classo predict \
    --i-features regress-xtest.qza \
    --i-problem regresstaxa.qza \
    --o-predictions regress-predictions.qza

[32mSaved CLASSOProblem to: regress-predictions.qza[0m


In [30]:
!qiime classo predict \
    --i-features classify-xtest.qza \
    --i-problem classifytaxa.qza \
    --o-predictions classify-predictions.qza

[32mSaved CLASSOProblem to: classify-predictions.qza[0m


## Visualization

In [43]:
!qiime classo summarize \
  --i-problem regresstaxa.qza \
  --i-taxa classification.qza \
  --i-predictions regress-predictions.qza \
  --o-visualization regresstaxa_R3.qzv

[32mSaved Visualization to: regresstaxa_R3.qzv[0m


In [44]:
!qiime classo summarize \
  --i-problem classifytaxa.qza \
  --i-taxa classification.qza \
  --i-predictions classify-predictions.qza \
  --o-visualization classifytaxa_C2.qzv

[32mSaved Visualization to: classifytaxa_C2.qzv[0m


Drag&drop .qzv files on : https://view.qiime2.org
Thanks to this alternative, one can also track the workflow that the qiime2 artifact did. 