# QIIME 2 Tutorial: Machine Learning

## Setup

## Predicting categorical data
### Training/testing classifier

In [None]:
! qiime sample-classifier classify-samples \
    --i-table data/moving_pictures/moving_pictures_table.qza \
    --m-metadata-file data/moving_pictures/moving_pictures_metadata.qzv \
    --m-metadata-column body-site \
    --p-estimator RandomForestClassifier \
    --p-random-state 123 \
    --output-dir rf_classifier

In [None]:
! qiime metadata tabulate \
    --m-input-file rf_classifier/predictions.qza \
    --o-visualization rf_classifier/predictions.qzv

In [None]:
! qiime metadata tabulate \
    --m-input-file rf_classifier/probabilities.qza \
    --o-visualization rf_classifier/probabilities.qzv

In [None]:
! qiime metadata tabulate \
    --m-input-file rf_classifier/test_targets.qza \
    --m-input-file rf_classifier/predictions.qza \
    --o-visualization rf_classifier/test_targets_predictions.qzv

In [None]:
! qiime metadata tabulate \
    --m-input-file rf_classifier/feature_importance.qza \
    --o-visualization rf_classifier/feature_importance.qzv

### Feature selection

In [None]:
! qiime sample-classifier classify-samples \
    --i-table data/moving_pictures/moving_pictures_table.qza \
    --m-metadata-file data/moving_pictures/moving_pictures_metadata.tsv \
    --m-metadata-column body-site \
    --p-optimize-feature-selection \
    --p-parameter-tuning \
    --p-estimator RandomForestClassifier \
    --p-n-estimators 20 \
    --p-random-state 123 \
    --output-dir rf_opt_classifier

In [None]:
! qiime feature-table filter-features \
    --i-table data/moving_pictures/moving_pictures_table.qza \
    --m-metadata-file rf_opt_classifier/feature_importance.qza \
    --o-filtered-table rf_opt_classifier/important_feature_table.qza

In [None]:
! qiime sample-classifier heatmap \
    --i-table data/moving_pictures/moving_pictures_table.qza \
    --i-importance rf_opt_classifier/feature_importance.qza \
    --m-sample-metadata-file data/moving_pictures/moving_pictures_metadata.tsv \
    --m-sample-metadata-column body-site \
    --p-group-samples \
    --p-feature-count 30 \
    --o-filtered-table rf_opt_classifier/important_feature_table_top_30.qza \
    --o-heatmap rf_opt_classifier/important_feature_heatmap.qzv

**Note:** The model we trained here is a toy example containing very few samples from a single study and will probably not be useful for predicting other unknown samples. But if you have samples from one of these body sites, it could be a fun exercise to give it a spin!

## Predicting continuous data

1. Predict on previous moving pictures dataset
2. Predict on ECAM dataset

### Moving pictures dataset

In [None]:
! qiime sample-classifier regress-samples \
    --i-table data/moving_pictures/moving_pictures_table.qza \
    --m-metadata-file data/moving_pictures/moving_pictures_metadata.tsv \
    --m-metadata-column days-since-experiment-start \
    --p-estimator RandomForestRegressor \
    --output-dir mp_regressor \
    --verbose

### ECAM dataset

In [None]:
! qiime sample-classifier regress-samples \
    --i-table data/ecam/ecam_table.qza \
    --m-metadata-file data/ecam/ecam_metadata.tsv \
    --m-metadata-column month \
    --p-estimator RandomForestRegressor \
    --output-dir ecam_regressor \
    --verbose

## Nested cross-validation

In [None]:
! qiime sample-classifier classify-samples-ncv \
    --i-table data/moving_pictures/moving_pictures_table.qza \
    --m-metadata-file data/moving_pictures/moving_pictures_metadata.tsv \
    --m-metadata-column body-site \
    --p-estimator RandomForestClassifier \
    --p-random-state 123 \
    --output-dir moving_pictures_ncv

In [None]:
! qiime sample-classifier confusion-matrix \
    --i-predictions moving_pictures_ncv/predictions-ncv.qza \
    --i-probabilities moving_pictures_ncv/probabilities-ncv.qza \
    --m-truth-file data/moving_pictures/moving_pictuers_metadata.tsv \
    --m-truth-column body-site \
    --o-visualization moving_pictures_ncv/ncv_confusion_matrix.qzv

In [None]:
! qiime sample-classifier regress-samples-ncv \
    --i-table data/ecam/ecam_table.qza \
    --m-metadata-file data/ecam/ecam_metadata.tsv \
    --p-estimator RandomForestRegressor \
    --p-random-state 123 \
    --output-dir ecam_ncv

In [None]:
! qiime sample-classifier scatterplot \
    --i-predictions ecam_ncv/predictions-ncv.qza \
    --m-truth-file data/ecam/ecam_metadata.tsv \
    --m-truth-column month \
    --o-visualization ecam_ncv/ecam_scatterp.qzv