# QIIME 2 Tutorial: Machine Learning

## Setup

## Predicting categorical data
### Training/testing classifier

In [None]:
! qiime sample-classifier classify-samples \
    --i-table data/table.qza \
    --m-metadata-file data/sample_metadata.qzv \
    --m-metadata-column body-site \
    --p-estimator RandomForestClassifier \
    --p-random-state 0 \
    --output-dir rf_classifier

In [None]:
! qiime metadata tabulate \
    --m-input-file rf_classifier/predictions.qza \
    --o-visualization rf_classifier/predictions.qzv

In [None]:
! qiime metadata tabulate \
    --m-input-file rf_classifier/probabilities.qza \
    --o-visualization rf_classifier/probabilities.qzv

In [None]:
! qiime metadata tabulate \
    --m-input-file rf_classifier/test_targets.qza \
    --m-input-file rf_classifier/predictions.qza \
    --o-visualization rf_classifier/test_targets_predictions.qzv

In [None]:
! qiime metadata tabulate \
    --m-input-file rf_classifier/feature_importance.qza \
    --o-visualization rf_classifier/feature_importance.qzv

### Feature selection

In [None]:
! qiime sample-classifier classify-samples \
    --i-table data/table.qza \
    --m-metadata-file data/sample_metadata.tsv \
    --m-metadata-column body-site \
    --p-optimize-feature-selection \
    --p-parameter-tuning \
    --p-estimator RandomForestClassifier \
    --p-n-estimators 20 \
    --p-random-state 123 \
    --output-dir rf_opt_classifier

In [None]:
! qiime feature-table filter-features \
    --i-table data/table.qza \
    --m-metadata-file rf_opt_classifier/feature_importance.qza \
    --o-filtered-table rf_opt_classifier/important_feature_table.qza

In [None]:
! qiime sample-classifier heatmap \
    --i-table data/table.qza \
    --i-importance rf_opt_classifier/feature_importance.qza \
    --m-sample-metadata-file data/sample_metadata.tsv \
    --m-sample-metadata-column body-site \
    --p-group-samples \
    --p-feature-count 30 \
    --o-filtered-table rf_opt_classifier/important_feature_table_top_30.qza \
    --o-heatmap rf_opt_classifier/important_feature_heatmap.qzv

**Note:** The model we trained here is a toy example containing very few samples from a single study and will probably not be useful for predicting other unknown samples. But if you have samples from one of these body sites, it could be a fun exercise to give it a spin!

## Predicting continuous data

1. Predict on previous moving pictures dataset
2. Predict on ECAM dataset

In [None]:
! qiime sample-classifier regress-samples \
    --i-table data/table.qza \
    --m-metadata-file data/sample_metadata.tsv \
    --m-metadata-column days-since-experiment-start \
    --output-dir mp_regressor \
    --verbose