# Tutorial: Machine Learning with QIIME 2

This notebook contains materials accompanying the Functional Genomics Center Zürich course **Next-Generation Sequencing Applied to Metagenomics (BIO638)**. The notebook and corresponding setup script were adapted from the [**Advanced Block Course: Computational Biology**](https://github.com/bokulich-lab/advanced-comp-bio-tutorial.git); all source code is licensed under the Apache License 2.0.

Save your own local copy of this notebook by using `File > Save a copy in Drive`. At some point you may be prompted to trust the notebook. We promise that it is safe 🤞

**Notes (optional):**

The Google Colab notebook environment will interpret any command as Python code by default. If we want to run bash commands we will have to prefix them by `!`. So any command you see with a leading `!` is a bash command and if you wanted to run it in your terminal you would omit the leading `!`. For example, if in the Colab notebook you ran `!wget` you would just run `wget` in your terminal.

In this notebook we use the `!` prefix because we run all QIIME 2 commands using the [`q2cli`](https://github.com/qiime2/q2cli/) (QIIME 2 command-line interface). However, QIIME 2 also has a python API and a Galaxy interface. You can learn more about these and other QIIME 2 interfaces at https://qiime2.org/.

In [None]:
feature_table_file = "/content/feature_table.qza"
feature_table_tss_file = "/content/feature_table_tss.qza"
metadata_file = "/content/metadata.tsv"

### Environment setup

QIIME 2 is usually installed by following the [official installation instructions](https://docs.qiime2.org/2024.10/install/). However, because we are using Google Colab and there are some caveats to using conda here, we will have to hack around the installation a little. But no worries, we provide a setup script below which does all this work for us. 😌 Let's start by pulling a local copy of the project repository down from GitHub.

From here, you run the entire notebook by selecting `Runtime > Run all` from the menu in Google Colab. Some steps are time-comsuming and the entire notebook may take up to 30-60 minutes, so run the entire notebook now and we will inspect the commands and results as we work through as a class.

🛑 **ACTION** 🛑
<br>
*Run every cell in the notebook using the instructions above.*

In [1]:
! git clone https://github.com/bokulich-lab/rigi-workshop.git workshop-materials

Cloning into 'materials'...
remote: Enumerating objects: 11, done.[K
remote: Counting objects: 100% (11/11), done.[K
remote: Compressing objects: 100% (10/10), done.[K
remote: Total 11 (delta 1), reused 7 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (11/11), 12.18 KiB | 4.06 MiB/s, done.
Resolving deltas: 100% (1/1), done.


We will move into the `materials/` directory.

In [2]:
%cd workshop-materials

/content/materials


Now we are ready to set up our environment. This will take about 10 minutes.
<br>
**Note:** This setup is only relevant for Google Colaboratory and will not work on your local machine. To learn more about MOSHPIT installation please consult our [official tutorial](https://moshpit.readthedocs.io/en/latest/chapters/00_setup.html).

In [32]:
%run setup_moshpit

In [41]:
# we need to alias the "mosh" command to point to the moshpit-dev environment - this is a workaround for the Google Colab environment
alias mosh mamba run -n moshpit-dev -r /usr/local mosh