# Visualizing Songbird feature differentials with Qurro
In this example, we use data from the Red Sea metagenome dataset. This particular data was obtained from [Songbird's GitHub repository in its `data/redsea` folder](https://github.com/biocore/songbird/tree/master/data/redsea), and is associated with the following paper:

Thompson, L. R., Williams, G. J., Haroon, M. F., Shibl, A., Larsen, P., Shorenstein, J., ... & Stingl, U. (2017). Metagenomic covariation along densely sampled environmental gradients in the Red Sea. _The ISME Journal, 11_(1), 138.

## 2022 note on running Songbird (and also running Qurro)

A lot has changed since we published these tools in 2019 and 2020! Notably, the [pandas](https://pandas.pydata.org/) (and, as a result, QIIME 2 versions) required by Qurro (version 0.8.0 and higher) and Songbird are now incompatible, as of writing:

| Tool  | Required `pandas` version | Required QIIME 2 version |
| ----  | ---- | ---- |
| Qurro | `>= 1` | `>= 2020.11` |
| Songbird | `< 1` | `>= 2019.7, <= 2020.6` |

This implies that installing Qurro and Songbird into the same conda environment is not feasible. However, it's possible to install them into separate conda environments; the differentials output by Songbird are still completely compatible with Qurro.

__To get around this issue for the purposes of this tutorial__, we will run Songbird from within one QIIME 2 conda environment (version `2020.6`) and run Qurro from within another QIIME 2 conda environment (version `2022.2`). (Getting Jupyter and conda to play nicely can be a bit of a pain, but the `nb_conda_kernels` package should help make it easier to switch between conda environments within a notebook. That being said, it'll probably be easier to replicate these analyses outside of a Jupyter notebook.)

For the most up-to-date details about how to install and run Songbird, please see [its documentation](https://github.com/biocore/songbird/).

## Requirements
This notebook relies on two QIIME 2 conda environments being installed, as discussed above: one containing Songbird, and one containing Qurro. See above for details on the exact versions required.

## 0. Setting up
In this section, we replace the output directory with an empty directory. This just lets us run this notebook multiple times, without any tools complaining about overwriting files.

In [1]:
# Clear the output directory so we can write these files there
!rm -rf output
# Since git doesn't keep track of empty directories, create the output/ directory if it doesn't already exist
# (if it does already exist, -p ensures that an error won't be thrown)
!mkdir -p output

## 1. Using Songbird and Qurro through QIIME 2 (using two conda environments)

### 1. A. Using Songbird through QIIME 2 (`>= 2019.7, <= 2020.6`)

__This should be run from a QIIME 2 conda environment in which Songbird (but not Qurro) is installed.__

If you just installed Songbird, it's advised that you run `qiime dev refresh-cache` on your system afterwards in order to get QIIME 2 to "find" its QIIME 2 plugin.

In order to use this dataset's BIOM table in QIIME 2, we need to import it as a `FeatureTable[Frequency]` QIIME 2 artifact.

In [2]:
!qiime tools import \
    --input-path input/redsea.biom \
    --output-path output/redsea.biom.qza \
    --type FeatureTable[Frequency]

[32mImported input/redsea.biom as BIOMV210DirFmt to output/redsea.biom.qza[0m


Now, we can run Songbird through QIIME 2 on our imported BIOM table. This produces three output files, but the main one we care about for Qurro is the `FeatureData[Differential]` artifact (which will be stored in `output/differentials.qza`). This artifact contains **feature differentials**: as Songbird's documentation puts it, these correspond to "...the ordering of the coefficients within a covariate."

Please see [Songbird's documentation](https://github.com/biocore/songbird/) for more information about how it works and how its output files are formatted.

#### Why these hyperparameters?
These hyperparameters (in particular, `epochs` and `differential-prior`) were selected based on experimentation with Tensorboard. See Songbird's [FAQs](https://github.com/biocore/songbird/#faqs) for details on how to use Tensorboard and select these sort of hyperparameters for your own datasets (this is important, but the question of how to do this is beyond the scope of this tutorial).

In [3]:
!qiime songbird multinomial \
    --i-table output/redsea.biom.qza \
    --m-metadata-file input/redsea_metadata.txt \
    --p-formula "Depth+Temperature+Salinity+Oxygen+Fluorescence+Nitrate" \
    --p-epochs 10000 \
    --p-differential-prior 0.5 \
    --o-differentials output/differentials.qza \
    --o-regression-stats output/regression-stats.qza \
    --o-regression-biplot output/regression-biplot.qza

[33mQIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.[0m
[32mSaved FeatureData[Differential] to: output/differentials.qza[0m
[32mSaved SampleData[SongbirdStats] to: output/regression-stats.qza[0m
[32mSaved PCoAResults % Properties('biplot') to: output/regression-biplot.qza[0m


### 1. B. Using Qurro through QIIME 2 (`>= 2020.11`)

__At this point, you should switch to a newer QIIME 2 environment with which Qurro will be compatible.__

Since our "feature rankings" are the (sorted) feature differentials that Songbird just produced, we'll use the `qiime qurro differential-plot` command.

In [1]:
!qiime qurro differential-plot --help

Usage: [94mqiime qurro differential-plot[0m [OPTIONS]

  Generates an interactive visualization of feature differentials in tandem
  with a visualization of the log-ratios of selected features' sample
  abundances.

[1mInputs[0m:
  [94m[4m--i-ranks[0m ARTIFACT [32mFeatureData[Differential][0m
                       Feature differentials.                       [35m[required][0m
  [94m[4m--i-table[0m ARTIFACT [32mFeatureTable[Frequency][0m
                       A BIOM table describing the abundances of the ranked
                       features in samples. Note that empty samples and
                       features will be removed from the Qurro visualization.
                                                                    [35m[required][0m
[1mParameters[0m:
  [94m[4m--m-sample-metadata-file[0m METADATA...
    (multiple          Sample metadata. In Qurro visualizations, you can use
     arguments will    sample metadata fields to change the x-

In [2]:
!qiime qurro differential-plot \
    --i-ranks output/differentials.qza \
    --i-table output/redsea.biom.qza \
    --m-sample-metadata-file input/redsea_metadata.txt \
    --m-feature-metadata-file input/feature_metadata.txt \
    --verbose \
    --o-visualization output/qurro_plot_q2.qzv

[32mSaved Visualization to: output/qurro_plot_q2.qzv[0m
[0m

That's it! Now, we've created a QZV file (describing a Qurro visualization) at `output/qurro_plot_q2.qzv`. You can view this visualization in one of the following ways:
  1. Upload the QZV file to [view.qiime2.org](https://view.qiime2.org).
  2. View the QZV file using `qiime tools view`.

## 2. Using Songbird and Qurro as standalone tools (again, using two conda environments)
We don't need to use Songbird and Qurro through QIIME 2; if you want, you can run these tools outside of QIIME 2. Although this means you don't have access to some of QIIME 2's functionality (e.g. provenance tracking, or artifact semantic types), the results you get should be roughly the same. (We say "roughly" because some of the machine learning methods used by Songbird involve randomness.)

As with the QIIME 2 examples above, Songbird and Qurro are incompatible -- they have conflicting dependencies. We recommend using conda, so that you can install Songbird and Qurro into two separate environments (and switch between these as needed).

### 2. A. Using Songbird as a standalone tool

__This should be run from a conda environment in which Songbird (but not Qurro) is installed.__

In [1]:
!songbird multinomial \
    --input-biom input/redsea.biom \
    --metadata-file input/redsea_metadata.txt \
    --formula "Depth+Temperature+Salinity+Oxygen+Fluorescence+Nitrate" \
    --epochs 10000 \
    --differential-prior 0.5 \
    --summary-dir output/


2022-07-05 21:04:30.548298: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-07-05 21:04:30.572787: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1999965000 Hz
2022-07-05 21:04:30.573710: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55feb226f620 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-07-05 21:04:30.573769: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version

Instructions for updating:
Use `tf.random.categorical` instead.

Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlo

### 2. B. Using Qurro as a standalone tool

__This should be run from a conda environment in which Qurro (but not Songbird) is installed.__

When we used Qurro through QIIME 2, we had to specify the `differential-plot` command in order to let the Qurro QIIME 2 plugin know we were working with feature differentials.

Now that we're running Qurro outside of QIIME 2, we don't need to specify this; Qurro can accept either feature differentials or feature loadings as input.

In [1]:
!qurro --help

Usage: qurro [OPTIONS]

  Generates a visualization of feature rankings and log-ratios.

  The resulting visualization contains two plots. The first plot shows how
  features are ranked, and the second plot shows the log-ratio of "selected"
  features' abundances within samples.

  The visualization is interactive, so which features are "selected" to
  construct log-ratios -- as well as various other properties of the
  visualization -- can be changed by the user.

Options:
  -r, --ranks TEXT                Either feature differentials (contained in a
                                  TSV file, where each row describes a feature
                                  and each column describes a differential
                                  field) or a scikit-bio OrdinationResults
                                  file for a biplot (containing feature
                                  loadings). When sorted numerically,
                                  differentials and 

In [2]:
!qurro \
    --ranks output/differentials.tsv \
    --table input/redsea.biom \
    --sample-metadata input/redsea_metadata.txt \
    --feature-metadata input/feature_metadata.txt \
    --output-dir output/qurro_plot_standalone/

Successfully generated a visualization in the folder output/qurro_plot_standalone/.


We just generated a Qurro visualization in the folder `output/qurro_plot_standalone/`. This visualization is analogous to the QZV file we generated above using QIIME 2. You can view this visualization by just opening up `output/qurro_plot_standalone/index.html` in a modern web browser.

That's it! If you have any more questions about using Qurro, feel free to contact us (see the Qurro README for contact information).