<h1><b>Artifact API Practice</b></h1>
<h6><i>Anton Hibl, June 10th, 2021</i></h6>

<hr><hr>

#### <u>The Artifact API & Qiime 2 Introduction</u>

The Arifact API is an interface for python3 and Qiime2 which allows for an interactive computation environment in which Qiime2 and it's framework and the programming language of Python3 can interface cohesively.

To start off I will load in an `Artifact` which is a feature table; from this I will pass the feature table into Qiime2's plugin `q2-feature-table` which then uses the `rarefy` method to return a new artifact. However the first thing you should do in any notebook or program is load in your necessary modules which I have done below:


In [1]:
# Importing necessary modules for general data science
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib as mpl
import sklearn, sys, socket, requests
import altair as alt
import biom, qiime2
from qiime2 import Metadata 
from qiime2.plugins import feature_table, diversity, demux

Now I can get started by loading in my feature table:

In [38]:
''' Utilizing the q2-feature-table plugin & the rarefy method to return a new artifact '''

# Loads in feature table artifact
unrarefied_table = qiime2.Artifact.load("./zipfiles/table.qza")
'''
    This next piece of code rarefies the table, this essentially takes subsample frequencies from all samples 
    so that the sum of frequencies in each sample is equal to the sampling-depth.
'''
rarefy_result = feature_table.methods.rarefy(table=unrarefied_table, sampling_depth=100)
rarefied_table = rarefy_result.rarefied_table

While it is reccomended to work with Qiime2 `Artifacts` directly, it is possible to access the underlying data in one or more compatible views (Python objects/data structures or file formats). For an example, dopwn below I will look at the feature table that has now been rarefied and access it as a `biom.Table` object:

In [41]:
# Creates a variable which holds the view of our rarefied table using biom's Table function
biom_table_ex = rarefied_table.view(biom.Table)
# now I print out the head of the biom table
print(biom_table_ex.head())

# Constructed from biom file
#OTU ID	BAQ2420.1.1	BAQ2420.1.2	BAQ2420.1.3	BAQ2420.2	BAQ2420.3
409faa5f5353e543bf6d99125c7c0e83	0.0	0.0	0.0	0.0	0.0
a7b877ae6d2f079a15b6b192a4425620	0.0	0.0	0.0	4.0	0.0
a36b38f754f6abd278aeb9dbc7696343	4.0	0.0	0.0	0.0	4.0
1237d5925a7176fced9dda961a86c684	0.0	0.0	2.0	0.0	0.0
ef3fdbe1dcde754d91130cde6a4b4d61	0.0	0.0	0.0	0.0	0.0


It is also totally possible to view this artifact's data as a `pandas.Dataframe` object like below:

In [42]:
# Creates a variable holding the view of my rarefied table as a Pandas DataFrame object
panda_table_ex = rarefied_table.view(pd.DataFrame)
# Shows the head of this panda dataframe object
panda_table_ex.head()


Unnamed: 0,409faa5f5353e543bf6d99125c7c0e83,a7b877ae6d2f079a15b6b192a4425620,a36b38f754f6abd278aeb9dbc7696343,1237d5925a7176fced9dda961a86c684,ef3fdbe1dcde754d91130cde6a4b4d61,96cbccca68ad868a78bb0604e4a41cf5,11f7b172e09b77715d3bb2175a40b409,f6c10a04d57159c0d64d6bc30c677471,1694836fc379411d2b9aa087d68d571b,ffd60d684f32e6fd5b47fe90095f9d34,...,45ee2d4ff023d26d4cd0b410a9918ff8,427206321377b8a2a217b4ee92cbed5e,10aa0f26b210fbe3a30fe61801e9a075,86037f00237ac314a58db234198c18b3,c9e32079e7c3e2a804a14ce56dfdcc05,dfec303098a97f8d14ac51318ab6fafb,69c7e02fbef232462ebacf2aea4498b0,fc38276fdb465adf0d4e28f7980f043d,53976595447cfdeb7b839b31dd755525,bbe5779dec14df7e755cb26e51b45ff1
BAQ2420.1.1,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
BAQ2420.1.2,0.0,0.0,0.0,0.0,0.0,0.0,1.0,4.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
BAQ2420.1.3,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
BAQ2420.2,0.0,4.0,0.0,0.0,0.0,0.0,6.0,5.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
BAQ2420.3,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0


A powerful feature of QIIME 2 is that you can export different types of views from QIIME artifacts as illustrated here, then operate on the resulting data types, and import those data back into QIIME. This is useful if there are some operations that are available on the view’s data type (e.g., the pandas.DataFrame) that are not available through the QIIME API. However, a drawback to this is that data provenance os lost in this process as Qiime cannot keep track of what happen to your data outside of it.

You can import the `pandas.DataFrame` object into another qiime artifact like so:

#### <u>Other Qiime2 Plugins </u>
My `rarefied_table` artifact can be now utilized within other qiime2 plugins and their methods to create interesting visualizations, transformations, and analyses. Here I will compute the <i>Observed Features alpha diversity metric</i> using the `q2-diversity plugin`. The resulting Artifact will be of type `SampleData[AlphaDiversity]`, and we can access a `pd.Series` as a view of this Artifact:

In [26]:
# Creates a variable which holds the computed alpha diversity values which are based on the metric of observed features
alpha_result_b3t4 = diversity.pipelines.alpha(table=rarefied_table, metric='observed_features')
# Applies the alpha_diversity method to the variable I just made
alpha_diversity_ex = alpha_result_b3t4.alpha_diversity
# Views the data as a pandas Series
alpha_diversity_ex.view(pd.Series)

BAQ2420.1.1    29
BAQ2420.1.2    28
BAQ2420.1.3    23
BAQ2420.2      32
BAQ2420.3      32
BAQ2462.1      29
BAQ2462.2      26
BAQ2462.3      23
BAQ2687.1      26
BAQ2687.2      20
BAQ2687.3      27
BAQ2838.1      20
BAQ2838.2      15
BAQ2838.3      14
BAQ3473.1      30
BAQ3473.2      31
BAQ3473.3      38
BAQ4166.1.1    27
BAQ4166.1.2    30
BAQ4166.1.3    29
BAQ4166.2      34
BAQ4166.3      36
BAQ4697.1      22
BAQ4697.2      26
BAQ4697.3      35
YUN1005.1.1    20
YUN1005.3      19
YUN1242.1      15
YUN1242.3      18
YUN1609.1      17
YUN2029.2      23
YUN3153.2      22
YUN3153.3      21
YUN3259.1.2    18
YUN3259.1.3    13
YUN3259.2      26
YUN3259.3      26
YUN3346.1      29
YUN3346.2       9
YUN3346.3      21
YUN3428.1      26
YUN3428.2      39
YUN3428.3      30
YUN3533.1.1    24
YUN3533.1.2    25
YUN3533.1.3    36
YUN3533.2      23
YUN3533.3      39
YUN3856.1.1    24
YUN3856.1.2    29
YUN3856.1.3    32
YUN3856.2      39
YUN3856.3      41
Name: observed_features, dtype: int64

Finally, we can save our `Artifacts` as `.qza` files as follows:

Another powerful feature of QIIME 2 is that you can combine interfaces. For example, you could develop a Python script that automatically processes files for you to generate results as we just did, and then perform analysis of those files using the command line interface or the QIIME 2 Studio. For instance, you could now continue your analysis and view some results on the command line as follows:

The above commands as an API call in Artifact:

In [30]:
metadata = Metadata.load('sample-metadata.tsv')
group_significance = diversity.actions.alpha_group_significance(alpha_diversity=alpha_diversity_ex, metadata=metadata)

<hr>

# Altair Practice

Data in Altair is built around the Pandas `DataFrame`. One of the defining characteristics of statistical visualization is that it begins with tidy `DataFrames`. This will work perfect with Qiime as the `Artifact API` allows pandas `DataFrame` objects to be quite versatile.

In [20]:
# Generate some mock data with pandas as pd in the form of a dictionary of arrays
data = pd.DataFrame({'a': list('CCCDDDEEE'),              # a is mapped to a list created from a string of characters
                     'b': [2, 7, 4, 1, 2, 6, 8, 4, 7]})   # b is mapped to a list of integers

# Chart this data with altair and mark a point on it with another altair method
chart = alt.Chart(data)      # charts the data
alt.Chart(data).mark_point() # marks and displays all points on one axis thus making one point

This problem of the one axis can be addressed by using the `.encode()` method in parallel with the `.mark_point()` method(I can exchange this with lines and bars as well as is demonstrated down below) and specifying which variables are arranged on which axis:

In [21]:
alt.Chart(data).mark_point().encode(
    x='a',
    y='b'
)

With bars:

In [22]:
alt.Chart(data).mark_bar().encode(
    x='a',
    y='b'
)

With lines and an an averaged dataset for the y-axis(`b` in the pandas `dataframe`/`dictionary`:

In [23]:
# creates a 'chart_ex' variable to hold the line plot
chart_ex = alt.Chart(data).mark_line().encode(
    x='a',
    y='average(b)'
)

# shows chart
alt.Chart(data).mark_line().encode(
    x='a',
    y='average(b)'
)

In [50]:
print(chart_ex.to_json())

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "name": "data-347f1284ea3247c0f55cb966abbdd2d8"
  },
  "datasets": {
    "data-347f1284ea3247c0f55cb966abbdd2d8": [
      {
        "a": "C",
        "b": 2
      },
      {
        "a": "C",
        "b": 7
      },
      {
        "a": "C",
        "b": 4
      },
      {
        "a": "D",
        "b": 1
      },
      {
        "a": "D",
        "b": 2
      },
      {
        "a": "D",
        "b": 6
      },
      {
        "a": "E",
        "b": 8
      },
      {
        "a": "E",
        "b": 4
      },
      {
        "a": "E",
        "b": 7
      }
    ]
  },
  "encoding": {
    "x": {
      "field": "a",
      "type": "nominal"
    },
    "y": {
      "aggregate": "average",
      "field": "b",
      "type": "quantitative"
    }
  },
  "mark": "line"
}


### Qiime Artifact Workflow Practice
Below I use the `summarize` visualization method from `demux`, a qiime2 plugin which helps hadnle demultiplexed data:

In [25]:
#Inputting another qiime2 plugin
from qiime2.plugins.demux.visualizers import summarize


# Loading a demultiplexed data artifact
demux_ex = qiime2.Artifact.load("zipfiles/demux-full.qza")

# uses demux's sumamrize method to visualize the data and make some assessments
summarized_demux_dat, = summarize(demux_ex)

<Figure size 432x288 with 0 Axes>

In [26]:
# Calls for the visualization
summarized_demux_dat

Now I will use qiime2's `diversity` plugin to  make a better analysis of the demultiplexed data's diversity metrics, as well as the `alpha` method from `diversity`:

In [76]:
# Import the necessary plug-ins
from qiime2.plugins import diversity
from qiime2.plugins.feature_table.visualizers import heatmap

# Load in my feature table artifact
prealph_demux_ft = qiime2.Artifact.load('zipfiles/table.qza')
# Apply the alpha method to calculate some alpha diversity metrics
alpha_result_ex, = diversity.pipelines.alpha(table=prealph_demux_ft, metric='observed_features')


Seperating these cells allows for the pipeline to not have to be re-run every time we would like to see the visualization, the only recurring loading is of meta-data to ensure it is up to date with the whole data set:

In [77]:
# Load my samples meta-data
meta_ex = Metadata.load('sample-metadata.tsv')
# Create a visualization of some of the correlations in the newly defined data set
alph_corr_ex, = diversity.visualizers.alpha_correlation(alpha_result_ex, meta_ex)
# Call for the visualization
alph_corr_ex