In [1]:
import pandas as pd
import plotly
import relatively

plotly.offline.init_notebook_mode(connected=True)

# Reactions and Enzyme Commission

ECs are hierarchical though the lower you get into the hierarchy the less helpful these plots can be. Their intention is to give a brief overview of what's happening across our samples at given levels.

With ECs, we have 4 levels. Because the values of the levels are numeric we will have to specify which columns represent the hierarchy and which represent the values.

Taking a look at our table:

In [2]:
df = pd.read_table("../data/reactions.txt")
df.head()

Unnamed: 0,class,subclass,subsubclass,serialid,sample1,sample2,sample3,sample4,sample5
0,1,-,-,-,117.0,98.0,207.0,127.0,409.0
1,1,1,-,-,1.0,7.0,6.0,16.0,69.0
2,1,1,1,-,248.0,364.0,358.0,400.0,400.0
3,1,1,1,-,8.0,4.0,12.0,8.0,14.0
4,1,1,1,1,83.0,101.0,210.0,92.0,37.0


Each level of ECs are dependent on the previous level to make sense, so we will specify `dependent`:

In [3]:
fig = relatively.abundance_figure(
    "../data/reactions.txt", ["class", "subclass", "subsubclass"],
    ["sample1", "sample2", "sample3", "sample4", "sample5"],
    title="Observed Reactions (EC)",
    dependent=".")
plotly.offline.iplot(fig)

In [4]:
# Should also work when samples are numbers...
# sample5 -> 7 is intentional to show no space when plotly generates the plot
df.rename(columns={"sample1":1, "sample2":2, "sample3":3, "sample4":4, "sample5":7}, inplace=True)
df.head()

Unnamed: 0,class,subclass,subsubclass,serialid,1,2,3,4,7
0,1,-,-,-,117.0,98.0,207.0,127.0,409.0
1,1,1,-,-,1.0,7.0,6.0,16.0,69.0
2,1,1,1,-,248.0,364.0,358.0,400.0,400.0
3,1,1,1,-,8.0,4.0,12.0,8.0,14.0
4,1,1,1,1,83.0,101.0,210.0,92.0,37.0


In [5]:
hierarchy = ["class", "subclass", "subsubclass"]
dfs = relatively.get_dfs_across_hierarchy(df, hierarchy, [1, 2, 3, 4, 7], dependent=".")
fig = relatively.get_abundance_figure_from_dfs(dfs, hierarchy)
plotly.offline.iplot(fig)

# Taxonomic Composition of Metagenomes

The hierarchy of taxonomies can be used to summarize and order samples. In this example we will reorganize our samples based on their Shannon diversity index.

Our input table looks like:

In [6]:
df = pd.read_table("../data/taxonomy.txt")
df.head()

Unnamed: 0,superkingdom,phylum,class,order,sample1,sample2,sample3,sample4,sample5
0,Archaea,Candidatus Bathyarchaeota,,,123.0,120.0,486.0,818.0,3253.0
1,Archaea,Candidatus Lokiarchaeota,,,4.0,1.0,9.0,2.0,5.0
2,Archaea,Candidatus Micrarchaeota,,,9.0,13.0,9.0,19.0,14.0
3,Archaea,Candidatus Nanohaloarchaeota,,Nanohaloarchaea,1.0,2.0,2.0,1.0,0.0
4,Archaea,Candidatus Odinarchaeota,,,2.0,0.0,2.0,3.0,6.0


In the case of taxonomy, each level treated independently still makes biological sense, so we'll leave off `dependent`.

In [7]:
fig = relatively.abundance_figure(
    "../data/taxonomy.txt", ["phylum", "class", "order"],
    title="Taxonomy Assignment Summary",
    height=900)
plotly.offline.iplot(fig)

# Diversity Across Samples

To get diversity indexes per sample, there's a convenience function to calculate each on a dataframe:

In [8]:
diversities = relatively.calculate_diversity(
    df, ["phylum", "class", "order"],
    ["sample1", "sample2", "sample3", "sample4", "sample5"],
    ["shannon", "simpson", "invsimpson"])
diversities.head()

Unnamed: 0,shannon,simpson,invsimpson
sample1,3.326696,0.90206,10.210381
sample2,3.409359,0.917576,12.132366
sample3,3.686257,0.940822,16.898295
sample4,2.593147,0.821694,5.608331
sample5,2.710792,0.86728,7.53466
