<div style="text-align: justify; padding:5px; background-color:rgb(252, 253, 255); border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">
    <font color='red'>Mini Jupyter tutorial<br><br>To run each cell, click the cell and press <kbd>Run</kbd> from the menu bar. This will run any Python code or display any text within the selected cell before highlighting the next cell down. There are two types of cell: A <i>text cell</i> of type <kbd>Markdown</kbd> or <kbd>Heading</kbd> and a <i>code cell</i> of type <kbd>Code</kbd> identifiable with the <span style="font-family: courier; color:black; background-color:white;">In[ ]:</span> to the left of the cell</i>. The type of cell is also identifiable from the dropdown menu in the above menu bar to the right of <kbd>Run</kbd>. Any visual results produced by the code (text/figures) are displayed directly below that cell. Press <kbd>Run</kbd> again until you reach the end of the notebook or alternatively click <kbd>Kernel</kbd><font color='black'>→</font><kbd>Restart and Run All</kbd>. Should the Jupyter notebook crash for any reason, restart the Jupyter Kernel by clicking <kbd>Kernel</kbd><font color='black'>→</font><kbd>Restart</kbd>, and start again from the top.
        
</div>

# Tutorial 6: Metabolomics Hierarchical Edge Bundle

<p style="text-align: justify">
<br>
This tutorial covers the necessary steps for producting a Hierarchical Edge Bundle.
</p>

<div style="text-align: justify; padding:5px; background-color:rgb(252, 253, 255); border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">
    <font color='red', size=4>Note: If visualisng using a JavaScript pop-up window you will need to allow pop-ups from your browser for the domain you're running from (localhost or mybinder.org).
</div> 

<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">
    
<h2 id="1importpackagesmodules" style="text-align: justify">1. Import Packages/Modules</h2>

<p style="text-align: justify">The first code cell of this tutorial imports <a href="https://docs.python.org/3/tutorial/modules.html"><em>packages</em> and <em>modules</em></a> into the Jupyter environment. <em>Packages</em> and <em>modules</em> provide additional functions and tools beyond the in-built Python modules.
<br></p>
<br>
All the code embedded in this notebook is written using Python (<a href="http://www.python.org">python.org</a>) and JavaScript (<a href="https://www.javascript.com/">javascript.com</a>) and are built upon popular open source packages such as Networkx (<a href="https://networkx.github.io/">networkx.github.io</a>), NumPy (<a href="https://numpy.org/">numpy.org</a>), SciPy (<a href="https://www.scipy.org/">scipy.org</a>), Matplotlib (<a href="https://matplotlib.org/">matplotlib.org</a>), Pandas (<a href="https://pandas.pydata.org/">pandas.pydata.org</a>) and D3 JavaScript (<a href="https://d3js.org/">d3js.org</a>).
    
<em>Note:</em> a tutorial focusing on the python programming language is beyond the scope of this notebook. To learn how to program in Python with Jupyter Notebook please refer to: 
<a href="https://mybinder.org/v2/gh/jakevdp/PythonDataScienceHandbook/master?filepath=notebooks%2FIndex.ipynb">Python Data Science Handbook (Jake VanderPlas, 2016)</a>.

In [1]:
import os
    
home = os.getcwd() + "/"

import numpy as np
import pandas as pd
from IPython.display import Javascript, display, IFrame, HTML, display_javascript
from sklearn.preprocessing import StandardScaler
import multivis

print('All packages successfully loaded')

%load_ext autoreload
%autoreload 2

All packages successfully loaded


  import pandas.util.testing as tm


<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

<h2 style="text-align: justify">2. Load Data and Peak sheet</h2>

<p style="text-align: justify">The code cell below loads the <em>Data</em> and <em>Peak</em> sheets from an Excel file using <code>loadData()</code>. When this is complete, you should see confirmation that Peak (the Peak worksheet) and Data (the Data worksheet) tables have been loaded.<br><br>

This dataset has previously been published in (<a href="https://physoc.onlinelibrary.wiley.com/doi/full/10.1113/EP087159">Lawler et al. (2018)</a>) in <i>Experimental Physiology</i> and has been put into a standardised <a href="https://en.wikipedia.org/wiki/Tidy_data">Tidy Data</a> format.
</p>
    
Further information on the publication with a link to the data repository can be found here: Project ID <a href="XXX">XXX</a>). 

Please inspect the <a href="Altitude_Data.xlsx">Altitude_Data.xlsx </a>Excel file before using it in this tutorial to understand its structure. To change the dataset to be loaded into the notebook replace <code>filename = 'Altitude_Data.xlsx'</code> with another file with the same <a href="https://en.wikipedia.org/wiki/Tidy_data">Tidy Data</a> format as <a href="Altitude_Data.xlsx">Altitude_Data.xlsx</a>, and then rerun the workflow.

</div></div>

In [2]:
file = 'Altitude_Data.xlsx'

DataTable,PeakTable = multivis.utils.loadData(home + file, DataSheet='Data', PeakSheet='Peak')

Loading sheet: Peak
Loading sheet: Data
TOTAL SAMPLES: 29 TOTAL PEAKS: 32
Done!


<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Display the Data sheet

Check the imported Data table simply by calling the function <span style="font-family: monaco; font-size: 14px; background-color:white;">display(DataTable)</span><br>
</div>

In [3]:
display(DataTable)

Unnamed: 0,Idx,Class,Subject,M1,M2,M3,M4,M5,M6,M7,...,M23,M24,M25,M26,M27,M28,M29,M30,M31,M32
1,1,Day1,ID#8,11869.885789,7928.191086,10435.6156,24274.490844,1652528.0,552759.7,3302.626974,...,8821.86844,45452.021662,842267.4,35812.954717,11908.058199,58304.192636,76229.516925,698.829269,6152166.0,301138.4
2,2,Day3,ID#9,6091.032382,336302.294432,7583.153313,15992.455361,1014382.0,539827.5,2561.39659,...,7070.409466,23748.717737,596931.0,43209.005653,1614.750726,17107.605509,33792.399242,5219.808748,3844697.0,387650.7
3,3,Day14,ID#10,8304.840762,91992.628406,15463.843074,17243.287685,1261087.0,635835.1,2579.520637,...,6189.891266,62171.092488,461790.8,19891.976258,5787.252806,21196.689803,49518.409732,6917.767103,3642466.0,3085592.0
4,4,Day1,ID#4,5679.245738,103604.360972,11726.584477,18997.94632,1730119.0,801818.6,1862.338814,...,14710.768208,29470.036797,633863.3,54690.964915,6119.355667,34562.628959,25722.056666,82857.001884,6507678.0,454499.7
5,5,Day3,ID#3,7927.71988,377994.409342,37252.85594,23070.362098,2172443.0,812261.7,2368.612424,...,8373.694858,70177.184112,576483.6,119391.571698,5831.690401,11314.28983,27222.497589,8780.422655,5452713.0,367584.1
6,6,Day14,ID#3,4074.580486,9234.118904,2883.602638,27252.577493,1694014.0,319840.3,2436.745786,...,17175.921137,43648.210575,366902.6,41311.72709,20909.038093,14625.381528,61184.204465,2386.211484,3958768.0,703689.0
7,7,Day3,ID#2,7120.333141,98285.350473,14214.077514,24997.653286,1421475.0,698111.4,3576.187472,...,11930.411039,33482.365703,769590.7,17285.765486,4002.855161,18389.688133,54971.306524,1266.178495,4396494.0,552230.2
8,8,Day3,ID#4,8103.250066,31683.902081,8928.894882,20545.072813,1655281.0,388682.6,3023.746753,...,9760.323565,24682.407167,695113.4,33854.620792,5749.788807,13098.908274,32455.790304,13672.154888,3771656.0,1186001.0
9,9,Day14,ID#8,8273.693156,382598.960254,4154.97273,22212.782829,1891639.0,891573.2,2310.807344,...,11142.048186,50927.179351,682817.9,169428.675064,5179.87386,14255.030832,63933.849462,2019.496191,3703559.0,496448.3
10,10,Day3,ID#10,6690.535074,98946.80458,5388.740427,14757.880513,1370226.0,664479.7,2577.400534,...,10905.284099,30077.046114,662094.7,44494.246759,5019.971982,9310.215496,32246.148489,455.958833,3207303.0,355998.6


<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Display the Peak sheet

Check the imported Peak table  simply by calling the function <span style="font-family: monaco; font-size: 14px; background-color:white;">display(PeakTable)</span><br>
</div>

In [4]:
display(PeakTable)

Unnamed: 0,Idx,Name,Label,Mode,mz,rt,F,pvalue,pFDR,RSD,Dratio
1,1,M1,Ocatanedioic acid,Negative,173.081494,528.593,1.205805,0.316809,0.290529,10.471229,2.887221
2,2,M2,Glycoursodeoxycholic acid,Negative,448.30488,1158.75,6.674117,0.007382,0.07975,4.706839,24.314863
3,3,M3,Dodecanedioc acid,Negative,229.143848,966.471,3.659888,0.079039,0.191246,19.856452,3.216742
4,4,M4,Succinic Acid,Negative,117.019323,133.4985,0.415028,0.626102,0.341498,5.577582,4.025602
5,5,M5,Citric Acid,Negative,191.019367,109.544,0.741641,0.435266,0.311669,9.170874,2.417115
6,6,M6,Lactic Acid,Negative,89.025122,84.363,4.928634,0.020552,0.111017,10.341807,2.332534
7,7,M7,5-Hydroxytryptophan,Negative,219.076809,217.493,1.254299,0.303983,0.286883,16.876721,1.564374
8,8,M8,Glycocholic acid,Negative,464.299632,979.16,7.62628,0.005556,0.076033,11.902311,7.463177
9,9,M9,L-Tryptophan,Negative,203.082155,337.889,1.335823,0.287394,0.284528,4.289423,3.48065
10,10,M10,Hexadecanedioic acid,Negative,285.20573,1344.61,0.554782,0.543473,0.328267,9.537188,4.942172


<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 3. Statistical analysis

Statistical analysis is important to identify any features or samples which may be outliers. 
It is also important to identify whether the data is normally distributed prior to any further analysis such as correlation analysis. Whether the data is normally distrubuted or not can determine the most suitable correlation function to use. For example the parametric method Pearson's correlation should be used for normally distributed data, whereas the non-parametric method Spearman's correlation is suitable for non-normally distributed data.

Statistical analysis can also provide additional univariate information for futher down-stream visualisations, such as one-way Anova p-values and PCA loadings for each feature to displayed in each of the nodes of the hierarchical edge bundle.
</div>

In [5]:
stats = multivis.utils.statistics(PeakTable, DataTable)

stats.help()

Generate a table of parametric or non-parametric statistics and merges them with the Peak Table (node table).
        Initial_Parameters
            ----------
            peaktable : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
            datatable : Pandas dataframe matrix containing scores

        Methods
            -------
            set_params : Set parameters -
                parametric: Perform parametric statistical analysis, assuming the data is normally distributed (default: True)
                log_data: Perform a log on all data prior to statistical analysis (default: False)
                group_column_name: The group column name used in the datatable (default: None)
                control_group_name: The control group name in the datatable, if available (default: None)
                group_alpha_CI: The alpha value for group confidence intervals (default: 0.05)
                median_fold_change_alpha_CI: The alpha value for median fold 

In [6]:
params = dict({'parametric': False
              , 'log_data': True
              , 'group_column_name': None
              , 'control_group_name': None
              , 'group_alpha_CI': 0.05
              , 'median_fold_change_alpha_CI': 0.05
              , 'pca_alpha_CI': 0.05
              , 'total_missing': False
              , 'group_missing': False
              , 'pca_loadings': True
              , 'normality_test': True
              , 'group_normality_test': False
              , 'group_mean_CI': False
              , 'group_median_CI': False
              , 'median_fold_change': False
              , 'kruskal_wallis_test': False
              , 'levene_twoGroup': False
              , 'levene_allGroup': False
              , 'oneway_Anova_test': False
              , 'ttest_oneGroup': False
              , 'ttest_twoGroup': False
              , 'mann_whitney_u_test': False})

stats.set_params(**params)

PeakTableStats = stats.calculate()

<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Display the Peak sheet with statistical information

Check the imported Peak table with statistical information, by simply calling the function <span style="font-family: monaco; font-size: 14px; background-color:white;">display(PeakTableStats)</span><br>
</div>

In [7]:
display(PeakTableStats)

Unnamed: 0,Idx,Name,Label,Mode,mz,rt,F,pvalue,pFDR,RSD,...,Shapiro_statistic,Shapiro_pvalue,PC1,PC2,PC1_lower,PC1_upper,PC1_sig,PC2_lower,PC2_upper,PC2_sig
0,0,M1,Ocatanedioic acid,Negative,173.081494,528.593,1.205805,0.316809,0.290529,10.471229,...,0.964668,0.4256959,0.172449,-0.075246,-0.004328,0.379405,False,-0.371117,0.141209,False
1,1,M2,Glycoursodeoxycholic acid,Negative,448.30488,1158.75,6.674117,0.007382,0.07975,4.706839,...,0.880608,0.003469536,0.166588,0.247957,-0.072998,0.386042,False,0.04934,0.465034,True
2,2,M3,Dodecanedioc acid,Negative,229.143848,966.471,3.659888,0.079039,0.191246,19.856452,...,0.962037,0.3686097,0.193745,-0.070164,0.029889,0.342714,True,-0.383324,0.138315,False
3,3,M4,Succinic Acid,Negative,117.019323,133.4985,0.415028,0.626102,0.341498,5.577582,...,0.912227,0.01946388,-0.035571,-0.240657,-0.248807,0.249519,False,-0.609621,-0.115223,True
4,4,M5,Citric Acid,Negative,191.019367,109.544,0.741641,0.435266,0.311669,9.170874,...,0.653896,4.894164e-07,-0.076491,-0.160579,-0.523625,0.042642,False,-0.558396,0.00478,False
5,5,M6,Lactic Acid,Negative,89.025122,84.363,4.928634,0.020552,0.111017,10.341807,...,0.983205,0.9110418,0.234512,0.009089,0.134311,0.384394,True,-0.230515,0.239335,False
6,6,M7,5-Hydroxytryptophan,Negative,219.076809,217.493,1.254299,0.303983,0.286883,16.876721,...,0.971145,0.5911415,-0.056805,-0.250191,-0.272329,0.195233,False,-0.478667,-0.116964,True
7,7,M8,Glycocholic acid,Negative,464.299632,979.16,7.62628,0.005556,0.076033,11.902311,...,0.986696,0.966218,0.346193,-0.049402,0.320745,0.404001,True,-0.313089,0.248794,False
8,8,M9,L-Tryptophan,Negative,203.082155,337.889,1.335823,0.287394,0.284528,4.289423,...,0.961391,0.3555563,0.244783,-0.125493,0.055943,0.428122,True,-0.396451,0.205814,False
9,9,M10,Hexadecanedioic acid,Negative,285.20573,1344.61,0.554782,0.543473,0.328267,9.537188,...,0.95258,0.2134427,0.111963,-0.2914,-0.163126,0.395038,False,-0.467644,-0.1539,True


<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 4. Clean the data, removing any non-normally distributed features prior to further parametric analysis

</div>

In [8]:
PeakTableStats_filtered = PeakTableStats[PeakTableStats.Shapiro_pvalue > 0.05]

<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 4.  Log transform and scale values

Transformation can be done with a log which is commonly used for biological data types, however there are other types such as square, square root, cube root or reciprocal transformation, which may be suitable for other types of data. Scaling is highly dependent on the values found in the dataset. If the dataset contains a number of values which are very large compared to the majority of other values, then this may skew the data in favour of those values and consequently bias the results. Scaling methods such as unit variance and pareto scaling scale all the values in the dataset so that they all values are comparable. Pearson correlation, a parametric method, is performed next where a log is necessary to normally distribute the data. However, if a non-parametric method was used, such as Spearman or Kendall's Tau, then log is not necessary. Additionally, also in this case no scaling is necessary after log transformation, as correlation analysis does not require scaling, as covariance is measured between individual values in the bivariate correlation analysis, unlike with PCA in tutorial 1, which performs multivariate analysis, measuring the variance across all values. However, if another similarity metric were to be used in place of correlation, such as Euclidean distance, then scaling may be a necessary step to take.

</div>

In [9]:
# Extract and scale the metabolite data from the DataTable 

peaklist = PeakTableStats_filtered['Name']                  # Set peaklist to the metabolite names in the DataTableClean
X = DataTable[peaklist].values                # Extract X matrix from DataTable using peaklist
Xlog = np.log10(X)                            # Log transform (base-10)
#Xscale = scaler.fit_transform(Xlog)          # Scale to unit variance (not necessary if later performing correlation analysis)

X_data = pd.DataFrame(Xlog, columns=peaklist)

<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 5. Correlation analysis

Correlation is a form of similarity and measures the strength of the linear relationship between two variables. Pearson's correlation, a form of parametric correlation analysis, is described mathematically by dividing the joint variability or covariance of two variables by the product of their standard deviations (see Eq1). Other forms of correlation measure the monotonic relationships and are non-parametric, such as Spearman’s rank correlation and Kendall Tau's correlation. The following correlation analysis, allows for Pearson, Spearman or Kendall Tau's correlation analysis.

\begin{equation*}
r = \frac{Cov(X,Y)}{SD(X).SD(Y)}
\end{equation*}
<center>Eq1: Pearson’s correlation coefficient</center>

</div>

In [10]:
correlationType = "pearson"; #"spearman"; "kendalltau"

X = X_data[PeakTableStats_filtered['Name']]

ScoreBlocks,PvalueBlocks = multivis.utils.corrAnalysis(X, correlationType)

100%|██████████| 24/24 [00:00<00:00, 310.14it/s]


<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 6.  Generate Edges

The similarities are filtered and put in a dataframe of edges, where nodes represent metabolites and edges represent similarity scores (correlation coefficients), with included node names, labels, colours, similarity scores and pvalues.

</div>

In [11]:
networkEdges = multivis.Edge(peaktable=PeakTableStats_filtered, datatable=ScoreBlocks, pvalues=PvalueBlocks)

networkEdges.help()

Builds nodes and edges and is the base class for the Network class.

        Initial_Parameters
        ----------
        peaktable : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
        datatable : Pandas dataframe matrix containing scores
        pvalues : Pandas dataframe matrix containing score/similarity pvalues (if available, otherwise set to None)

        Methods
        -------
        set_params : Set parameters
            filter_type: The value type to filter the data on (default: 'pvalue')
            hard_threshold: Value to filter the data on (default: 0.005)
            withinBlocks: Include scores within blocks if building multi-block network (default: False)
            sign: The sign of the score/similarity to filter on ('pos', 'neg' or 'both') (default: 'both')

        help : Print this help text

        build : Builds the nodes and edges.
        getNodes : Returns a Pandas dataframe of all nodes.
        getEdges : Returns a Pandas da

In [12]:
params = dict({'filter_type': 'pvalue'               #The filer type to use for the similarities matrix ('Pvalue' or 'Score')              
                    , 'hard_threshold': 0.2          #The hard threshold to apply to the similarities matrix
                    , 'withinBlocks': False  #Include scores within blocks if building multi-block network
                    , 'sign': "both"})               #The sign of the similarities ('pos', 'neg' or 'both')

networkEdges.set_params(**params)

networkEdges.build()

edges = networkEdges.getEdges()
nodes = networkEdges.getNodes()

In [13]:
edges

Unnamed: 0,start_index,start_name,start_label,end_index,end_name,end_label,score,sign,pvalue
0,0,M1,Ocatanedioic acid,1,M3,Dodecanedioc acid,0.454770,1.0,0.013191
1,0,M1,Ocatanedioic acid,2,M6,Lactic Acid,0.303864,1.0,0.109037
2,0,M1,Ocatanedioic acid,4,M8,Glycocholic acid,0.423531,1.0,0.022055
3,0,M1,Ocatanedioic acid,5,M9,L-Tryptophan,0.443496,1.0,0.015964
4,0,M1,Ocatanedioic acid,6,M10,Hexadecanedioic acid,0.424193,1.0,0.021826
...,...,...,...,...,...,...,...,...,...
107,18,M25,Phenylalanine,20,M27,2-Octenoylcarnitine,-0.257989,-1.0,0.176629
108,19,M26,Adenosine,20,M27,2-Octenoylcarnitine,-0.287670,-1.0,0.130229
109,19,M26,Adenosine,21,M29,Cortisol,-0.337043,-1.0,0.073789
110,20,M27,2-Octenoylcarnitine,21,M29,Cortisol,0.485343,1.0,0.007613


<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Display node data used in the Hierarchical Edge Bundle

Check the node data simply by calling the function <span style="font-family: monaco; font-size: 14px; background-color:white;">display(nodes)</span><br>
</div>

In [14]:
display(nodes)

Unnamed: 0,Idx,Name,Label,Mode,mz,rt,F,pvalue,pFDR,RSD,...,Shapiro_statistic,Shapiro_pvalue,PC1,PC2,PC1_lower,PC1_upper,PC1_sig,PC2_lower,PC2_upper,PC2_sig
0,0,M1,Ocatanedioic acid,Negative,173.0814936,528.593,1.20580499949928,0.316809429654668,0.290528592310766,10.4712294328065,...,0.9646682739257812,0.4256958961486816,0.1724494829104768,-0.0752463011844634,-0.0043283717340059,0.3794053049147053,False,-0.3711174441402688,0.1412088272901892,False
1,1,M3,Dodecanedioc acid,Negative,229.143848,966.471,3.65988782518355,0.0790391935278154,0.191245625859604,19.8564515336576,...,0.9620373249053956,0.3686096668243408,0.1937452268095154,-0.0701638797406561,0.0298885396811498,0.3427142759930367,True,-0.3833235933266539,0.1383154817894357,False
2,2,M6,Lactic Acid,Negative,89.02512203,84.363,4.92863410270874,0.020552209789679,0.111017251280658,10.3418071336858,...,0.9832051396369934,0.911041796207428,0.2345115676934214,0.0090894068325294,0.1343106198142524,0.3843943973329666,True,-0.2305148711556528,0.2393349461597855,False
3,3,M7,5-Hydroxytryptophan,Negative,219.0768091,217.493,1.25429929524897,0.303982625217717,0.286883151078054,16.8767206661306,...,0.97114497423172,0.5911415219306946,-0.0568054972129033,-0.2501907709749494,-0.2723290076184799,0.195232889670412,False,-0.4786667584272614,-0.1169642076167158,True
4,4,M8,Glycocholic acid,Negative,464.2996324,979.16,7.62627977928109,0.0055561816777377,0.0760327539927116,11.9023111972631,...,0.9866955280303956,0.9662179946899414,0.3461926347479641,-0.04940187034722,0.320745320115649,0.4040010361894586,True,-0.3130886843999339,0.2487937083989633,False
5,5,M9,L-Tryptophan,Negative,203.0821552,337.889,1.33582291749258,0.287394237048956,0.284527643984525,4.28942262139224,...,0.9613911509513856,0.3555563390254974,0.2447834308059566,-0.1254931615269014,0.0559426413757861,0.4281217769634591,True,-0.3964511051946188,0.2058140176241171,False
6,6,M10,Hexadecanedioic acid,Negative,285.2057299,1344.61,0.554781791967447,0.543472507847197,0.328266939980167,9.53718839791506,...,0.9525795578956604,0.213442713022232,0.1119633142916463,-0.291399669928092,-0.1631258963573475,0.395038080312861,False,-0.467644480225538,-0.1538995840090826,True
7,7,M11,Pseudouridine,Negative,243.0612565,85.364,1.6435444299445,0.221582868255957,0.264951112801882,3.20575216891402,...,0.9761365056037904,0.7331565618515015,-0.0057548661063859,-0.2502900981607978,-0.2395304669798411,0.2760523152885438,False,-0.4868124117393693,-0.0765961979288973,True
8,8,M12,Uric acid,Negative,335.0480195,94.622,1.81229025915499,0.19267708311762,0.254613216625571,4.05003744537164,...,0.9615872502326964,0.359477698802948,0.0148663500523906,-0.2732034585846769,-0.2638461034208656,0.2298539503228373,False,-0.4779900382822973,-0.1738419729037371,True
9,9,M14,Hypoxanthine,Positive,137.0463946,100.593,3.31714245291838,0.0618275109477684,0.177911813619459,4.13574593254638,...,0.9563542604446412,0.266516774892807,0.0122516106872489,-0.0196582166980569,-0.190983776711368,0.1655886229441613,False,-0.2436645761932165,0.3478471461207987,False


<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Display edge data used in the Hierarchical Edge Bundle

Check the edge data simply by calling the function <span style="font-family: monaco; font-size: 14px; background-color:white;">display(edges)</span><br>
</div>

In [15]:
display(edges)

Unnamed: 0,start_index,start_name,start_label,end_index,end_name,end_label,score,sign,pvalue
0,0,M1,Ocatanedioic acid,1,M3,Dodecanedioc acid,0.454770,1.0,0.013191
1,0,M1,Ocatanedioic acid,2,M6,Lactic Acid,0.303864,1.0,0.109037
2,0,M1,Ocatanedioic acid,4,M8,Glycocholic acid,0.423531,1.0,0.022055
3,0,M1,Ocatanedioic acid,5,M9,L-Tryptophan,0.443496,1.0,0.015964
4,0,M1,Ocatanedioic acid,6,M10,Hexadecanedioic acid,0.424193,1.0,0.021826
...,...,...,...,...,...,...,...,...,...
107,18,M25,Phenylalanine,20,M27,2-Octenoylcarnitine,-0.257989,-1.0,0.176629
108,19,M26,Adenosine,20,M27,2-Octenoylcarnitine,-0.287670,-1.0,0.130229
109,19,M26,Adenosine,21,M29,Cortisol,-0.337043,-1.0,0.073789
110,20,M27,2-Octenoylcarnitine,21,M29,Cortisol,0.485343,1.0,0.007613


<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 7.  Sort edges

Sort the edges for visualisation preference

</div>

In [16]:
#edges.sort_values(['start_index', 'end_index'], inplace=True, ascending=True)
#edges.sort_values('pvalue', inplace=True, ascending=False)
edges.sort_values('score', inplace=True, ascending=False)

<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 8.  Plot Hierarchical edge bundle

The edges from the network are then passed into D3 JavaScript to generate a Hierarchical edge bundle and embedded in HTML for interactive visualisation. The Hierarchical edge bundle is implemented as a circular hierarchical tree structure, with nodes on the outside and edges passing through the circle following a bundled curve until they connect to other nodes. The edges represent correlation coefficients and can be coloured accordingly based on the correlation values, their sign or represented as pvalues and coloured using a continuous colour map, and different meta data such as groups/classes within the data can also be reflected in the plot to illustrate how the different groups/classes are correlated and to what degree.

Note: The visualisation will automatically open in another tab, unless running in Binder (See step 9).

</div>

In [17]:
bundle = multivis.edgeBundle(nodes,edges)

bundle.help()

Produces an interactive hierarchical edge bundle in D3.js, from nodes and edges.

        Parameters
        ----------
        nodes : Pandas dataframe containing nodes generated from Edge.
        edges : Pandas dataframe containing edges generated from Edge.
        
        Methods
        -------
        set_params : Set parameters -
            html_file: Name to save the HTML file as (default: 'hEdgeBundle.html')
            innerRadiusOffset: Sets the inner radius based on the offset value from the canvas width/diameter (default: 120)
            blockSeparation: Value to set the distance between different segmented blocks (default: 1)
            linkFadeOpacity: The link fade opacity when hovering over/clicking nodes (default: 0.05)
            mouseOver: Setting to 'True' swaps from clicking to hovering over nodes to select them (default: True)
            fontSize: The font size in pixels set for each node (default: 10)
            backgroundColor: Set the background colour

  self[key]


In [18]:
params = dict({'html_file': 'hEdgeBundle.html'      #HTML file name to save to
               , 'innerRadiusOffset': 120           #The offset from the radius to determine the inner radius of the edge bundle
               , 'blockSeparation': 3               #The degree of separation between blocks
               , 'linkFadeOpacity': 0.01            #The opacity of faded links
               , 'mouseOver': True                  #Setting to 'True' swaps from clicking to hovering over nodes to select them 
               , 'fontSize': 10                     #The font size of each node
               , 'backgroundColor': 'white'         #Set the background colour of the plot
               , 'foregroundColor': 'black'         #Set the foreground colour of the plot      
               , 'node_data': ['Name', 'Label', 'Mode', 'mz', 'rt', 'F', 'pvalue', 'pFDR', 'RSD', 'Dratio', 'PC1', 'PC2', 'PC1_sig', 'PC2_sig']
               , 'nodeColorScale': 'linear'         #The scale to use for colouring the nodes
               , 'node_color_column': 'pvalue'      #If node_color_column contains colour values it overides the use of node_cmap
               , 'node_cmap': 'brg'                 #Set the colour palette to use for colouring the nodes               
               , 'edge_color_value': 'sign'         #Set the values to colour the edges by. Either 'sign', 'score' or 'pvalue'.
               , 'edgeColorScale': 'linear'         #The scale to use for colouring the edges (if edge_color_value is 'pvalue')
               , 'edge_cmap': 'brg'                 #Set the colour palette to use for colouring the edges
               , 'addArcs': True                    #Setting to 'True' adds arcs around the edge bundle for each block
               , 'arcRadiusOffset': 20              #Sets the arc radius offset from the inner radius
               , 'extendArcAngle': 4                #Sets the angle value to add to each end of the arcs 
               , 'arc_cmap': 'tab20'})              #Sets the CMAP colour palette to use for colouring the arcs

bundle.set_params(**params)

bundle.build()

HTML writen to hEdgeBundle.html


<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Build Dashboard

A dashboard with panels for the hierarchical edge bundle, node data and sliders is built, allowing for a more robust interface for exploratory analysis of the data. The dashboard is automatically written to HTML and launched upon creation.

</div>

In [19]:
bundle.buildDashboard()

HTML writen to hEdgeBundle_dashboard.html


<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 9.  Alternative visualisation options

The visualisations will automatically open when run locally in Jupyter notebook. However, when running in Binder the visualisations will not open in a new tab due to security restrictions within Binder. Therefore, the following options are available: IFrames and JavaScript.

Note 1: Due to security restrictions within Jupyter notebook, resizing the window opened from within Jupyter notebook with Javascript or IFrame will result in a 403: Forbidden error, due to the visualisation being reloaded each time. In this case, just change the dimensions set in the cell and rerun the cell.

Note 2: Depending on the browser you're using the save button may be disabled when opening the visualisation in an IFrame or JavaScript pop-up. In Chrome downloads have been disable to prevent malicious behavior. If you have trouble downloading try a different browser or use the visualisation opened automatically in another tab when running Jupyter notebook locally.
</div>

<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Plain visualisation
</div>

In [20]:
vis_option = "IFrame"  #"Javascript"
file = params['html_file']

if vis_option.lower() == "javascript":
    display(Javascript('''window.open(\'{}\','spring','width=1000,height=1000')'''.format(file)))
elif vis_option.lower() == "iframe":
    display(IFrame(src=file, width='100%', height='1000px'))

<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Visualisation with dashboard
</div>

In [21]:
vis_option = "javascript" #"IFrame"
file = params['html_file'].split(".")[0]+"_dashboard.html"

if vis_option.lower() == "javascript":
    display(Javascript('''window.open(\'{}\','spring','width=1500,height=1000')'''.format(file)))
elif vis_option.lower() == "iframe":
    display(IFrame(src=file, width='100%', height='1000px'))

<IPython.core.display.Javascript object>