<div style="text-align: justify; padding:5px; background-color:rgb(252, 253, 255); border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">
    <font color='red'>Mini Jupyter tutorial<br><br>To run each cell, click the cell and press <kbd>Run</kbd> from the menu bar. This will run any Python code or display any text within the selected cell before highlighting the next cell down. There are two types of cell: A <i>text cell</i> of type <kbd>Markdown</kbd> or <kbd>Heading</kbd> and a <i>code cell</i> of type <kbd>Code</kbd> identifiable with the <span style="font-family: courier; color:black; background-color:white;">In[ ]:</span> to the left of the cell</i>. The type of cell is also identifiable from the dropdown menu in the above menu bar to the right of <kbd>Run</kbd>. Any visual results produced by the code (text/figures) are displayed directly below that cell. Press <kbd>Run</kbd> again until you reach the end of the notebook or alternatively click <kbd>Kernel</kbd><font color='black'>→</font><kbd>Restart and Run All</kbd>. Should the Jupyter notebook crash for any reason, restart the Jupyter Kernel by clicking <kbd>Kernel</kbd><font color='black'>→</font><kbd>Restart</kbd>, and start again from the top.
        
</div>

# Tutorial 1.6: Metabolomics of athlete performance at high altitude: A multi-block hierarchical edge bundle

<p style="text-align: justify">
<br>
This tutorial covers the necessary steps for producing a Hierarchical Edge Bundle using multi-block data from a study on the impact that high altitude has on the performanance of athletes.
</p>

<div style="text-align: justify; padding:5px; background-color:rgb(252, 253, 255); border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">
    <font color='red', size=4>Note: If visualisng using a JavaScript pop-up window you will need to allow pop-ups from your browser for the domain you're running from (localhost or mybinder.org).
</div> 

<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">
    
<h2 id="1importpackagesmodules" style="text-align: justify">1. Import Packages/Modules</h2>

<p style="text-align: justify">The first code cell of this tutorial imports <a href="https://docs.python.org/3/tutorial/modules.html"><em>packages</em> and <em>modules</em></a> into the Jupyter environment. <em>Packages</em> and <em>modules</em> provide additional functions and tools beyond the in-built Python modules.
<br></p>
<br>
All the code embedded in this notebook is written using Python (<a href="http://www.python.org">python.org</a>) and JavaScript (<a href="https://www.javascript.com/">javascript.com</a>) and are built upon popular open source packages such as Networkx (<a href="https://networkx.github.io/">networkx.github.io</a>), NumPy (<a href="https://numpy.org/">numpy.org</a>), SciPy (<a href="https://www.scipy.org/">scipy.org</a>), Matplotlib (<a href="https://matplotlib.org/">matplotlib.org</a>), statsmodels (<a href="www.statsmodels.org/">statsmodels.org</a>), Scikit-learn (<a href="scikit-learn.org/">scikit-learn.org</a>), scikits.bootstrap (<a href="github.com/cgevans/scikits-bootstrap">github.com/cgevans/scikits-bootstrap</a>), Pandas (<a href="https://pandas.pydata.org/">pandas.pydata.org</a>) and D3 JavaScript (<a href="https://d3js.org/">d3js.org</a>).
    
<em>Note:</em> a tutorial focusing on the python programming language is beyond the scope of this notebook. To learn how to program in Python with Jupyter Notebook please refer to: 
<a href="https://mybinder.org/v2/gh/jakevdp/PythonDataScienceHandbook/master?filepath=notebooks%2FIndex.ipynb">Python Data Science Handbook (Jake VanderPlas, 2016)</a>.

In [1]:
import os
    
home = os.getcwd() + "/"

import numpy as np
import pandas as pd
from IPython.display import Javascript, display, IFrame
import multivis

print('All packages successfully loaded')

%load_ext autoreload
%autoreload 2

All packages successfully loaded


<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

<h2 style="text-align: justify">2. Load Data and Peak Table</h2>

<p style="text-align: justify">The code cell below loads the <em>Data</em> and <em>Peak</em> tables from an Excel file using <code>loadData()</code>. When this is complete, you should see confirmation that Peak (the Peak worksheet) and Data (the Data worksheet) tables have been loaded.<br>

This dataset has previously been published in (<a href="https://physoc.onlinelibrary.wiley.com/doi/full/10.1113/EP087159">Lawler et al. (2018)</a>) in <i>Experimental Physiology</i> and has been put into a standardised <a href="https://en.wikipedia.org/wiki/Tidy_data">Tidy Data</a> format.
</p> 

Please inspect the <a href="Altitude_Data.xlsx">Altitude_Data.xlsx </a>Excel file before using it in this tutorial to understand its structure. To change the dataset to be loaded into the notebook replace <code>filename = 'Altitude_Data.xlsx'</code> with another file with the same <a href="https://en.wikipedia.org/wiki/Tidy_data">Tidy Data</a> format as <a href="Altitude_Data.xlsx">Altitude_Data.xlsx</a>, and then rerun the workflow.

</div></div>

In [2]:
file = 'Altitude_Data.xlsx'

DataTable,PeakTable = multivis.utils.loadData(home + file, DataSheet='Data', PeakSheet='Peak')

Loading table: Peak
Loading table: Data
TOTAL SAMPLES: 29 TOTAL PEAKS: 32
Done!


<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Display the Data Table

Check the imported Data table simply by calling the function <span style="font-family: monaco; font-size: 14px; background-color:white;">display(DataTable)</span><br>
</div>

In [3]:
display(DataTable)

Unnamed: 0,Idx,Class,SampleID,M1,M2,M3,M4,M5,M6,M7,...,M23,M24,M25,M26,M27,M28,M29,M30,M31,M32
1,1,Day1,ID#8,11869.885789,7928.191086,10435.6156,24274.490844,1652528.0,552759.7,3302.626974,...,8821.86844,45452.021662,842267.4,35812.954717,11908.058199,58304.192636,76229.516925,698.829269,6152166.0,301138.4
2,2,Day3,ID#9,6091.032382,336302.294432,7583.153313,15992.455361,1014382.0,539827.5,2561.39659,...,7070.409466,23748.717737,596931.0,43209.005653,1614.750726,17107.605509,33792.399242,5219.808748,3844697.0,387650.7
3,3,Day14,ID#10,8304.840762,91992.628406,15463.843074,17243.287685,1261087.0,635835.1,2579.520637,...,6189.891266,62171.092488,461790.8,19891.976258,5787.252806,21196.689803,49518.409732,6917.767103,3642466.0,3085592.0
4,4,Day1,ID#4,5679.245738,103604.360972,11726.584477,18997.94632,1730119.0,801818.6,1862.338814,...,14710.768208,29470.036797,633863.3,54690.964915,6119.355667,34562.628959,25722.056666,82857.001884,6507678.0,454499.7
5,5,Day3,ID#3,7927.71988,377994.409342,37252.85594,23070.362098,2172443.0,812261.7,2368.612424,...,8373.694858,70177.184112,576483.6,119391.571698,5831.690401,11314.28983,27222.497589,8780.422655,5452713.0,367584.1
6,6,Day14,ID#3,4074.580486,9234.118904,2883.602638,27252.577493,1694014.0,319840.3,2436.745786,...,17175.921137,43648.210575,366902.6,41311.72709,20909.038093,14625.381528,61184.204465,2386.211484,3958768.0,703689.0
7,7,Day3,ID#2,7120.333141,98285.350473,14214.077514,24997.653286,1421475.0,698111.4,3576.187472,...,11930.411039,33482.365703,769590.7,17285.765486,4002.855161,18389.688133,54971.306524,1266.178495,4396494.0,552230.2
8,8,Day3,ID#4,8103.250066,31683.902081,8928.894882,20545.072813,1655281.0,388682.6,3023.746753,...,9760.323565,24682.407167,695113.4,33854.620792,5749.788807,13098.908274,32455.790304,13672.154888,3771656.0,1186001.0
9,9,Day14,ID#8,8273.693156,382598.960254,4154.97273,22212.782829,1891639.0,891573.2,2310.807344,...,11142.048186,50927.179351,682817.9,169428.675064,5179.87386,14255.030832,63933.849462,2019.496191,3703559.0,496448.3
10,10,Day3,ID#10,6690.535074,98946.80458,5388.740427,14757.880513,1370226.0,664479.7,2577.400534,...,10905.284099,30077.046114,662094.7,44494.246759,5019.971982,9310.215496,32246.148489,455.958833,3207303.0,355998.6


<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Display the Peak Table

Check the imported Peak table simply by calling the function <span style="font-family: monaco; font-size: 14px; background-color:white;">display(PeakTable)</span><br>
</div>

In [4]:
display(PeakTable)

Unnamed: 0,Idx,Name,Label,Mode,mz,rt,F,pvalue,pFDR,RSD,Dratio
1,1,M1,Ocatanedioic acid,Negative,173.081494,528.593,1.205805,0.316809,0.290529,10.471229,2.887221
2,2,M2,Glycoursodeoxycholic acid,Negative,448.30488,1158.75,6.674117,0.007382,0.07975,4.706839,24.314863
3,3,M3,Dodecanedioc acid,Negative,229.143848,966.471,3.659888,0.079039,0.191246,19.856452,3.216742
4,4,M4,Succinic Acid,Negative,117.019323,133.4985,0.415028,0.626102,0.341498,5.577582,4.025602
5,5,M5,Citric Acid,Negative,191.019367,109.544,0.741641,0.435266,0.311669,9.170874,2.417115
6,6,M6,Lactic Acid,Negative,89.025122,84.363,4.928634,0.020552,0.111017,10.341807,2.332534
7,7,M7,5-Hydroxytryptophan,Negative,219.076809,217.493,1.254299,0.303983,0.286883,16.876721,1.564374
8,8,M8,Glycocholic acid,Negative,464.299632,979.16,7.62628,0.005556,0.076033,11.902311,7.463177
9,9,M9,L-Tryptophan,Negative,203.082155,337.889,1.335823,0.287394,0.284528,4.289423,3.48065
10,10,M10,Hexadecanedioic acid,Negative,285.20573,1344.61,0.554782,0.543473,0.328267,9.537188,4.942172


<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 3. Statistical analysis

Statistical analysis is important to identify any features or samples which may be outliers. 
It is also important to identify whether the data is normally distributed prior to any further analysis such as correlation analysis. Whether the data is normally distrubuted or not can determine the most suitable correlation function to use. For example the parametric method Pearson's correlation should be used for normally distributed data, whereas the non-parametric method Spearman's correlation is suitable for non-normally distributed data.

Statistical analysis can also provide additional univariate information for futher down-stream visualisations, such as one-way Anova p-values and PCA loadings for each feature to displayed in each of the nodes of the hierarchical edge bundle.
</div>

In [5]:
stats = multivis.utils.statistics(PeakTable, DataTable)

stats.help()

Generate a table of parametric or non-parametric statistics and merges them with the Peak Table (node table).
        Initial_Parameters
            ----------
            peaktable : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
            datatable : Pandas dataframe matrix containing values for statistical analysis

        Methods
            -------
            set_params : Set parameters -
                parametric: Perform parametric statistical analysis, assuming the data is normally distributed (default: True)
                log_data: Perform a log ('natural', base 2 or base 10) on all data prior to statistical analysis (default: (False, 2))
                scale_data: Scale the data to unit variance (default: False)
                impute_data: Impute any missing values using KNN impute with a set number of nearest neighbours (default: (False, 3))
                group_column_name: The group column name used in the datatable (default: None)
      

In [6]:
params = dict({'parametric': True
              , 'log_data': (True, 2)
              , 'scale_data': True
              , 'impute_data': (True, 3)
              , 'group_column_name': 'Class'
              , 'control_group_name': 'Day1'
              , 'group_alpha_CI': 0.05
              , 'fold_change_alpha_CI': 0.05
              , 'pca_alpha_CI': 0.05
              , 'total_missing': False
              , 'group_missing': False
              , 'pca_loadings': True
              , 'normality_test': True
              , 'group_normality_test': False
              , 'group_mean_CI': True
              , 'group_median_CI': False
              , 'mean_fold_change': True
              , 'median_fold_change': False
              , 'kruskal_wallis_test': False
              , 'levene_twoGroup': False
              , 'levene_allGroup': False
              , 'oneway_Anova_test': False
              , 'ttest_oneGroup': False
              , 'ttest_twoGroup': False
              , 'mann_whitney_u_test': False})

stats.set_params(**params)

PeakTableStats = stats.calculate()

  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  CIs = bo

  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_

  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha

  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_alpha_CI)
  grpMean_CI = bootstrap.ci(data=group, statfunction=np.nanmean, n_samples=500, alpha=grpAlphaCI)
  CIs = bootstrap.ci(data=groupList, statfunction=meanFold, n_samples=500, alpha=fold_change_

<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Display the Peak Table with statistical information: Shapiro-Wilk's pvalue, mean concentration, mean fold change and PCA loadings

Check the imported Peak table with statistical information by simply calling the function <span style="font-family: monaco; font-size: 14px; background-color:white;">display(PeakTableStats)</span><br>
</div>

In [7]:
display(PeakTableStats)

Unnamed: 0,Idx,Name,Label,Mode,mz,rt,F,pvalue,pFDR,RSD,...,Shapiro_statistic,Shapiro_pvalue,PC1,PC2,PC1_lower,PC1_upper,PC1_sig,PC2_lower,PC2_upper,PC2_sig
0,0,M1,Ocatanedioic acid,Negative,173.081494,528.593,1.205805,0.316809,0.290529,10.471229,...,0.964668,0.4256912,0.172449,-0.075246,0.005055,0.384042,True,-0.360783,0.137221,False
1,1,M2,Glycoursodeoxycholic acid,Negative,448.30488,1158.75,6.674117,0.007382,0.07975,4.706839,...,0.880608,0.003469454,0.166588,0.247957,-0.076497,0.379042,False,0.040265,0.423204,True
2,2,M3,Dodecanedioc acid,Negative,229.143848,966.471,3.659888,0.079039,0.191246,19.856452,...,0.962037,0.3686095,0.193745,-0.070164,0.044362,0.325822,True,-0.407339,0.137973,False
3,3,M4,Succinic Acid,Negative,117.019323,133.4985,0.415028,0.626102,0.341498,5.577582,...,0.912227,0.0194639,-0.035571,-0.240657,-0.269335,0.212274,False,-0.489396,-0.092047,True
4,4,M5,Citric Acid,Negative,191.019367,109.544,0.741641,0.435266,0.311669,9.170874,...,0.653896,4.894235e-07,-0.076491,-0.160579,-0.558845,0.028232,False,-0.529497,-0.009062,True
5,5,M6,Lactic Acid,Negative,89.025122,84.363,4.928634,0.020552,0.111017,10.341807,...,0.983205,0.9110454,0.234512,0.009089,0.112962,0.369715,True,-0.223224,0.250871,False
6,6,M7,5-Hydroxytryptophan,Negative,219.076809,217.493,1.254299,0.303983,0.286883,16.876721,...,0.971145,0.5911328,-0.056805,-0.250191,-0.278891,0.196599,False,-0.420621,-0.108121,True
7,7,M8,Glycocholic acid,Negative,464.299632,979.16,7.62628,0.005556,0.076033,11.902311,...,0.986695,0.9662158,0.346193,-0.049402,0.32358,0.390445,True,-0.317753,0.237383,False
8,8,M9,L-Tryptophan,Negative,203.082155,337.889,1.335823,0.287394,0.284528,4.289423,...,0.961392,0.3555631,0.244783,-0.125493,0.050191,0.407765,True,-0.396753,0.219328,False
9,9,M10,Hexadecanedioic acid,Negative,285.20573,1344.61,0.554782,0.543473,0.328267,9.537188,...,0.952579,0.2134403,0.111963,-0.2914,-0.135374,0.421585,False,-0.468751,-0.159268,True


<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Determine if the data is normally distributed

If the majority of the data shows a Shapiro-Wilks pvalue > 0.05 the data is normally distributed and parametric statistical analysis is recommended, however if the pvalue is < 0.05 then the data is non-normally distributed and non-parametric analysis methods are recommended.
</div>

<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Features normally distributed

In [8]:
display(len(PeakTableStats.query('Shapiro_pvalue > 0.05')))

24

<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Features non-normally distributed

In [9]:
display(len(PeakTableStats.query('Shapiro_pvalue < 0.05')))

8

<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Normality conclusion

The data is normally distributed and therefore parametric analysis (e.g. one-way Anova, T-test, Pearson's correlation etc) is recommended.

<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 4. Log transform prior to Pearson's correlation analysis

Transformation can be done with a log which is commonly used for biological data types, however there are other types such as square, square root, cube root or reciprocal transformation, which may be suitable for other types of data. Scaling is highly dependent on the values found in the dataset. If the dataset contains a number of values which are very large compared to the majority of other values, then this may skew the data in favour of those values and consequently bias the results. Scaling methods such as unit variance scaling scale all the values in the dataset so that the values are all comparable. Pearson correlation, a parametric method, can later be performed where a log is necessary to normally distribute the data. However, if a non-parametric method is used, such as Spearman or Kendall's Tau, then a log is not necessary. Additionally, where a correlation is used no scaling is necessary after log transformation, as correlation analysis does not require scaling, as covariance is measured between individual values in the bivariate correlation analysis, unlike with PCA in tutorial 1.1, which performs multivariate analysis, measuring the variance across all values. However, if another similarity metric were to be used in place of correlation, such as Euclidean distance, then scaling may be a necessary step to take.

In this example, as Pearson's correlation is later being performed a log is necessary.

</div>

In [10]:
peaklist = PeakTableStats['Name']                   # Set peaklist to the metabolite names in the DataTableClean
X = DataTable[peaklist]                             # Extract X matrix from DataTable using peaklist
Xlog = np.log10(X)                                  # Log transform (base-10)
#Xscale = multivis.utils.scaler(Xlog)               # Scale to unit variance
#Xscale = multivis.utils.imputeData(X, k=3)         # Impute remaining missing values using KNN impute with k=3

DataTableLogged = pd.merge(DataTable.T[~DataTable.T.index.isin(PeakTable['Name'])].T.reset_index(drop=True), pd.DataFrame(Xlog, columns=peaklist).reset_index(drop=True), left_index=True, right_index=True)

<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 5. Slice up the data into blocks placed into a dictionary indexed by group/class name

Slice the data by group/class name for later identification of multi-block associations and place in a dictionary indexed by group/class name.
</div>

In [11]:
GroupByBlockPeaks, GroupByBlockData = multivis.utils.groups2blocks(PeakTableStats, DataTableLogged, 'Class')

In [12]:
keys = GroupByBlockPeaks.keys()
for key in keys:
    display(GroupByBlockPeaks[key])

Unnamed: 0,Idx,Name,Label,Mode,mz,rt,F,pvalue,pFDR,RSD,...,Shapiro_statistic,Shapiro_pvalue,PC1,PC2,PC1_lower,PC1_upper,PC1_sig,PC2_lower,PC2_upper,PC2_sig
0,0,A1,Ocatanedioic acid,Negative,173.081494,528.593,1.205805,0.316809,0.290529,10.471229,...,0.964668,0.4256912,0.172449,-0.075246,0.005055,0.384042,True,-0.360783,0.137221,False
1,1,A2,Glycoursodeoxycholic acid,Negative,448.30488,1158.75,6.674117,0.007382,0.07975,4.706839,...,0.880608,0.003469454,0.166588,0.247957,-0.076497,0.379042,False,0.040265,0.423204,True
2,2,A3,Dodecanedioc acid,Negative,229.143848,966.471,3.659888,0.079039,0.191246,19.856452,...,0.962037,0.3686095,0.193745,-0.070164,0.044362,0.325822,True,-0.407339,0.137973,False
3,3,A4,Succinic Acid,Negative,117.019323,133.4985,0.415028,0.626102,0.341498,5.577582,...,0.912227,0.0194639,-0.035571,-0.240657,-0.269335,0.212274,False,-0.489396,-0.092047,True
4,4,A5,Citric Acid,Negative,191.019367,109.544,0.741641,0.435266,0.311669,9.170874,...,0.653896,4.894235e-07,-0.076491,-0.160579,-0.558845,0.028232,False,-0.529497,-0.009062,True
5,5,A6,Lactic Acid,Negative,89.025122,84.363,4.928634,0.020552,0.111017,10.341807,...,0.983205,0.9110454,0.234512,0.009089,0.112962,0.369715,True,-0.223224,0.250871,False
6,6,A7,5-Hydroxytryptophan,Negative,219.076809,217.493,1.254299,0.303983,0.286883,16.876721,...,0.971145,0.5911328,-0.056805,-0.250191,-0.278891,0.196599,False,-0.420621,-0.108121,True
7,7,A8,Glycocholic acid,Negative,464.299632,979.16,7.62628,0.005556,0.076033,11.902311,...,0.986695,0.9662158,0.346193,-0.049402,0.32358,0.390445,True,-0.317753,0.237383,False
8,8,A9,L-Tryptophan,Negative,203.082155,337.889,1.335823,0.287394,0.284528,4.289423,...,0.961392,0.3555631,0.244783,-0.125493,0.050191,0.407765,True,-0.396753,0.219328,False
9,9,A10,Hexadecanedioic acid,Negative,285.20573,1344.61,0.554782,0.543473,0.328267,9.537188,...,0.952579,0.2134403,0.111963,-0.2914,-0.135374,0.421585,False,-0.468751,-0.159268,True


Unnamed: 0,Idx,Name,Label,Mode,mz,rt,F,pvalue,pFDR,RSD,...,Shapiro_statistic,Shapiro_pvalue,PC1,PC2,PC1_lower,PC1_upper,PC1_sig,PC2_lower,PC2_upper,PC2_sig
0,0,B1,Ocatanedioic acid,Negative,173.081494,528.593,1.205805,0.316809,0.290529,10.471229,...,0.964668,0.4256912,0.172449,-0.075246,0.005055,0.384042,True,-0.360783,0.137221,False
1,1,B2,Glycoursodeoxycholic acid,Negative,448.30488,1158.75,6.674117,0.007382,0.07975,4.706839,...,0.880608,0.003469454,0.166588,0.247957,-0.076497,0.379042,False,0.040265,0.423204,True
2,2,B3,Dodecanedioc acid,Negative,229.143848,966.471,3.659888,0.079039,0.191246,19.856452,...,0.962037,0.3686095,0.193745,-0.070164,0.044362,0.325822,True,-0.407339,0.137973,False
3,3,B4,Succinic Acid,Negative,117.019323,133.4985,0.415028,0.626102,0.341498,5.577582,...,0.912227,0.0194639,-0.035571,-0.240657,-0.269335,0.212274,False,-0.489396,-0.092047,True
4,4,B5,Citric Acid,Negative,191.019367,109.544,0.741641,0.435266,0.311669,9.170874,...,0.653896,4.894235e-07,-0.076491,-0.160579,-0.558845,0.028232,False,-0.529497,-0.009062,True
5,5,B6,Lactic Acid,Negative,89.025122,84.363,4.928634,0.020552,0.111017,10.341807,...,0.983205,0.9110454,0.234512,0.009089,0.112962,0.369715,True,-0.223224,0.250871,False
6,6,B7,5-Hydroxytryptophan,Negative,219.076809,217.493,1.254299,0.303983,0.286883,16.876721,...,0.971145,0.5911328,-0.056805,-0.250191,-0.278891,0.196599,False,-0.420621,-0.108121,True
7,7,B8,Glycocholic acid,Negative,464.299632,979.16,7.62628,0.005556,0.076033,11.902311,...,0.986695,0.9662158,0.346193,-0.049402,0.32358,0.390445,True,-0.317753,0.237383,False
8,8,B9,L-Tryptophan,Negative,203.082155,337.889,1.335823,0.287394,0.284528,4.289423,...,0.961392,0.3555631,0.244783,-0.125493,0.050191,0.407765,True,-0.396753,0.219328,False
9,9,B10,Hexadecanedioic acid,Negative,285.20573,1344.61,0.554782,0.543473,0.328267,9.537188,...,0.952579,0.2134403,0.111963,-0.2914,-0.135374,0.421585,False,-0.468751,-0.159268,True


Unnamed: 0,Idx,Name,Label,Mode,mz,rt,F,pvalue,pFDR,RSD,...,Shapiro_statistic,Shapiro_pvalue,PC1,PC2,PC1_lower,PC1_upper,PC1_sig,PC2_lower,PC2_upper,PC2_sig
0,0,C1,Ocatanedioic acid,Negative,173.081494,528.593,1.205805,0.316809,0.290529,10.471229,...,0.964668,0.4256912,0.172449,-0.075246,0.005055,0.384042,True,-0.360783,0.137221,False
1,1,C2,Glycoursodeoxycholic acid,Negative,448.30488,1158.75,6.674117,0.007382,0.07975,4.706839,...,0.880608,0.003469454,0.166588,0.247957,-0.076497,0.379042,False,0.040265,0.423204,True
2,2,C3,Dodecanedioc acid,Negative,229.143848,966.471,3.659888,0.079039,0.191246,19.856452,...,0.962037,0.3686095,0.193745,-0.070164,0.044362,0.325822,True,-0.407339,0.137973,False
3,3,C4,Succinic Acid,Negative,117.019323,133.4985,0.415028,0.626102,0.341498,5.577582,...,0.912227,0.0194639,-0.035571,-0.240657,-0.269335,0.212274,False,-0.489396,-0.092047,True
4,4,C5,Citric Acid,Negative,191.019367,109.544,0.741641,0.435266,0.311669,9.170874,...,0.653896,4.894235e-07,-0.076491,-0.160579,-0.558845,0.028232,False,-0.529497,-0.009062,True
5,5,C6,Lactic Acid,Negative,89.025122,84.363,4.928634,0.020552,0.111017,10.341807,...,0.983205,0.9110454,0.234512,0.009089,0.112962,0.369715,True,-0.223224,0.250871,False
6,6,C7,5-Hydroxytryptophan,Negative,219.076809,217.493,1.254299,0.303983,0.286883,16.876721,...,0.971145,0.5911328,-0.056805,-0.250191,-0.278891,0.196599,False,-0.420621,-0.108121,True
7,7,C8,Glycocholic acid,Negative,464.299632,979.16,7.62628,0.005556,0.076033,11.902311,...,0.986695,0.9662158,0.346193,-0.049402,0.32358,0.390445,True,-0.317753,0.237383,False
8,8,C9,L-Tryptophan,Negative,203.082155,337.889,1.335823,0.287394,0.284528,4.289423,...,0.961392,0.3555631,0.244783,-0.125493,0.050191,0.407765,True,-0.396753,0.219328,False
9,9,C10,Hexadecanedioic acid,Negative,285.20573,1344.61,0.554782,0.543473,0.328267,9.537188,...,0.952579,0.2134403,0.111963,-0.2914,-0.135374,0.421585,False,-0.468751,-0.159268,True


In [13]:
keys = GroupByBlockData.keys()
for key in keys:
    display(GroupByBlockData[key])

Unnamed: 0,Idx,SampleID,A1,A2,A3,A4,A5,A6,A7,A8,...,A23,A24,A25,A26,A27,A28,A29,A30,A31,A32
0,1,ID#8,4.074447,3.899174,4.018518,4.38515,6.218149,5.742536,3.51886,5.178125,...,3.945561,4.657553,5.92545,4.55404,4.075841,4.7657,4.882123,2.844371,6.789028,5.478766
1,4,ID#4,3.754291,5.015378,4.069172,4.278707,6.238076,5.904076,3.270059,4.983532,...,4.167635,4.469381,5.801996,4.737916,3.786706,4.538607,4.410306,4.918329,6.813426,5.657534
2,12,ID#10,3.991098,4.605677,4.047178,4.770038,6.264298,5.782631,3.334607,4.480804,...,4.168362,4.555995,5.747527,4.503722,4.455922,4.20513,4.608247,2.422338,6.440977,5.831221
3,17,ID#2,3.893013,5.083472,3.959181,4.390972,6.274372,5.765614,3.543611,4.6511,...,4.143037,4.492955,5.942019,4.15432,3.714024,4.359722,5.08729,2.577588,6.354538,5.707375
4,18,ID#6,4.440662,5.295463,4.147908,4.519121,6.097693,6.020969,3.5203,6.056335,...,3.924835,4.525718,6.137299,5.022716,3.262095,3.245169,4.542649,4.152877,6.576629,5.941119
5,19,ID#3,4.087825,5.495268,3.941231,4.366286,6.300329,5.790063,3.170798,5.160773,...,3.677518,4.652911,5.964083,4.782201,3.690908,3.927815,4.213695,4.347532,6.683485,5.622045
6,21,ID#5,4.034273,4.104711,4.106464,4.426831,6.366795,5.713619,3.304789,5.871772,...,4.113137,4.780275,5.981228,4.854689,3.735655,4.350694,4.51181,4.827475,6.603826,6.239878
7,22,ID#1,4.267015,5.27977,4.150893,4.522742,6.399732,5.748116,3.369556,4.972958,...,3.990573,4.639565,5.891246,4.984276,3.704876,3.235513,4.580642,2.907541,6.496506,5.671377
8,26,ID#9,4.138,3.396817,4.478716,4.37469,6.444006,5.672045,3.469946,4.460727,...,3.958609,4.42993,5.725425,4.321265,4.375328,4.255632,4.959859,3.032133,6.601236,5.837137


Unnamed: 0,Idx,SampleID,B1,B2,B3,B4,B5,B6,B7,B8,...,B23,B24,B25,B26,B27,B28,B29,B30,B31,B32
0,2,ID#9,3.784691,5.52673,3.87985,4.203915,6.006202,5.732255,3.408477,5.171992,...,3.849445,4.37564,5.775924,4.635574,3.208105,4.233189,4.528819,3.717655,6.584862,5.588441
1,5,ID#3,3.899148,5.577485,4.57116,4.363054,6.336948,5.909696,3.374494,5.523801,...,3.922917,4.846196,5.760787,5.076974,3.765794,4.053627,4.434928,3.943515,6.736613,5.565357
2,7,ID#2,3.8525,4.992489,4.152719,4.397899,6.152739,5.843925,3.55342,5.093013,...,4.076655,4.524816,5.88626,4.237689,3.60237,4.264574,4.740136,3.102495,6.643107,5.74212
3,8,ID#4,3.908659,4.500839,3.950798,4.312708,6.218872,5.589595,3.480545,5.654207,...,3.989464,4.392388,5.842056,4.529618,3.759652,4.117235,4.511292,4.135837,6.576532,6.074085
4,10,ID#10,3.825461,4.995402,3.731487,4.169024,6.136792,5.822482,3.411182,4.878298,...,4.037637,4.478235,5.82092,4.648304,3.700701,3.96896,4.508478,2.658926,6.50614,5.551448
5,15,ID#6,4.072171,5.525505,4.388307,4.246284,6.268295,5.789378,3.252083,5.355079,...,3.613316,4.683841,5.812587,4.471625,3.387923,4.40283,4.371885,3.59418,6.51026,5.567145
6,23,ID#8,3.765582,5.42221,3.942258,4.425633,6.320807,5.694414,3.326287,5.207453,...,4.074724,3.197466,6.001857,5.150323,3.944234,4.017855,5.012168,4.715637,6.248246,5.488643
7,25,ID#7,4.068124,3.818103,4.600771,4.358022,6.351669,5.830243,3.319232,5.271598,...,4.071753,4.674318,5.773881,4.590581,3.870235,4.228076,4.855943,2.82812,6.859464,5.568242
8,27,ID#5,3.875099,5.121022,3.728621,4.184402,6.233669,5.869795,3.264833,4.898369,...,3.781659,4.507721,5.571638,4.961609,3.541468,4.167914,4.564952,3.756538,6.65142,6.223036
9,28,ID#1,3.882167,5.208442,3.623634,4.431227,6.317787,5.768726,3.518355,4.927731,...,4.305959,4.608123,5.963077,4.270345,3.960432,4.235742,4.947536,2.807305,6.660489,6.156665


Unnamed: 0,Idx,SampleID,C1,C2,C3,C4,C5,C6,C7,C8,...,C23,C24,C25,C26,C27,C28,C29,C30,C31,C32
0,3,ID#10,3.919331,4.963753,4.189317,4.23662,6.100745,5.803344,3.411539,5.117707,...,3.791683,4.793588,5.664445,4.298678,3.762472,4.326268,4.694767,3.839966,6.561396,6.489339
1,6,ID#3,3.610083,3.965395,3.459935,4.435408,6.228917,5.504933,3.38681,4.146029,...,4.23492,4.639966,5.564551,4.616073,4.320334,4.165107,4.786639,3.377709,6.59756,5.847381
2,9,ID#8,3.917699,5.582744,3.618568,4.346603,6.276838,5.950157,3.363764,5.19075,...,4.046965,4.70695,5.834305,5.228987,3.714319,4.153968,4.805731,3.305243,6.568619,5.695874
3,11,ID#5,3.843222,4.666306,3.634215,4.416349,6.231295,5.700932,3.419468,4.60074,...,4.226367,4.490209,5.796446,4.29313,4.142241,4.107203,4.75644,4.430915,6.502539,5.902857
4,13,ID#9,3.977001,5.106095,4.626515,4.309759,6.281414,6.028462,3.388106,5.051498,...,4.055386,4.735719,6.073818,5.203397,3.776663,3.761391,4.657941,3.564923,6.627442,5.651351
5,14,ID#6,4.210309,5.224165,4.333649,4.197604,5.358152,5.845606,3.240969,5.4596,...,3.926553,4.416488,5.887085,4.433038,3.619763,4.494782,4.619775,4.847543,6.835286,6.132373
6,16,ID#1,3.878483,5.31794,4.035537,4.293074,6.30282,5.670919,3.377875,4.786624,...,3.560818,4.553126,5.935945,4.858409,3.665768,3.26367,4.130239,4.007672,6.865227,5.583684
7,20,ID#2,4.001123,5.489663,4.544879,4.41144,6.2863,5.749986,3.194529,5.729561,...,4.133874,4.412501,6.082593,4.856832,4.004203,3.992892,4.859156,3.80876,6.747892,5.642952
8,24,ID#7,4.035726,4.971778,3.846807,4.216989,6.194265,5.552952,3.318568,4.570388,...,4.006313,4.622701,5.778009,4.289385,4.171635,4.34616,4.869988,4.34941,6.352872,5.870821
9,29,ID#4,3.963775,4.659271,3.532867,4.31373,6.188316,5.663184,3.38719,4.256073,...,4.103308,4.579221,5.884022,4.452744,3.994959,3.917849,5.134796,4.800244,6.023852,5.751274


<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 6. Merge all the data
    
Merge all the data from each group/block and consolidate any statistical results generated from the multivis.utils.statistics package in relation to each block for later identification of multi-block associations and visualisation.
    
The multi-block data can be prepared using multivis.utils.groups2blocks given a single dataset separated by a group/class column and then merged together on a common index. This is suitable for when comparing differences between groups of samples which do not have a common sample ID. If the dimensions of the index are different for each group, any sample outliers can be identified through exploratory analysis (e.g. PCA) and removed until the dimensions of each group are the same, or if there are no outliers simply merge the data on the common index values and any samples outside the common index will be removed automatically. When the sample IDs are common between each block of data, such as when analysing time-series data comparing the same individuals over time or the same samples across different 'Omics platforms through a systems biology approach (e.g. transcriptomics, proteomics, metabolomics), the multi-block data from each of the "omics" blocks can be placed into a dictionary indexed by the source name. However, the multi-omics approach does require prior modeling with methods such as multi-block variable influence on orthogonal projections (MB-VIOP) and OnPLS or methods such as <a href="http://mixomics.org/">mixOmics</a>. In such a case merging on a common sample ID should be used instead of merging on the index. A good example of this can be seen in this study on asthma by <a href="https://pubs.acs.org/doi/10.1021/acs.analchem.8b03205">Reinke, S et. al. (2018)</a>, where the same samples are measured across multiple 'Omics platforms and then log-transformed, scaled and modelled using MB-VIOP and OnPLS.
    
Note: Sample ID 'ID#7' removed after merging as this sample is only present in the 'Day1' group.
</div>

In [14]:
MultiBlockPeaks, MultiBlockData = multivis.utils.mergeBlocks(GroupByBlockPeaks, GroupByBlockData, 'SampleID')

<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Display the Multi-block Peak Table

Check the Multi-block Peak Table simply by calling the function <span style="font-family: monaco; font-size: 14px; background-color:white;">display(MultiBlockPeaks)</span><br>
</div>

In [15]:
display(MultiBlockPeaks)

Unnamed: 0,Idx,Name,Label,Mode,mz,rt,F,pvalue,pFDR,RSD,...,PC2_sig,Block,MeanFoldChange,MeanFoldChange_CI_lower,MeanFoldChange_CI_upper,MeanFoldChange_sig,Group_mean,Group_mean_CI_lower,Group_mean_CI_upper,Group_mean_sig
0,0,A1,Ocatanedioic acid,Negative,173.081494,528.5930,1.205805,0.316809,0.290529,10.471229,...,False,Day1,0.000000,0.000000,0.000000,False,0.669912,-0.022512,1.389242,False
1,1,A2,Glycoursodeoxycholic acid,Negative,448.304880,1158.7500,6.674117,0.007382,0.079750,4.706839,...,True,Day1,0.000000,0.000000,0.000000,False,-0.409907,-1.320029,0.217838,False
2,2,A3,Dodecanedioc acid,Negative,229.143848,966.4710,3.659888,0.079039,0.191246,19.856452,...,False,Day1,0.000000,0.000000,0.000000,False,0.175319,-0.055282,0.515291,False
3,3,A4,Succinic Acid,Negative,117.019323,133.4985,0.415028,0.626102,0.341498,5.577582,...,True,Day1,0.000000,0.000000,0.000000,False,0.760985,0.170814,1.589921,True
4,4,A5,Citric Acid,Negative,191.019367,109.5440,0.741641,0.435266,0.311669,9.170874,...,True,Day1,0.000000,0.000000,0.000000,False,0.366689,-0.016821,0.661026,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91,91,C28,Benzyl cinnamate,Positive,239.106873,1192.4850,0.390561,0.643624,0.344644,20.720842,...,False,Day14,6.153067,5.637729,45.380711,True,-0.152577,-1.009189,0.305538,False
92,92,C29,Cortisol,Positive,363.217288,779.0690,3.831661,0.047872,0.167497,4.942567,...,True,Day14,-1.787061,-5858.889971,-0.138721,True,0.230527,-0.512056,0.687416,False
93,93,C30,N-N'-Diphenylguanidine,Positive,212.118471,483.3210,0.689793,0.460283,0.315460,11.714564,...,False,Day14,-2.116057,-108.361021,-0.351548,True,0.430230,0.043422,0.883845,True
94,94,C31,Tetradecanedioc acid,Positive,520.340010,1436.7700,2.282723,0.157126,0.239203,16.452576,...,False,Day14,-2.154269,-151.105809,0.692659,False,-0.101811,-1.284137,0.508504,False


<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Display the Multi-block Data Table

Check the Multi-block Data Table simply by calling the function <span style="font-family: monaco; font-size: 14px; background-color:white;">display(MultiBlockData)</span><br>
</div>

In [16]:
display(MultiBlockData)

Unnamed: 0,Idx,SampleID,A1,A2,A3,A4,A5,A6,A7,A8,...,C23,C24,C25,C26,C27,C28,C29,C30,C31,C32
0,1,ID#8,4.074447,3.899174,4.018518,4.38515,6.218149,5.742536,3.51886,5.178125,...,4.046965,4.70695,5.834305,5.228987,3.714319,4.153968,4.805731,3.305243,6.568619,5.695874
1,4,ID#4,3.754291,5.015378,4.069172,4.278707,6.238076,5.904076,3.270059,4.983532,...,4.103308,4.579221,5.884022,4.452744,3.994959,3.917849,5.134796,4.800244,6.023852,5.751274
2,12,ID#10,3.991098,4.605677,4.047178,4.770038,6.264298,5.782631,3.334607,4.480804,...,3.791683,4.793588,5.664445,4.298678,3.762472,4.326268,4.694767,3.839966,6.561396,6.489339
3,17,ID#2,3.893013,5.083472,3.959181,4.390972,6.274372,5.765614,3.543611,4.6511,...,4.133874,4.412501,6.082593,4.856832,4.004203,3.992892,4.859156,3.80876,6.747892,5.642952
4,18,ID#6,4.440662,5.295463,4.147908,4.519121,6.097693,6.020969,3.5203,6.056335,...,3.926553,4.416488,5.887085,4.433038,3.619763,4.494782,4.619775,4.847543,6.835286,6.132373
5,19,ID#3,4.087825,5.495268,3.941231,4.366286,6.300329,5.790063,3.170798,5.160773,...,4.23492,4.639966,5.564551,4.616073,4.320334,4.165107,4.786639,3.377709,6.59756,5.847381
6,21,ID#5,4.034273,4.104711,4.106464,4.426831,6.366795,5.713619,3.304789,5.871772,...,4.226367,4.490209,5.796446,4.29313,4.142241,4.107203,4.75644,4.430915,6.502539,5.902857
7,22,ID#1,4.267015,5.27977,4.150893,4.522742,6.399732,5.748116,3.369556,4.972958,...,3.560818,4.553126,5.935945,4.858409,3.665768,3.26367,4.130239,4.007672,6.865227,5.583684
8,26,ID#9,4.138,3.396817,4.478716,4.37469,6.444006,5.672045,3.469946,4.460727,...,4.055386,4.735719,6.073818,5.203397,3.776663,3.761391,4.657941,3.564923,6.627442,5.651351


<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 7. Correlation analysis

Correlation is a form of similarity and measures the strength of the linear relationship between two variables. Pearson's correlation, a form of parametric correlation analysis, is described mathematically by dividing the joint variability or covariance of two variables by the product of their standard deviations (see Eq1). Other forms of correlation measure the monotonic relationships and are non-parametric, such as Spearman’s rank correlation and Kendall Tau's correlation. The following correlation analysis, allows for Pearson, Spearman or Kendall Tau's correlation analysis.

\begin{equation*}
r = \frac{Cov(X,Y)}{SD(X).SD(Y)}
\end{equation*}
<center>Eq1: Pearson’s correlation coefficient</center>

</div>

In [17]:
correlationType = "pearson" #"spearman"; "kendalltau"

X = MultiBlockData[MultiBlockPeaks['Name']]

MultiBlockScores,MultiBlockPvalues = multivis.utils.corrAnalysis(X, correlationType)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 96/96 [00:00<00:00, 134.40it/s]


<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 8.  Generate Edges

The similarities/distances are filtered and put in a dataframe of edges, where nodes represent features (metabolites) and edges represent similarity/distance scores (e.g. correlation coefficients), with included node names, labels and other attributes such as correlation coefficient pvalues and other statistical information.

</div>

In [18]:
networkEdges = multivis.Edge(peaktable=MultiBlockPeaks, datatable=MultiBlockScores, pvalues=MultiBlockPvalues)

networkEdges.help()

Builds nodes and edges and is the base class for the Network class.

        Initial_Parameters
        ----------
        peaktable : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
        datatable : Pandas dataframe matrix containing scores
        pvalues : Pandas dataframe matrix containing score/similarity pvalues (if available, otherwise set to None)

        Methods
        -------
        set_params : Set parameters
            filter_type: The value type to filter the data on (default: 'pvalue')
            hard_threshold: Value to filter the data on (default: 0.005)
            withinBlocks: Include scores within blocks if building multi-block network (default: False)
            sign: The sign of the score/similarity to filter on ('pos', 'neg' or 'both') (default: 'both')

        help : Print this help text

        build : Builds the nodes and edges.
        getNodes : Returns a Pandas dataframe of all nodes.
        getEdges : Returns a Pandas da

In [19]:
params = dict({'filter_type': 'pvalue'               #The filer type to use for the similarities matrix ('Pvalue' or 'Score')              
                    , 'hard_threshold': 0.1          #The hard threshold to apply to the similarities matrix
                    , 'withinBlocks': True           #Include scores within blocks if building multi-block network
                    , 'sign': "both"})               #The sign of the similarities ('pos', 'neg' or 'both')

networkEdges.set_params(**params)

networkEdges.build()

edges = networkEdges.getEdges()
nodes = networkEdges.getNodes()

<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Display node data used in the Hierarchical Edge Bundle

Check the node data simply by calling the function <span style="font-family: monaco; font-size: 14px; background-color:white;">display(nodes)</span><br>
</div>

In [20]:
display(nodes)

Unnamed: 0,Idx,Name,Label,Mode,mz,rt,F,pvalue,pFDR,RSD,...,PC2_sig,Block,MeanFoldChange,MeanFoldChange_CI_lower,MeanFoldChange_CI_upper,MeanFoldChange_sig,Group_mean,Group_mean_CI_lower,Group_mean_CI_upper,Group_mean_sig
0,0,A1,Ocatanedioic acid,Negative,173.0814936,528.593,1.20580499949928,0.316809429654668,0.290528592310766,10.4712294328065,...,False,Day1,0.0,0.0,0.0,False,0.6699123921018183,-0.022511622847999273,1.3892423613498621,False
1,1,A2,Glycoursodeoxycholic acid,Negative,448.3048795,1158.75,6.6741169557049,0.00738194291064731,0.0797503547732716,4.70683906005692,...,True,Day1,0.0,0.0,0.0,False,-0.40990676543887283,-1.32002881744421,0.21783846468873636,False
2,2,A3,Dodecanedioc acid,Negative,229.143848,966.471,3.65988782518355,0.0790391935278154,0.191245625859604,19.8564515336576,...,False,Day1,0.0,0.0,0.0,False,0.175318888601908,-0.05528166863246267,0.515290853507156,False
3,3,A4,Succinic Acid,Negative,117.0193225,133.4985,0.415028135103856,0.626102281483585,0.341497981950194,5.57758173557696,...,True,Day1,0.0,0.0,0.0,False,0.7609845646230762,0.17081382726110053,1.5899214964933464,True
4,4,A5,Citric Acid,Negative,191.019367,109.544,0.741641380659948,0.43526594776887,0.311668591419043,9.17087428507904,...,True,Day1,0.0,0.0,0.0,False,0.36668945071319603,-0.01682115344543396,0.6610262234178536,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91,91,C28,Benzyl cinnamate,Positive,239.1068731,1192.485,0.390561250705216,0.643623550170723,0.344644127952825,20.7208424277511,...,False,Day14,6.153066607249892,5.6377292336046,45.38071077312762,True,-0.1525769557055212,-1.0091890896533964,0.30553801528764823,False
92,92,C29,Cortisol,Positive,363.217288,779.069,3.83166105052219,0.0478722499051548,0.167497360787185,4.94256748210212,...,True,Day14,-1.787060618565915,-5858.889971198745,-0.1387212015337813,True,0.23052650486067994,-0.5120562502062447,0.6874160683786472,False
93,93,C30,N-N'-Diphenylguanidine,Positive,212.1184708,483.321,0.689793438615943,0.460282975732664,0.315459512541623,11.7145642841881,...,False,Day14,-2.1160565914777414,-108.36102102000383,-0.35154820339719384,True,0.43022999757883984,0.043422315488964386,0.8838449866341342,True
94,94,C31,Tetradecanedioc acid,Positive,520.3400103,1436.77,2.28272289102931,0.157126086204619,0.239202968102066,16.4525760572793,...,False,Day14,-2.154268647788697,-151.10580934172333,0.6926592186098471,False,-0.10181135182874819,-1.2841371357207252,0.5085035195954405,False


<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Display edge data used in the Hierarchical Edge Bundle

Check the edge data simply by calling the function <span style="font-family: monaco; font-size: 14px; background-color:white;">display(edges)</span><br>
</div>

In [21]:
display(edges)

Unnamed: 0,start_index,start_name,start_label,start_block,end_index,end_name,end_label,end_block,score,sign,pvalue
0,0,A1,Ocatanedioic acid,Day1,8,A9,L-Tryptophan,Day1,0.684112,1.0,0.042114
1,0,A1,Ocatanedioic acid,Day1,27,A28,Benzyl cinnamate,Day1,-0.789901,-1.0,0.011319
2,1,A2,Glycoursodeoxycholic acid,Day1,2,A3,Dodecanedioc acid,Day1,-0.599103,-1.0,0.088223
3,1,A2,Glycoursodeoxycholic acid,Day1,5,A6,Lactic Acid,Day1,0.607819,1.0,0.082499
4,1,A2,Glycoursodeoxycholic acid,Day1,9,A10,Hexadecanedioic acid,Day1,-0.621874,-1.0,0.073763
...,...,...,...,...,...,...,...,...,...,...,...
521,87,C24,cis-Aconitic acid,Day14,93,C30,N-N'-Diphenylguanidine,Day14,-0.586715,-1.0,0.096769
522,89,C26,Adenosine,Day14,93,C30,N-N'-Diphenylguanidine,Day14,-0.656975,-1.0,0.054542
523,89,C26,Adenosine,Day14,95,C32,Bilirubin,Day14,-0.693576,-1.0,0.038254
524,91,C28,Benzyl cinnamate,Day14,95,C32,Bilirubin,Day14,0.701285,1.0,0.035285


<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 9.  Sort edges

Sort the edges for visualisation preference

</div>

In [22]:
#edges.sort_values(['start_index', 'end_index'], inplace=True, ascending=True)
#edges.sort_values('pvalue', inplace=True, ascending=False)
edges.sort_values('score', inplace=True, ascending=False)

<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 10.  Plot Hierarchical edge bundle

The edges from the network are then passed into D3 JavaScript to generate a Hierarchical edge bundle and embedded in HTML for interactive visualisation. The Hierarchical edge bundle is implemented as a circular hierarchical tree structure, with nodes on the outside and edges passing through the circle following a bundled curve until they connect to other nodes. The edges represent some association value such as correlation coefficients and can be coloured accordingly based on the values, the sign or represented as pvalues and coloured using a continuous colour map. Different meta data such as groups/classes within the data can also be reflected in the plot to illustrate how the different groups/classes are correlated and to what degree.

Note: The visualisation will automatically open in another tab, unless running in Binder (See step 11).

</div>

In [23]:
bundle = multivis.edgeBundle(nodes,edges)

bundle.help()

Produces an interactive hierarchical edge bundle in D3.js, from nodes and edges.

        Parameters
        ----------
        nodes : Pandas dataframe containing nodes generated from Edge.
        edges : Pandas dataframe containing edges generated from Edge.
        
        Methods
        -------
        set_params : Set parameters -
            html_file: Name to save the HTML file as (default: 'hEdgeBundle.html')
            innerRadiusOffset: Sets the inner radius based on the offset value from the canvas width/diameter (default: 120)
            blockSeparation: Value to set the distance between different segmented blocks (default: 1)
            linkFadeOpacity: The link fade opacity when hovering over/clicking nodes (default: 0.05)
            mouseOver: Setting to 'True' swaps from clicking to hovering over nodes to select them (default: True)
            fontSize: The font size in pixels set for each node (default: 10)
            backgroundColor: Set the background colour

In [24]:
params = dict({'html_file': 'hEdgeBundle_multi-block_altitude_study.html'      #HTML file name to save to
               , 'innerRadiusOffset': 120           #The offset from the radius to determine the inner radius of the edge bundle
               , 'blockSeparation': 4               #The degree of separation between blocks
               , 'linkFadeOpacity': 0.01            #The opacity of faded links
               , 'mouseOver': True                  #Setting to 'True' swaps from clicking to hovering over nodes to select them 
               , 'fontSize': 10                     #The font size of each node
               , 'backgroundColor': 'white'         #Set the background colour of the plot
               , 'foregroundColor': 'black'         #Set the foreground colour of the plot      
               , 'node_data': ['Name', 'Label', 'Mode', 'mz', 'rt', 'F', 'pvalue', 'pFDR', 'RSD', 'Dratio', 'Group_mean', 'Group_mean_sig', 'MeanFoldChange', 'MeanFoldChange_sig','PC1', 'PC2', 'PC1_sig', 'PC2_sig']
               , 'nodeColorScale': 'linear'         #The scale to use for colouring the nodes
               , 'node_color_column': 'pvalue'      #If node_color_column contains colour values it overides the use of node_cmap
               , 'node_cmap': 'brg'                 #Set the colour palette to use for colouring the nodes               
               , 'edge_color_value': 'sign'         #Set the values to colour the edges by. Either 'sign', 'score' or 'pvalue'.
               , 'edgeColorScale': 'linear'         #The scale to use for colouring the edges (if edge_color_value is 'pvalue')
               , 'edge_cmap': 'cool'                #Set the colour palette to use for colouring the edges
               , 'addArcs': True                    #Setting to 'True' adds arcs around the edge bundle for each block
               , 'arcRadiusOffset': 20              #Sets the arc radius offset from the inner radius
               , 'extendArcAngle': 3                #Sets the angle value to add to each end of the arcs 
               , 'arc_cmap': 'tab20'})              #Sets the CMAP colour palette to use for colouring the arcs

bundle.set_params(**params)

bundle.build()

HTML writen to hEdgeBundle_multi-block_altitude_study.html


<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Build Dashboard

A dashboard with panels for the hierarchical edge bundle, node data and sliders is built, allowing for a more robust interface for exploratory analysis of the data. The dashboard is automatically written to HTML and launched upon creation.

</div>

In [25]:
bundle.buildDashboard()

HTML writen to hEdgeBundle_multi-block_altitude_study_dashboard.html


<div style="background-color:rgb(255, 250, 250); padding:10px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

## 11.  Alternative visualisation options

The visualisations will automatically open when run locally in Jupyter notebook. However, when running in Binder the visualisations will not open in a new tab due to security restrictions within Binder. Therefore, the following options are available: IFrames and JavaScript.

Note 1: Due to security restrictions within Jupyter notebook, resizing the window opened from within Jupyter notebook with Javascript or IFrame will result in a 403: Forbidden error, due to the visualisation being reloaded each time. In this case, just change the dimensions set in the cell and rerun the cell.

Note 2: Depending on the browser you're using the save button may be disabled when opening the visualisation in an IFrame or JavaScript pop-up. In Chrome downloads have been disable to prevent malicious behavior. If you have trouble downloading try a different browser or use the visualisation opened automatically in another tab when running Jupyter notebook locally.
</div>

<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Plain visualisation
</div>

In [26]:
vis_option = "IFrame"  #"Javascript"
file = params['html_file']

if vis_option.lower() == "javascript":
    display(Javascript('''window.open(\'{}\','edgeBundle','width=1000,height=1000')'''.format(file)))
elif vis_option.lower() == "iframe":
    display(IFrame(src=file, width='100%', height='1000px'))

<div style="background-color:rgb(255, 250, 250); padding:5px;  border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">

### Visualisation with dashboard
</div>

In [27]:
vis_option = "javascript" #"IFrame"
file = params['html_file'].split(".")[0]+"_dashboard.html"

if vis_option.lower() == "javascript":
    display(Javascript('''window.open(\'{}\','edgeBundle','width=1500,height=1000')'''.format(file)))
elif vis_option.lower() == "iframe":
    display(IFrame(src=file, width='100%', height='1000px'))

<IPython.core.display.Javascript object>