This repository contains the supplementary information for the journal article,"Migrating from Partial Least Squares Discriminant Analysis to Artificial Neural Networks: A Comparison of Functionally Equivalent Feature Importance and Visualisation Tools using Jupyter Notebooks.". There are two types of workflows included in this repository: a standardised visualisation and interrogation partial least squares (PLS) regression workflow, and an equivalent artificial neural network workflow.
Two previously published datasets are used as examples of the standardised PLS workflow and the proposed equivalent ANN workflow. The first, by Chan et al. (2016) is a urine NMR dataset comprised of 149 named metabolites, publicly available on Metabolomics Workbench (Study ID: ST0001047). Two classes were used: gastric cancer (n=43) vs. healthy controls (n=40). The second, by Ganna et al. (2014) and Ganna et al. (2015) is a plasma LC-MS with 189 named metabolites, publicly available on Metabolights (Study ID: MTBLS90). Samples were split into two classes by sex: males (n=485) and females (n=483).
Due to the structural equivalence with PLS, a shallow (2-layer) ANN is used in this study. Provided the success of this approach towards visualisation and interrogation in shallow ANNs, it may then be possible to adapt this further to deeper ANN architectures. This shallow (2-layer) ANN architecture has a hidden layer consisting of multiple neurons (n = 2 to 6) with a sigmoidal activation, and an output layer consisting of a single neuron with a sigmoidal activation function.
The standardised PLS workflow and the proposed equivalent ANN workflow include the following steps: hyperparameter optimisation, building and training the model, bootstrap resampling of the model, model evaluation, and model visualisation. All steps and accompanying visualisation methods are described in detail above each corresponding code cell within the workflows. These workflows were implemented using the Python programming language, and are presented as Jupyter Notebooks. There are three ways to that these can be accessed: as a static HTML file, in the cloud (using Binder), or downloaded and run on a local machine.
To open notebooks as static HTML files:
- PLSDA_ST001047.html (Method: PLS-DA; Dataset: ST001047)
- ANNSigSig_ST001047.html (Method: ANN-SS; Dataset: ST001047)
- PLSDA_MTBLS90.html (Method: PLS-DA; Dataset: MTBLS90)
- ANNSigSig_MTBLS90.html (Method: ANN-SS; Dataset: MTBLS90)
To launch the notebooks in the cloud (using Binder):
- PLSDA_ST001047.ipynb (Method: PLS-DA; Dataset: ST001047)
- ANNSigSig_ST001047.ipynb (Method: ANN-SS; Dataset: ST001047)
- PLSDA_MTBLS90.ipynb (Method: PLS-DA; Dataset: MTBLS90)
- ANNSigSig_MTBLS90.ipynb (Method: ANN-SS; Dataset: MTBLS90)
To download and run notebooks on a local machine
This requires Python 3.x and Jupyter to be installed on your local machine. We recommend using the Anaconda Distribution, which can be download from the Anaconda Webpage (https://www.anaconda.com/distribution/). For information on installing Python and using Jupyter Notebooks, refer to the tutorial, "Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing" by Mendez et al. (2019).
Note: If you are using Windows, you need to install git using the following: Git for Windows
- Open Terminal on Linux/MacOS or Command Prompt on Windows
- Enter the following into the console (one line at a time)
git clone https://github.com/cimcb/MetabProjectionViz cd MetabProjectionViz conda env create -f environment.yml conda activate MetabProjectionViz jupyter notebook