Partial Least Squares - Discriminant Analysis (PLS-DA)
===

Author: Nathan A. Mahynski

Date: 2023/09/12

Description: Discussion and examples of different PLS-DA approaches.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mahynski/pychemauth/blob/main/docs/jupyter/gallery/plsda.ipynb)

[PLS](pls.ipynb) can also be applied to classification problems.  The general idea is to perform a PLS(2) decomposition
between $X$ and $\vec{y}$, where now $\vec{y}$ is one-hot encoded for the different classes.  For binary classification, a simple 0 or 1 is adequate.  The scores that come from the PLS decomposition are then used as an input to a
classification model.  So really, PLS-DA is just using PLS to find a good subspace, then performing classication
on the transformed coordinates in that space; this classification is the "discrimination" which can be done
by any number of methods.  The PLS outputs floating point numbers not integers, so a decision needs to be made
on how to "cluster" and assign points to a given class. Some common methods are:

* Closest class centroid - obviously that Euclidean distance in PLS-DA score space (projection) is not necessarily a good representation of class differences so you need to validate this with test set, etc. See this [paper](https://link.springer.com/article/10.1007/s11306-007-0099-6) and more discussion in [Pomerantsev et al.](https://onlinelibrary.wiley.com/doi/abs/10.1002/cem.3030).

* Thresholding (some cutoff distance).

* Some statistical confidence bounds based on tolerable mis-classfication error rates.

* Build logistic regression, or other decision boundary, etc. in score (projected) space - this is better than option (1) which avoids relying on distance being proportional to likelihood of a class.

This essentially just uses (supervised) PLS to find a good subspace to project into.  However, we could also use something like LDA instead (discussed in the previous section); for LDA we are bounded by `#dimensions <= min(n_classes - 1, n_features)`, which we are not in PLS.  So if `n_classes` is low, LDA can only find a very low-D subspace; it may be better to find some space between that and `n_features`, which PLS can provide.

PLS-DA is widely applied in cheminformatics research since we often have a $p > n$ instance which can be handled
automatically with PLS.  This is especially true of -omics fields.   However, nonlinear techniques such as [artificial neural networks can also be used](https://link.springer.com/article/10.1007/s11306-020-1640-0) have been shown to perform as well. A thorough review of the state of the art PLS-DA by Lee et al. is available [here](https://pubs.rsc.org/en/content/articlehtml/2018/an/c8an00599k)

In [None]:
if 'google.colab' in str(get_ipython()):
    !pip install git+https://github.com/mahynski/pychemauth@main
    import os
    os.kill(os.getpid(), 9) # Automatically restart the runtime to reload libraries

In [None]:
try:
    import pychemauth
except:
    raise ImportError("pychemauth not installed")

import matplotlib.pyplot as plt
%matplotlib inline

import watermark
%load_ext watermark

%load_ext autoreload
%autoreload 2

In [2]:
%watermark -t -m -v --iversions

UsageError: Line magic function `%watermark` not found.


Hard vs. Soft PLS-DA
---

["Multiclass partial least squares discriminant analysis: Taking the right way—A critical tutorial," by
Pomerantsev and Rodionova, Journal of Chemometrics, 32 (2018)](https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/cem.3030) suggests 2 approaches to PLS-DA. These are "hard" and "soft" PLS-DA, which are distinguished by how they determine their discrimination boundaries.  Both begin in the same way.

1. One-hot encode $Y$ for different classes.

> This means that classes form the vertices of $k$-dimensional simplex, where we have $k$ classes.  Following [Indahl et al.](https://onlinelibrary.wiley.com/doi/abs/10.1002/cem.1061), you could reduce the dummy matrix to remove a dimension;  if you have 4 classes, this forms a tetrahedron - i.e., represent classes as +1 for k-1 instances and for the last one indicate it as all zeros.
>
> Convert
>
> $$
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
$$
>
> to
>
>$$
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
0 & 0 & 0 \\
\end{bmatrix}
$$
>
>The first matrix is connected by a hyperplane and so strictly the space spanned by these classes is $k$-1 dimensional.  This is made more explicitly clear by the second matrix which clearly has rank $k-1$ and encloses a tetrahedral volume.

<img src="../../_static/fig1_pomerantsev_2018.png" style="width:500px;">

In [4]:
%ls ../../_static/

[0m[01;32mbiased_nested_cv.png[0m*   [01;32mdefault.png[0m*              [01;35mpls_example_fig1.png[0m
[01;32mboruta_in_a_hurry.pdf[0m*  [01;32mimblearn_generation.png[0m*
[01;35mcolab_example.gif[0m       [01;32mpipeline.png[0m*
