# Goals

Scipy has a PCA model that contains a function for the inverse transformation. It would be nice if the Scipy PLS model also had this transformation. This notebook explores the mathematics behind those transformations.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

from sklearn import linear_model, decomposition, datasets
from sklearn.cross_validation import train_test_split
from sklearn.metrics import confusion_matrix, r2_score
#%matplotlib inline

np.random.seed(127)

In [2]:
diabetes = datasets.load_diabetes()

X = diabetes.data
y = diabetes.target

## PCA Mathematics

$y_1 = b_1 X_1 + \dots + b_k X_k + \epsilon \approx a_1 L_1 + \dots + a_j L_j$ where L is the set of latent variables from the PCA transformation

$\textbf{X}\bar{b}=\textbf{L}\bar{a}$ 

$\bar{b}=\textbf{X}^{-1}\textbf{L}\bar{a}$ Note: using the pseudoinverse of $\textbf{X}$

Taking the PCA inverse of $\bar{a}$ is equivalent to this operation.

In [10]:
pca = decomposition.PCA(n_components=4)
X_transform = pca.fit_transform(X)
pca

PCA(copy=True, n_components=4, whiten=False)

In [26]:
X_train, X_test, y_train, y_test = train_test_split(X_transform, y, test_size=0.33)

clf = linear_model.LinearRegression()
clf.fit(X_train, y_train)

clf.score(X_test, y_test)

0.50391015146979568

In [27]:
clf.coef_

array([-434.07807845, -230.49587371,  280.43266327,  629.18014863])

In [28]:
np.linalg.pinv(X) @ X_transform @ clf.coef_

array([ -38.04495689, -287.46524073,  529.07976722,  281.47055447,
        -45.34778318, -133.54835381, -178.86181996,  100.09171542,
        354.60283531,  291.86101208])

In [29]:
pca.inverse_transform(clf.coef_)

array([ -38.04495689, -287.46524073,  529.07976722,  281.47055447,
        -45.34778318, -133.54835381, -178.86181996,  100.09171542,
        354.60283531,  291.86101208])

## PLS Mathematics

This part requires a bit more TODO.