# Kernel Principal Component Analysis

## Objectives

- Understand and apply Kernel Principal Component Analysis (KPCA) for handling non-linearly separable data.
- Compare the effectiveness of KPCA with traditional PCA in visualizing and separating complex datasets.
- Illustrate the transformation of non-linearly separable data into a lower-dimensional subspace where it becomes linearly separable.

## Background

Traditional linear dimensionality reduction methods like PCA often struggle with non-linear data. KPCA extends PCA to non-linear data by using kernel methods to enable effective dimensionality reduction in complex datasets.

## Datasets Used

Simulated datasets including linearly separable blobs, non-linearly separable half-moons, concentric circles, and interlocking hearts patterns to demonstrate the capabilities of KPCA.

## Introduction

Many machine learning algorithms make assumptions about the linear separability of the input data.

If we are dealing with nonlinear problems, which we may encounter rather frequently in real-world applications, linear transformation techniques for dimensionality reduction, such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), may not be the best choice.

This notebook will look at a kernelized version of PCA, or KPCA. Using KPCA, we will learn how to transform data that is not linearly separable onto a new, lower-dimensional subspace suitable for linear classifiers.

In [1]:
import numpy as np

import plotly.express as px
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook_connected"

from sklearn.decomposition import PCA, KernelPCA

## Linearly Separable Data

In [2]:
from sklearn.datasets import make_blobs

Xl, yl = make_blobs(n_samples=200, centers=2, cluster_std=0.5, random_state=0)

In [3]:
# Plotting the data
figL = px.scatter(
    x=Xl[:, 0],
    y=Xl[:, 1],
    color=yl,
    color_continuous_scale='bluered'
)
figL.update_traces(
    marker=dict(size=8, opacity=0.6)
)
figL.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    title='Linearly Separable Data'
)
figL.show()

This is an example of linearly separable data. Let's apply PCA!

In [4]:
pca = PCA(n_components=2)

# Getting the transformed dataset represented in terms of the principal components
dataL = pca.fit_transform(Xl)

print(f'Explained variance:  {pca.explained_variance_.round(3)}')
print(f'Explained variance ratio: {pca.explained_variance_ratio_.round(2)}')

Explained variance:  [3.629 0.235]
Explained variance ratio: [0.94 0.06]


In [5]:
# Plotting the transformed data in a two principal component space
figL2 = px.scatter(
    x=dataL[:, 0],
    y=dataL[:, 1],
    color=yl,
    color_continuous_scale='bluered'
)
figL2.update_traces(marker=dict(size=8, opacity=0.6))
figL2.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    xaxis_title='Principal Component 1',
    yaxis_title='Principal Component 2',
    title='Linearly Separable Data (PCA: 2 components)'
)
figL2.show()

As you can see, the first component separates both groups.

In [6]:
# Projecting the transformed data into the first component space
figL1 = px.scatter(
    x=dataL[:, 0],
    y=np.zeros_like(dataL[:, 0]),
    color=yl,
    color_continuous_scale='bluered'
)
figL1.update_traces(marker=dict(size=8, opacity=0.6))
figL1.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    xaxis_title='Principal Component 1',
    yaxis_title='',
    title='Linearly Separable Data (PCA: 1 component)'
)
figL1.show()

Groups are linearly separated.

This is a common way to visualize high-dimensional data that has been reduced to a single dimension for simplicity, and to demonstrate the effectiveness of PCA for feature extraction or dimensionality reduction, particularly in cases where classes are linearly separable.

The resulting principal component yields a subspace where the data are well separated. 

## Non-linearly Separable Data: Half Moons Problem

In [7]:
from sklearn.datasets import make_moons

Xm, ym = make_moons(100, noise=0.015, random_state=0)

In [8]:
# Plotting the data
figM = px.scatter(
    x=Xm[:, 0],
    y=Xm[:, 1],
    color=ym,
    color_continuous_scale='bluered'
)
figM.update_traces(marker=dict(size=8, opacity=0.6))
figM.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    title='Half Moons Data'
)
figM.show()

This is an example of nonlinearly separable data.

Since the two half-moon shapes are linearly inseparable, we expect that the "classic" PCA will fail to give us a "good" representation of the data in the principal components space. 

Let's apply PCA!

In [9]:
dataM = pca.fit_transform(Xm)

In [10]:
# Plotting the transformed data in a two principal component space
figM2 = px.scatter(
    x=dataM[:, 0],
    y=dataM[:, 1],
    color=ym,
    color_continuous_scale='bluered'
)
figM2.update_traces(marker=dict(size=8, opacity=0.6))
figM2.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    xaxis_title='Principal Component 1',
    yaxis_title='Principal Component 2',
    title='Half Moons Data (PCA: 2 components)'
)
figM2.show()

In [11]:
# Projecting the transformed data into the first component space
figM1 = px.scatter(
    x=dataM[:, 0],
    y=np.zeros_like(dataM[:, 0]),
    color=ym,
    color_continuous_scale='bluered'
)
figM1.update_traces(marker=dict(size=8, opacity=0.6))
figM1.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    xaxis_title='Principal Component 1',
    yaxis_title='',
    title='Half Moons Data (PCA: 1 component)'
)
figM1.show()

Groups are not linearly separated.

The resulting principal components do not yield a subspace where the data is linearly separated well. 

Note that PCA is an unsupervised method and does not "consider" class labels to maximize the variance. Here, the colors blue and red are just added for visualization to indicate the degree of separation.

### Gaussian RBF kernel PCA

Now we will perform dimensionality reduction via RBF kernel PCA on our half-moon data.

In [12]:
kpca = KernelPCA(n_components=2, kernel='rbf', gamma=15)

dataK = kpca.fit_transform(Xm)

In [13]:
# Plotting the transformed data in a two principal component space
figK2 = px.scatter(
    x=dataK[:, 0],
    y=dataK[:, 1],
    color=ym,
    color_continuous_scale='bluered'
)
figK2.update_traces(marker=dict(size=8, opacity=0.6))
figK2.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    xaxis_title='Principal Component 1',
    yaxis_title='Principal Component 2',
    title='Half Moons Data (Kernel PCA: 2 components)'
)
figK2.show()

In [14]:
# Projecting the transformed data into the first component space
figK1 = px.scatter(
    x=dataK[:, 0],
    y=np.zeros_like(dataK[:, 0]),
    color=ym,
    color_continuous_scale='bluered'
)
figK1.update_traces(marker=dict(size=8, opacity=0.6))
figK1.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    xaxis_title='Principal Component 1',
    yaxis_title='',
    title='Half Moons Data (Kernel PCA: 1 component)'
)
figK1.show()

With the kernel PCA method, we were able to separate the groups! Let's analyse another example.

## Nonlinearly Separable Data: Concentric Circles

In [15]:
from sklearn.datasets import make_circles

Xc, yc = make_circles(n_samples=400, random_state=123, noise=0.1, factor=0.2)

In [16]:
# Plotting the data
figC = px.scatter(
    x=Xc[:, 0],
    y=Xc[:, 1],
    color=yc,
    color_continuous_scale='bluered'
)
figC.update_traces(marker=dict(size=8, opacity=0.6))
figC.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    title='Concentric Circles Data'
)
figC.show()

This is another example of nonlinearly separable data.

Since the two shapes are linearly inseparable, we expect the traditional PCA will fail to give us a "good" representation of the data in the principal components space. 

Let's apply traditional PCA!

In [17]:
dataC = pca.fit_transform(Xc)

In [18]:
# Plotting the transformed data in a two principal component space
figC2 = px.scatter(
    x=dataC[:, 0],
    y=dataC[:, 1],
    color=yc,
    color_continuous_scale='bluered'
)
figC2.update_traces(marker=dict(size=8, opacity=0.6))
figC2.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    xaxis_title='Principal Component 1',
    yaxis_title='Principal Component 2',
    title='Concentric Circles Data (Kernel PCA: 2 components)'
)
figC2.show()

In [19]:
# Projecting the transformed data into the first component space
figC1 = px.scatter(
    x=dataC[:, 0],
    y=np.zeros_like(dataC[:, 0]),
    color=yc,
    color_continuous_scale='bluered'
)
figC1.update_traces(marker=dict(size=8, opacity=0.6))
figC1.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    xaxis_title='Principal Component 1',
    yaxis_title='',
    title='Concentric Circles Data (PCA: 1 component)'
)
figC1.show()

Groups are not linearly separated.

The resulting principal components do not yield a subspace where the data is linearly separated well. 

Now we will perform dimensionality reduction via RBF kernel PCA on our concentric circles data.

In [20]:
dataCK = kpca.fit_transform(Xc)

In [21]:
# Plotting the transformed data in a two principal component space
figCK2 = px.scatter(
    x=dataCK[:, 0],
    y=dataCK[:, 1],
    color=yc,
    color_continuous_scale='bluered'
)
figCK2.update_traces(marker=dict(size=8, opacity=0.6))
figCK2.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    xaxis_title='Principal Component 1',
    yaxis_title='Principal Component 2',
    title='Concentric Circles Data (Kernel PCA: 2 components)'
)
figCK2.show()

In [22]:
# Projecting the transformed data into the first component space
figCK1 = px.scatter(
    x=dataCK[:, 0],
    y=np.zeros_like(dataCK[:, 0]),
    color=yc,
    color_continuous_scale='bluered'
)
figCK1.update_traces(marker=dict(size=8, opacity=0.6))
figCK1.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    xaxis_title='Principal Component 1',
    yaxis_title='',
    title='Concentric Circles Data (Kernel PCA: 1 component)'
)
figCK1.show()

This 1-dimensional subspace obtained via Gaussian RBF kernel PCA looks much better in terms of linear class separation.

We use Kernel PCA when the data is not linearly separable in the original feature space, meaning we cannot effectively reduce dimensions using linear methods like standard PCA. 

Kernel PCA applies a kernel function to map the original non-linearly separable data to a higher-dimensional space where it becomes linearly separable. After this transformation, linear PCA is applied in the new feature space to perform dimensionality reduction.

## Nonlinearly Separable Data: Interlocking Hearts

In [23]:
def generate_heart_shape(a, b):
    t = np.linspace(0, 2 * np.pi, 100)
    x = 16 * np.sin(t)**3
    y = 13 * np.cos(t) - 5 * np.cos(2*t) - 2 * np.cos(3*t) - np.cos(4*t)
    return a + x, b + y

In [24]:
# Generate two interlocking hearts dataset
x1, y1 = generate_heart_shape(0, 0)     # First heart
x2, y2 = generate_heart_shape(20, -10)  # Second heart (shifted)

Xh = np.vstack((np.hstack((x1, x2)), np.hstack((y1, y2)))).T
yh = np.array([0]*len(x1) + [1]*len(x2))  

In [25]:
# Plotting the data
figH = px.scatter(
    x=Xh[:, 0],
    y=Xh[:, 1],
    color=yh,
    color_continuous_scale='bluered'
)
figH.update_traces(marker=dict(size=8, opacity=0.6))
figH.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    title='Interlocking Hearts'
)
figH.show()

In [26]:
# Traditional PCA
dataH = pca.fit_transform(Xh)

In [27]:
# Plotting the transformed data in a two principal component space
figH2 = px.scatter(
    x=dataH[:, 0],
    y=dataH[:, 1],
    color=yh,
    color_continuous_scale='bluered'
)
figH2.update_traces(marker=dict(size=8, opacity=0.6))
figH2.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    xaxis_title='Principal Component 1',
    yaxis_title='Principal Component 2',
    title='Interlocking Hearts (PCA: 2 components)'
)
figH2.show()

In [28]:
# Projecting the transformed data into the first component space
figH1 = px.scatter(
    x=dataH[:, 0],
    y=np.zeros_like(dataH[:, 0]),
    color=yh,
    color_continuous_scale='bluered'
)
figH1.update_traces(marker=dict(size=8, opacity=0.6))
figH1.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    xaxis_title='Principal Component 1',
    yaxis_title='',
    title='Interlocking Hearts Data (Kernel PCA: 1 component)'
)
figH1.show()

In [29]:
# Kernel PCA
dataHK = kpca.fit_transform(Xh)

In [30]:
# Plotting the transformed data in a two principal component space
figHK2 = px.scatter(
    x=dataHK[:, 0],
    y=dataHK[:, 1],
    color=yh,
    color_continuous_scale='bluered'
)
figHK2.update_traces(marker=dict(size=8, opacity=0.6))
figHK2.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    xaxis_title='Principal Component 1',
    yaxis_title='Principal Component 2',
    title='Interlocking Hearts (Kernel PCA: 2 components)'
)
figHK2.show()

In [31]:
# Projecting the transformed data into the first component space
figHK1 = px.scatter(
    x=dataHK[:, 0],
    y=np.zeros_like(dataHK[:, 0]),
    color=yh,
    color_continuous_scale='bluered'
)
figHK1.update_traces(marker=dict(size=8, opacity=0.6))
figHK1.update_layout(
    coloraxis_showscale=False,
    width=600,
    height=400,
    xaxis_title='Principal Component 1',
    yaxis_title='',
    title='Interlocking Hearts Data (Kernel PCA: 1 component)'
)
figHK1.show()

## Conclusions

Key Takeaways
- Traditional PCA performs well with linearly separable data but fails to provide good separability with non-linear patterns.
- KPCA effectively transforms non-linear data into a linearly separable form in a new feature space, enhancing class separability.
- Choosing the right kernel and parameters is crucial for maximizing the effectiveness of KPCA in practical applications.

## References

- https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.KernelPCA.html
- Muller, A.C. & Guido, S. (2017) Introduction to Machine Learning with Python. A guide for Data scientists. USA: O'Reilly, chapter 3.
- VanderPlas, J. (2017) Python Data Science Handbook: Essential Tools for Working with Data. USA: O’Reilly Media, Inc. chapter 5