<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="cognitiveclass.ai logo"  />
</center>


# **Kernel Principal Component Analysis**


Rumor has it that the ultra-wealthy community consists of either investment bankers or entrepreneurs in the tech industry that dropped out of college. Is that stereotype really true? Ever wonder if the top billionaires in the world share anything in common? Although, we can't say with certainty what it takes to be one, we do have a way to determine if any patterns exist among the richest people in the world.

In this notebook, you will explore Kernel Principal Component Analysis (Kernel PCA) - an extension of principal component analysis (PCA) - to extract key feature patterns in the dataset, which is usually of higher dimension. In addition to analyzing billionaires around the globe, we will also use this unsupervised learning technique to denoise images.

![img](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-ML0187EN-SkillsNetwork/labs/module%203/images/RichPeople.png)


## **Table of Contents**

<!-- <a href="#Principle-Component-Analysis">Principle Component Analysis</a> -->

<ol>
    <li><a href="https://#Objectives">Objectives</a>
        <ol>
            <li><a href="https://#Datasets">Datasets</a></li>
        </ol>
    </li>
    <li>
        <a href="https://#Setup">Setup</a>
        <ol>
            <li><a href="https://#Installing-Required-Libraries">Installing Required Libraries</a></li>
            <li><a href="https://#Importing-Required-Libraries">Importing Required Libraries</a></li>
            <li><a href="https://#Defining-Helper-Functions">Defining Helper Functions</a></li>
        </ol>
    </li>
    <li>
        <a href="https://#Background">Background</a>
        <ol>
            <li><a href="https://#What-does-kernel-PCA-do-?">What does PCA do?</a></li>
        </ol>
    </li>
    <li>
        <a href="https://#Visual-Example-Transform-a-Dataset-Before-Applying-PCA">Visual Example - Transform a Dataset Before Applying PCA  </a>
        <ol>
            <li><a href="https://#Apply-PCA"> Apply PCA </a></li>
            <li><a href="https://#Transform-a-Dataset-to-a-Higher-Dimension-and-Then-Apply-PCA">Transform a Dataset to a Higher Dimension and Then Apply PCA</a></li>
        </ol>
    </li>
     <li>
        <a href="https://#Kernel-PCA"> Kernel PCA </a>
        <ol>
            <li><a href="https://#Whats-a-Kernel">What's a Kernel? </a></li>
            <li><a href="https://https://#Applying-Kernel-PCA">Applying Kernel PCA</a></li>
        </ol>
    </li>
    <li>
        <a href="https://#Using-Kernel-PCA-to-Predict-if-Youre-the-Richest-Person-in-the-World"> Using  Kernel PCA to Predict if You're the Richest Person in the World </a>
        <ol>
            <li><a href="https://#Data-Analysis">Data Analysis</a></li>
            <li><a href="#Applying-Kernel-PCA">Applying Kernel PCA</a></li>
            <li><a href="https://#Using-Kernel-PCA-to-Improve-Visualization">Using Kernel PCA to Improve Visualization  </a></li>
            <li><a href="https://#Using-Kernel-PCA-to-Improve-Prediction">Using Kernel PCA to Improve Prediction </a></li>
        </ol>
    </li>
</ol>

<a href="https://#Exercises">Exercises</a>

<ol>
    <li><a href="https://#Exercise-1---Fitting-PCA-Kernel">Exercise 1 - Fitting PCA and Kernel PCA Objects</a></li>
    <li><a href="https://#Exercise-2---Reconstruct-the-Digits">Exercise 2 - Reconstruct the Digits</a></li>
    <li><a href="https://#Exercise-3---Visualize-Denoised-Digit-Images">Exercise 3 - Visualize Denoised Digit Images</a></li>
</ol>


## Objectives

After completing this lab, you will be able to:

*   **Understand** why we  transform a dataset to a higher dimension and apply PCA.
*   **Understand** what is Kernel PCA.
*   **Apply** Kernel PCA effectively to real world datasets for purposes ranging from prediction to visualization.


### Installing Required Libraries

The following required modules are pre-installed in the Skills Network Labs environment. However if you run this notebook commands in a different Jupyter environment (e.g. Watson Studio or Anaconda) you will need to install these libraries by removing the `#` sign before `!mamba` in the following code cell.


In [1]:
import numpy as np
import pandas as pd
from itertools import accumulate

import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA, KernelPCA
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

In [3]:
import warnings

warnings.filterwarnings('ignore')

In [4]:
sns.set_context('notebook')
sns.set_style('white')

### Installing Required Libraries

The following required modules are pre-installed in the Skills Network Labs environment. However if you run this notebook commands in a different Jupyter environment (e.g. Watson Studio or Anaconda) you will need to install these libraries by removing the `#` sign before `!mamba` in the following code cell.


In [None]:
def plot_proj(A,v,y,name=None):

    plt.scatter(A[:, 0] ,A[:, 1], label='data', c=y, cmap='viridis')
    
    #plt.plot(np.linspace(A[:,0].min(),A[:,0].max()),np.linspace(A[:,1].min(),A[:,1].max())*(v[1]/v[0]),color='black',linestyle='--',linewidth=1.5,label=name)   
    plt.plot(np.linspace(-1, 1), np.linspace(-1, 1) * (v[1] / v[0]), color='black', linestyle='--', linewidth=1.5, label=name)  
    # Run through all the data

    for i in range(len(A[:, 0]) - 1):
        #data point 
        w=A[i, :]

        # projection
        cv = (np.dot(A[i, :], v)) / np.dot(v, np.transpose(v)) * v

        # line between data point and projection
        plt.plot([A[i, 0], cv[0]], [A[i, 1], cv[1]],'r--', linewidth=1.5)

    plt.plot([A[-1, 0], cv[0]], [A[-1, 1], cv[1]], 'r--', linewidth=1.5, label='projections' )
    plt.legend()
    plt.show()