# Principal Component Analysis

This is not exactly a full machine learning algorithm. This is a technique to reduce the dimensions of the dataset(i.e, number of features) by capture the data space with which has more variance.


Think of PCA as projection of the dataset onto the plane passing through the maximum variance in the dataset.

<img src=1-1.png>



From the figure, we intivutively understand reducing the dimensions of the dataset by projecting the points onto the plane, there by the plane is the representative of the data points that are plotted on the chart. The PC1 and PC2 are the two components.


# Step - 0

Import libraries.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline

# Step - 1

Get the data, we are going to use the **Breast cancer dataset**, which is in-built with sklearn.

In [None]:
from sklearn.datasets import load_breast_cancer
#Load data.The load dataset function returns the dictionary, each key represents
#the details of the dataset.
cancer = load_breast_cancer()
df = pd.DataFrame(cancer['data'],columns=cancer['feature_names'])
df.head()

We observe that the dataset has 30 features. Let's use PCA and reduce the dimensions of the data.


**Note:** For applying PCA all the data points in the dataset must be scaled to zero mean. If one of the feature range is not comarable to the other feature then that feature must be scaled.

In [None]:
from sklearn.preprocessing import StandardScaler
# Make scalar object.
scaler = StandardScaler()
scaler.fit(df)
# Scale the data.
scaled_data = scaler.transform(df)

# Step - 2

Applying PCA is lot similar to the usage of other machine learning algorithms of sklearn.

In [None]:
from sklearn.decomposition import PCA
#Declaring PCA with "two" components.
pca = PCA(n_components=2)
#fitting the data.
pca.fit(scaled_data)

In [None]:
#Now we can transform this data to its first 2 principal components.
trans_data = pca.transform(scaled_data)

In [None]:
scaled_data.shape

In [None]:
trans_data.shape

# Step - 3

Let's visualize the power of PCA on our dataset.

In [None]:
plt.figure(figsize=(8,6))
plt.scatter(trans_data[:,0],trans_data[:,1],c=cancer['target'],cmap='plasma')
plt.xlabel('1st principal component')
plt.ylabel('2nd Principal Component')

---
                                    THE END