<div class="alert alert-block alert-success">
    <h1 align="center">Machine Learning in Python</h1>
    <h3 align="center">Principal Component Analysis (PCA)</h3>
    <h4 align="center"><a href="https://github.com/AliBinary">Ali Ghanbari</a></h5>
</div>

![image.png](attachment:image.png)

# Principal Component Analysis (PCA)

![image.png](attachment:image.png)

where you can apply PCA?

* Data Visualization:

When working on any data related problem, the challenge in today's world is the sheer volume of data, and the variables/features that define that data. To solve a problem where data is the key, you need extensive data exploration like finding out how the variables are correlated or understanding the distribution of a few variables. Considering that there are a large number of variables or dimensions along which the data is distributed, visualization can be a challenge and almost impossible.
Hence, PCA can do that for you since it projects the data into a lower dimension, thereby allowing you to visualize the data in a 2D or 3D space with a naked eye.

* Speeding Machine Learning (ML) Algorithm:

Since PCA's main idea is dimensionality reduction, you can leverage that to speed up your machine learning algorithm's training and testing time considering your data has a lot of features, and the ML algorithm's learning is too slow.


## Importing the libraries

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

## Importing the dataset

In [None]:
df = pd.read_csv('iris.csv')
X = df.iloc[:,0:4].values
y = df.iloc[:,4].values

In [None]:
df.head()

## Feature Scaling

In [None]:
sc = StandardScaler()
X = sc.fit_transform(X)

![image.png](attachment:image.png)

## Applying PCA

In [None]:
pca = PCA(n_components = 2)
X = pca.fit_transform(X)

In [None]:
X

In [None]:
maindf = pd.DataFrame(data = X, columns = ['principal component 1', 'principal component 2'])

In [None]:
maindf

In [None]:
pca.components_

![image.png](attachment:image.png)

In [None]:
pca.explained_variance_

In [None]:
finaldf = pd.concat([maindf, df[['Species']]], axis = 1)
finaldf

## Visualising the results

In [None]:
fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1)
ax.set_xlabel('Principal Component 1', fontsize = 15)
ax.set_ylabel('Principal Component 2', fontsize = 15)
ax.set_title('2 component PCA', fontsize = 20)
targets = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
    indicesToKeep = finaldf['Species'] == target
    ax.scatter(finaldf.loc[indicesToKeep, 'principal component 1']
               , finaldf.loc[indicesToKeep, 'principal component 2']
               , c = color
               , s = 50)
ax.legend(targets)
ax.grid()