# Walk Through Example PCA
(artificial data set)
<bf>

**0) Loading libraries**

In the first example we want to perform PCA on a simple, artificial dataset. First, we import the standard libraries:

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Next, we call *PCA* and a *plotly* library for 3D plots:

In [None]:
#pip install plotly

In [None]:
from sklearn.decomposition import PCA
import plotly.graph_objects as go

<br>

**1) Reading the data set**

Next, we read the dataset using *pandas*. Note, that the dataset is just an ordinary text file.

In [None]:
XYZdf = pd.read_csv("Rot.txt", sep='\s+', header = None)
XYZ   = np.array(XYZdf)

<br>

In [None]:
print(XYZ[:10,:])

**2) Plotting the data set**

The dataset has just three features. Therefore, we can visualize the data in a standard 3D scatter plot.

In [None]:
print(XYZ.shape)

a) 3D scatter plot

In [None]:
# standard3D plot
fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(projection='3d')
ax.scatter(XYZ[:,0], XYZ[:,1], XYZ[:,2], c = 'black', marker = 'o', s = 40)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
ax.tick_params(axis = 'both', which = 'major', labelsize = 15)
ax.zaxis.labelpad = -3 
plt.show()

In [None]:
mesh    = go.Mesh3d(x = XYZ[:,0], y = XYZ[:,1], z = XYZ[:,2], opacity = 0.5, color='rgba(244,22,100,0.6)')
scatter = go.Scatter3d(x = XYZ[:,0], y = XYZ[:,1], z = XYZ[:,2], mode = 'markers', marker = dict(size = 3, color = 'black'))
fig     = go.Figure(data = [mesh, scatter])
fig.update_layout(width = 800, height = 800, margin = dict(r = 10, b = 10, l = 10, t = 10))
fig.show()

It seems as if the x,y coordinates correlate with the z coordinate. We can check that by displaying the correlation values in a headmap: 

In [None]:
XYZdf.columns = ['X', 'Y', 'Z']
sns.heatmap(XYZdf.corr(), annot=True, cmap = sns.color_palette("Blues"))
plt.show()

A more sophisticated plot using *plotly* shows clearly the correlation of the x,y coordinates with the z coordinate.

<br>

**3) Running PCA**

Let us therefore now run a PCA, i. e. transforming the dataset into its proper (= eigen) coordinate system.

In [None]:
out = PCA(n_components = 3).fit(XYZ)

eigenVec = out.components_          #eigen vectors
eigenVal = out.explained_variance_  #eigen values
eigenXYZ = out.transform(XYZ)       #dataset transformed into its proper (= eigen) coordinate system

In [None]:
print(eigenVec)
#print(eigenVal)

In [None]:
v1 = eigenVec[:,0]
v2 = eigenVec[:,1]
v3 = eigenVec[:,2]

In [None]:
np.dot(v1,v2)

As we expected, one eigenvalue is essentially zero (keep the limited numerical accuracy in mind).

In [None]:
epsilon = np.finfo(float).eps
print(epsilon)

In [None]:
print(eigenVal)

<br>

**4) Plotting Eigenvalue Spectrum and Data in Eigenspace**

Finally, we want to plot the eigenvalue spectrum... 

In [None]:
xplot = np.arange(1,4)

fig = plt.figure(figsize=(5, 3))
plt.bar(xplot, eigenVal, color = (0.9, 0.9, 0.9), edgecolor = 'black')
plt.xlabel('dimension')
plt.ylabel('eigenvalue')
plt.yscale('log')
plt.xticks(xplot)
plt.show()

...and the data itself. Note that each eigen coordinate is now a linear combination of the previous coordinates.

In [None]:
# standard3D plot
fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(projection = '3d')
ax.scatter(eigenXYZ[:,0], eigenXYZ[:,1], eigenXYZ[:,2], c = 'black', marker = 'o', s = 40)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
ax.tick_params(axis = 'both', which = 'major', labelsize = 5)
ax.zaxis.labelpad = -15 #
plt.show()

Knowing, that the 3rd coordinate is not needed, we can display the data just with an ordinary 2D scatter plot.

In [None]:
plt.scatter(eigenXYZ[:,0], eigenXYZ[:,1], c = 'black', marker = 'o', s = 40)
plt.xlabel('eigen X')
plt.ylabel('eigen Y')
plt.show()