# Exercise: Principal component analysis

In this exercise we will implement some methods to identify patterns in data. 

Author: Stefano Pagani <stefano.pagani@polimi.it>.

Date: 2024

Course: Mathematical and numerical foundations of scientific machine learning

For further details on PCA, see Chapter 1, Brunton, S. L. and Kutz, J. N., Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control (2nd ed.).


In [None]:
# imports

import numpy as np
import scipy.io

import matplotlib.pyplot as plt


In [None]:
# Dataset creation

np.random.seed(42)

signals = np.zeros((200,800))

delta_t = 1.0/800.0
Nt      = 800

t =  np.linspace(0.0, Nt*delta_t, Nt, endpoint=False)

# number of samples
N = 200

# shift coef
shiftcoef = 0.05


for k in range(N):
    #plt.plot(t,(t-0)*(1-t)*( np.sin(2*np.pi*t+np.random.rand(1)) + 0.1*np.sin(2*np.pi*t*10**(np.random.rand(1))) ) )
    if (np.mod(k,4)==0):
        signals[k,:] = -np.exp( - ((t-0.4-shiftcoef*np.random.rand(1))**2)/(2*(0.001+0.005*np.random.rand(1)))  )
    else:
        signals[k,:] = -np.exp( - ((t-0.4-shiftcoef*np.random.rand(1))**2)/(2*(0.001+0.005*np.random.rand(1)))  ) + np.exp( - ((t-0.4-shiftcoef*np.random.rand(1))**2)/(2*(0.001+0.005*np.random.rand(1)))  )
        
    plt.plot(t,signals[k,:])
plt.show()



Task 1: plot some signals from the given datasets. What can you observed?


Task 2: construct a truncated PCA expansion of your dataset based on 10 components. 

See the online documentation of the function.

In [None]:
from sklearn.decomposition import PCA

pca = PCA(n_components=10)

print(np.shape(signals))
pca.fit(signals)



Task 3: extract the mean and the first three basis functions and plot them. 

What can you observe?

In [None]:
# mean
mean_f = pca.mean_
# first basis function
u_0 = pca.components_[0]
# second basis function
u_1 = pca.components_[1]
# third basis function
u_2 = pca.components_[2]

plt.plot(u_2)



Task 4: plot the cumulative sum of the explained variance ratio (how much variability is capture by the first n components)

In [None]:

plt.plot(np.cumsum(pca.explained_variance_ratio_),marker='*')
plt.show(block=False)
#plt.pause(0.1)
#sum(pca.explained_variance_ratio_)
print(pca.explained_variance_ratio_[0:3])



Task 5: compute the coefficients of the expansion associated with the first three basis functions. 

In [None]:
# inner product
coefficient_0 = ((signals-mean_f) @ u_0)
coefficient_1 = ((signals-mean_f) @ u_1)
coefficient_2 = ((signals-mean_f) @ u_2) 


Scatter plot of the components. are the coefficients clearly separated?

In [None]:

# 
col_vec = np.int32( np.mod(np.arange(0,200,1),4) ==0 )
plt.scatter(coefficient_0,coefficient_1,c=col_vec)



Task 6: compare the plot of the actual signals with the reconstructed ones.

In [None]:
ind_sel = 0
plt.plot(signals[ind_sel,:])
rec_signal = mean_f+ coefficient_0[ind_sel]*u_0 + coefficient_1[ind_sel]*u_1 + + coefficient_2[ind_sel]*u_2
plt.plot(rec_signal)


Task 7: study the effect of variable shift coefficients on the reconstruction. What can you observe?

Task 8: study the effect noise on the reconstruction. What can you observe?