### Intro to Linear Combinations, Independence, Change of Basis & PCA  

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
dbp = np.array([78,80,81,82,84,86])
sbp = np.array([126,128,127,130,130,132])
health_index = np.array([0.6,0.8,0.7,0.9,0.94,0.98])
df = pd.DataFrame({"Diastolic BP":dbp, "Systolic BP":sbp, "Health Index": health_index})
df.head(10)

In [None]:
df_features = df[["Diastolic BP", "Systolic BP"]]
df_features

In [None]:
sns.set_style(style='whitegrid')

sp = sns.scatterplot(x=df["Diastolic BP"], y=df["Systolic BP"])
sp.set(xlabel ="DBP", ylabel = "SBP", title ='Diastolic Systolic plot')

plt.show()

**Activity: Calculate the variance of DBP and SBP**   

In [None]:
df_features.??

**Activity: Create a synthetic feature as linear combination of DBP and SBP**

1. Use beta1 = 0.8, beta2=0.6

**Activity: Write the above linear combination as Matrix vector product** 

**Activity: Create a new dataframe with synthetic feature column added and calculate variance of each column**

**Activity: How much of total variance is explained by synthetic feature**

Code this

**Activity: Add new synthetic features with different betas**

1. Create a new dataframe with features
2. Add synthetic feature as before (beta1=0.8, beta=0.6)
3. Add synthetic features for different betas
4. beta1=0.6, 0.98, 0.2, beta2=0.8, 0.2, 0.98 for DBP and SBP resepctively

In [None]:
cov_mat = df_features.cov()

# Watch out. If you are using numpy for calculating covariance matrix,
# then you have to first take a transpose data matrix
# It is pretty wierd. But it is what it is. :-(
# cov_mat = np.cov(df_features.to_numpy().T)

cov_mat

In [None]:
correl_mat = df_features.corr()
correl_mat

In [None]:
eigen_vals, eigen_vecs = np.linalg.eig(cov_mat)

In [None]:
eigen_vals

In [None]:
eigen_vecs

In [None]:
eigen_vecs[0] #this is not eigen vec 1

In [None]:
eigen_vecs[:, 0] # this is eigen vec 1

In [None]:
X = df_features.to_numpy()
v1 = eigen_vecs[:, 0]
v2 = eigen_vecs[:, 1]

**Matrix-Vector multiplication as vector of projections**

In [None]:
# Xv1 gives PC1
# It gives the vector formed whose individual entries are 
# projection of individual records onto the direction of v1 
PC1 = np.matmul(X, v1)
PC1

In [None]:
# Xv2 gives PC2
# It gives the vector formed whose individual entries are 
# projection of individual records onto the direction of v2
PC2 = np.matmul(X, v2)
PC2

**Viewing Vx1 (Multiplying a record vector with Eigen matrix) as change of basis**

In [None]:
# Make Eigen Matrix
V = np.array([v1, v2])
V

In [None]:
val_in_std_basis = X[0,:]
print(f"val_in_std_basis= {val_in_std_basis}")

val_in_eigen_basis = np.matmul(V, val_in_std_basis)
print(f"val_in_eigen_basis= {val_in_eigen_basis}")

**Summary**

Matrix Vector product can be viewed in 4-5 ways from a ML and data science perspective. 
1. Projection of the dataset record onto a vector
2. Linear combination of features
3. Change of basis: What are the new coordinates of a record vector when viewed from a new basis (This could be Eigen basis or any other basis)
4. Matrix is a linear transformation applied to vector (And a special case is Eigen vector transformation)

There are 2-3+ more ways of looking at it (in future)
1. Row picture
2. Reduced dimension linear combination of archetype
3. Spectral addition over low rank matrices

**Dot product of vectors**
$$
    a^Tb = \|a\| \|b\| cos\theta
$$

In [None]:
# Dot product of standard unit vectors e1 and e2
e1 = np.array([1,0,0])
e2 = np.array([0,1,0])
np.dot(e1, e2)

In [None]:
# Dot products of eigen vectors
np.dot(v1, v2)