### 1. Matrix addition and subtraction

In [9]:
import numpy as np

A = np.array(
    [
        [1,2],
        [3,5]
    ]
)

B = np.array(
    [
        [1,2],
        [0,1]
    ]
)
print("C=",A + B,"\n")
print("D=",A - B)

# Used in gradient updates
# Error calculation
# Weight adjustments

C= [[2 4]
 [3 6]] 

D= [[0 0]
 [3 4]]


### 3. Matrix multiplication

In [12]:
C = A@B
C
# ML relevance
# Core of linear models
# Neural network forward pass
# Feature transformations

array([[ 1,  4],
       [ 3, 11]])

### 4. Identity Matrix 
 Intuition
 Does nothing when multiplied
 Like number 1 for matrices

In [18]:
I = np.eye(2)
print(A@I)

# ML relevance
#     Regularization
#     Matrix inversion
#     Stability in optimization

[[1. 2.]
 [3. 5.]]


### 5. Transpose : Swap rows and columns


In [21]:
print(A,"\n")
print(A.T)
# ML relevance
#     Used in covariance matrices
#     Dot products
#     Backpropagation math

[[1 2]
 [3 5]] 

[[1 3]
 [2 5]]


### Determinant (scaling intitution)

In [23]:
# Intuition
#     Measures how much a matrix scales space
#     Zero determinant â†’ information lost

print(np.linalg.det(A))

# ML meaning
#     det = 0 â†’ matrix not invertible
#     Indicates collinearity
#     Important in PCA and Gaussian models

-1.0000000000000004


### 7. Inverse (when it exists)

In [26]:
A_inv = np.linalg.inv(A)
print(A_inv)
print(A@A_inv)

# ML relevance
#     Normal equation in linear regression
#     Solving linear systems
#     Least squares

[[-5.  2.]
 [ 3. -1.]]
[[1. 0.]
 [0. 1.]]


### Linear regression equation y = X w

In [27]:
X = np.array(
    [
        [1,2],
        [3,8],
        [2,0]
    ]
)

w = np.array(
    [
        [0.1],
        [0.2]
    ]
)

y = X@w
y

array([[0.5],
       [1.9],
       [0.2]])

### Normal Equation (closed-form regression) $$w = (X^T X)^{-1} X^T y$$

In [35]:
w = (np.linalg.inv(X.T @ X)@X.T)@y
w

# Why this matters
#     Shows why transpose, inverse, multiplication exist
#     Explains why some matrices fail (singular)

array([[0.1],
       [0.2]])

### Regularization idea (intuition)

In [37]:
lambda_I = 0.1*np.eye(2)
stable_inv = np.linalg.inv(A.T@A+lambda_I)

stable_inv
# This idea is used in Ridge Regression.

array([[ 5.92668024, -3.46232179],
       [-3.46232179,  2.05702648]])

### Compute eigenvalues and eigenvectors

In [38]:
A

array([[1, 2],
       [3, 5]])

In [42]:
values , vectors = np.linalg.eig(A)
print(values,"\n\n", vectors)

[-0.16227766  6.16227766] 

 [[-0.86460354 -0.36126098]
 [ 0.50245469 -0.93246475]]


In [48]:
A @ vectors[:,0] , values[0]*vectors[:,0]

(array([ 0.14030584, -0.08153717]), array([ 0.14030584, -0.08153717]))

In [49]:
# 4. ML meaning of eigenvalues
#     Large eigenvalue â†’ important direction
#     Small eigenvalue â†’ less information
#     Zero eigenvalue â†’ redundant feature
# This is feature importance in linear algebra form.

### 5. Covariance matrix (entry point to PCA)
#### Sample dataset (2 features)

In [57]:
X = np.array([[2, 1],
              [3, 2],
              [4, 3],
              [5, 4]])
# Step 1: Center the data

X_centered = X - X.mean(axis=0)

# Always done before PCA

# Step 2: Compute covariance matrix
cov = np.cov(X_centered.T)
cov

# Covariance matrix shows:
#     variance
#     correlation between features

array([[1.66666667, 1.66666667],
       [1.66666667, 1.66666667]])

### 6. Eigenvectors of covariance matrix (PCA core)

In [62]:
eigenvalues, eigenvectors = np.linalg.eig(cov)

print(eigenvalues,"\n")
print(eigenvectors)

# Interpretation
#     Eigenvectors â†’ principal directions
#     Eigenvalues â†’ variance captured

[3.33333333 0.        ] 

[[ 0.70710678 -0.70710678]
 [ 0.70710678  0.70710678]]


### 7. Choosing principal components

In [64]:
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:,idx]

# Largest eigenvalue â†’ first principal component.

### 8. Project data onto principal component

In [65]:
PC1 = eigenvectors[:,0]
X_reduced = X_centered@PC1
X_reduced
# This is dimensionality reduction.

array([-2.12132034, -0.70710678,  0.70710678,  2.12132034])

In [66]:
# 9. Visual intuition (important)
# Before PCA:
#     data spread in 2D
# After PCA:
#     data spread mostly along 1 direction
# ML benefit:
#     less noise
#     faster models
#     better generalization

### 10. PCA using scikit-learn (real ML usage)

In [67]:
from sklearn.decomposition import PCA
pca = PCA(n_components=1)
X_pca = pca.fit_transform(X)
X_pca

array([[ 2.12132034],
       [ 0.70710678],
       [-0.70710678],
       [-2.12132034]])

In [68]:
pca.explained_variance_ratio_

array([1.])

# 12. Advanced ML insight (very important)

## Covariance matrix: $\Sigma = X^{T} X $

## Eigenvectors of $\Sigma$:
 *   define principal axes
 *   diagonalize the matrix

## This connects PCA to:
 * SVD
 * linear regression
 * Gaussian distributions

# ðŸ’¡ Advanced ML Insight: Connecting PCA Concepts

## The Covariance Matrix and Principal Axes

The foundation of Principal Component Analysis (PCA) lies in the **Covariance Matrix**, $\Sigma$.

For a data matrix $X$ (where columns are features and rows are observations, assuming $X$ is centered), the covariance matrix is defined as:

$$\Sigma = X^{T} X$$

### Eigen-Decomposition and Dimensionality Reduction

The **eigenvectors** of the covariance matrix ($\Sigma$) are crucial because they:
* Define the **Principal Axes** (the directions of maximum variance in the data).
* **Diagonalize the matrix**, transforming the data into a new coordinate system where the new features (principal components) are uncorrelated.

---

## PCA's Deep Connections

This framework establishes deep mathematical connections between PCA and several fundamental machine learning and statistical concepts:

* **Singular Value Decomposition (SVD):** The principal components of a data matrix $X$ are the **right singular vectors** of $X$ (or the eigenvectors of $X^TX$). The singular values are proportional to the square roots of the eigenvalues ($\sigma_i = \sqrt{\lambda_i}$).
* **Linear Regression:** PCA can be seen as finding the subspace that **best approximates** the data (minimizing the perpendicular distance to the subspace), which is closely related to the least squares minimization used in linear regression.
* **Gaussian Distributions:** When data is assumed to follow a **multivariate Gaussian distribution**, the principal axes (eigenvectors) of the covariance matrix align with the axes of the Gaussian's elliptical contours, making PCA an optimal method for linear dimensionality reduction in this context.

In [71]:
# eigenvectors = directions of variance
# eigenvalues = strength of variance
# PCA = keep strongest directions

# 13. Practice exercises (do these)

Compute eigenvalues of:  [[3, 1],[1, 3]]


* 1. Verify eigen equation A v = Î» v
* 2. Create a dataset with 3 features and compute covariance
* 3. Perform PCA manually using NumPy
* 4. Reduce data from 3D â†’ 2D
*  5.Compare with sklearn PCA output
* 6. Plot original vs PCA-reduced data