### Problem Formulation

We need to choose the vector along which the variance of the data is maximized. 

![image.png](attachment:image.png)

u is the unit vector along which we are projecting

![image.png](attachment:image.png)

Now, we need to find u such that the variance is maximized

![image.png](attachment:image.png)

We need to choose the best vector

Now, simply finding the maximization, is a optimization problem

### Covariance and Covariance Matrix

Variance doesn't capture the relationship between two features

![image.png](attachment:image.png)

# 🔗 Covariance and Correlation

Both **covariance** and **correlation** measure the relationship between two variables, but they differ in scale and interpretation.

---

## 📊 Covariance

**Covariance** measures the **direction** of the linear relationship between two variables.

### 🧮 Formula (sample covariance between $X$ and $Y$):

$$
\text{Cov}(X, Y) = \frac{1}{m} \sum_{i=1}^{m} (x_i - \bar{x})(y_i - \bar{y})
$$

Where:
* $x_i$, $y_i$: the $i$-th values of variables $X$ and $Y$
* $\bar{x}$, $\bar{y}$: means of $X$ and $Y$
* $m$: number of samples

### 📌 Interpretation:

* $\text{Cov}(X, Y) > 0$: Both variables tend to **increase together** (positive linear relationship).
* $\text{Cov}(X, Y) < 0$: One variable tends to increase while the other **decreases** (negative linear relationship).
* $\text{Cov}(X, Y) \approx 0$: There is **no linear relationship** between the variables.

❗ **Important Note:** Covariance is **not normalized**, so its value depends on the scale and units of the variables. A large covariance doesn't necessarily mean a strong relationship, just that the values are large.

---

## 🔗 Correlation

**Correlation** is the **normalized version of covariance**. It tells us **how strong** the linear relationship is, regardless of the units of the variables.

### 🧮 Formula (Pearson correlation coefficient):

$$
\text{Corr}(X, Y) = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}
$$

Where:
* $\sigma_X$, $\sigma_Y$: standard deviations of $X$ and $Y$

### 📌 Interpretation:

* $\text{Corr}(X, Y) = 1$: A **perfect positive linear relationship**. As one variable increases, the other increases proportionally.
* $\text{Corr}(X, Y) = -1$: A **perfect negative linear relationship**. As one variable increases, the other decreases proportionally.
* $\text{Corr}(X, Y) = 0$: **No linear relationship**. Changes in one variable don't predict changes in the other in a linear fashion.

✔️ **Key Feature:** Correlation is always in the range $[-1, 1]$, making it easy to compare the strength of relationships across different datasets or variable pairs.

---

## 🧠 In PCA

* The **covariance matrix** is a fundamental component in PCA (Principal Component Analysis). It tells us how each pair of features in a dataset varies together.
* PCA uses the **eigenvectors** of the covariance matrix to find the principal components, which are the directions of maximum variance in the data.
* If your data is **not standardized** (i.e., not scaled to have a mean of zero and a standard deviation of one), PCA directly uses the **covariance matrix**.
* If your data **is standardized**, performing PCA using the covariance matrix is mathematically equivalent to using the **correlation matrix**. This is because standardizing the data makes the covariance matrix equal to the correlation matrix.

# 📈 The Covariance Matrix

The **covariance matrix** is a fundamental concept in multivariate statistics, providing a way to understand the relationships and spread within a dataset containing multiple variables.

---

## 🧐 What is it?

A covariance matrix is a **square matrix** where:
* The **diagonal elements** represent the **variance** of each individual variable.
* The **off-diagonal elements** represent the **covariance** between each pair of variables.

It effectively summarizes the variances of all variables and the covariances between all pairs of variables in a dataset.

---

## 🎯 Purpose

The primary purpose of a covariance matrix is to:
* Quantify the **inter-dependencies** between multiple random variables.
* Describe the **shape and orientation** of a multidimensional data distribution (e.g., in a multivariate normal distribution).
* Serve as a crucial input for various statistical and machine learning algorithms, particularly those dealing with dimensionality reduction and feature relationships.

---

## 🧮 Formula and Construction

For a dataset with $p$ variables $X_1, X_2, \ldots, X_p$, the covariance matrix $\Sigma$ (or $C$) is a $p \times p$ symmetric matrix.

Each element $\Sigma_{jk}$ (or $C_{jk}$) of the matrix is calculated as:

$$
\Sigma_{jk} = \text{Cov}(X_j, X_k) = \frac{1}{m} \sum_{i=1}^{m} (x_{ij} - \bar{x}_j)(x_{ik} - \bar{x}_k)
$$

Where:
* $X_j$ and $X_k$ are the $j$-th and $k$-th variables (features).
* $x_{ij}$ is the $i$-th observation of the $j$-th variable.
* $\bar{x}_j$ is the mean of the $j$-th variable.
* $m$ is the number of samples (observations).

Alternatively, it can be expressed in matrix form:

Let $X$ be an $m \times p$ data matrix, where each row is an observation and each column is a variable. Let $\mathbf{\bar{x}}$ be a $1 \times p$ row vector of the means of each column. Then, the centered data matrix $X_c = X - \mathbf{1}\mathbf{\bar{x}}$, where $\mathbf{1}$ is an $m \times 1$ column vector of ones.

The covariance matrix $\Sigma$ can be calculated as:

$$
\Sigma = \frac{1}{m} X_c^T X_c
$$
*(Note: Some definitions use $\frac{1}{m-1}$ for sample covariance for an unbiased estimate, particularly in smaller samples. For population covariance or large datasets, $\frac{1}{m}$ is common.)*

---

## ✨ Key Properties

1.  **Symmetric:** $\Sigma_{jk} = \Sigma_{kj}$, meaning the covariance of $X_j$ with $X_k$ is the same as $X_k$ with $X_j$.
2.  **Positive Semi-Definite:** For any non-zero vector $\mathbf{a}$, $\mathbf{a}^T \Sigma \mathbf{a} \ge 0$. This implies that all its eigenvalues are non-negative. If the matrix is positive definite, all eigenvalues are strictly positive.
3.  **Real Matrix:** Contains real numbers.

---

## 📌 Interpretation of Elements

* **Diagonal Elements ($\Sigma_{jj}$):** These are the variances of the individual variables ($\text{Var}(X_j) = \text{Cov}(X_j, X_j)$). A larger value indicates greater spread for that variable.
* **Off-Diagonal Elements ($\Sigma_{jk}$, where $j \ne k$):** These are the covariances between two different variables.
    * **Positive value:** $X_j$ and $X_k$ tend to increase or decrease together.
    * **Negative value:** As $X_j$ increases, $X_k$ tends to decrease, and vice-versa.
    * **Value close to zero:** Little to no linear relationship between $X_j$ and $X_k$.

---

## 🌐 Applications and Significance

The covariance matrix is vital in many areas:

1.  **Principal Component Analysis (PCA):** This is perhaps its most famous application. PCA uses the eigenvectors and eigenvalues of the covariance matrix to find the principal components, which represent the directions of maximum variance in the data.
2.  **Multivariate Normal Distribution:** The covariance matrix fully defines the shape of a multivariate normal distribution.
3.  **Linear Discriminant Analysis (LDA):** Used to project features into a lower-dimensional space, often involving within-class and between-class covariance matrices.
4.  **Portfolio Theory (Finance):** Used to calculate the risk (variance) of a portfolio of assets based on the covariances between asset returns.
5.  **Mahalanobis Distance:** This distance metric uses the covariance matrix to account for the correlation between variables when measuring distances.
6.  **Independent Component Analysis (ICA):** Involves decorrelation, often using a whitening step based on the covariance matrix.

---

## ↔️ Covariance Matrix vs. Correlation Matrix

* **Covariance Matrix:** Contains raw covariances and variances. Its values depend on the scale of the variables.
* **Correlation Matrix:** A normalized version of the covariance matrix. Its diagonal elements are all 1s (as correlation of a variable with itself is 1), and off-diagonal elements are Pearson correlation coefficients, ranging from -1 to 1.

The correlation matrix is often preferred when the variables have vastly different scales, as it provides a standardized measure of linear relationship. PCA using a covariance matrix is equivalent to using a correlation matrix if the data is first standardized (scaled to unit variance).

# 🧮 Matrix Transformations, Eigenvectors, and Eigenvalues

These concepts are fundamental in linear algebra and are crucial for understanding how linear transformations affect vectors, and for many applications in data science, physics, engineering, and more.

---

## 🔁 Matrix Transformations

A **matrix transformation** is a function that takes a vector as input and outputs another vector, typically by multiplying the input vector by a matrix. It essentially "transforms" the vector's position, direction, or magnitude in space.

If $A$ is an $m \times n$ matrix and $\mathbf{x}$ is an $n \times 1$ column vector, then the transformation $T(\mathbf{x}) = A\mathbf{x}$ maps vectors from $\mathbb{R}^n$ to $\mathbb{R}^m$.

### 💡 Geometric Interpretation:
Matrix transformations can represent various geometric operations on vectors or points in space:

* **Scaling (Dilation/Contraction):** Changes the length of a vector.
    Example (2D): Scaling by factor $k$ along both axes.
    $$
    A = \begin{pmatrix} k & 0 \\ 0 & k \end{pmatrix}
    $$
* **Rotation:** Changes the direction of a vector around the origin.
    Example (2D): Rotation by angle $\theta$ counter-clockwise.
    $$
    R = \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix}
    $$
* **Reflection:** Flips a vector across a line or plane.
    Example (2D): Reflection across the x-axis.
    $$
    A = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}
    $$
* **Shear:** Skews the shape of a vector or object, distorting angles.
    Example (2D): Shear along the x-axis.
    $$
    A = \begin{pmatrix} 1 & k \\ 0 & 1 \end{pmatrix}
    $$
* **Projection:** Maps a vector onto a subspace (e.g., a line or plane).
    Example (2D): Projection onto the x-axis.
    $$
    P = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}
    $$

### Why are they important?
They provide a powerful framework to describe how entire spaces (or objects within them) are stretched, squashed, rotated, or moved. This is crucial in computer graphics, physics, engineering, and many areas of data science.

---

## 🌟 Eigenvectors and Eigenvalues

While a general matrix transformation can drastically change a vector's direction and magnitude, some special vectors retain their original direction (or merely reverse it) after the transformation. These special vectors are called **eigenvectors**, and the factor by which they are scaled is called their **eigenvalue**.

### 🔗 The Eigenvalue Equation:

For a square matrix $A$ (which represents a linear transformation), a non-zero vector $\mathbf{v}$ is an **eigenvector** of $A$ if applying the transformation $A$ to $\mathbf{v}$ simply scales $\mathbf{v}$ by a scalar factor $\lambda$. This relationship is expressed by the **eigenvalue equation**:

$$
A\mathbf{v} = \lambda\mathbf{v}
$$

Where:
* $A$: The square matrix representing the linear transformation.
* $\mathbf{v}$: The **eigenvector** (a non-zero vector).
* $\lambda$: The **eigenvalue** (a scalar value).

### 💡 Geometric Interpretation:

* **Eigenvectors ($\mathbf{v}$):** These are the "special directions" or "invariant directions" of the transformation. When the matrix $A$ acts on an eigenvector $\mathbf{v}$, the resulting vector $A\mathbf{v}$ lies on the same line as $\mathbf{v}$ (it's parallel to $\mathbf{v}$). The transformation merely stretches or shrinks the eigenvector, or reverses its direction, but does not rotate it off its original span.
* **Eigenvalues ($\lambda$):** This scalar tells us **how much** the eigenvector is stretched or shrunk (or if its direction is reversed).
    * If $|\lambda| > 1$, the eigenvector is stretched.
    * If $0 < |\lambda| < 1$, the eigenvector is shrunk.
    * If $\lambda = 1$, the eigenvector remains unchanged in length.
    * If $\lambda = 0$, the eigenvector is mapped to the zero vector (it lies in the null space of $A$).
    * If $\lambda < 0$, the eigenvector's direction is reversed in addition to being scaled.

### How to Find Eigenvalues and Eigenvectors:

1.  **Find Eigenvalues ($\lambda$):**
    Rearrange the eigenvalue equation:
    $A\mathbf{v} = \lambda\mathbf{v}$
    $A\mathbf{v} - \lambda\mathbf{v} = \mathbf{0}$
    $A\mathbf{v} - \lambda I\mathbf{v} = \mathbf{0}$ (where $I$ is the identity matrix)
    $(A - \lambda I)\mathbf{v} = \mathbf{0}$

    For $\mathbf{v}$ to be a non-zero vector, the matrix $(A - \lambda I)$ must be singular (non-invertible), which means its determinant must be zero:
    $$
    \text{det}(A - \lambda I) = 0
    $$
    This is called the **characteristic equation**. Solving this polynomial equation for $\lambda$ gives you the eigenvalues.

2.  **Find Eigenvectors ($\mathbf{v}$):**
    For each eigenvalue $\lambda$ found, substitute it back into the equation $(A - \lambda I)\mathbf{v} = \mathbf{0}$ and solve the system of linear equations for $\mathbf{v}$. The non-zero solutions are the eigenvectors corresponding to that eigenvalue. Note that if $\mathbf{v}$ is an eigenvector, then any non-zero scalar multiple $c\mathbf{v}$ is also an eigenvector for the same eigenvalue.

---

## 📊 Applications and Significance

Eigenvalues and eigenvectors are incredibly powerful and have applications across numerous fields:

1.  **Principal Component Analysis (PCA):** In data science, PCA uses the eigenvectors of the **covariance matrix** (or correlation matrix) to identify the principal components. These eigenvectors represent the directions of maximum variance in the data, allowing for dimensionality reduction while retaining the most important information. The eigenvalues indicate the amount of variance captured by each principal component.

2.  **Google PageRank Algorithm:** The PageRank algorithm, which ranks web pages, is essentially an eigenvector problem. The PageRank of a page is an element of the dominant eigenvector of a matrix representing the link structure of the web.

3.  **Vibration Analysis (Engineering):** In mechanical engineering, eigenvalues represent the natural frequencies of vibration of a structure (e.g., bridges, buildings), and eigenvectors represent the corresponding mode shapes (how the structure deforms during vibration).

4.  **Quantum Mechanics:** In quantum mechanics, eigenvalues represent the possible measurable values of a physical quantity (like energy), and eigenvectors represent the states of the system associated with those values.

5.  **Facial Recognition (Eigenfaces):** Eigenvectors are used to create "eigenfaces," which are a set of basis images used to efficiently represent and recognize human faces.

6.  **Markov Chains:** Eigenvalues and eigenvectors help analyze the long-term behavior and stability of systems modeled by Markov chains. The eigenvector corresponding to an eigenvalue of 1 often represents the steady-state distribution.

7.  **Image Compression:** Techniques like Singular Value Decomposition (SVD), which is closely related to eigen-decomposition, use eigenvalues to compress images by retaining only the most significant components.

These concepts form the bedrock of many advanced analytical techniques and are indispensable tools for understanding complex systems.

# 🔢 Example: Finding Eigenvalues and Eigenvectors for a 3x3 Matrix

Let's find the eigenvalues and eigenvectors for the following 3x3 matrix $A$:

$$
A = \begin{pmatrix} 2 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 4 \end{pmatrix}
$$

---

## Step 1: Set up the Characteristic Equation

The characteristic equation is given by $\text{det}(A - \lambda I) = 0$, where $I$ is the identity matrix of the same dimension as $A$, and $\lambda$ represents the eigenvalues we are trying to find.

First, let's form the matrix $(A - \lambda I)$:

$$
A - \lambda I = \begin{pmatrix} 2 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 4 \end{pmatrix} - \lambda \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} = \begin{pmatrix} 2-\lambda & 0 & 0 \\ 0 & 3-\lambda & 0 \\ 0 & 0 & 4-\lambda \end{pmatrix}
$$

Now, calculate the determinant of this matrix and set it to zero:

$$
\text{det}(A - \lambda I) = (2-\lambda)(3-\lambda)(4-\lambda) = 0
$$

---

## Step 2: Find the Eigenvalues ($\lambda$)

From the characteristic equation $(2-\lambda)(3-\lambda)(4-\lambda) = 0$, the values of $\lambda$ that satisfy this equation are:

* $2 - \lambda = 0 \implies \lambda_1 = 2$
* $3 - \lambda = 0 \implies \lambda_2 = 3$
* $4 - \lambda = 0 \implies \lambda_3 = 4$

So, the eigenvalues of matrix $A$ are $2, 3,$ and $4$.

---

## Step 3: Find the Eigenvectors ($\mathbf{v}$) for Each Eigenvalue

For each eigenvalue, we substitute its value back into the equation $(A - \lambda I)\mathbf{v} = \mathbf{0}$ and solve for the non-zero vector $\mathbf{v} = \begin{pmatrix} v_1 \\ v_2 \\ v_3 \end{pmatrix}$.

### Case 1: For $\lambda_1 = 2$

Substitute $\lambda = 2$ into $(A - \lambda I)\mathbf{v} = \mathbf{0}$:

$$
\begin{pmatrix} 2-2 & 0 & 0 \\ 0 & 3-2 & 0 \\ 0 & 0 & 4-2 \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ v_3 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}
$$

$$
\begin{pmatrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ v_3 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}
$$

This matrix equation translates to the following system of linear equations:
1.  $0v_1 + 0v_2 + 0v_3 = 0$ (This equation is always true and provides no constraint)
2.  $0v_1 + 1v_2 + 0v_3 = 0 \implies v_2 = 0$
3.  $0v_1 + 0v_2 + 2v_3 = 0 \implies v_3 = 0$

Here, $v_1$ can be any real number. Since eigenvectors must be non-zero, we can choose a simple non-zero value for $v_1$. Let $v_1 = 1$.

Thus, the eigenvector corresponding to $\lambda_1 = 2$ is:
$$
\mathbf{v}_1 = \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}
$$
(Any non-zero scalar multiple of $\mathbf{v}_1$ is also an eigenvector for $\lambda_1=2$).

### Case 2: For $\lambda_2 = 3$

Substitute $\lambda = 3$ into $(A - \lambda I)\mathbf{v} = \mathbf{0}$:

$$
\begin{pmatrix} 2-3 & 0 & 0 \\ 0 & 3-3 & 0 \\ 0 & 0 & 4-3 \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ v_3 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}
$$

$$
\begin{pmatrix} -1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ v_3 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}
$$

This translates to the system:
1.  $-1v_1 + 0v_2 + 0v_3 = 0 \implies v_1 = 0$
2.  $0v_1 + 0v_2 + 0v_3 = 0$ (Free variable)
3.  $0v_1 + 0v_2 + 1v_3 = 0 \implies v_3 = 0$

Here, $v_2$ can be any real number. Let $v_2 = 1$.

Thus, the eigenvector corresponding to $\lambda_2 = 3$ is:
$$
\mathbf{v}_2 = \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix}
$$

### Case 3: For $\lambda_3 = 4$

Substitute $\lambda = 4$ into $(A - \lambda I)\mathbf{v} = \mathbf{0}$:

$$
\begin{pmatrix} 2-4 & 0 & 0 \\ 0 & 3-4 & 0 \\ 0 & 0 & 4-4 \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ v_3 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}
$$

$$
\begin{pmatrix} -2 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 0 \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ v_3 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}
$$

This translates to the system:
1.  $-2v_1 + 0v_2 + 0v_3 = 0 \implies v_1 = 0$
2.  $0v_1 - 1v_2 + 0v_3 = 0 \implies v_2 = 0$
3.  $0v_1 + 0v_2 + 0v_3 = 0$ (Free variable)

Here, $v_3$ can be any real number. Let $v_3 = 1$.

Thus, the eigenvector corresponding to $\lambda_3 = 4$ is:
$$
\mathbf{v}_3 = \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}
$$

---

## Summary of Results

For the matrix $A = \begin{pmatrix} 2 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 4 \end{pmatrix}$:

* **Eigenvalue $\lambda_1 = 2$** has eigenvector $\mathbf{v}_1 = \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}$
* **Eigenvalue $\lambda_2 = 3$** has eigenvector $\mathbf{v}_2 = \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix}$
* **Eigenvalue $\lambda_3 = 4$** has eigenvector $\mathbf{v}_3 = \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}$

This example nicely illustrates the concept, especially for diagonal matrices where the eigenvalues are simply the diagonal entries and the eigenvectors are the standard basis vectors (scaled). For more complex matrices, finding the roots of the characteristic polynomial and solving the resulting systems of equations can be more computationally intensive.

Let's break down the process of calculating eigenvectors step-by-step, using an example. The core idea is that an eigenvector v for a given eigenvalue λ satisfies the equation:

Av=λv

This equation can be rewritten as:

(A−λI)v=0

Where I is the identity matrix. Your goal is to find the non-zero vectors v that satisfy this homogeneous system of linear equations. This means finding the null space (or kernel) of the matrix (A−λI).

Method: Solving (A−λI)v=0 using Gaussian Elimination
The most common method to solve this system is Gaussian elimination (also known as row reduction).

Form the matrix (A−λI).
Set up the augmented matrix [(A−λI)∣0].
Perform row operations to transform (A−λI) into its row-reduced echelon form (RREF).
Interpret the RREF to find the components of v. You will typically find one or more "free variables" which allow you to express the eigenvector in terms of a parameter.
Example 1: Revisit the Diagonal 3x3 Matrix (Simple Case)
Let's reuse our previous matrix A and focus on finding an eigenvector for one of its eigenvalues.

A= 

​
  
2
0
0
​
  
0
3
0
​
  
0
0
4
​
  

​
 

We already found the eigenvalues are λ 
1
​
 =2, λ 
2
​
 =3, and λ 
3
​
 =4. Let's find the eigenvector for λ 
1
​
 =2.

Goal: Find v= 

​
  
v 
1
​
 
v 
2
​
 
v 
3
​
 
​
  

​
  such that (A−2I)v=0.

Form (A−2I):
$$
A - 2I = \begin{pmatrix} 2-2 & 0 & 0 \ 0 & 3-2 & 0 \ 0 & 0 & 4-2 \end{pmatrix} = \begin{pmatrix} 0 & 0 & 0 \ 0 & 1 & 0 \ 0 & 0 & 2 \end{pmatrix}
$$

Set up the system (A−2I)v=0:
$$
\begin{pmatrix} 0 & 0 & 0 \ 0 & 1 & 0 \ 0 & 0 & 2 \end{pmatrix} \begin{pmatrix} v_1 \ v_2 \ v_3 \end{pmatrix} = \begin{pmatrix} 0 \ 0 \ 0 \end{pmatrix}
$$

Interpret the system:
This directly gives us the equations:

0v 
1
​
 +0v 
2
​
 +0v 
3
​
 =0 (This equation is always true; it doesn't give us information about v 
1
​
 )
0v 
1
​
 +1v 
2
​
 +0v 
3
​
 =0⟹v 
2
​
 =0
0v 
1
​
 +0v 
2
​
 +2v 
3
​
 =0⟹v 
3
​
 =0
Since the first equation doesn't constrain v 
1
​
 , v 
1
​
  is a free variable. This means v 
1
​
  can be any non-zero real number. To get a specific eigenvector, we usually pick a simple non-zero value for the free variable(s). Let's choose v 
1
​
 =1.

Construct the eigenvector:
With v 
1
​
 =1, v 
2
​
 =0, and v 
3
​
 =0, our eigenvector for λ 
1
​
 =2 is:
$$
\mathbf{v}_1 = \begin{pmatrix} 1 \ 0 \ 0 \end{pmatrix}
$$
(Any scalar multiple of this vector, e.g.,  

​
  
5
0
0
​
  

​
 , is also a valid eigenvector for λ 
1
​
 =2.)

Example 2: A 2x2 Matrix (More General Case with Row Reduction)
Let's consider a non-diagonal matrix to see Gaussian elimination in action.
Let A=( 
1
3
​
  
2
2
​
 ).

First, find eigenvalues (briefly):
Characteristic equation: det(A−λI)=0
(1−λ)(2−λ)−(2)(3)=0
2−λ−2λ+λ 
2
 −6=0
λ 
2
 −3λ−4=0
(λ−4)(λ+1)=0
So, the eigenvalues are λ 
1
​
 =4 and λ 
2
​
 =−1.

Finding Eigenvector for λ 
1
​
 =4:
Goal: Find v=( 
v 
1
​
 
v 
2
​
 
​
 ) such that (A−4I)v=0.

Form (A−4I):
$$
A - 4I = \begin{pmatrix} 1-4 & 2 \ 3 & 2-4 \end{pmatrix} = \begin{pmatrix} -3 & 2 \ 3 & -2 \end{pmatrix}
$$

Set up the augmented matrix:
$$
\left[ \begin{array}{cc|c} -3 & 2 & 0 \ 3 & -2 & 0 \end{array} \right]
$$

Perform Row Operations (Gaussian Elimination):
Add Row 1 to Row 2 (R 
2
​
 →R 
2
​
 +R 
1
​
 ):
$$
\left[ \begin{array}{cc|c} -3 & 2 & 0 \ 0 & 0 & 0 \end{array} \right]
$$
(The goal of row reduction is to get leading 1s and zeros below them, but here we see a row of zeros, which is expected for matrices (A−λI) because their determinant is zero, meaning they are singular and have a non-trivial null space).

Interpret the RREF:
The matrix translates back to a single equation:
−3v 
1
​
 +2v 
2
​
 =0

From this, we can express one variable in terms of the other. Let's solve for v 
1
​
 :
3v 
1
​
 =2v 
2
​
 ⟹v 
1
​
 = 
3
2
​
 v 
2
​
 

Here, v 
2
​
  is the free variable. We can choose any non-zero value for v 
2
​
 . To avoid fractions, let's pick v 
2
​
 =3.
Then v 
1
​
 = 
3
2
​
 (3)=2.

Construct the eigenvector:
The eigenvector for λ 
1
​
 =4 is:
$$
\mathbf{v}_1 = \begin{pmatrix} 2 \ 3 \end{pmatrix}
$$
Verification: Av 
1
​
 =( 
1
3
​
  
2
2
​
 )( 
2
3
​
 )=( 
(1)(2)+(2)(3)
(3)(2)+(2)(3)
​
 )=( 
2+6
6+6
​
 )=( 
8
12
​
 )
And λ 
1
​
 v 
1
​
 =4( 
2
3
​
 )=( 
8
12
​
 ). This matches!

Finding Eigenvector for λ 
2
​
 =−1:
Goal: Find v=( 
v 
1
​
 
v 
2
​
 
​
 ) such that (A−(−1)I)v=0, which is (A+I)v=0.

Form (A+I):
$$
A + I = \begin{pmatrix} 1+1 & 2 \ 3 & 2+1 \end{pmatrix} = \begin{pmatrix} 2 & 2 \ 3 & 3 \end{pmatrix}
$$

Set up the augmented matrix:
$$
\left[ \begin{array}{cc|c} 2 & 2 & 0 \ 3 & 3 & 0 \end{array} \right]
$$

Perform Row Operations:

Multiply Row 1 by  
2
1
​
  (R 
1
​
 → 
2
1
​
 R 
1
​
 ): $$ \left[ \begin{array}{cc|c} 1 & 1 & 0 \ 3 & 3 & 0 \end{array} \right] $$
Subtract 3 times Row 1 from Row 2 (R 
2
​
 →R 
2
​
 −3R 
1
​
 ): $$ \left[ \begin{array}{cc|c} 1 & 1 & 0 \ 0 & 0 & 0 \end{array} \right] $$
Interpret the RREF:
The matrix translates to the single equation:
1v 
1
​
 +1v 
2
​
 =0⟹v 
1
​
 =−v 
2
​
 

Here, v 
2
​
  is the free variable. Let's choose v 
2
​
 =1.
Then v 
1
​
 =−(1)=−1.

Construct the eigenvector:
The eigenvector for λ 
2
​
 =−1 is:
$$
\mathbf{v}_2 = \begin{pmatrix} -1 \ 1 \end{pmatrix}
$$
Verification: Av 
2
​
 =( 
1
3
​
  
2
2
​
 )( 
−1
1
​
 )=( 
(1)(−1)+(2)(1)
(3)(−1)+(2)(1)
​
 )=( 
−1+2
−3+2
​
 )=( 
1
−1
​
 )
And λ 
2
​
 v 
2
​
 =−1( 
−1
1
​
 )=( 
1
−1
​
 ). This matches!

Eigenvalues are the values for lambda

![image.png](attachment:image.png)

Simply, told, the largest eigenvector(the one with the highest eigenvalue) is the value of u for which variance is maximised.

### Step by Step solution

### 🎯 1) Mean-Centering the Data

Making the data **mean-centric** (or **mean-centering**) means shifting the data so that **each feature has a mean of zero**.

---

### ❓ Why Mean-Center the Data?

PCA (and many ML algorithms) analyze how data varies around the **mean**. If data isn't centered:
- The principal components may not align with the true directions of maximum variance.
- The origin (0, 0, ..., 0) becomes a poor reference point.

Mean-centering ensures PCA captures variance relative to the **true data center**.

---

### 🔧 How Is It Done?

Let $X$ be your dataset with $m$ samples and $n$ features.

For each feature (column) $x_j$:
1. Compute the mean:
   $$
   \bar{x}_j = \frac{1}{m} \sum_{i=1}^{m} x_{ij}
   $$
2. Subtract it from each value:
   $$
   x_{ij}^{\text{centered}} = x_{ij} - \bar{x}_j
   $$

After this transformation, each feature column satisfies:
$$
\frac{1}{m} \sum_{i=1}^{m} x_{ij}^{\text{centered}} = 0
$$

---

### 🧠 Example

Original column: $[4, 6, 8]$
- Mean: $(4 + 6 + 8)/3 = 6$
- Mean-centered: $[4 - 6, 6 - 6, 8 - 6] = [-2, 0, 2]$

---

### ✅ Summary
- **Mean-centering** shifts data so its mean is at the origin.
- **Essential** for PCA: ensures principal components reflect **true variance** directions.
- Common preprocessing step in unsupervised learning tasks.



### 📊 2) Get the Covariance Matrix

The **covariance matrix** captures how much each pair of features in a dataset **vary together**.

It is a key component in PCA: principal components are the **eigenvectors** of the covariance matrix.

---

### 🧮 Definition

Let $X$ be the mean-centered data matrix of shape $(m \times n)$:
- $m$ = number of samples
- $n$ = number of features

The **covariance matrix** $\Sigma$ is an $n \times n$ matrix:
$$
\Sigma = \frac{1}{m} X^T X
$$

Each element $\Sigma_{ij}$ is the **covariance between feature $i$ and feature $j$**:
$$
\Sigma_{ij} = \text{Cov}(x_i, x_j) = \frac{1}{m} \sum_{k=1}^{m} x_{ki} x_{kj}
$$

Since $X$ is already mean-centered, we don't subtract means again.

---

### 📌 Properties
- $\Sigma$ is a **symmetric** matrix.
- Diagonal elements $\Sigma_{ii}$ represent the **variance** of feature $i$.
- Off-diagonal elements $\Sigma_{ij}$ represent **covariance** between features $i$ and $j$.
- $\Sigma$ is **positive semi-definite**.

---

### 🧠 Example

Suppose your centered dataset $X$ has 3 samples and 2 features:
$$
X = \begin{bmatrix}
-1 &  2 \\
 0 &  0 \\
 1 & -2
\end{bmatrix}
$$

Then:
$$
\Sigma = \frac{1}{3} X^T X = \frac{1}{3} \begin{bmatrix} -1 & 0 & 1 \\ 2 & 0 & -2 \end{bmatrix} \begin{bmatrix} -1 & 2 \\ 0 & 0 \\ 1 & -2 \end{bmatrix} = \begin{bmatrix} \text{Var}(x_1) & \text{Cov}(x_1, x_2) \\ \text{Cov}(x_2, x_1) & \text{Var}(x_2) \end{bmatrix}
$$

---

### ✅ Summary
- The covariance matrix captures all pairwise feature variances and covariances.
- PCA uses it to find directions (principal components) where variance is maximized.
- A foundational tool for understanding relationships between features.

### 3) Find the eigenvectors and values

The top-k components with the highest eigenvalues become our PCA components

### How to transform points?

![image.png](attachment:image.png)

In [2]:
import numpy as np
import pandas as pd

np.random.seed(23) 

mu_vec1 = np.array([0,0,0])
cov_mat1 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class1_sample = np.random.multivariate_normal(mu_vec1, cov_mat1, 20)

df = pd.DataFrame(class1_sample,columns=['feature1','feature2','feature3'])
df['target'] = 1

mu_vec2 = np.array([1,1,1])
cov_mat2 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class2_sample = np.random.multivariate_normal(mu_vec2, cov_mat2, 20)

df1 = pd.DataFrame(class2_sample,columns=['feature1','feature2','feature3'])

df1['target'] = 0

df = pd.concat([df,df1],ignore_index=True)

df = df.sample(40)

In [3]:
df.sample(3)

Unnamed: 0,feature1,feature2,feature3,target
25,0.290746,0.866975,0.982643,0
38,-0.764314,1.566504,1.548788,0
24,0.748855,2.593111,1.170818,0


In [6]:
%pip install plotly


Note: you may need to restart the kernel to use updated packages.


In [7]:
import plotly.express as px
#y_train_trf = y_train.astype(str)
fig = px.scatter_3d(df, x=df['feature1'], y=df['feature2'], z=df['feature3'],
              color=df['target'].astype('str'))
fig.update_traces(marker=dict(size=12,
                              line=dict(width=2,
                                        color='DarkSlateGrey')),
                  selector=dict(mode='markers'))

fig.show()

### Step 1: Apply standard scaling

In [8]:
from sklearn.preprocessing import StandardScaler

scaler=StandardScaler()

df.iloc[:,0:3]=scaler.fit_transform(df.iloc[:,0:3])

### Step 2: Find covariance matrix

In [11]:
covariance_matrix=np.cov([df.iloc[:,0], df.iloc[:,1], df.iloc[:,2]])
                         
print("Covariance Matrix: \n",covariance_matrix)

Covariance Matrix: 
 [[1.02564103 0.20478114 0.080118  ]
 [0.20478114 1.02564103 0.19838882]
 [0.080118   0.19838882 1.02564103]]


### Step 3: Finding eigenvectors and eigenvalues

In [12]:
eigen_values, eigen_vectors= np.linalg.eig(covariance_matrix)

In [13]:
eigen_values

array([1.3536065 , 0.94557084, 0.77774573])

In [14]:
eigen_vectors

array([[-0.53875915, -0.69363291,  0.47813384],
       [-0.65608325, -0.01057596, -0.75461442],
       [-0.52848211,  0.72025103,  0.44938304]])

In [15]:
pc=eigen_vectors[0:2]

In [16]:
pc

array([[-0.53875915, -0.69363291,  0.47813384],
       [-0.65608325, -0.01057596, -0.75461442]])

In [18]:
transformed_df = np.dot(df.iloc[:, 0:3].values, pc.T)

# (40,3) -> (3,2) = (40,2)

new_df=pd.DataFrame(transformed_df, columns=['PC1','PC2'])

new_df['target']=df['target'].values

new_df.sample(3)

Unnamed: 0,PC1,PC2,target
5,-1.320157,-0.257002,0
2,-0.271876,0.498222,1
9,0.227326,-2.669841,0


In [20]:
df.shape

(40, 4)

In [19]:
new_df.shape

(40, 3)