<a href="https://colab.research.google.com/github/BoomKanteng/Covariance-Analysis/blob/main/Covariance_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 🚀 (A) About This Note Book

- 🎯 The objective of this notebook is to create a tool for the satisfied group project in `MTH234 Linear Algebra`.
- 🧑🏻‍💻 Python Implemtation by [CHOKUN](https://github.com/ChotanansubSoph) & [OnlyJust3rd](https://github.com/OnlyJust3rd)

## ⚙️ (B) Setup and Dependencies

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
#๊ Sample Data >> Replace x, y with the actual dataset used in the project

x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 4, 3, 2, 1])

##  ☀️ (C) Implementation

### 📝 1. Mean ($µ$) and variance ($𝜎$):

The sample variance, $σ^2_{x}$, measures the spread of a dataset by computing the squared differences between each data point ($X_{i}$) and the mean ($µ$). Scaled by $(n-1)$, where $n$ is the number of observations, it provides insight into the variability of individual data points from the dataset's overall mean. The formula is expressed as:

$ σ^2_{x} = \frac{1}{n-1}\sum_{i=1}^{n} (X_{i}-µ)^2 \tag{Eq. 1} $

---



According to ["Statistics-and-PCA, MIT"](https://web.mit.edu/18.06/www/Spring17/Statistics-and-PCA.pdf), if we define vector $O$ = [1,1,...1] with $O^TO = n$ then we can define mean($µ$) following this expression:

$$ µ = \frac{O^Tx}{O^TO} \tag{Eq. 2}$$

---

which is simply the projection of x onto o. And the sample variance is

$$σ^2_{x} = \frac{||x-µO||^2}{n-1} = \frac{||(I-\frac{OO^T}{O^TO})x||}{n-1} \tag{Eq. 3}$$

---

Define $P$ is the projection operator from above that subtracts the mean from a vector (i.e. it
projects vectors onto the subspace of vectors with zero mean). as show in this espression: $$P=I-\frac{oo^T}{o^To} \tag{Eq.4}$$

---



Covariance is calculated using the following mathematical expression:


$$ \sigma(x, y) = E[(x - E(x))(y - E(y)]\  \tag{Eq. 5-1}$$
$$ \sigma(x, y) = \frac{(Px)^T(Py)}{n-1} = \frac{x^TPy}{n-1} \tag{Eq. 5-2}$$

---






In [None]:
#Implementation (Eq.4)
def cal_projection(data: np.array) -> np.array:
  n = len(data)
  return np.eye(n) - np.outer(np.ones(n), np.ones(n)) / n

#Implementation (Eq.5-2)
def cal_covariance(x : np.array, y: np.array) -> np.array:
    n = len(x)
    P = cal_projection(x)
    cov = np.dot(np.dot(x, P), y) / (n - 1)
    return cov

In [None]:
print(f"X data : {x}")
print(f"Y data : {y}")

print("\n=== Our Implementation Method ===")
print("Covariance(x,y) : ",cal_covariance(x,y))

print("\n=== Compare with Numpy ===")
print("(Realible Standard Library)")
print("\nCovariance(x,y) : ", np.cov(x, y)[0,1])

X data : [1 2 3 4 5]
Y data : [5 4 3 2 1]

=== Our Implementation Method ===
Covariance(x,y) :  -2.5

=== Compare with Numpy ===
(Realible Standard Library)

Covariance(x,y) :  -2.5


### 📝 2. Covariance Matrix ($Σ$):
The covariance matrix is denoted as Σ and expressed as:

$$
\Sigma = \begin{bmatrix}
\sigma(x, x) & \sigma(x, y) \\
\sigma(y, x) & \sigma(y, y)
\end{bmatrix}
\tag{Eq. 6}
$$

---
The covariance matrix exhibits `symmetry` , meaning that the covariances are mirrored across the main diagonal. Mathematically, this is expressed as:

$$σ(x, y) = σ(y, x) \tag{Eq. 7} $$

---


In [None]:
#Implementation (Eq.6)
def cal_cov_matrix(x: np.array, y: np.array) -> np.array:
  cov_matrix = [
      [cal_covariance(x,x),cal_covariance(x,y)],
      [cal_covariance(y,x),cal_covariance(y,y)]
  ]
  return np.array(cov_matrix)

In [None]:
print("=== Sample Data ===")
print(f"X data : {x}")
print(f"Y data : {y}")

print("\n=== Our Implementation Method ===")
sample_cov_matrix = cal_cov_matrix(x,y)
print(sample_cov_matrix)

print("\n=== Compare with Numpy ===")
print("(Realible Standard Library)")
print("\n",np.cov(np.column_stack((x, y)), rowvar=False))

=== Sample Data ===
X data : [1 2 3 4 5]
Y data : [5 4 3 2 1]

=== Our Implementation Method ===
[[ 2.5 -2.5]
 [-2.5  2.5]]

=== Compare with Numpy ===
(Realible Standard Library)

 [[ 2.5 -2.5]
 [-2.5  2.5]]


### 📝 3. Eigenvalues ($λ$) and Eigenvector ($\vec{V}$)


Eigenvalues, denoted as $λ$, and eigenvectors, represented by $\vec{V}$, are fundamental concepts in linear algebra. The relationship between a covariance matrix $\Sigma$, an eigenvector $\vec{V}$, and its corresponding eigenvalue $λ$

is expressed by the equation:

$\Sigma\vec{V} = λ\vec{V} \tag{Eq. 8} $

---




#### ✏️ 3.1 Calculate Eigenvalues ($λ$)


To determine the eigenvalues of a covariance matrix, we utilize the characteristic equation given by the expression:

$$ \text{det}(\Sigma - \lambda I) = 0 \tag{Eq. 9} $$

---

The characteristic equation for a 2x2 matrix, as exemplified by

$  \text{det}\left(\begin{bmatrix} a-\lambda & b \\ c & d-\lambda \end{bmatrix}\right) = 0  \tag{Eq. 10} $

---

Given the quadratic equation:


$$ (a-λ)(d-λ)-bc = 0 $$
$  \lambda^2 - (a + d)\lambda + (ad - bc) = 0  \tag{Eq. 11}$

---

where $\lambda$ is the eigenvalue, and $a$, $d$, $b$ and $c$ are coefficients of the 2x2 matrix, the solutions for $\lambda$  are given by the quadratic formula:

$$  \lambda_1 = \frac{{(a + d) + \sqrt{(a + d)^2 - 4(ad-bc)}}}{2}  \tag{Eq. 12-1} $$

$$  \lambda_2 = \frac{{(a + d) - \sqrt{(a + d)^2 - 4(ad-bc)}}}{2}  \tag{Eq. 12-2} $$

---

In [None]:
def cal_eigenvalue(cov_matrix : np.array) -> np.array:
    a, b = cov_matrix[0][0], cov_matrix[0][1]
    c, d = cov_matrix[1][0], cov_matrix[1][1]

    # Calculate the discriminant in quadratic equation (Eq.12-2)
    delta = np.sqrt((a + d)**2 - 4 * (a * d - b * c))

    # Calculate the two solutions for lambda (eigenvalues)
    eigenvalue_1 = (a + d + delta) / 2 # (Eq.12-1)
    eigenvalue_2 = (a + d - delta) / 2 # (Eq.12-1)

    return np.array([eigenvalue_1, eigenvalue_2])

sample_cov_matrix_2 = np.array([[4,1],[-2,1]])
sample_eigenvalues = cal_eigenvalue(sample_cov_matrix_2)

print("Covariance Matrix:")
print(sample_cov_matrix_2,"\n")

print("\n=== Our Implementation Method ===")
print("Eigenvalue:", sample_eigenvalues)

print("\n=== Compare with Numpy ===")
print("(Realible Standard Library)")
print("Eigenvalue:",np.linalg.eig(sample_cov_matrix_2)[0])

Covariance Matrix:
[[ 4  1]
 [-2  1]] 


=== Our Implementation Method ===
Eigenvalue: [3. 2.]

=== Compare with Numpy ===
(Realible Standard Library)
Eigenvalue: [3. 2.]


#### ✏️ 3.2 Calculate Eigenvector ($\vec{V}$)

to find the corresponding Eigenvector $\vec{V}$ substitute it back into the equation:

$(Σ - \lambda I) \mathbf{v} = 0 \tag{Eq.13}$



---


The eigenvector
$vec{V}$ is then normalized using:

$\mathbf{\vec{V}}^* = \frac{\mathbf{\vec{V}}}{\|\mathbf{\vec{V}}\|} \tag{Eq.14}$

---


In [None]:
def calculate_eigenvectors(cov_matrix : np.array, eigenvalues : np.array) -> np.array:
    # Ensure that the input matrices are NumPy arrays
    cov_matrix = np.array(cov_matrix)
    eigenvalues = np.array(eigenvalues)

    # Calculate eigenvectors
    v1 = np.array([1, (eigenvalues[0] - cov_matrix[0, 0]) / cov_matrix[1, 0]], dtype=float)
    v2 = np.array([1, (eigenvalues[1] - cov_matrix[0, 0]) / cov_matrix[1, 0]], dtype=float)

    # Normalize the eigenvectors (Eq.14)
    v1 /= np.linalg.norm(v1)
    v2 /= np.linalg.norm(v2)

    # Combine eigenvectors into a matrix
    eigenvectors = np.column_stack((v1, v2))

    return eigenvectors


print("Covariance Matrix:")
print(sample_cov_matrix_2)

# Example usage
result = calculate_eigenvectors(sample_cov_matrix_2, sample_eigenvalues)
print("\n=== Our Implementation Method ===")
print("Eigenvectors:")
print(result)

# Compare with np.linalg.eig
_, eigenvectors_np = np.linalg.eig(sample_cov_matrix_2)
print("\n=== Compare with Numpy ===")
print("(Realible Standard Library)")
print("Eigenvectors:")
print(eigenvectors_np)

Covariance Matrix:
[[ 4  1]
 [-2  1]]

=== Our Implementation Method ===
Eigenvectors:
[[0.89442719 0.70710678]
 [0.4472136  0.70710678]]

=== Compare with Numpy ===
(Realible Standard Library)
Eigenvectors:
[[ 0.70710678 -0.4472136 ]
 [-0.70710678  0.89442719]]


# 📚 (D) Reference

-  Vincent Spruyt, ”A geometric interpretation of the covariance matrix”, https://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/ (Access on December 2023)
- MIT (Massachusetts Institute of Technology). (2017, September 7). Statistics and PCA. Retrieved from https://web.mit.edu/18.06/www/Spring17/Statistics-and-PCA.pdf (Access on December 2023)