
# Covariance and Correlation Matrices

## 1. Covariance

### Definition
Covariance measures how two variables change together:
- Positive: both increase together
- Negative: one increases while the other decreases
- Near zero: no linear relation

### Formula (Two variables X and Y)
$$
\text{Cov}(X,Y) = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})
$$

### Covariance Matrix (for p variables)
$$
\Sigma =
\begin{bmatrix}
\text{Var}(X_1) & \text{Cov}(X_1, X_2) & \cdots & \text{Cov}(X_1, X_p) \\
\text{Cov}(X_2, X_1) & \text{Var}(X_2) & \cdots & \text{Cov}(X_2, X_p) \\
\vdots & \vdots & \ddots & \vdots \\
\text{Cov}(X_p, X_1) & \text{Cov}(X_p, X_2) & \cdots & \text{Var}(X_p)
\end{bmatrix}
$$

---

## 2. Correlation

### Definition
Correlation standardizes covariance to be in the range **-1 to 1**, measuring the strength of the linear relationship.

### Formula (Two variables X and Y)
$$
\rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}
$$

Where:
- $\sigma_X$ = standard deviation of X
- $\sigma_Y$ = standard deviation of Y

### Correlation Matrix (for p variables)
$$
R =
\begin{bmatrix}
1 & \rho_{12} & \cdots & \rho_{1p} \\
\rho_{21} & 1 & \cdots & \rho_{2p} \\
\vdots & \vdots & \ddots & \vdots \\
\rho_{p1} & \rho_{p2} & \cdots & 1
\end{bmatrix}
$$

---

## 3. Relationship between Covariance and Correlation

$$
\rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)} \sqrt{\text{Var}(Y)}}
$$

---

## 4. Example

For 3 variables: Height, Weight, Age

- Covariance Matrix:
$$
\Sigma =
\begin{bmatrix}
\text{Var(Height)} & \text{Cov(Height, Weight)} & \text{Cov(Height, Age)} \\
\text{Cov(Weight, Height)} & \text{Var(Weight)} & \text{Cov(Weight, Age)} \\
\text{Cov(Age, Height)} & \text{Cov(Age, Weight)} & \text{Var(Age)}
\end{bmatrix}
$$

- Correlation Matrix:
$$
R =
\begin{bmatrix}
1 & \rho_{\text{Height, Weight}} & \rho_{\text{Height, Age}} \\
\rho_{\text{Weight, Height}} & 1 & \rho_{\text{Weight, Age}} \\
\rho_{\text{Age, Height}} & \rho_{\text{Age, Weight}} & 1
\end{bmatrix}
$$


In [16]:
import numpy as np
import pandas as pd

data={
    "Height": [160, 165, 170, 175, 180],
    "Weight": [55, 65, 72, 80, 85],
    "Age":    [25, 30, 35, 40, 45]
}

df=pd.DataFrame(data)
X=df.values
n,p=X.shape


means=np.mean(X,axis=0)

X_centred=X-means

cov_matrix=(X_centred.T @ X_centred )/(n-1)

std_devs=np.sqrt(np.diag(cov_matrix))
corr_matrix=cov_matrix/np.outer(std_devs,std_devs)

corr_matrix

array([[1.        , 0.99409712, 1.        ],
       [0.99409712, 1.        , 0.99409712],
       [1.        , 0.99409712, 1.        ]])