# ðŸ§  What is Covariance?

In simple terms, **covariance** is a statistical measure that describes the **directional relationship** between two random variables. It tells you whether two variables tend to move together or in opposite directions.

* **Positive Covariance:** Indicates that as one variable increases, the other variable tends to **increase**.
* **Negative Covariance:** Indicates that as one variable increases, the other variable tends to **decrease**.
* **Zero Covariance:** Indicates that there is no linear relationship between the two variables.

---

## ðŸ§® The Mathematics: Formulas

The formula for covariance differs slightly depending on whether you are calculating it for an entire **population** or for a **sample**.

Let $X$ and $Y$ be two random variables, with:
* $n$ = number of observations
* $x_i, y_i$ = the individual data points
* $\mu_X, \mu_Y$ = the population means of $X$ and $Y$
* $\bar{x}, \bar{y}$ = the sample means of $X$ and $Y$
* $E[...]$ = the Expected Value (the theoretical mean)

#### 1. Population Covariance

This is the theoretical covariance between two random variables, $X$ and $Y$.

$$
\mathrm{Cov}(X, Y) = E[ (X - \mu_X)(Y - \mu_Y) ]
$$

A common computational form of this formula is:

$$
\mathrm{Cov}(X, Y) = E[XY] - E[X]E[Y]
$$

#### 2. Sample Covariance

This is the formula you use when calculating covariance from a set of data (a sample).

$$
\mathrm{Cov}(x, y) = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{n - 1}
$$

> **Why divide by $n-1$?**
> This is known as **Bessel's correction**. It corrects the bias in the sample covariance, making it a better and more accurate estimator of the true population covariance, especially with smaller samples.

---

## ðŸ”‘ Key Properties of Covariance

Let $X$, $Y$, and $Z$ be random variables and $a, b$ be constants.

1.  **Symmetry:** The order of the variables does not matter.
    $$
    \mathrm{Cov}(X, Y) = \mathrm{Cov}(Y, X)
    $$

2.  **Relationship with Variance:** The covariance of a variable with itself is its **variance**.
    $$
    \mathrm{Cov}(X, X) = \mathrm{Var}(X)
    $$

3.  **Effect of Adding a Constant:** Adding a constant (a fixed number) to a variable does not change its covariance, because it doesn't change its "spread."
    $$
    \mathrm{Cov}(X + a, Y) = \mathrm{Cov}(X, Y)
    $$

4.  **Effect of Scaling (Multiplying by a Constant):** Multiplying a variable by a constant scales the covariance by that same constant.
    $$
    \mathrm{Cov}(aX, Y) = a \cdot \mathrm{Cov}(X, Y)
    $$

5.  **Bilinearity:** This combines the scaling and additivity properties.
    * $\mathrm{Cov}(aX + bY, Z) = a \cdot \mathrm{Cov}(X, Z) + b \cdot \mathrm{Cov}(Y, Z)$
    * This is fundamental to how covariance matrices are used in linear algebra.

6.  **Covariance of Sums:** The covariance of a sum of variables is the sum of all their individual covariances.
    $$
    \mathrm{Cov}(X + Y, Z) = \mathrm{Cov}(X, Z) + \mathrm{Cov}(Y, Z)
    $$
    Similarly, this allows us to find the variance of a sum:
    $$
    \mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) + 2 \cdot \mathrm{Cov}(X, Y)
    $$

7.  **Independence:** If two variables $X$ and $Y$ are **independent**, their covariance is **zero**.
    
    If $X, Y$ are independent, then $\mathrm{Cov}(X, Y) = 0$.

    > **Important:** The reverse is **not** always true. A covariance of zero only means there is no *linear* relationship. Two variables can have a strong non-linear relationship (like a parabola) and still have zero covariance.

---

## Covariance vs. Variance vs. Correlation

This table helps clarify the differences between these related concepts.

| Measure | What It Measures | Units | Range |
| :--- | :--- | :--- | :--- |
| **Variance** | The spread of a **single variable** around its mean. (How much does it vary?) | $\mathrm{Units}^2$ | $0$ to $+\infty$ |
| **Covariance** | The **directional relationship** between **two variables**. (Do they move together?) | $\mathrm{Units\ of\ } X \times \mathrm{Units\ of\ } Y$ | $-\infty$ to $+\infty$ |
| **Correlation** | The **strength and direction** of the linear relationship between **two variables**. | None (Standardized) | $-1$ to $+1$ |

**Key Limitation of Covariance:**
The main weakness of covariance is that its value is **not standardized**. A covariance of 100 might be very large for one dataset but tiny for another, depending on the units (e.g., dollars vs. cents). This makes it hard to compare the "strength" of relationships.

**How Correlation "Fixes" This:**
**Correlation** is simply the **normalized version of covariance**. You calculate it by dividing the covariance by the standard deviations of both variables.

$$
\mathrm{Corr}(X, Y) = \rho_{XY} = \frac{\mathrm{Cov}(X, Y)}{\sigma_X \sigma_Y}
$$

This standardization scales the value to a range of **-1 to +1**, allowing you to directly compare the *strength* of the linear relationship between different pairs of variables.

# ðŸ§® Steps to Calculate the Covariance Matrix

## 1. Define the Data Matrix

Suppose you have a data matrix $X$ with shape $n \times m$, where:
* $n$ = number of **observations (samples)**
* $m$ = number of **variables (features)**

**Example:**

$$
X =
\begin{bmatrix}
2.1 & 8.0 \\
2.5 & 12.0 \\
3.6 & 14.0 \\
4.0 & 10.0
\end{bmatrix}
$$

Here, we have:
* $n = 4$ samples
* $m = 2$ variables (letâ€™s call them $X_1$ and $X_2$)

---

## 2. Compute the Mean of Each Variable

First, calculate the mean (average) for each variable (column).

**Mean of $X_1$:**
$$
\bar{X}_1 = \frac{2.1 + 2.5 + 3.6 + 4.0}{4} = \frac{12.2}{4} = 3.05
$$

**Mean of $X_2$:**
$$
\bar{X}_2 = \frac{8 + 12 + 14 + 10}{4} = \frac{44}{4} = 11
$$

This gives us a **mean vector** $\bar{X}$:
$$
\bar{X} = [3.05, 11]
$$

---

## 3. Subtract the Mean (Mean-Centering)

Next, create a new matrix by subtracting the corresponding variable's mean from each value. This process is called **mean-centering** the data.

$$
X_c = X - \bar{X} =
\begin{bmatrix}
2.1 - 3.05 & 8 - 11 \\
2.5 - 3.05 & 12 - 11 \\
3.6 - 3.05 & 14 - 11 \\
4.0 - 3.05 & 10 - 11
\end{bmatrix}
=
\begin{bmatrix}
-0.95 & -3 \\
-0.55 & 1 \\
0.55 & 3 \\
0.95 & -1
\end{bmatrix}
$$

---

## 4. Compute the Covariance Matrix

The formula for the **sample covariance matrix** ($\Sigma$) is:

$$
\Sigma = \frac{1}{n - 1} (X_c)^T (X_c)
$$

Where $(X_c)^T$ is the transpose of the centered matrix.

**Step 4a: Transpose the centered matrix**

$$
(X_c)^T =
\begin{bmatrix}
-0.95 & -0.55 & 0.55 & 0.95 \\
-3 & 1 & 3 & -1
\end{bmatrix}
$$

**Step 4b: Multiply the transpose by the centered matrix**

$$
(X_c)^T (X_c) =
\begin{bmatrix}
-0.95 & -0.55 & 0.55 & 0.95 \\
-3 & 1 & 3 & -1
\end{bmatrix}
\begin{bmatrix}
-0.95 & -3 \\
-0.55 & 1 \\
0.55 & 3 \\
0.95 & -1
\end{bmatrix}
$$

Let's compute the four elements of the resulting matrix:

* **Top-left (Var $X_1$):**
    $(-0.95)^2 + (-0.55)^2 + (0.55)^2 + (0.95)^2 = 0.9025 + 0.3025 + 0.3025 + 0.9025 = 2.41$
* **Bottom-right (Var $X_2$):**
    $(-3)^2 + (1)^2 + (3)^2 + (-1)^2 = 9 + 1 + 9 + 1 = 20$
* **Top-right (Cov $X_1, X_2$):**
    $(-0.95)(-3) + (-0.55)(1) + (0.55)(3) + (0.95)(-1) = 2.85 - 0.55 + 1.65 - 0.95 = 3.0$
* **Bottom-left (Cov $X_2, X_1$):**
    $(-3)(-0.95) + (1)(-0.55) + (3)(0.55) + (-1)(0.95) = 2.85 - 0.55 + 1.65 - 0.95 = 3.0$

This gives the matrix of "sum of squares":
$$
(X_c)^T (X_c) =
\begin{bmatrix}
2.41 & 3.0 \\
3.0 & 20
\end{bmatrix}
$$

**Step 4c: Divide by $n - 1$**

We divide by $n - 1 = 4 - 1 = 3$ (this is the degrees of freedom for a *sample* covariance).

$$
\Sigma = \frac{1}{3}
\begin{bmatrix}
2.41 & 3.0 \\
3.0 & 20
\end{bmatrix}
=
\begin{bmatrix}
2.41 / 3 & 3.0 / 3 \\
3.0 / 3 & 20 / 3
\end{bmatrix}
=
\begin{bmatrix}
0.8033... & 1.0 \\
1.0 & 6.666...
\end{bmatrix}
$$

---

## âœ… Final Covariance Matrix

Rounding to four decimal places, the covariance matrix is:

$$
\Sigma =
\begin{bmatrix}
0.8033 & 1.0 \\
1.0 & 6.6667
\end{bmatrix}
$$

---

## ðŸ§  Interpretation

The covariance matrix $\Sigma = \begin{bmatrix} \sigma^2(X_1) & \text{Cov}(X_1, X_2) \\ \text{Cov}(X_2, X_1) & \sigma^2(X_2) \end{bmatrix}$ tells us:

* **Diagonal (Variances):**
    * **Variance of $X_1$:** $\sigma^2(X_1) \approx 0.8033$. This measures the spread of the first variable.
    * **Variance of $X_2$:** $\sigma^2(X_2) \approx 6.6667$. The second variable is much more spread out than the first.
* **Off-Diagonal (Covariance):**
    * **Covariance between $X_1$ and $X_2$:** $\text{Cov}(X_1, X_2) = 1.0$.
    * Since the covariance is **positive**, it indicates that as $X_1$ increases, $X_2$ tends to increase as well.

## Covariance Properties and Calculation With Python

## 1. Setup

First, let's import NumPy and create our sample data. We'll create three variables, $X$, $Y$, and $Z$, with 5 samples each. We'll also define our constants.

In [1]:
import numpy as np

# Define our variables (as 1D arrays)
X = np.array([1, 2, 4, 5, 6])  # e.g., Study Hours
Y = np.array([2, 3, 5, 4, 6])  # e.g., Test Score
Z = np.array([8, 6, 7, 4, 3])  # e.g., Game Hours

# Define constants
a = 5
b = 10

print(f"X: {X}")
print(f"Y: {Y}")
print(f"Z: {Z}")
print(f"a: {a}, b: {b}\n")

X: [1 2 4 5 6]
Y: [2 3 5 4 6]
Z: [8 6 7 4 3]
a: 5, b: 10



## 2. Baseline Calculations:

Instead of calculating covariances one by one, the NumPy way is to compute the full covariance matrix. We do this by stacking our variables into a single data matrix (with columns as variables) and passing it to ``np.cov()``.

**Note:**
- ``rowvar=False`` tells NumPy that our **columns** are variables (the standard).
- ``ddof=1`` tells NumPy to use the **sample** covariance formula (dividing by $n-1$).

In [5]:
# Stack X, Y, Z as columns in one data matrix
# (samples, variables) -> (5, 3)
data = np.stack([X, Y, Z], axis=1)

print("Data Matrix (samples=rows, variables=cols):\n", data)

# Calculate the 3x3 covariance matrix
cov_matrix = np.cov(data, rowvar=False, ddof=1)

print("\nFull Covariance Matrix:\n", np.round(cov_matrix, 4))


Data Matrix (samples=rows, variables=cols):
 [[1 2 8]
 [2 3 6]
 [4 5 7]
 [5 4 4]
 [6 6 3]]

Full Covariance Matrix:
 [[ 4.3   3.   -3.7 ]
 [ 3.    2.5  -2.25]
 [-3.7  -2.25  4.3 ]]


### 2.1. Let's grab our baseline values from this matrix.
Extract baseline values for our proofs

$$\mathbf{\Sigma} =

\begin{bmatrix}

\text{Var}(X) & \text{Cov}(X, Y) & \text{Cov}(X, Z) \\

\text{Cov}(X, Y) & \text{Var}(Y) & \text{Cov}(Y, Z) \\

\text{Cov}(X, Z) & \text{Cov}(Y, Z) & \text{Var}(Z)

\end{bmatrix}$$

In [7]:
var_X = cov_matrix[0, 0]
var_Y = cov_matrix[1, 1]
var_Z = cov_matrix[2, 2]

cov_XY = cov_matrix[0, 1]
cov_XZ = cov_matrix[0, 2]
cov_YZ = cov_matrix[1, 2]

print(f"Var(X): {var_X:.4f}")
print(f"Var(Y): {var_Y:.4f}")
print(f"Var(Z): {var_Z:.4f}")
print(f"Cov(X, Y): {cov_XY:.4f}")
print(f"Cov(X, Z): {cov_XZ:.4f}")
print(f"Cov(Y, Z): {cov_YZ:.4f}")

Var(X): 4.3000
Var(Y): 2.5000
Var(Z): 4.3000
Cov(X, Y): 3.0000
Cov(X, Z): -3.7000
Cov(Y, Z): -2.2500


## Property 1: SymmetryFormula:

**Formula:** $\text{Cov}(X, Y) = \text{Cov}(Y, X)$

**Explanation:** The covariance matrix is always symmetric. ``cov_matrix[i, j]`` is always equal to ``cov_matrix[j, i]``.

In [8]:
print("### Property 1: Symmetry ###")
# LHS is Cov(X, Y), RHS is Cov(Y, X)
lhs = cov_matrix[0, 1] # Cov(X, Y)
rhs = cov_matrix[1, 0] # Cov(Y, X)

print(f"  LHS (Cov(X, Y)): {lhs:.4f}")
print(f"  RHS (Cov(Y, X)): {rhs:.4f}")
print(f"  Property holds: {np.isclose(lhs, rhs)}")

### Property 1: Symmetry ###
  LHS (Cov(X, Y)): 3.0000
  RHS (Cov(Y, X)): 3.0000
  Property holds: True


## Property 2: Covariance with Itself

**Formula:** $\text{Cov}(X, X) = \text{Var}(X)$

**Explanation:** This is why the diagonal of the covariance matrix is the variance of each variable.

In [9]:
print("\n### Property 2: Cov(X, X) = Var(X) ###")
# LHS: Cov(X, X) from the matrix diagonal
lhs = cov_matrix[0, 0]

# RHS: Var(X) calculated separately (or just looked up)
rhs = np.var(X, ddof=1)

print(f"  LHS (Cov(X, X)): {lhs:.4f}")
print(f"  RHS (Var(X)):    {rhs:.4f}")
print(f"  Property holds: {np.isclose(lhs, rhs)}")


### Property 2: Cov(X, X) = Var(X) ###
  LHS (Cov(X, X)): 4.3000
  RHS (Var(X)):    4.3000
  Property holds: True


## Property 3: Effect of Adding a Constant (Shift)

**Formula:** $\text{Cov}(X + a, Y) = \text{Cov}(X, Y)$

**Explanation:** Shifting the data (adding a constant) changes its mean but not its spread (variance) or its relationship with other variables (covariance).

In [11]:
print(f"\n### Property 3: Cov(X + {a}, Y) = Cov(X, Y) ###")
# LHS: Calculate covariance with the new, shifted variable
X_plus_a = X + a
# We must re-calculate covariance for this new variable
lhs = np.cov(X_plus_a, Y, ddof=1)[0, 1]

# RHS: Our original, baseline Cov(X, Y)
rhs = cov_XY

print(f"  New Var (X + {a}): {X_plus_a}")
print(f"  LHS (Cov(X + {a}, Y)): {lhs:.4f}")
print(f"  RHS (Cov(X, Y)):   {rhs:.4f}")
print(f"  Property holds: {np.isclose(lhs, rhs)}")


### Property 3: Cov(X + 5, Y) = Cov(X, Y) ###
  New Var (X + 5): [ 6  7  9 10 11]
  LHS (Cov(X + 5, Y)): 3.0000
  RHS (Cov(X, Y)):   3.0000
  Property holds: True


## Property 4: Effect of ScalingFormula: 

**Formula:** $\text{Cov}(aX, Y) = a \cdot \text{Cov}(X, Y)$

**Explanation:** Scaling a variable by $a$ directly scales its covariance with other variables by $a$.

In [12]:
print(f"\n### Property 4: Cov({a}*X, Y) = {a}*Cov(X, Y) ###")
# LHS: Calculate covariance with the new, scaled variable
aX = a * X
lhs = np.cov(aX, Y, ddof=1)[0, 1]

# RHS: Our original Cov(X, Y), scaled by the constant
rhs = a * cov_XY

print(f"  New Var ({a}*X): {aX}")
print(f"  LHS (Cov({a}*X, Y)): {lhs:.4f}")
print(f"  RHS ({a} * Cov(X, Y)): {rhs:.4f}")
print(f"  Property holds: {np.isclose(lhs, rhs)}")


### Property 4: Cov(5*X, Y) = 5*Cov(X, Y) ###
  New Var (5*X): [ 5 10 20 25 30]
  LHS (Cov(5*X, Y)): 15.0000
  RHS (5 * Cov(X, Y)): 15.0000
  Property holds: True


## Property 5: Bilinearity (Distributive)

**Formula:** $\text{Cov}(aX + bY, Z) = a \cdot \text{Cov}(X, Z) + b \cdot \text{Cov}(Y, Z)$

**Explanation:** This shows we can "distribute" the covariance operation over linear combinations. This is a very powerful property used in portfolio math.

In [13]:
print(f"\n### Property 5: Bilinearity ###")
print(f"  Test: Cov({a}X + {b}Y, Z) = {a}*Cov(X,Z) + {b}*Cov(Y,Z)")

# LHS: Create the new combined variable and find its covariance with Z
new_var = a * X + b * Y
lhs = np.cov(new_var, Z, ddof=1)[0, 1]

# RHS: Use our baseline values and combine them
rhs = a * cov_XZ + b * cov_YZ

print(f"  New Var ({a}X + {b}Y): {new_var}")
print(f"  LHS (Cov(new, Z)): {lhs:.4f}")
print(f"  RHS (a*Cov(X,Z) + b*Cov(Y,Z)): {rhs:.4f}")
print(f"  Property holds: {np.isclose(lhs, rhs)}")


### Property 5: Bilinearity ###
  Test: Cov(5X + 10Y, Z) = 5*Cov(X,Z) + 10*Cov(Y,Z)
  New Var (5X + 10Y): [25 40 70 65 90]
  LHS (Cov(new, Z)): -41.0000
  RHS (a*Cov(X,Z) + b*Cov(Y,Z)): -41.0000
  Property holds: True


## Property 6: Variance of a Sum

**Formula:** $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2 \cdot \text{Cov}(X, Y)$

**Explanation:** This is a crucial property. The variance of a sum is **not** just the sum of the variances. You must account for the covariance between them.

In [14]:
print("\n### Property 6: Variance of a Sum ###")
print("  Test: Var(X + Y) = Var(X) + Var(Y) + 2*Cov(X, Y)")

# LHS: Create the new variable (X + Y) and find its variance
X_plus_Y = X + Y
lhs = np.var(X_plus_Y, ddof=1)

# RHS: Combine our baseline components
rhs = var_X + var_Y + 2 * cov_XY

print(f"  New Var (X + Y): {X_plus_Y}")
print(f"  LHS (Var(X + Y)): {lhs:.4f}")
print(f"  RHS (Var(X)+Var(Y)+2*Cov(X,Y)): {rhs:.4f}")
print(f"  Property holds: {np.isclose(lhs, rhs)}")


### Property 6: Variance of a Sum ###
  Test: Var(X + Y) = Var(X) + Var(Y) + 2*Cov(X, Y)
  New Var (X + Y): [ 3  5  9  9 12]
  LHS (Var(X + Y)): 12.8000
  RHS (Var(X)+Var(Y)+2*Cov(X,Y)): 12.8000
  Property holds: True
