# ðŸ§  What is Covariance?

In simple terms, **covariance** is a statistical measure that describes the **directional relationship** between two random variables. It tells you whether two variables tend to move together or in opposite directions.

* **Positive Covariance:** Indicates that as one variable increases, the other variable tends to **increase**.
* **Negative Covariance:** Indicates that as one variable increases, the other variable tends to **decrease**.
* **Zero Covariance:** Indicates that there is no linear relationship between the two variables.

---

## ðŸ§® The Mathematics: Formulas

The formula for covariance differs slightly depending on whether you are calculating it for an entire **population** or for a **sample**.

Let $X$ and $Y$ be two random variables, with:
* $n$ = number of observations
* $x_i, y_i$ = the individual data points
* $\mu_X, \mu_Y$ = the population means of $X$ and $Y$
* $\bar{x}, \bar{y}$ = the sample means of $X$ and $Y$
* $E[...]$ = the Expected Value (the theoretical mean)

#### 1. Population Covariance

This is the theoretical covariance between two random variables, $X$ and $Y$.

$$
\mathrm{Cov}(X, Y) = E[ (X - \mu_X)(Y - \mu_Y) ]
$$

A common computational form of this formula is:

$$
\mathrm{Cov}(X, Y) = E[XY] - E[X]E[Y]
$$

#### 2. Sample Covariance

This is the formula you use when calculating covariance from a set of data (a sample).

$$
\mathrm{Cov}(x, y) = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{n - 1}
$$

> **Why divide by $n-1$?**
> This is known as **Bessel's correction**. It corrects the bias in the sample covariance, making it a better and more accurate estimator of the true population covariance, especially with smaller samples.

---

## ðŸ”‘ Key Properties of Covariance

Let $X$, $Y$, and $Z$ be random variables and $a, b$ be constants.

1.  **Symmetry:** The order of the variables does not matter.
    $$
    \mathrm{Cov}(X, Y) = \mathrm{Cov}(Y, X)
    $$

2.  **Relationship with Variance:** The covariance of a variable with itself is its **variance**.
    $$
    \mathrm{Cov}(X, X) = \mathrm{Var}(X)
    $$

3.  **Effect of Adding a Constant:** Adding a constant (a fixed number) to a variable does not change its covariance, because it doesn't change its "spread."
    $$
    \mathrm{Cov}(X + a, Y) = \mathrm{Cov}(X, Y)
    $$

4.  **Effect of Scaling (Multiplying by a Constant):** Multiplying a variable by a constant scales the covariance by that same constant.
    $$
    \mathrm{Cov}(aX, Y) = a \cdot \mathrm{Cov}(X, Y)
    $$

5.  **Bilinearity:** This combines the scaling and additivity properties.
    * $\mathrm{Cov}(aX + bY, Z) = a \cdot \mathrm{Cov}(X, Z) + b \cdot \mathrm{Cov}(Y, Z)$
    * This is fundamental to how covariance matrices are used in linear algebra.

6.  **Covariance of Sums:** The covariance of a sum of variables is the sum of all their individual covariances.
    $$
    \mathrm{Cov}(X + Y, Z) = \mathrm{Cov}(X, Z) + \mathrm{Cov}(Y, Z)
    $$
    Similarly, this allows us to find the variance of a sum:
    $$
    \mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) + 2 \cdot \mathrm{Cov}(X, Y)
    $$

7.  **Independence:** If two variables $X$ and $Y$ are **independent**, their covariance is **zero**.
    
    If $X, Y$ are independent, then $\mathrm{Cov}(X, Y) = 0$.

    > **Important:** The reverse is **not** always true. A covariance of zero only means there is no *linear* relationship. Two variables can have a strong non-linear relationship (like a parabola) and still have zero covariance.

---

## Covariance vs. Variance vs. Correlation

This table helps clarify the differences between these related concepts.

| Measure | What It Measures | Units | Range |
| :--- | :--- | :--- | :--- |
| **Variance** | The spread of a **single variable** around its mean. (How much does it vary?) | $\mathrm{Units}^2$ | $0$ to $+\infty$ |
| **Covariance** | The **directional relationship** between **two variables**. (Do they move together?) | $\mathrm{Units\ of\ } X \times \mathrm{Units\ of\ } Y$ | $-\infty$ to $+\infty$ |
| **Correlation** | The **strength and direction** of the linear relationship between **two variables**. | None (Standardized) | $-1$ to $+1$ |

**Key Limitation of Covariance:**
The main weakness of covariance is that its value is **not standardized**. A covariance of 100 might be very large for one dataset but tiny for another, depending on the units (e.g., dollars vs. cents). This makes it hard to compare the "strength" of relationships.

**How Correlation "Fixes" This:**
**Correlation** is simply the **normalized version of covariance**. You calculate it by dividing the covariance by the standard deviations of both variables.

$$
\mathrm{Corr}(X, Y) = \rho_{XY} = \frac{\mathrm{Cov}(X, Y)}{\sigma_X \sigma_Y}
$$

This standardization scales the value to a range of **-1 to +1**, allowing you to directly compare the *strength* of the linear relationship between different pairs of variables.

# ðŸ§® Steps to Calculate the Covariance Matrix

## 1. Define the Data Matrix

Suppose you have a data matrix $X$ with shape $n \times m$, where:
* $n$ = number of **observations (samples)**
* $m$ = number of **variables (features)**

**Example:**

$$
X =
\begin{bmatrix}
2.1 & 8.0 \\
2.5 & 12.0 \\
3.6 & 14.0 \\
4.0 & 10.0
\end{bmatrix}
$$

Here, we have:
* $n = 4$ samples
* $m = 2$ variables (letâ€™s call them $X_1$ and $X_2$)

---

## 2. Compute the Mean of Each Variable

First, calculate the mean (average) for each variable (column).

**Mean of $X_1$:**
$$
\bar{X}_1 = \frac{2.1 + 2.5 + 3.6 + 4.0}{4} = \frac{12.2}{4} = 3.05
$$

**Mean of $X_2$:**
$$
\bar{X}_2 = \frac{8 + 12 + 14 + 10}{4} = \frac{44}{4} = 11
$$

This gives us a **mean vector** $\bar{X}$:
$$
\bar{X} = [3.05, 11]
$$

---

## 3. Subtract the Mean (Mean-Centering)

Next, create a new matrix by subtracting the corresponding variable's mean from each value. This process is called **mean-centering** the data.

$$
X_c = X - \bar{X} =
\begin{bmatrix}
2.1 - 3.05 & 8 - 11 \\
2.5 - 3.05 & 12 - 11 \\
3.6 - 3.05 & 14 - 11 \\
4.0 - 3.05 & 10 - 11
\end{bmatrix}
=
\begin{bmatrix}
-0.95 & -3 \\
-0.55 & 1 \\
0.55 & 3 \\
0.95 & -1
\end{bmatrix}
$$

---

## 4. Compute the Covariance Matrix

The formula for the **sample covariance matrix** ($\Sigma$) is:

$$
\Sigma = \frac{1}{n - 1} (X_c)^T (X_c)
$$

Where $(X_c)^T$ is the transpose of the centered matrix.

**Step 4a: Transpose the centered matrix**

$$
(X_c)^T =
\begin{bmatrix}
-0.95 & -0.55 & 0.55 & 0.95 \\
-3 & 1 & 3 & -1
\end{bmatrix}
$$

**Step 4b: Multiply the transpose by the centered matrix**

$$
(X_c)^T (X_c) =
\begin{bmatrix}
-0.95 & -0.55 & 0.55 & 0.95 \\
-3 & 1 & 3 & -1
\end{bmatrix}
\begin{bmatrix}
-0.95 & -3 \\
-0.55 & 1 \\
0.55 & 3 \\
0.95 & -1
\end{bmatrix}
$$

Let's compute the four elements of the resulting matrix:

* **Top-left (Var $X_1$):**
    $(-0.95)^2 + (-0.55)^2 + (0.55)^2 + (0.95)^2 = 0.9025 + 0.3025 + 0.3025 + 0.9025 = 2.41$
* **Bottom-right (Var $X_2$):**
    $(-3)^2 + (1)^2 + (3)^2 + (-1)^2 = 9 + 1 + 9 + 1 = 20$
* **Top-right (Cov $X_1, X_2$):**
    $(-0.95)(-3) + (-0.55)(1) + (0.55)(3) + (0.95)(-1) = 2.85 - 0.55 + 1.65 - 0.95 = 3.0$
* **Bottom-left (Cov $X_2, X_1$):**
    $(-3)(-0.95) + (1)(-0.55) + (3)(0.55) + (-1)(0.95) = 2.85 - 0.55 + 1.65 - 0.95 = 3.0$

This gives the matrix of "sum of squares":
$$
(X_c)^T (X_c) =
\begin{bmatrix}
2.41 & 3.0 \\
3.0 & 20
\end{bmatrix}
$$

**Step 4c: Divide by $n - 1$**

We divide by $n - 1 = 4 - 1 = 3$ (this is the degrees of freedom for a *sample* covariance).

$$
\Sigma = \frac{1}{3}
\begin{bmatrix}
2.41 & 3.0 \\
3.0 & 20
\end{bmatrix}
=
\begin{bmatrix}
2.41 / 3 & 3.0 / 3 \\
3.0 / 3 & 20 / 3
\end{bmatrix}
=
\begin{bmatrix}
0.8033... & 1.0 \\
1.0 & 6.666...
\end{bmatrix}
$$

---

## âœ… Final Covariance Matrix

Rounding to four decimal places, the covariance matrix is:

$$
\Sigma =
\begin{bmatrix}
0.8033 & 1.0 \\
1.0 & 6.6667
\end{bmatrix}
$$

---

## ðŸ§  Interpretation

The covariance matrix $\Sigma = \begin{bmatrix} \sigma^2(X_1) & \text{Cov}(X_1, X_2) \\ \text{Cov}(X_2, X_1) & \sigma^2(X_2) \end{bmatrix}$ tells us:

* **Diagonal (Variances):**
    * **Variance of $X_1$:** $\sigma^2(X_1) \approx 0.8033$. This measures the spread of the first variable.
    * **Variance of $X_2$:** $\sigma^2(X_2) \approx 6.6667$. The second variable is much more spread out than the first.
* **Off-Diagonal (Covariance):**
    * **Covariance between $X_1$ and $X_2$:** $\text{Cov}(X_1, X_2) = 1.0$.
    * Since the covariance is **positive**, it indicates that as $X_1$ increases, $X_2$ tends to increase as well.

## Covariance Properties and Calculation With Python

In [5]:
import numpy as np

# --- 1. Setup Specific Data ---
# Define specific, non-random data
# Let's use the matrix from the original example, plus a third variable
data = np.array([
    [2.1, 8.0, 1.0],
    [2.5, 12.0, 3.0],
    [3.6, 14.0, 2.0],
    [4.0, 10.0, 4.0]
])

# Extract variables (columns)
X1 = data[:, 0]
X2 = data[:, 1]
X3 = data[:, 2]

# Define constants
a = 5
b = 10

# --- Helper Function ---
def cov(var1, var2):
    """Calculates the sample covariance between two variables."""
    # np.cov expects rows to be variables, so we stack them
    # ddof=1 ensures we use the (n-1) sample covariance formula
    return np.cov(var1, var2, ddof=1)[0, 1]

def var(var1):
    """Calculates the sample variance."""
    # ddof=1 ensures we use the (n-1) sample variance formula
    return np.var(var1, ddof=1)

print("--- 1. Initial Data Setup ---")
print(f"Number of samples (n): {data.shape[0]}")
print(f"Number of variables: {data.shape[1]}")
print(f"\nInitial Data Matrix:\n{data}")
print(f"\nVariable X1: {X1}")
print(f"Variable X2: {X2}")
print(f"Variable X3: {X3}")
print(f"\nConstant (a): {a}")
print(f"Constant (b): {b}")

# Calculate initial variances and covariances
var_x1 = var(X1)
var_x2 = var(X2)
var_x3 = var(X3)
cov_x1_x2 = cov(X1, X2)
cov_x1_x3 = cov(X1, X3)
cov_x2_x3 = cov(X2, X3)

print(f"\nInitial Variances:")
print(f"  Var(X1): {var_x1:.4f}")
print(f"  Var(X2): {var_x2:.4f}")
print(f"  Var(X3): {var_x3:.4f}")
print(f"\nInitial Covariances:")
print(f"  Cov(X1, X2): {cov_x1_x2:.4f}")
print(f"  Cov(X1, X3): {cov_x1_x3:.4f}")
print(f"  Cov(X2, X3): {cov_x2_x3:.4f}")
print("=" * 40 + "\n")


# --- 2. Prove Properties (Detailed) ---

print("### Property 1: Symmetry ###")
print("  Testing: Cov(X1, X2) = Cov(X2, X1)")
cov_x2_x1 = cov(X2, X1)
print(f"  LHS (Cov(X1, X2)): {cov_x1_x2:.4f}")
print(f"  RHS (Cov(X2, X1)): {cov_x2_x1:.4f}")
print(f"  Property holds: {np.isclose(cov_x1_x2, cov_x2_x1)}")
print("-" * 40 + "\n")


print("### Property 2: Covariance with Itself ###")
print("  Testing: Cov(X1, X1) = Var(X1)")
cov_x1_x1 = cov(X1, X1)
print(f"  LHS (Cov(X1, X1)): {cov_x1_x1:.4f}")
print(f"  RHS (Var(X1)):    {var_x1:.4f}")
print(f"  Property holds: {np.isclose(cov_x1_x1, var_x1)}")
print("-" * 40 + "\n")


print("### Property 3: Effect of Adding a Constant ###")
print(f"  Testing: Cov(X1 + {a}, X2) = Cov(X1, X2)")
print(f"  New variable (X1 + {a}): {X1 + a}")
cov_x1a_x2 = cov(X1 + a, X2)
print(f"  LHS (Cov(X1 + {a}, X2)): {cov_x1a_x2:.4f}")
print(f"  RHS (Cov(X1, X2)):      {cov_x1_x2:.4f}")
print(f"  Property holds: {np.isclose(cov_x1a_x2, cov_x1_x2)}")
print("-" * 40 + "\n")


print("### Property 4: Effect of Scaling ###")
print(f"  Testing: Cov({a}*X1, X2) = {a} * Cov(X1, X2)")
print(f"  New variable ({a}*X1): {[round(x, 2) for x in (a*X1)]}")
# Left side
cov_ax1_x2 = cov(a * X1, X2)
print(f"  LHS (Cov({a}*X1, X2)): {cov_ax1_x2:.4f}")
# Right side
a_cov_x1_x2 = a * cov_x1_x2
print(f"  RHS ({a} * Cov(X1, X2)): {a} * {cov_x1_x2:.4f} = {a_cov_x1_x2:.4f}")
print(f"  Property holds: {np.isclose(cov_ax1_x2, a_cov_x1_x2)}")
print("-" * 40 + "\n")


print("### Property 5: Bilinearity ###")
print(f"  Testing: Cov({a}*X1 + {b}*X2, X3) = {a}*Cov(X1, X3) + {b}*Cov(X2, X3)")
# Left side
new_var_ab = a * X1 + b * X2
print(f"  New variable ({a}*X1 + {b}*X2): {new_var_ab}")
left_side = cov(new_var_ab, X3)
print(f"  LHS (Cov({a}*X1 + {b}*X2, X3)): {left_side:.4f}")
# Right side
right_side = a * cov_x1_x3 + b * cov_x2_x3
print(f"  RHS ({a}*Cov(X1, X3) + {b}*Cov(X2, X3)):")
print(f"    = ({a} * {cov_x1_x3:.4f}) + ({b} * {cov_x2_x3:.4f})")
print(f"    = ({(a * cov_x1_x3):.4f}) + ({(b * cov_x2_x3):.4f})")
print(f"    = {right_side:.4f}")
print(f"  Property holds: {np.isclose(left_side, right_side)}")
print("-" * 40 + "\n")


print("### Property 6: Covariance of Independent Variables ###")
print("  Testing: Cov(X1, X3) â‰ˆ 0 (if constructed to be unrelated)")
print(f"  Cov(X1, X3): {cov_x1_x3:.4f}")
print(f"  Cov(X2, X3): {cov_x2_x3:.4f}")
print(f"  (Note: The covariance values {cov_x1_x3:.4f} and {cov_x2_x3:.4f} are not exactly 0,")
print(f"  as this is a small sample. In this specific case, they are not close to 0.)")
print(f"  The original Cov(X1, X2) was {cov_x1_x2:.4f}, showing a clear relationship.")
print("-" * 40 + "\n")


print("### Property 7: Variance of Sums ###")
print("  Testing: Var(X1 + X2) = Var(X1) + Var(X2) + 2*Cov(X1, X2)")
# Left side
new_var_sum = X1 + X2
print(f"  New variable (X1 + X2): {new_var_sum}")
var_x1_plus_x2 = var(new_var_sum)
print(f"  LHS (Var(X1 + X2)): {var_x1_plus_x2:.4f}")
# Right side
right_side = var_x1 + var_x2 + 2 * cov_x1_x2
print(f"  RHS (Var(X1) + Var(X2) + 2*Cov(X1, X2)):")
print(f"    = {var_x1:.4f} + {var_x2:.4f} + (2 * {cov_x1_x2:.4f})")
print(f"    = {var_x1:.4f} + {var_x2:.4f} + {(2 * cov_x1_x2):.4f}")
print(f"    = {right_side:.4f}")
print(f"  Property holds: {np.isclose(var_x1_plus_x2, right_side)}")
print("=" * 40 + "\n")

--- 1. Initial Data Setup ---
Number of samples (n): 4
Number of variables: 3

Initial Data Matrix:
[[ 2.1  8.   1. ]
 [ 2.5 12.   3. ]
 [ 3.6 14.   2. ]
 [ 4.  10.   4. ]]

Variable X1: [2.1 2.5 3.6 4. ]
Variable X2: [ 8. 12. 14. 10.]
Variable X3: [1. 3. 2. 4.]

Constant (a): 5
Constant (b): 10

Initial Variances:
  Var(X1): 0.8033
  Var(X2): 6.6667
  Var(X3): 1.6667

Initial Covariances:
  Cov(X1, X2): 1.0000
  Cov(X1, X3): 0.7667
  Cov(X2, X3): 0.6667

### Property 1: Symmetry ###
  Testing: Cov(X1, X2) = Cov(X2, X1)
  LHS (Cov(X1, X2)): 1.0000
  RHS (Cov(X2, X1)): 1.0000
  Property holds: True
----------------------------------------

### Property 2: Covariance with Itself ###
  Testing: Cov(X1, X1) = Var(X1)
  LHS (Cov(X1, X1)): 0.8033
  RHS (Var(X1)):    0.8033
  Property holds: True
----------------------------------------

### Property 3: Effect of Adding a Constant ###
  Testing: Cov(X1 + 5, X2) = Cov(X1, X2)
  New variable (X1 + 5): [7.1 7.5 8.6 9. ]
  LHS (Cov(X1 + 5, X2)): 