# 7 Variance, Covariance, and Correlation
<hr>

In statistics and probability theory, variance, covariance, and correlation are fundamental concepts that describe the relationships and behaviors of random variables.

## 7.1 Variance
<hr>

Variance measures the spread of a set of values, or how much the values in a random variable differ from the mean.

$$\text{Var}(X) = E\left[ (X-\mu)^2 \right] = E[X^2] - \mu^2$$

A high variance indicates that the data points are spread out widely around the mean, while a low variance indicates they are clustered closely.

## 7.2 Covariance
<hr>

Covariance measures how much two random variables change together, or the extent to which one variable's deviation from its mean matches the deviation of another.

$$\text{Cov}(X,Y) = E\left[ (X-\mu_X) (Y-\mu_Y) \right]$$

- Positive covariance implies that as $X$ increases, $Y$ tends to increase.
- Negative covariance implies that as $X$ increases, $Y$ tends to decrease.
- Zero covariance implies that there is no linear relationship between $X$ and $Y$.

*Remarks:*

- $\infty < \text{Cov}(X,Y) < \infty$
- If small values of one variable, $X$, occur in conjunction with small values of another variable, $Y$, then $\text{Cov}(X,Y)>0$. For instance, if $X < \mu_X, Y < \mu_Y$, then

$$ E\left[ (X-\mu_X) (Y-\mu_Y) \right] = E[ (-) (-) ] = + $$

- If large values of one variable, $X$, occur in conjunction with large values of another variable, $Y$, then $\text{Cov}(X,Y)>0$.

$$ E\left[ (X-\mu_X) (Y-\mu_Y) \right] = E[ (+) (+) ] = + $$

- If large values of one variable, $X$, occur in conjunction with smaller values of another variable, $Y$, then $\text{Cov}(X,Y)<0$.

$$ E\left[ (X-\mu_X) (Y-\mu_Y) \right] = E[ (+) (-) ] = - $$

**Theorem:** The covariance of $X$ and $Y$ is given as:

$$\text{Cov}(X,Y)=E(X.Y) - \mu_X  \mu_Y$$

*Proof.*

\begin{align}
\text{Cov}(X,Y) &= E\left[ (X-\mu_X) (Y-\mu_Y) \right] \\
&= E[XY - X \mu_Y - Y \mu_X + \mu_X \mu_Y] \\
&= E(XY) - E(X \mu_Y) - E(Y \mu_X) + E(\mu_X \mu_Y) \\
&= E(XY) - \mu_Y E(X) - \mu_X E(Y) + \mu_X \mu_Y \\
&= E(XY) - \mu_Y \mu_X - \mu_X \mu_Y + \mu_X \mu_Y \\
&= E(XY) - \mu_X \mu_Y \\
\end{align}


### 7.2.1 Properties of Covariance

\begin{align}
\text{Cov}(X,Y) &= \text{Cov}(Y,X) \\
\text{Cov}(X,X) &= E\left[ (X-\mu_X) (X-\mu_X) \right] = E(X-\mu_X)^2 = \text{Var}(X) \\
\text{Cov}(aX, Y) &= a \ \text{Cov}(X,Y) \\
\text{Var}(a X_1 + b X_2) &= a^2 \ \text{Var}(X_1) + b^2 \ \text{Var}(X_2) + 2(a)(b) \text{Cov}(X_1, X_2) \\
\end{align}


## 7.3 Correlation
<hr>

Correlation is a normalized form of covariance that measures the strength and direction of a linear relationship between two random variables.

$$\text{Corr}(X,Y) = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}$$

- $ -1 \leq \rho(X, Y) \leq 1$
- If $\rho(X,Y) = 0$, then $\text{Cov}(X,Y)=0$ i.e., there is no linearity between $X$ and $Y$. Note that the absence of linearity does not mean independence of variables. Take, for instance, the joint of $X$ and $Y$ that is in the shape of a parabola. The linearity is zero in this case, but the variables are not independent.
- If $X$ and $Y$ are independent, then the $\text{Cov}(X,Y)=0$.

Correlations:
-	High = 0.9 and above
-	Moderate = 0.5 to 0.8
-	Low = 0.5 and below