<a href="https://colab.research.google.com/github/Hamid-Mofidi/The-Principles-of-Deep-Learning-Theory/blob/main/Ch.%201%3A%20Pretraining%20/1.0%20Probability.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In multivariate statistics, the variance-covariance matrix (also known as the covariance matrix) is a matrix that summarizes the variances and covariances of a set of random variables. It is a square matrix where the diagonal elements are the variances of each variable, and the off-diagonal elements are the covariances between pairs of variables.

For example, suppose we have a dataset with two variables, x and y, and we want to calculate the variance-covariance matrix. We can represent the dataset as a matrix X, where each row represents an observation, and each column represents a variable:

```
X = [x1, y1]
    [x2, y2]
    [x3, y3]
    ...
```

The variance-covariance matrix for this dataset is:

```
[ Var(x)   Cov(x,y) ]
[ Cov(x,y) Var(y)   ]
```

where Var(x) and Var(y) are the variances of x and y, respectively, and Cov(x,y) is the covariance between x and y.

For example, suppose we have the following dataset:

```
X = [ 1,  2]
    [ 3,  4]
    [ 5,  6]
```

The mean of x is (1+3+5)/3 = 3, and the mean of y is (2+4+6)/3 = 4. The variances of x and y are:

```
Var(x) = ((1-3)^2 + (3-3)^2 + (5-3)^2)/3 = 4
Var(y) = ((2-4)^2 + (4-4)^2 + (6-4)^2)/3 = 4
```

The covariance between x and y is:

```
Cov(x,y) = ((1-3)*(2-4) + (3-3)*(4-4) + (5-3)*(6-4))/3 = 2
```

Therefore, the variance-covariance matrix for this dataset is:

```
[ 4  2 ]
[ 2  4 ]
```

This matrix summarizes the variances and covariances of the x and y variables in the dataset. The diagonal elements represent the variances of each variable, and the off-diagonal elements represent the covariances between pairs of variables.

Here's an example where the variance-covariance matrix is a 3x3 square matrix:

Suppose we have data on three variables: x, y, and z. We collect data on these variables, and assume that each observation has data on each of these three variables. We can represent this data as a matrix X, with n rows (one for each observation) and three columns (one for each variable):

```
X = [x1, y1, z1]
    [x2, y2, z2]
    ...
    [xn, yn, zn]
```

The variance-covariance matrix for this dataset is a 3x3 matrix that summarizes the variance and covariance of each variable with every other variable. The diagonal elements of the matrix represent the variances of each variable, while the off-diagonal elements represent the covariances between pairs of variables.

Here's an example variance-covariance matrix for this dataset:

```
[ Var(x)   Cov(x,y)  Cov(x,z) ]
[ Cov(x,y) Var(y)    Cov(y,z) ]
[ Cov(x,z) Cov(y,z)  Var(z)   ]
```

For example, if we have the following dataset:

```
X = [ 1,  2,  3]
    [ 4,  5,  6]
    [ 7,  8,  9]
    [10, 11, 12]
```

The mean of x is (1+4+7+10)/4 = 5.5, the mean of y is (2+5+8+11)/4 = 6, and the mean of z is (3+6+9+12)/4 = 7.5. The variances of x, y, and z are:

```
Var(x) = ((1-5.5)^2 + (4-5.5)^2 + (7-5.5)^2 + (10-5.5)^2)/4 = 10.83
Var(y) = ((2-6)^2 + (5-6)^2 + (8-6)^2 + (11-6)^2)/4 = 10.5
Var(z) = ((3-7.5)^2 + (6-7.5)^2 + (9-7.5)^2 + (12-7.5)^2)/4 = 10.83
```

The covariances between x, y, and z are:

```
Cov(x,y) = ((1-5.5)*(2-6) + (4-5.5)*(5-6) + (7-5.5)*(8-6) + (10-5.5)*(11-6))/4 = 8.25
Cov(x,z) = ((1-5.5)*(3-7.5) + (4-5.5)*(6-7.5) + (7-5.5)*(9-7.5) + (10-5.5)*(12-7.5))/4 = 8.25
Cov(y,z) = ((2-6)*(3-7.5) + (5-6)*(6-7.5) + (8-6)*(9-7.5) + (11-6)*(12-7.5))/4 = 8.25
```

Therefore, the variance-covariance matrix for this dataset is:

```
[ 10.83  8.25   8.25  ]
[  8.25  10.5    8.25  ]
[  8.25   8.25  10.83  ]
``` 

This matrix summarizes the variances and covariances of the x, y, and z variables in the dataset. The diagonal elements represent the variances of each variable, and the off-diagonal elements represent the covariances between pairs of variables.

## The variance-covariance matrix is always positive definite, assuming the variables are not perfectly linearly dependent.



A positive-definite matrix is a square matrix where all of the eigenvalues are positive. Equivalently, a matrix A is positive definite if and only if for any non-zero vector x, the quadratic form x^T A x is positive. That is, x^T A x > 0 for all non-zero vectors x.

Some important properties of positive-definite matrices include:

1. All the eigenvalues of a positive-definite matrix are positive. This means that the matrix is invertible and has a unique inverse.

2. The determinant of a positive-definite matrix is positive. This means that the matrix is non-singular and has a unique inverse.

3. The diagonal entries of a positive-definite matrix are positive. This follows from the fact that the matrix is diagonalizable by a similarity transformation, and the diagonal entries of a diagonal matrix are its eigenvalues.

4. A positive-definite matrix is always symmetric. This follows from the spectral theorem, which states that a real symmetric matrix is diagonalizable by an orthogonal matrix.

5. A positive-definite matrix defines a positive-definite quadratic form. This means that any function of the form f(x) = x^T A x, where A is a positive-definite matrix, is always positive for any non-zero vector x.

These properties make positive-definite matrices very useful in many areas of mathematics, including linear algebra, optimization, and statistics.