<center> <h1>Covariance Matrix<h1/> <center/>

A covariance matrix is a square matrix that summarizes the pairwise covariances between multiple variables. 
It captures the linear relationship between each pair of variables. The diagonal elements represent the variances of individual variables, while off-diagonal elements reveal the strength and direction of relationships between variables.

### Let's get a practical example to understand it : 

Suppose we have three variables measured for a group of patients and we want to understand how these variables relate to each other. :
- blood pressure (BP)
- body mass index (BMI)
- fasting glucose levels (FG)

In [11]:
import pandas as pd

data = {
    'Blood Pressure (mmHg)': [130, 140, 125, 150, 135],
    'Body Mass Index': [25, 28, 22, 30, 26],
    'Fasting Glucose (mg/dL)': [120, 110, 130, 100, 115]
}

df = pd.DataFrame(data)

df

Unnamed: 0,Blood Pressure (mmHg),Body Mass Index,Fasting Glucose (mg/dL)
0,130,25,120
1,140,28,110
2,125,22,130
3,150,30,100
4,135,26,115


In [14]:
cov_matrix = df.cov()
cov_matrix

Unnamed: 0,Blood Pressure (mmHg),Body Mass Index,Fasting Glucose (mg/dL)
Blood Pressure (mmHg),92.5,28.5,-106.25
Body Mass Index,28.5,9.2,-33.75
Fasting Glucose (mg/dL),-106.25,-33.75,125.0


###  Insights

In [18]:
variance_matrix = df.var()
covariance_matrix = df.cov()

print("\nInsights from the Variance and Covariance Matrices:")

for i, col in enumerate(df.columns):
    print(f"Variance of {col}: {variance_matrix.iloc[i]}")
    
for i in range(len(df.columns)):
    for j in range(i+1, len(df.columns)):
        # Get the covariance value
        cov_val = covariance_matrix.iloc[i, j]
        
        if abs(cov_val) > 0.5:
            print(f"Correlation between {df.columns[i]} and {df.columns[j]}: {cov_val:.2f}")


Insights from the Variance and Covariance Matrices:
Variance of Blood Pressure (mmHg): 92.5
Variance of Body Mass Index: 9.2
Variance of Fasting Glucose (mg/dL): 125.0
Correlation between Blood Pressure (mmHg) and Body Mass Index: 28.50
Correlation between Blood Pressure (mmHg) and Fasting Glucose (mg/dL): -106.25
Correlation between Body Mass Index and Fasting Glucose (mg/dL): -33.75


### Analysis

**Blood Pressure**: Variance is relatively high (92.50), indicating a significant amount of variation in blood pressure values. There is also a positive correlation with Body Mass Index is positive (28.50), suggesting that higher body mass index is associated with higher blood pressure.


**Blood Pressure**: Variance is relatively high (92.50), indicating a significant amount of variation in blood pressure values. There is also a positive correlation with Body Mass Index is positive (28.50), suggesting that higher body mass index is associated with higher blood pressure.

**Body Mass Index**: Variance is moderate (9.20), indicating some variation in body mass index values. And the positive correlation as we saw before.

**Fasting Glucose**: Variance is relatively high (-125.00), indicating a significant amount of variation in fasting glucose values. Correlations with other variables are negative:
- With Blood Pressure, the covariance is -106.25, suggesting that higher blood pressure is associated with lower fasting glucose levels.
- With Body Mass Index, the covariance is -33.75, indicating a possible inverse relationship between body mass index and fasting glucose.

**Data Science List** :  https://medium.com/@soulawalid/list/statistics-data-science-65305693779d