# Introduction

Comprehending and unleashing the intricate affinities among variables in the expansive realm of statistics is integral. Everything from data-driven decision-making to scientific discoveries to predictive modeling depends on our potential to disentangle the hidden connections and patterns within complex datasets. 

Among various statistical standards supporting this pursuit, **covariance** and **correlation** are crucial, rendering insights into the independencies between variables.

Covariance and correlation are frequently occurring variables in statistical analysis, yet people often misunderstand or use them interchangeably. The subtle nuances differentiating these two standards can profoundly impact our interpretation and utilization of statistical relationships.

Therefore, understanding the true nature of covariance and correlation is paramount for any data enthusiast or professional striving to unveil the full potential of their data.

1. [Correlation vs Covariance](https://www.analyticsvidhya.com/blog/2023/07/covariance-vs-correlation/)
2. [Correlation](https://www.scribbr.com/statistics/correlation-coefficient/)


Pearson, Spearman, and Kendall are three different types of correlation coefficients used to measure the relationship between two variables. Here's a brief explanation of each:

1. Pearson Correlation Coefficient:
   - The Pearson correlation coefficient, also known as Pearson's r, measures the linear relationship between two variables.
   - It assumes that the relationship between the variables is linear and follows a specific pattern.
   - Pearson's r ranges from -1 to 1, where -1 indicates a perfect negative linear relationship, 1 indicates a perfect positive linear relationship, and 0 indicates no linear relationship.
   - It is sensitive to outliers and assumes that the variables are normally distributed.

2. Spearman Correlation Coefficient:
   - The Spearman correlation coefficient, also known as Spearman's rho, measures the monotonic relationship between two variables.
   - It does not assume that the relationship between the variables follows a specific pattern, but it looks for a consistent increase or decrease in the values of the variables.
   - Spearman's rho ranges from -1 to 1, where -1 indicates a perfect negative monotonic relationship, 1 indicates a perfect positive monotonic relationship, and 0 indicates no monotonic relationship.
   - It is less sensitive to outliers and does not assume that the variables are normally distributed.

3. Kendall Correlation Coefficient:
   - The Kendall correlation coefficient, also known as Kendall's tau, measures the ordinal relationship between two variables.
   - It looks for the concordant and discordant pairs of observations between the variables.
   - Kendall's tau ranges from -1 to 1, where -1 indicates a perfect negative ordinal relationship, 1 indicates a perfect positive ordinal relationship, and 0 indicates no ordinal relationship.
   - It is also less sensitive to outliers and does not assume that the variables are normally distributed.

In summary, Pearson correlation measures linear relationships, Spearman correlation measures monotonic relationships, and Kendall correlation measures ordinal relationships. The choice of which correlation coefficient to use depends on the nature of the variables and the type of relationship you want to measure.

In [1]:
import pandas as pd
import seaborn as sns

In [2]:
df = sns.load_dataset('healthexp')
df.head()

Unnamed: 0,Year,Country,Spending_USD,Life_Expectancy
0,1970,Germany,252.311,70.6
1,1970,France,192.143,72.2
2,1970,Great Britain,123.993,71.9
3,1970,Japan,150.437,72.0
4,1970,USA,326.961,70.9


In [4]:
df.cov(numeric_only=True)

Unnamed: 0,Year,Spending_USD,Life_Expectancy
Year,201.098848,25718.83,41.915454
Spending_USD,25718.827373,4817761.0,4166.800912
Life_Expectancy,41.915454,4166.801,10.733902


In [7]:
df.corr(numeric_only=True, method='spearman')

Unnamed: 0,Year,Spending_USD,Life_Expectancy
Year,1.0,0.931598,0.896117
Spending_USD,0.931598,1.0,0.747407
Life_Expectancy,0.896117,0.747407,1.0


In [8]:
df.corr(numeric_only=True, method='pearson')

Unnamed: 0,Year,Spending_USD,Life_Expectancy
Year,1.0,0.826273,0.902175
Spending_USD,0.826273,1.0,0.57943
Life_Expectancy,0.902175,0.57943,1.0


In [9]:
df.corr(numeric_only=True, method='kendall')

Unnamed: 0,Year,Spending_USD,Life_Expectancy
Year,1.0,0.787151,0.737203
Spending_USD,0.787151,1.0,0.57162
Life_Expectancy,0.737203,0.57162,1.0


In [10]:
df = sns.load_dataset('flights')
df.head()

Unnamed: 0,year,month,passengers
0,1949,Jan,112
1,1949,Feb,118
2,1949,Mar,132
3,1949,Apr,129
4,1949,May,121


In [11]:
df.cov(numeric_only=True)

Unnamed: 0,year,passengers
year,12.0,383.087413
passengers,383.087413,14391.917201


In [14]:
df.corr(numeric_only=True, method='pearson')

Unnamed: 0,year,passengers
year,1.0,0.921824
passengers,0.921824,1.0


In [15]:
df = sns.load_dataset('tips')
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [16]:
df.corr(numeric_only=True)

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0


In [17]:
df = sns.load_dataset('titanic')

In [18]:
df.corr(numeric_only=True)

Unnamed: 0,survived,pclass,age,sibsp,parch,fare,adult_male,alone
survived,1.0,-0.338481,-0.077221,-0.035322,0.081629,0.257307,-0.55708,-0.203367
pclass,-0.338481,1.0,-0.369226,0.083081,0.018443,-0.5495,0.094035,0.135207
age,-0.077221,-0.369226,1.0,-0.308247,-0.189119,0.096067,0.280328,0.19827
sibsp,-0.035322,0.083081,-0.308247,1.0,0.414838,0.159651,-0.253586,-0.584471
parch,0.081629,0.018443,-0.189119,0.414838,1.0,0.216225,-0.349943,-0.583398
fare,0.257307,-0.5495,0.096067,0.159651,0.216225,1.0,-0.182024,-0.271832
adult_male,-0.55708,0.094035,0.280328,-0.253586,-0.349943,-0.182024,1.0,0.404744
alone,-0.203367,0.135207,0.19827,-0.584471,-0.583398,-0.271832,0.404744,1.0
