In [1]:
import numpy as np
import matplotlib.pyplot as plt

# Covariance
Covariance is a measure of the tendency of two variables to vary together.<br>
If we have two series, X and Y, their deviations from the mean are
![](assets/images/cov-deviations.PNG)
Covariance is the mean of these products:
![](assets/images/cov-formula.PNG)
where n is the length of the two series (they have to be the same length)

In [2]:
X = np.random.rand(50)
Y = np.random.rand(50)

np.cov(X, Y)[0, 1]

-0.012486954642708959

Covariance is useful for some calculations, but it doesn't mean much by itself.

In [3]:
X = np.random.rand(50)
Y = 2 * X + np.random.normal(0, 0.1, 50)

np.cov(X, Y)[0, 1]

0.14802277795832247

The coefficient of correlation is a standardized version of covariance that is easier to interpret.

# Correlation
A correlation is a statistic intended to quantify the strength of the relationship between two variables.<br>


### Pearson’s Correlation
Divide the deviations by the standard deviation, which yields standard scores(dimensionless scores), and compute the product of standard scores:
![](assets/images/corr-pearson-product.PNG)
The mean of these products is
![](assets/images/corr-pearson-mean.PNG)
In terms of covaiance:
![](assets/images/corr-pearson-cov.PNG)
- The value lies between -1 and +1<br>
- If ρ is positive, the correlation is positive
- If ρ is negative, the correlation is negative
- The magnitude of ρ indicates the strength of the correlation


In [4]:
print('Covariance of X and Y: %.2f'%np.cov(X, Y)[0, 1])
print('Correlation of X and Y: %.2f'%np.corrcoef(X, Y)[0, 1])

Covariance of X and Y: 0.15
Correlation of X and Y: 0.98


In [5]:
plt.scatter(X,Y)
plt.xlabel('X Value')
plt.ylabel('Y Value')
plt.show()
print('Correlation of X and Y: %.2f'%np.corrcoef(X, Y)[0, 1])

NameError: name 'plt' is not defined

What if Pearson’s correlation is near 0? 
It is tempting to conclude that there is no relationship between the variables, but that conclusion is not valid. 
- Pearson’s correlation only measures linear relationships<br>
<br>
<b>Examples of datasets with a range of correlations</b>
![](assets/images/corr-pearson-examples.png)

### Spearman’s Rank Correlation
Pearson’s correlation works well if the relationship between variables is linear and if the variables are roughly normal. But it is not robust in the presence of outliers.<br>
Spearman’s rank correlation is an alternative that mitigates the effect of outliers and skewed distributions as well as non-linearity. 

# Causation
If variables A and B are correlated, there are three possible explanations: A causes B, or B causes A, or some other set of factors causes both A and B
<b>Correlation does not imply causation</b>

So what can you do to provide evidence of causation?<br>
- Use time. The order of events can help us infer the direction of causation, but it does not preclude the possibility that something else causes both A and B
- Use randomness.