### Correlation coefficient

As an excercise in numpy let's compute the correlation of two columns "manually". 

This will help us get used to "broadcast" operations.

The formula for the correlation coefficient of two variables is below.

We will make some artificial data, then try to implement the formula using broadcasting. 

Numpy can check our answer.


<img src="formula.png" alt="drawing" width="800"/>



In [3]:
import numpy as np

### Make some data
data = np.random.randint(20,size=20).reshape(10,2)
data

array([[17, 19],
       [ 6,  7],
       [ 1, 14],
       [ 3,  6],
       [ 8,  7],
       [ 6, 16],
       [18,  3],
       [ 5, 18],
       [ 1,  2],
       [ 0, 13]])

In [8]:
## Use slicing to get the two columns

c1 = data[:,0]
c2 = data[:,1]

c1,c2

(array([17,  6,  1,  3,  8,  6, 18,  5,  1,  0]),
 array([19,  7, 14,  6,  7, 16,  3, 18,  2, 13]))

In [12]:
## Get the numerator

numerator=np.sum((c1-c1.mean())*(c2-c2.mean()))
numerator

12.5

In [13]:
## Get the denominator

denominator= np.sum((c1-c1.mean())**2)**(1/2)*np.sum((c2-c2.mean())**2)**(1/2)
denominator

356.4495055404061

In [14]:
## Compute r

r=numerator/denominator
r

0.0350680806277147

In [6]:
## Numpy can tell us the true answer
np.corrcoef(data.T)

array([[1.        , 0.03506808],
       [0.03506808, 1.        ]])

In [7]:
## Pandas can also tell us the true answer

import pandas as pd

pd.DataFrame(data).corr()

Unnamed: 0,0,1
0,1.0,0.035068
1,0.035068,1.0


### Similar Challenge

A similar challenge for you is to use numpy to compute the [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error) when you have a vector $\hat{y}$ of predictions and a vector $\bar{y}$ of true observations.

Please try this for the data below.


In [27]:
ybar = np.arange(10)
noise = np.random.randn(10)/100 ## some gaussian noise
yhat = ybar + noise ## this is a broadcast sum
ybar,yhat

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([2.46675300e-03, 9.98465299e-01, 1.97799546e+00, 2.99571662e+00,
        4.00104691e+00, 5.00533823e+00, 6.00466346e+00, 7.01071427e+00,
        7.99140788e+00, 8.99986290e+00]))

In [28]:
### Your work here



### Solving a system of equations

Use numpy to solve the system of equations given by

\begin{align}
x + 2y +z &= 1\\
2x +3z &= 3 \\
y+z &= 8
\end{align}



In [30]:
A = np.array([[1,2,1],[2,0,3],[0,1,1]])
v = np.array([1,3,8])
A,v

(array([[1, 2, 1],
        [2, 0, 3],
        [0, 1, 1]]),
 array([1, 3, 8]))

$$A\bar{x} = \bar{v}$$

To solve the above, we can apply $A^{-1}$ to both sides...

$$\bar{x} = A^{-1}\bar{v}$$

In [32]:
x=np.linalg.inv(A)@v
x

array([-8.4,  1.4,  6.6])

In [35]:
## Check the answer

A@x

array([1., 3., 8.])

### Similar challenge

Try to solve the following system of equations.

\begin{align}
-x -y -z &= 11\\
2x +9y -2z &= 0\\
x &= 9
\end{align}

In [38]:
## Your work here
