# Les OLS simples

Les formules à implémenter pour estimer les coefficients de la régression linéaire simple $a, \beta$  

La formula pour déterminer $a$  

![images/a_formula.png](images/a_formula.png)

La formule pour déterminer $\beta$

![images/b_formula.png](images/b_formula.png)

In [3]:
# Calcul de la moyenne
def mean(data:list):
    total = 0.0
    for i in data:
        total = total + i
    return total / len(data)

In [8]:
X = [1, 2, 0, 3, 1, 8, 7, 9, 1]

In [9]:
mean(X)

3.5555555555555554

In [48]:
def variance(data:list):
    squaredData = [x**2 for x in data]
    return mean(squaredData) - mean(data)**2

In [49]:
variance(X)

10.691358024691358

In [50]:
def stdev(data:list):
    return variance(data) ** 0.5

In [51]:
stdev(X)

3.269764215458258

In [52]:
def covariance(X, y):
    n = len(X) # X et y sont de la même taille
    Xy = []
    for i in range(n):
        Xy.append(X[i] * y[i])
    meanX = mean(X)
    meanY = mean(y)
    return mean(Xy) - (meanX * meanY)

In [53]:
y = [3, 1, 2, 5, 6, 4, 8, 7, 9]

In [54]:
covariance(X, y)

2.8888888888888893

In [55]:
covariance(X, X)

10.691358024691358

In [24]:
covariance(y, X)

2.8888888888888893

La formule de l'estimateur des moindres carrés ordinaires.  
$$\hat{y} = \hat{a}\bar{X}$$

Calculons $\beta$

In [25]:
from math import sqrt
beta = sqrt(covariance(X, y)) / variance(X)

In [26]:
beta

0.15897635896882817

In [27]:
a = mean(y) - (beta * mean(X))

In [28]:
a

4.434750723666388

In [31]:
def predict(x:float):
    return a - beta * x

In [32]:
predicted = [predict(x) for x in X]
predicted

[4.27577436469756,
 4.116798005728732,
 4.434750723666388,
 3.957821646759904,
 4.27577436469756,
 3.162939851915763,
 3.3219162108845914,
 3.0039634929469345,
 4.27577436469756]

# Diagnostic

In [36]:
#RMSE : root mean squared error
diff = []
n = len(y)
for i in range(n):
    diff.append((y[i] - predicted[i])**2)
diff

[1.6276002296194636,
 9.714429808514604,
 5.928011086394002,
 1.0861357199622386,
 2.9729540414341016,
 0.7006696915108048,
 21.88446793798438,
 15.968307765700864,
 22.31830785324874]

In [37]:
sum(diff)/len(diff)

9.133431570485467

# Calculer le $R^2$

In [67]:
(X, y)

([1, 2, 0, 3, 1, 8, 7, 9, 1], [3, 1, 2, 5, 6, 4, 8, 7, 9])

In [66]:
# coefficient de détermination
sigmaX = sqrt(variance(X))
sigmaY = sqrt(variance(y))
covariance(X, y) /  sigmaX * sigmaY

1.8013856812933033

In [56]:
import numpy as np

In [57]:
np.var(X)

10.69135802469136

In [60]:
covariance(X,y)

2.8888888888888893

In [64]:
sigmaX *  sigmaY

8.442494901663196