# O modelo da regressão linear simpres

$ Y = \beta_0 + \beta_1 x + \epsilon$

## O princípio dos mínimos quadrados 

O desvio vertical do ponto ($x_i, y_i$) da reta $y = b_0 = b_1 x_i$  $y = b_0 + b_1 x$ é:

$altura_do_ponto - altura_da_reta = y_i - (b_0 + b_1 x_i)$

A soma dos desvios quadrados verticais dos pontos $(x_1,y_1)$, ... , $(x_n, y_n)$ à reta é, portanto:

$f(b_0, b_1) = \sum\limits_{i=0}^{n}[y_i - (b_0+b_1 x_i)]^2 $



As estimativas pontuais de $\beta_0$ e $\beta_1$, representadas por $\hat \beta_0$ e $\hat \beta_1$ e denominadas **estimativas dos
mínimos quadrados**, são aqueles valores que minimizam $f (b_0 , b_1)$. Ou seja,

$\hat \beta_0$ e $\hat \beta_1$  são tais que


$f(\hat \beta_0,\hat \beta_1) \le f(b_0 , b_1)$ 

para qualquer $b_0 \text{ e } b_1$. 

A **reta de regressão estimada** ou a **reta dos mínimos
quadrados** é, portanto, a reta cuja equação é 

$ y = \hat \beta_0 + \hat \beta_1 x$

A estimativa dos mínimos quadrados do coeficiente angular $\beta_1$ da reta de regressão real é:

$ b_1 = \hat \beta_1 = \frac{\sum (x_i - \overline{x})(y_i - \overline{y})}{\sum (x_i - \overline{x})^2 } = \frac{S_{xy}}{S_{xx}}$




As fórmulas de cálculo do numerador e denominador de $\hat\beta_1$ são:

$ S_{xx} =  \sum x_i y_i - \frac{(\sum x_1)(\sum y_1)}{n}$


$S_{xx}  = \sum \x_i^2 - \frac{(\sum x_i)^2}{n}$


A estimativa dos mínimos quadrados do termo constante $\hat \beta_0$ 0 da reta de regressão real é:

$ b_0 = \hat \beta_0 = \frac{\sum y_i - \hat \beta_i \sum x_i}{n} = \overline{y} - \hat \beta_1 \overline{x} $

In [5]:
import pandas as pd

In [9]:
df = pd.read_csv("regressao_linear_dataset_ex1.csv")
df

Unnamed: 0,x,y
0,99.0,28.8
1,101.1,27.9
2,102.7,27.0
3,103.0,25.2
4,105.4,22.8
5,107.0,21.5
6,108.7,20.9
7,110.8,19.6
8,112.1,17.1
9,112.4,18.9


In [18]:
x = [99.0, 101.1, 102.7, 103.0, 105.4, 107.0, 108.7, 110.8, 112.1, 112.4, 113.6, 113.8, 115.1, 115.4, 120.0]
y = [28.8, 27.9, 27.0, 25.2, 22.8, 21.5, 20.9, 19.6, 17.1, 18.9, 16.0, 16.7, 13.0, 13.6, 10.8 ]

df = pd.DataFrame({'x':x, 'y':y})
df['xx'] = df['x'] * df['x']
df['xy'] = df['x'] * df['y']
df['yy'] = df['y'] * df['y']
df


Unnamed: 0,x,y,xx,xy,yy
0,99.0,28.8,9801.0,2851.2,829.44
1,101.1,27.9,10221.21,2820.69,778.41
2,102.7,27.0,10547.29,2772.9,729.0
3,103.0,25.2,10609.0,2595.6,635.04
4,105.4,22.8,11109.16,2403.12,519.84
5,107.0,21.5,11449.0,2300.5,462.25
6,108.7,20.9,11815.69,2271.83,436.81
7,110.8,19.6,12276.64,2171.68,384.16
8,112.1,17.1,12566.41,1916.91,292.41
9,112.4,18.9,12633.76,2124.36,357.21


In [26]:
# dfsum = pd.DataFrame('x':df['x'].sum(), 'y':df['y'].sum(), 'xx':df['xx'].sum(), 'xy':df['xy'].sum(), 'yy':df['yy'].sum() )
dfsum = []
dfsum.append(df['x'].sum())
dfsum.append(df['y'].sum())
dfsum.append(df['xx'].sum())
dfsum.append(df['xy'].sum())
dfsum.append(df['yy'].sum())

dfsum

[1640.1, 299.8, 179849.73, 32308.589999999997, 6430.06]

In [19]:
df.info()
df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   x       15 non-null     float64
 1   y       15 non-null     float64
 2   xx      15 non-null     float64
 3   xy      15 non-null     float64
 4   yy      15 non-null     float64
dtypes: float64(5)
memory usage: 728.0 bytes


Unnamed: 0,x,y,xx,xy,yy
count,15.0,15.0,15.0,15.0,15.0
mean,109.34,19.986667,11989.982,2153.906,428.670667
std,6.101499,5.593729,1329.563135,489.264165,227.928
min,99.0,10.8,9801.0,1296.0,116.64
25%,104.2,16.35,10859.08,1859.03,267.445
50%,110.8,19.6,12276.64,2171.68,384.16
75%,113.7,24.0,12927.7,2499.36,577.44
max,120.0,28.8,14400.0,2851.2,829.44


In [28]:
# Média de x e y
x_med = df['x'].mean()
y_med = df['y'].mean()

$\hat \beta_1 = \frac{S_{xy}}{S_{xx}}$ 

In [40]:
S_xy = df['xy'].sum() - df['x'].sum()*df['y'].sum()/df['x'].count()
S_xx = df['xx'].sum() - df['x'].sum().__pow__(2)/df['x'].count()

beta_1 = S_xy/S_xx
beta_1

-0.9047306579482158

In [41]:
beta_0 = y_med - (beta_1)*x_med
beta_0

118.90991680672457

Modelo de regressão linear

$ Y = \beta_1 x + \beta_0$

In [48]:
print('y = {}x + {}'.format(round(beta_1,2),round(beta_0,2)))

y = -0.9x + 118.91
