# Linear Regression - 
https://scipy-lectures.org/packages/statistics/index.html#linear-models-multiple-factors-and-analysis-of-variance

## Simple Linear Regression
Given two set of observations, x and y, we want to test the hypothesis that y is a linear function of x. In other terms:
y = x * coef + intercept + e
here e is observation noise. We will use the statsmodels module to:

Fit a linear model. We will use the simplest strategy, ordinary least squares (OLS).
Test that coef is non zero.

In [5]:
import numpy as np
import pandas as pd
import scipy
x = np.linspace(-5, 5, 20)
np.random.seed(1)
# normal distributed noise
y = -5 + 3*x + 4 * np.random.normal(size=x.shape)
# Create a data frame containing all the relevant variables
data = pd.DataFrame({'x': x, 'y': y})
data.head()

Unnamed: 0,x,y
0,-5.0,-13.502619
1,-4.473684,-20.868078
2,-3.947368,-18.954792
3,-3.421053,-19.555032
4,-2.894737,-10.22258


In [6]:
from statsmodels.formula.api import ols
model = ols("y ~ x", data).fit()

In [7]:
print(model.summary()) 

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.804
Model:                            OLS   Adj. R-squared:                  0.794
Method:                 Least Squares   F-statistic:                     74.03
Date:                Wed, 07 Dec 2022   Prob (F-statistic):           8.56e-08
Time:                        10:51:01   Log-Likelihood:                -57.988
No. Observations:                  20   AIC:                             120.0
Df Residuals:                      18   BIC:                             122.0
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -5.5335      1.036     -5.342      0.0

Statsmodels uses a statistical terminology: the y variable in statsmodels is called ‘endogenous’ while the x variable is called exogenous. 
http://statsmodels.sourceforge.net/devel/endog_exog.html
To simplify, y (endogenous) is the value you are trying to predict, while x (exogenous) represents the features you are using to make the prediction.
Retrieve the estimated parameters from the model above. Hint: use tab-completion to find the relevent attribute.

In [10]:
print(model.aic)

119.97669968555743
