# Multi-linear regression

***

Let's try multi-linear regression using sklearn.
[https://scikit-learn.org/stable/](https://scikit-learn.org/stable/)

$ petalwidth = t (sepallength) + u (sepalwidth) + v (petallength) + c $

## Using sklearn

In [37]:
# Import linear_model from sklearn.
import sklearn.linear_model as lm

# Create a linear regression model instance.
model = lm.LinearRegression()

# Let's use pandas to read a csv file and organise our data.
import pandas as pd

# Read the iris csv from online.
df = pd.read_csv('https://datahub.io/machine-learning/iris/r/iris.csv')

# Let's pretend we want to do linear regression on these variables to predict petal width.
x = df[['sepallength', 'sepalwidth', 'petallength']]

# Here's petal width.
y = df['petalwidth']

# Ask our model to fit the data.
model.fit(x, y)

# Here's our intercept.
print('   intercept: ',model.intercept_)

# Here's our coefficients, in order.
print('coefficients: ',model.coef_)

# See how good our fit is.
print('       score: ',model.score(x, y))

   intercept:  -0.248723586024455
coefficients:  [-0.21027133  0.22877721  0.52608818]
       score:  0.9380481344518986


In [38]:
# Calculating the score by hand.
t, u, v = model.coef_
c = model.intercept_

y_avg = y.mean()

u = ((y - (t * x['sepallength'] + u * x['sepalwidth'] + v * x['petallength'] + c))**2).sum()
v = ((y - y.mean())**2).sum()

1 - (u/v)

0.9380481344518986

***

## Using statsmodels

In [39]:
# Using statsmodels.
import statsmodels.api as sm

# Tell statmodels to include an intercept.
xwithc = sm.add_constant(x)

# Create a model.
msm = sm.OLS(y, xwithc)
# Fit the data.
rsm = msm.fit()
# Print a summary.
print(rsm.summary())

                            OLS Regression Results                            
Dep. Variable:             petalwidth   R-squared:                       0.938
Model:                            OLS   Adj. R-squared:                  0.937
Method:                 Least Squares   F-statistic:                     736.9
Date:                Sun, 06 Oct 2019   Prob (F-statistic):           6.20e-88
Time:                        17:25:10   Log-Likelihood:                 36.809
No. Observations:                 150   AIC:                            -65.62
Df Residuals:                     146   BIC:                            -53.57
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
const          -0.2487      0.178     -1.396      

  return ptp(axis=axis, out=out, **kwargs)


## End