#1 Multiple Linear regression part2
## Objectives
*   a. use statmodel with constant
*   b. manual and sklearn scaling
*   c. checking different regression metrics
*   d. AOB



In [34]:
#importing libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

import statsmodels.api as sm
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
import sklearn.metrics as metrics
from random import gauss
from mpl_toolkits.mplot3d import Axes3D
from scipy import stats as stats
%matplotlib inline


In [None]:
#loading the dataset
df = pd.read_csv("/content/WineQT.csv")
df.head()

In [None]:
#checking dttypes
df.info()

In [None]:
#statistical summary
df.describe()

#### Preparing data for modeling

In [38]:
#making a copy copy to be used for modeling
wine= df.copy(deep=True)

In [39]:
#our target variable will be alcoholic content.
X = wine.drop("alcohol", axis=1) # predictors  # data leakage
y = wine["alcohol"]# target

In [None]:
X.head()

In [None]:
y

##### Using statmodel

In [None]:
# statmodel
# use sm.add_constant(), to add constant term/y-intercept
X_con = sm.add_constant(X)# we add constant to differentiate predictor from other features.

#building the model
model =  sm.OLS(y,X_con) .fit()

#getting the model summary
model.summary()

Observation, we add a constant "const", due to betas we hve in an equation. this is beta 0. The constant is always 1.

### Note:
the coefficient of density is higher since it's value was small and due to different scales, the value is multiplied by a bigger value to match the rest thus higher coefficient.

We can rectify the above issue by scaling our features. This will help avoid our features from being shrinked or expanded ## this makes the coefficient interpretion esier.



### solution: standard scaling
There're different ways of doing this;

we'll focus on standard scaling(scaling it to features respective z-scores)

benefits
1.   makes value relatively small(mean value is zero, and std deviation from the mean is 1.
2.   easier interpretation: larger coefficient tend to be influential




In [None]:
#checking std deviation of the original predictors
np.std(X)

In [None]:
# standand scaling(subtract the mean of the variable/the std deviation of the variable)

#including all the columns
X_scaled = (X-np.mean(X))/np.std(X)
X_scaled.head()

#checking the statistical summary
X_scaled.describe()

In [None]:
#modeling
X_pred = sm.add_constant(X_scaled)
#building the model
model2 =  sm.OLS(y,X_pred).fit()
model2.summary()

##Note:

1. After scaling when all values of x=0, it means unscalled variables equal to the mean of that variable.
2. B0 will be the only determinant in predicting the target.



### **Multiple linear Regression in scikit-learn**

after trying mlr with statmodel, we'll try it with sklearn.

In [46]:
#data to be used
df2 = df.copy(deep=True)
#our target variable will be alcoholic content.
predX = df2.drop("alcohol", axis=1) # predictors
y = df2["alcohol"]# target

#### 1. scaling the data

In [47]:
#a creating stndardScaler object to scale the data for us
ss= StandardScaler()

In [48]:
#b opply Standardscaler object to our data by using fit() and transform() method
ss.fit(predX)
predX_st_scaled = ss.transform(predX)

In [None]:
#checking whether the manual and the sklearn scaling is the same
np.allclose(predX_st_scaled, X_scaled)

In [None]:
X_scaled.head()

In [None]:
#checking the mean of the target variable
y.mean()

In [None]:
#checking scaled vlue
predX_st_scaled[:5,:]

#### Fit the Model

In [None]:
#fit the model to our training data
lr = LinearRegression()
lr.fit(predX_st_scaled, y)

In [None]:
#checking the coef
#we can use coef_ attribute to regecover the result of regression.
#list of all of data coeficient
# we can check our statmodel coef to compare.
lr.coef_

In [None]:
#getting the intercept
lr.intercept_

In [None]:
#we can get the r squared of our model by using score()
lr.score(predX_st_scaled,y)

In [None]:
#Getting the prodiction of our model
y_hat = lr.predict(predX_st_scaled)
y_hat

In [None]:
y

we can now evaluate our model to see how it perfomed

In [None]:
#checking the predictor number again
predX_st_scaled.shape


In [None]:
#create base predictor/ when my scaled values have average value of zeros in scaled term.
base_pred=np.zeros(12).reshape(1,-1)
base_pred

In [None]:
#getting the intercept value
lr.predict(base_pred)# the prediction will be the intercept if the average scaled value is equal to zero.

### **Model Evaluation**

1. **observing residuals**.

In [None]:
#making predictions
y_hat = lr.predict(predX_st_scaled)
residual= (y-y_hat)

#plot residuals
plt.scatter(x=range(y_hat.shape[0]), y=residual, alpha=0.4);


Observation: the model meets ll the assumption of MLR:
1. Linearity
2. Independence
3. Normality
4. Hetroskedasticity/homoscedascity(Equal Variance)

### **Sklearn Metrics**

we have a couple of metrics in sklearn, these include
1. R^2 score
2. The mean Absolute error (MAE)
3. The mean standard error (MSE)
4. The Root mean standard error (RMSE)

note: by default the metric is R^2 score.


In [None]:
#getting the r2 score
metrics.r2_score(y, lr.predict(predX_st_scaled))

Things to note:

ensure the metric is properly calibrated. if we put simply y_bar as our prediction, then we could get r^2 score of 0. And if we predict , say y_bar +1, then we should get a negative R^2 score as demonstrated below.


In [None]:
# checking with the mean of y
avg_alcohol = np.mean(y)
num = len(y)

metrics.r2_score(y, avg_alcohol*np.ones(num))

In [None]:
#checking with the mean of y plus 1
metrics.r2_score(y, (avg_alcohol+1) *np.ones(num))

In [None]:
#computing MAE
metrics.mean_absolute_error(y, lr.predict(predX_st_scaled))

In [None]:
#computing MSE
metrics.mean_squared_error(y, lr.predict(predX_st_scaled))

In [None]:
#Root scared error
#computing RMSE
metrics.mean_squared_error(y, lr.predict(predX_st_scaled), squared=False),#squared=False

**END**