# Optional Lab: Linear Regression using Scikit-Learn

There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.



## Goals
In this lab you will:
- Utilize  scikit-learn to implement linear regression using Gradient Descent

## Tools
You will utilize functions from scikit-learn as well as matplotlib and NumPy. 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
'''
from lab_utils_multi import  load_house_data
from lab_utils_common import dlc
'''
np.set_printoptions(precision=2)
plt.style.use('./deeplearning.mplstyle')

# Gradient Descent
Scikit-learn has a gradient descent regression model [sklearn.linear_model.SGDRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#examples-using-sklearn-linear-model-sgdregressor).  Like your previous implementation of gradient descent, this model performs best with normalized inputs. [sklearn.preprocessing.StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) will perform z-score normalization as in a previous lab. Here it is referred to as 'standard score'.

### Load the data set

In [None]:
X_train = np.array([[1.24e+03, 3.00e+00, 1.00e+00, 6.40e+01],
                    [1.95e+03, 3.00e+00, 2.00e+00, 1.70e+01],
                    [1.72e+03, 3.00e+00, 2.00e+00, 4.20e+01],
                    [1.96e+03, 3.00e+00, 2.00e+00, 1.50e+01],
                    [1.31e+03, 2.00e+00, 1.00e+00, 1.40e+01],
                    [8.64e+02, 2.00e+00, 1.00e+00, 6.60e+01],
                    [1.84e+03, 3.00e+00, 1.00e+00, 1.70e+01],
                    [1.03e+03, 3.00e+00, 1.00e+00, 4.30e+01],
                    [3.19e+03, 4.00e+00, 2.00e+00, 8.70e+01],
                    [7.88e+02, 2.00e+00, 1.00e+00, 8.00e+01],
                    [1.20e+03, 2.00e+00, 2.00e+00, 1.70e+01],
                    [1.56e+03, 2.00e+00, 1.00e+00, 1.80e+01],
                    [1.43e+03, 3.00e+00, 1.00e+00, 2.00e+01],
                    [1.22e+03, 2.00e+00, 1.00e+00, 1.50e+01],
                    [1.09e+03, 2.00e+00, 1.00e+00, 6.40e+01],
                    [8.48e+02, 1.00e+00, 1.00e+00, 1.70e+01],
                    [1.68e+03, 3.00e+00, 2.00e+00, 2.30e+01],
                    [1.77e+03, 3.00e+00, 2.00e+00, 1.80e+01],
                    [1.04e+03, 3.00e+00, 1.00e+00, 4.40e+01],
                    [1.65e+03, 2.00e+00, 1.00e+00, 2.10e+01],
                    [1.09e+03, 2.00e+00, 1.00e+00, 3.50e+01],
                    [1.32e+03, 3.00e+00, 1.00e+00, 1.40e+01],
                    [1.59e+03, 0.00e+00, 1.00e+00, 2.00e+01],
                    [9.72e+02, 2.00e+00, 1.00e+00, 7.30e+01],
                    [1.10e+03, 3.00e+00, 1.00e+00, 3.70e+01],
                    [1.00e+03, 2.00e+00, 1.00e+00, 5.10e+01],
                    [9.04e+02, 3.00e+00, 1.00e+00, 5.50e+01],
                    [1.69e+03, 3.00e+00, 1.00e+00, 1.30e+01],
                    [1.07e+03, 2.00e+00, 1.00e+00, 1.00e+02],
                    [1.42e+03, 3.00e+00, 2.00e+00, 1.90e+01],
                    [1.16e+03, 3.00e+00, 1.00e+00, 5.20e+01],
                    [1.94e+03, 3.00e+00, 2.00e+00, 1.20e+01],
                    [1.22e+03, 2.00e+00, 2.00e+00, 7.40e+01],
                    [2.48e+03, 4.00e+00, 2.00e+00, 1.60e+01],
                    [1.20e+03, 2.00e+00, 1.00e+00, 1.80e+01],
                    [1.84e+03, 3.00e+00, 2.00e+00, 2.00e+01],
                    [1.85e+03, 3.00e+00, 2.00e+00, 5.70e+01],
                    [1.66e+03, 3.00e+00, 2.00e+00, 1.90e+01],
                    [1.10e+03, 2.00e+00, 2.00e+00, 9.70e+01],
                    [1.78e+03, 3.00e+00, 2.00e+00, 2.80e+01],
                    [2.03e+03, 4.00e+00, 2.00e+00, 4.50e+01],
                    [1.78e+03, 4.00e+00, 2.00e+00, 1.07e+02],
                    [1.07e+03, 2.00e+00, 1.00e+00, 1.00e+02],
                    [1.55e+03, 3.00e+00, 1.00e+00, 1.60e+01],
                    [1.95e+03, 3.00e+00, 2.00e+00, 1.60e+01],
                    [1.22e+03, 2.00e+00, 2.00e+00, 1.20e+01],
                    [1.62e+03, 3.00e+00, 1.00e+00, 1.60e+01],
                    [8.16e+02, 2.00e+00, 1.00e+00, 5.80e+01],
                    [1.35e+03, 3.00e+00, 1.00e+00, 2.10e+01],
                    [1.57e+03, 3.00e+00, 1.00e+00, 1.40e+01],
                    [1.49e+03, 3.00e+00, 1.00e+00, 5.70e+01],
                    [1.51e+03, 2.00e+00, 1.00e+00, 1.60e+01],
                    [1.10e+03, 3.00e+00, 1.00e+00, 2.70e+01],
                    [1.76e+03, 3.00e+00, 2.00e+00, 2.40e+01],
                    [1.21e+03, 2.00e+00, 1.00e+00, 1.40e+01],
                    [1.47e+03, 3.00e+00, 2.00e+00, 2.40e+01],
                    [1.77e+03, 3.00e+00, 2.00e+00, 8.40e+01],
                    [1.65e+03, 3.00e+00, 1.00e+00, 1.90e+01],
                    [1.03e+03, 3.00e+00, 1.00e+00, 6.00e+01],
                    [1.12e+03, 2.00e+00, 2.00e+00, 1.60e+01],
                    [1.15e+03, 3.00e+00, 1.00e+00, 6.20e+01],
                    [8.16e+02, 2.00e+00, 1.00e+00, 3.90e+01],
                    [1.04e+03, 3.00e+00, 1.00e+00, 2.50e+01],
                    [1.39e+03, 3.00e+00, 1.00e+00, 6.40e+01],
                    [1.60e+03, 3.00e+00, 2.00e+00, 2.90e+01],
                    [1.22e+03, 3.00e+00, 1.00e+00, 6.30e+01],
                    [1.07e+03, 2.00e+00, 1.00e+00, 1.00e+02],
                    [2.60e+03, 4.00e+00, 2.00e+00, 2.20e+01],
                    [1.43e+03, 3.00e+00, 1.00e+00, 5.90e+01],
                    [2.09e+03, 3.00e+00, 2.00e+00, 2.60e+01],
                    [1.79e+03, 4.00e+00, 2.00e+00, 4.90e+01],
                    [1.48e+03, 3.00e+00, 2.00e+00, 1.60e+01],
                    [1.04e+03, 3.00e+00, 1.00e+00, 2.50e+01],
                    [1.43e+03, 3.00e+00, 1.00e+00, 2.20e+01],
                    [1.16e+03, 3.00e+00, 1.00e+00, 5.30e+01],
                    [1.55e+03, 3.00e+00, 2.00e+00, 1.20e+01],
                    [1.98e+03, 3.00e+00, 2.00e+00, 2.20e+01],
                    [1.06e+03, 3.00e+00, 1.00e+00, 5.30e+01],
                    [1.18e+03, 2.00e+00, 1.00e+00, 9.90e+01],
                    [1.36e+03, 2.00e+00, 1.00e+00, 1.70e+01],
                    [9.60e+02, 3.00e+00, 1.00e+00, 5.10e+01],
                    [1.46e+03, 3.00e+00, 2.00e+00, 1.60e+01],
                    [1.45e+03, 3.00e+00, 2.00e+00, 2.50e+01],
                    [1.21e+03, 2.00e+00, 1.00e+00, 1.50e+01],
                    [1.55e+03, 3.00e+00, 2.00e+00, 1.60e+01],
                    [8.82e+02, 3.00e+00, 1.00e+00, 4.90e+01],
                    [2.03e+03, 4.00e+00, 2.00e+00, 4.50e+01],
                    [1.04e+03, 3.00e+00, 1.00e+00, 6.20e+01],
                    [1.62e+03, 3.00e+00, 1.00e+00, 1.60e+01],
                    [8.03e+02, 2.00e+00, 1.00e+00, 8.00e+01],
                    [1.43e+03, 3.00e+00, 2.00e+00, 2.10e+01],
                    [1.66e+03, 3.00e+00, 1.00e+00, 6.10e+01],
                    [1.54e+03, 3.00e+00, 1.00e+00, 1.60e+01],
                    [9.48e+02, 3.00e+00, 1.00e+00, 5.30e+01],
                    [1.22e+03, 2.00e+00, 2.00e+00, 1.20e+01],
                    [1.43e+03, 2.00e+00, 1.00e+00, 4.30e+01],
                    [1.66e+03, 3.00e+00, 2.00e+00, 1.90e+01],
                    [1.21e+03, 3.00e+00, 1.00e+00, 2.00e+01],
                    [1.05e+03, 2.00e+00, 1.00e+00, 6.50e+01]])

y_train = np.array([300., 509.8, 394., 540., 415., 230., 560., 294., 718.2, 200.,
                    302., 468., 374.2, 388., 282., 311.8, 401., 449.8, 301., 502.,
                    340., 400.28, 572., 264., 304., 298., 219.8, 490.7, 216.96, 368.2,
                    280., 526.87, 237., 562.43, 369.8, 460., 374., 390., 158., 426.,
                    390., 277.77, 216.96, 425.8, 504., 329., 464., 220., 358., 478.,
                    334., 426.98, 290., 463., 390.8, 354., 350., 460., 237., 288.3,
                    282., 249., 304., 332., 351.8, 310., 216.96, 666.34, 330., 480.,
                    330.3, 348., 304., 384., 316., 430.4, 450., 284., 275., 414.,
                    258., 378., 350., 412., 373., 225., 390., 267.4, 464., 174.,
                    340., 430., 440., 216., 329., 388., 390., 356., 257.8])

X_features = ['size(sqft)','bedrooms','floors','age']

### Scale/normalize the training data

In [None]:
scaler = StandardScaler()
X_norm = scaler.fit_transform(X_train)
print(f"Peak to Peak range by column in Raw        X:{np.ptp(X_train,axis=0)}")   
print(f"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}")

### Create and fit the regression model

In [None]:
sgdr = SGDRegressor(max_iter=1000)
sgdr.fit(X_norm, y_train)
print(sgdr)
print(f"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}")

### View parameters
Note, the parameters are associated with the *normalized* input data. The fit parameters are very close to those found in the previous lab with this data.

In [None]:
b_norm = sgdr.intercept_
w_norm = sgdr.coef_
print(f"model parameters:                   w: {w_norm}, b:{b_norm}")
print( "model parameters from previous lab: w: [110.56 -21.27 -32.71 -37.97], b: 363.16")

### Make predictions
Predict the targets of the training data. Use both the `predict` routine and compute using $w$ and $b$.

In [None]:
# make a prediction using sgdr.predict()
y_pred_sgd = sgdr.predict(X_norm)
# make a prediction using w,b. 
y_pred = np.dot(X_norm, w_norm) + b_norm  
print(f"prediction using np.dot() and sgdr.predict match: {(y_pred == y_pred_sgd).all()}")

print(f"Prediction on training set:\n{y_pred[:4]}" )
print(f"Target values \n{y_train[:4]}")

### Plot Results
Let's plot the predictions versus the target values.

In [None]:
# plot predictions and targets vs original features    
fig,ax=plt.subplots(1,4,figsize=(12,3),sharey=True)
for i in range(len(ax)):
    ax[i].scatter(X_train[:,i],y_train, label = 'target')
    ax[i].set_xlabel(X_features[i])
    ax[i].scatter(X_train[:,i],y_pred,color=dlc["dlorange"], label = 'predict')
ax[0].set_ylabel("Price"); ax[0].legend();
fig.suptitle("target versus prediction using z-score normalized model")
plt.show()

## Congratulations!
In this lab you:
- utilized an open-source machine learning toolkit, scikit-learn
- implemented linear regression using gradient descent and feature normalization from that toolkit