# Linear Regression using Sci-Kit learn

Sci-kit learn is a machine learning library which contains all the functions and algoritms required for Machine Learning

In [3]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor       # Functions in Scikit learn
from sklearn.preprocessing import StandardScaler    # Functions in Scikit learn

# Gradient Descent
Scikit-learn has a gradient descent regression model [sklearn.linear_model.SGDRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#examples-using-sklearn-linear-model-sgdregressor).  Like your previous implementation of gradient descent, this model performs best with normalized inputs. [sklearn.preprocessing.StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) will perform z-score normalization as in a previous lab. Here it is referred to as 'standard score'.

In [4]:
x_train=np.array([[2104,5,1,45],[1416,3,2,40],[852,2,1,35]])
y_train=np.array([460,232,178])

# Scaling/Normalizing

In [9]:
scaler=StandardScaler()
X_norm=scaler.fit_transform(x_train)
print(f"Peak to peak value by column before normalization X:{np.ptp(x_train,axis=0)}")
print(f"Peak to peak value by column after normalization X:{np.ptp(X_norm,axis=0)}")

Peak to peak value by column before normalization X:[1252    3    1   10]
Peak to peak value by column after normalization X:[2.44549494 2.40535118 2.12132034 2.44948974]


# Create and fit Regression Model

In [11]:
sgdr=SGDRegressor(max_iter=10000)
sgdr.fit(X_norm,y_train)
print(sgdr)
print(f"number of iterations completed: {sgdr.n_iter_} and number of weights updated: {sgdr.t_}")

SGDRegressor(max_iter=10000)
number of iterations completed: 1239 and number of weights updated: 3718.0


# View Parameters

In [13]:
b_norm=sgdr.intercept_
w_norm=sgdr.coef_
print(f"Model Parameters: \n w={w_norm} and b={b_norm}")

Model Parameters: 
 w=[ 38.05284259  41.53878889 -30.94608744  36.34545478] and b=[289.50468912]


# Predict values

In [16]:
# Make a prediction using sgdr.predict()
y_pred_sgdr=sgdr.predict(X_norm)
# Make a prediction using w,b
y_pred=np.dot(X_norm,w_norm)+b_norm

print(f"Prediction using np.dot() and sgdr.predict match: {y_pred==y_pred_sgdr}")

print(f"Prediction on training set: {y_pred}")
print(f"Target values: {y_train}")

Prediction using np.dot() and sgdr.predict match: [ True  True  True]
Prediction on training set: [459.47444694 231.5664008  177.47321963]
Target values: [460 232 178]


Thus we see that sklearn library provides functions to reduce the code complexity and give the desired results