## Utilize scikit-learn to implement linear regression

***Import Necessary Libraries***
*  sklearn.preprocessing.StandardScaler will perform z-score normalization
*  sklearn.linear_model.SGDRegressor contains implementation of gradient descent, this model performs best with normalized inputs

In [15]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler

***Load data from text file***
* First 4 columns are feature, 5th column is target

In [16]:
data = np.loadtxt("houses.txt", delimiter = ',')

X_train = data[:, :4]
y_train = data[:, 4]

print("Shape of X_train is: ", X_train.shape)
print("Shape of y_train is: ", y_train.shape)
print("No of training examples, m = ", len(X_train))

Shape of X_train is:  (100, 4)
Shape of y_train is:  (100,)
No of training examples, m =  100


***Scale / Normalizing the training data***
* StandardScaler().fit_transform(X_train) is used to normalize X_train data

In [17]:
scaler = StandardScaler()
X_norm = scaler.fit_transform(X_train)

print(f"Peak to Peak range by column in Raw        X:{np.ptp(X_train,axis=0)}")   
print(f"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}")

Peak to Peak range by column in Raw        X:[2.406e+03 4.000e+00 1.000e+00 9.500e+01]
Peak to Peak range by column in Normalized X:[5.83735704 6.12923357 2.06021411 3.68430905]


***Create and Fit the Regression Model***

In [18]:
sgdr = SGDRegressor(max_iter=1000)
sgdr.fit(X_norm, y_train)

print(sgdr)
print(f"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}")

SGDRegressor()
number of iterations completed: 129, number of weight updates: 12901.0


***View Parameters***
* The parameters are associated with the normalized input data

In [19]:
b_norm = sgdr.intercept_
w_norm = sgdr.coef_
print(f"model parameters:                   w: {w_norm}, b:{b_norm}")

model parameters:                   w: [110.22503612 -21.3047505  -32.46757047 -37.82735321], b:[362.23592071]


***Make predictions***

In [24]:
# make a prediction using sgdr.predict()
y_pred_sgd = sgdr.predict(X_norm)

# first 4 training examples
print(f"Prediction on training set:\n{y_pred_sgd[:4]}" )
print(f"Target values \n{y_train[:4]}")

Prediction on training set:
[248.70377846 295.6132571  485.67271642 389.62881957]
Target values 
[271.5 300.  509.8 394. ]
