# Developing Stochastic Gradient Descent

In [1]:
from linear_regressor import LinearRegressor
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sgd import sgd

Your Goal: Design and implement a generalized SGD system which takes a `Trainable` model as an input, and trains the model using stochastic gradient descent.

To help you work through this task, we will use a dummy dataset to test the model and your SGD implementation

In [2]:
wine_data = pd.read_csv("./winequality-red.csv")
wine_data

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.700,0.00,1.9,0.076,11.0,34.0,0.99780,3.51,0.56,9.4,5
1,7.8,0.880,0.00,2.6,0.098,25.0,67.0,0.99680,3.20,0.68,9.8,5
2,7.8,0.760,0.04,2.3,0.092,15.0,54.0,0.99700,3.26,0.65,9.8,5
3,11.2,0.280,0.56,1.9,0.075,17.0,60.0,0.99800,3.16,0.58,9.8,6
4,7.4,0.700,0.00,1.9,0.076,11.0,34.0,0.99780,3.51,0.56,9.4,5
...,...,...,...,...,...,...,...,...,...,...,...,...
1594,6.2,0.600,0.08,2.0,0.090,32.0,44.0,0.99490,3.45,0.58,10.5,5
1595,5.9,0.550,0.10,2.2,0.062,39.0,51.0,0.99512,3.52,0.76,11.2,6
1596,6.3,0.510,0.13,2.3,0.076,29.0,40.0,0.99574,3.42,0.75,11.0,6
1597,5.9,0.645,0.12,2.0,0.075,32.0,44.0,0.99547,3.57,0.71,10.2,5


The goal is to predict the `quality` column of the wine, given all the other properties of the wine. To achieve this task, we will use a linear regressor.

In [3]:
model = LinearRegressor(11, 3)
weights, bias = model.parameters()
print(f"Weights: \n{weights}")
print(f"Bias: \n{bias}")

Weights: 
[-0.00116493  0.00036832  0.00096218 -0.00084818  0.00114733  0.00044903
  0.00071257  0.00043261 -0.00071681 -0.00045147 -0.00183577]
Bias: 
[0.00020033]


The `weights` vector forms the coefficients of the linear regression plane of best fit (think slope) and `bias` is the vertical intercept term of the plane of best fit (offset from origin). We intialize these to small random numbers.

## Setup

First lets extract the data from the DataFrame into a numpy array that we can use to train the model. This consists of two steps:
- Extracting the "quality" column as the set of actual values to predict with regression
- Forming a numpy array out of the remainder columns (recalling that the linear regressor is initialized to take inputs vectors in $\mathbb{R}^D$)

In [None]:
ground_truths = wine_data["quality"].to_numpy(dtype=np.float64)
wine_data.drop("quality", axis=1, inplace=True)
inputs = wine_data.to_numpy(dtype=np.float64)

Then we'll split the entire dataset into a training set and testing set
- The training set is used to train the model and learn the coefficients of the plane of best fit
- The testing set is used to evaluate model performance on unseen data. **DO NOT use the testing set to train the model!** Otherwise we wouldn't know whether the plane of best fit is general enough!

In [None]:
X_train, X_test, y_train, y_test = train_test_split(inputs, ground_truths)

The final step is to normalize the input data by feature. We want to normalize data as much as possible to decrease the variation in our inputs.

In [None]:
def normalize(input):
    return (input - np.mean(input, axis=0, keepdims=True)) / np.std(
        input, axis=0, keepdims=True
    )
X_train = normalize(X_train)
X_test = normalize(X_test)

## Implementation

You will implement `sgd()` in the `sgd.py` file. The code below will run SGD on the linear regressor predicting the wine quality dataset and print out the loss. If you implemented SGD right, the model should be able to obtain a fairly low validation loss (I was able to achieve 0.364 error with using mean square error).

**Challenge** (Something fun to try on your own): Try and get an even lower mean square error. See what this does to the predictions however, and why this may not be a great idea

In [None]:
sgd(model, X_train, y_train, batch_size = 128, num_iter=1000, learning_rate = 3e-3)
print(model.loss(X_test, y_test))