# Exciting Stuff

In this notebook you will solve a problem that was posed as follows for CS189 Spring 2017 (with minor modifications): 

"Jordan is planning the frat party of the semester. He’s completely stocked up on Franzia. Unfortunately, the
labels for 497 boxes (test set) have been scratched off, and he needs to quickly find out which boxes contain
Red wine (label 1) and White wine (label 0). Fortunately, for him the boxes still have their Nutrition Facts
(features) intact and detail the chemical composition of the wine inside the boxes (the description of these
features and the features themselves are provided in data.mat). He also has 6,000 boxes with Nutrition
Facts and labels intact (train set). Help Jordan figure out what the labels should be for the 497 mystery boxes."

Dataset creds: Jonathan Shewchuk's CS189 Spring 2017

In [2]:
from scipy.io import loadmat as loadmat
import numpy as np
import matplotlib.pyplot as plt

## Important Functions
Fill these in so that we can perform training. 

In [1]:
def sigmoid(X, w):
    """
    Compute the elementwise sigmoid of the product Xw
    Data in X should be rows, weights are a column. 
    returns: s(Xw)
    """
    return 1 / (1 + np.exp(-np.dot(X, w)))

def gradient(X, y, w, onept, lamb=0):
    """
    Compute gradient of regularized loss function. 
    Accomodate for if X is just one data point. 
    returns: gradient (should match dimensions of w)
    """
    if onept:
        return 2 * lamb * w - ((y - sigmoid(X, w)) * X).reshape(w.size, 1)
    return 2 * lamb * w - np.dot(X.T, y - sigmoid(X, w)) / y.size

def loss(X, y, w, lamb=0):
    """
    Compute average loss for the data in X, labels in y, params w
    returns: scalar value of average loss
    """
    sumcost = 0
    for i in range(X.shape[0]):
        sumcost += y[i] * np.log(sigmoid(X[i], w)) + (1 - y[i]) * np.log(1 - sigmoid(X[i], w))
    return lamb * np.linalg.norm(w) ** 2 - sumcost / y.size
    
def accuracy(X, y, w):
    """
    Compute accuracy for data in X, labels in y, params w
    returns: scalar value of average accuracy
    """
    results = np.round(sigmoid(X, w))
    score = sum([results[i] == y[i] for i in range(y.size)]) / y.size
    return score[0]

## Load in the Data
This procedure uses loading a .mat file. The returned object is a dictionary that has numpy arrays as values. 

In [3]:
winedata = loadmat('./data.mat')
winedata.keys()

dict_keys(['description', 'X', '__header__', 'X_test', '__version__', '__globals__', 'y'])

## Preprocessing Data
Let's add a bias feature to improve the capacity of our model. 



In [6]:
wineTrain = winedata['X'] 
wineLabels = winedata['y']

wineTrain = np.concatenate([wineTrain, np.ones((wineTrain.shape[0], 1))], axis=1) # make sure you understand this line
wineTrain.shape 

weights = np.random.rand(wineTrain.shape[1])

## Batch Gradient Descent
- Create an empty list of loss values which we will fill and visualize.
- Perform training using the entire dataset for gradient calculations

In [None]:
# I know it was quick in the lecture, but training is basically just a loop. 
# Loop over something, change the weights in every iteration of the loop. 
# Batch gradient descent means using *all* of the data points for every gradient calculation. 
# Pick an epsilon as a step rate (don't make it too big)


## Visualize The Loss 
- Plot loss values with respect to every training step. 
- How can you explain the shape that this graph takes? 

In [None]:
# Use matplotlib 
# just dots will be fine

## Repeat Training Exercise with SGD
- Training with respect to only one point instead of the whole dataset. 
- Plot the losses. Why does the graph take the shape that it does?

In [None]:
# What's the difference between batch and SGD? 