# Homework 2

Implement early stopping by halting the training when the difference in consecutive loss values is below 0.001 for five consecutive epochs.

During the training phase, the object `loss` already contains the loss for the current and all previous iterations. So every iteration of the loop, you will want to check whether the difference between the current loss and the loss in the previous step is smaller than 0.001. Then you need to find a way of tracking whether this happens on 5 consecutive steps, and if so, use `break` to halt the loop.

At what iteration does your training loop with early stopping stop? How close are the betas to what they would be with the full 5000 iterations?

The homework is due on Thursday 9/29, at 11:59pm.


In [2]:
# Don't change anything in this cell

!gdown https://drive.google.com/uc?id=1Czz6iLAfIS58b5kAhKjmVzZJDuPnDgTF

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import statsmodels.api as sm
import statsmodels.formula.api as smf

from sklearn.linear_model import LinearRegression

data = pd.read_csv('data.csv', header = None, names = ['X','y'])

X = data.X.values
y = data.y.values

# Regression model
def regress(X, beta):
    f = beta[0] + beta[1]*X
    return f

# Mean squared error loss
def computeLoss(X, y, beta): 
    # number of samples
    m = X.shape[0]
    # sum of squared errors
    sqe = np.sum((regress(X, beta)-y)**2)
    # mean squared error
    msqe = sqe/(2*m)
    return msqe

def computeGrad(X, y, beta):
    m = X.shape[0]
    # derivative of the loss w.r.t. model bias b, i.e. beta 0
    dL_db = (np.sum(regress(X, beta)-y))/m 
    # derivative of the loss w.r.t model weights w, i.e. beta 1
    dL_dw = (np.sum((regress(X, beta)-y)*X))/m
    # full gradient
    gradient = (dL_db, dL_dw) 
    return gradient

Downloading...
From: https://drive.google.com/uc?id=1Czz6iLAfIS58b5kAhKjmVzZJDuPnDgTF
To: /content/data.csv
  0% 0.00/1.46k [00:00<?, ?B/s]100% 1.46k/1.46k [00:00<00:00, 1.43MB/s]


In [32]:
# Modify this cell to implement early stopping

# Convert X and y from the data frame to numpy arrays again
# (just in case we overwrote them somewhere)
X = data.X.values
y = data.y.values

# Initalize bias at 0 and weights at 1
b = np.array([0])
w = np.array([1])
beta = (b, w)

# Training loop
L = computeLoss(X, y, beta)
print("-1 L = {0}".format(L))
alpha = 0.01 # step size coefficient
n_epoch = 5000 # number of epochs (full passes through the dataset)
L_best = L
loss = L.copy()
beta0s = b.copy()
beta1s = w.copy()

Stopper = 0
Losslast = 0
for i in range(n_epoch):
    


    dL_db, dL_dw = computeGrad(X, y, beta)
    b = beta[0]
    w = beta[1]
    # update rules
    newbeta0 = b-alpha*dL_db
    newbeta1 = w-alpha*dL_dw
    # override the beta
    beta = (newbeta0, newbeta1)
    # track our loss after performing a single step
    L = computeLoss(X, y, beta) 
    loss = np.append(loss, L)
    beta0s = np.append(beta0s, beta[0])
    beta1s = np.append(beta1s, beta[1])
    if abs(L-Losslast)<0.001:
      Stopper +=1
    else:
      Stopper = 0
    if Stopper == 5:
      print("\nIn the case of 5,000: Beta 0:-3.89522094 Beta 1: 1.19297739","\n\nStopped at iteration",i,)

      break

    Losslast = L

    # Print information about the training progress every 100 epochs
    if i % 100 == 0:
      print("\nEpoch = ", i)
      print("Loss = ", L)
      print("weight (b1) = ", w)
      print("bias (b0) = ", b )
      print("beta0 =", newbeta0)
      print("beta1 =", newbeta1)



-1 L = 7.445855542929897

Epoch =  0
Loss =  5.8903978572788285
weight (b1) =  [1]
bias (b0) =  [0]
beta0 = [-0.02320665]
beta1 = [0.83924907]

Epoch =  100
Loss =  5.426983895839554
weight (b1) =  [0.86792364]
bias (b0) =  [-0.6595947]
beta0 = [-0.66542824]
beta1 = [0.86850968]

Epoch =  200
Loss =  5.139213127515016
weight (b1) =  [0.92159358]
bias (b0) =  [-1.19383209]
beta0 = [-1.19870262]
beta1 = [0.92208288]

Epoch =  300
Loss =  4.938611752614805
weight (b1) =  [0.96640356]
bias (b0) =  [-1.63987629]
beta0 = [-1.64394278]
beta1 = [0.96681208]

Epoch =  400
Loss =  4.798775062994989
weight (b1) =  [1.0038162]
bias (b0) =  [-2.01228644]
beta0 = [-2.01568162]
beta1 = [1.00415728]

In the case of 5,000: Beta 0:-3.89522094 Beta 1: 1.19297739 

Stopped at iteration 446
