# Assignment 5: Model Evaluation and Regularization


In this assignment, we will investigate two evaluation methods and two ways that regularization can be used to control the behavior of linear models. Most of the code here will be copied or refactored from previous assignments. You are encouraged to copy **your code only** from previous assignments.

In [2]:
require './assignment_lib'

false

# Question 1.1 (1 Point)

Copy **YOUR** implementation of ```StochasticGradientDescent``` from [Assignment 4](../assignment-4/assignment-4.ipynb) into the following cell.

In [3]:
# BEGIN YOUR CODE
# raise NotImplementedError.new()
#END YOUR CODE

class StochasticGradientDescent
  attr_reader :weights
  attr_reader :objective
  def initialize obj, w_0, lr = 0.01
    @objective = obj
    @weights = w_0
    @n = 1.0
    @lr = lr
  end
  def update x
    # BEGIN YOUR CODE
    curr_lr = @lr / Math.sqrt(@n)
    @objective.func(x, @weights)
    grad = @objective.grad(x, @weights)
    @weights = update_weights(@weights, grad, curr_lr)
    @n += 1
    #END YOUR CODE
  end
  
  def update_weights(w, dw, lr)
    w_copy = w.clone()
    dw_copy = dw.clone()
  
    dw_copy.each do |k, v|
      dw_copy[k] *= lr
    end
  
    w_copy.each do |k, v|
      if dw_copy.key?(k)
        w_copy[k] -= dw_copy[k]
      end
    end
    w_copy
  end
  
end

:update_weights

In [4]:
### Hidden Test (See Test 1.1 from Assignment 5) ###
assert_not_nil(StochasticGradientDescent.class)

## Question 1.2 (1 point)

Copy **YOUR** implementation of the ```dot``` product and ```norm``` functions from [Assignment 3](../assignment-3/assignment-3.ipynb) into the following cell. Please copy the whole function, not just the parts within the comments.

In [5]:
# BEGIN YOUR CODE
# raise NotImplementedError.new()
#END YOUR CODE

#Implement the error function given a weight vector, w
def dot x, w
  
  res = 0.0

  x.each do |k, v|
    if w[k] != nil
      res += v * w[k]
    end
  end
  return res  
end

def norm w
  return Math.sqrt(dot(w,w))
end

:norm

In [6]:
def test_12()
  assert_in_delta 2.0, norm({"a" => 1.41421, "b" => 1.41421}), 1e-2
  assert_in_delta 2.0, norm({"a" => -1.41421, "b" => 1.41421}), 1e-2
  assert_in_delta 0.0, norm({}), 1e-2

  assert_in_delta 6.0, dot({"a" => 2.0}, {"a" => 3.0}), 1e-6
  assert_in_delta 6.0, dot({"a" => 2.0}, {"a" => 3.0, "b" => 4.0}), 1e-6
  assert_equal 0.0, dot({}, {})
  assert_equal 0.0, dot({"a" => 1.0}, {"b" => 1.0})
end

test_12()

## Question 1.3 (1 Point)

Refactor **YOUR** $z$-score normalization method from [Assignment 4](../assignment-4/assignment-4.ipynb), where we called it ```create_zspambase```. It should be general enough to normalize any dataset. Only normalize features in the ```features``` key.

Note: Watch out for zero-stdev features.

In [7]:
def hashMean data
  res = Hash.new
  count = Hash.new
  data.each do |sample|
    sample["features"].each do |key, value|
      if res[key] == nil
        res[key] = 0.0
      end
      if count[key] == nil
        count[key] = 0
      end
      res[key] += value
      count[key] += 1
    end
  end
  res.each do |key, value|
    res[key] = value / count[key]
  end
  return res
end

def hashStd data, mean
  res = Hash.new
  count = Hash.new
  data.each do |sample|
    sample["features"].each do |key, value|
      if res[key] == nil
        res[key] = 0.0
      end
      if count[key] == nil
        count[key] = 0
      end
      res[key] += (value - mean[key])**2
      count[key] += 1
    end
  end
  res.each do |key, value|
    if count[key] > 1
      res[key] = Math.sqrt(value / (count[key] - 1))
    else
      res[key] = Math.sqrt(value)
    end
  end
  return res
end

def z_normalize dataset
  zdataset = dataset.clone
  zdataset["data"] = dataset["data"].collect do |r|
    u = r.clone
    u["features"] = r["features"].clone
    u
  end

  # BEGIN YOUR CODE
  mean = hashMean(zdataset["data"])
  std = hashStd(zdataset["data"], mean)
  zdataset["data"].each do |sample|
    sample["features"].each do |key, value|
      if std[key] > 0.0
        sample["features"][key] = (value - mean[key]) / std[key]
      else
        sample["features"][key] = 0.0
      end
    end
  end
  #END YOUR CODE
  return zdataset
end

:z_normalize

In [8]:
### TEST ###
def test_13()
  spambase = read_sparse_data_from_csv "spambase"
  zspambase = z_normalize spambase

  assert_in_delta 0.27, spambase["data"].first["features"]["word_freq_our"], 1e-5
  assert_in_delta -0.628106690674003, zspambase["data"].first["features"]["word_freq_our"], 1e-5

  assert_in_delta 607.0, spambase["data"].first["features"]["capital_run_length_total"], 1e-5
  assert_in_delta 0.53386, zspambase["data"].first["features"]["capital_run_length_total"], 1e-5
end

test_13()

## Question 2.1 (10 Points)

Change your ```LinearRegression``` implementation from  [Assignment 4](../assignment-4/assignment-4.ipynb) to implement regularization. The new implementation requires a value for $\lambda$. The regularization objective function for linear regression in a mini-batch is as follows:

# $L(w,X) = \frac{\lambda}{2} \left\lVert w \right\rVert ^ 2 + \frac{1}{n} \sum_{i} \frac{1}{2} \left(f(w,x_i) - y_i\right) ^ 2$

where ```reg_param``` corresponds to $\lambda$ in the formula above.

Note that there is no $\frac{1}{n}$ in front of the regularizer penalty. The ```predict``` and ```adjust``` methods have been provided for you. 

Hint: Use ```dot``` and ```norm``` as needed.

In [9]:
class LinearRegressionModelL2
  def initialize reg_param
    @reg_param = reg_param
  end

  def predict row, w
    x = row["features"]    
    yhat = dot(w, x)
  end
  
  def adjust w
    w.each_key {|k| w[k] = 0.0 if w[k].nan? or w[k].infinite?}
    w.each_key {|k| w[k] = 0.0 if w[k].abs > 1e5 }
  end
  
  def func data, w
    # BEGIN YOUR CODE
    res = 0.0
    data.each do |record|
      predict_value = predict record, w
      loss_value = (predict_value - record["label"]) ** 2 / (2 * data.length)
      res += loss_value
    end
    res += @reg_param * norm(w) ** 2 / 2
    #END YOUR CODE
  end
end

:func

In [10]:
### TEST ###
def test_21()
  m = LinearRegressionModelL2.new 0.0
  w = {"x1" => 1.0, "x2" => -3.0}
  x = [
      {"features" => {"x1" => 0.7, "x2" => -0.3}, "label" => 0.97},
      {"features" => {"x1" => -2.7, "x2" => -1.3}, "label" => -1.0}  
  ]
  
  e1 = 0.19845
  assert_in_delta e1, m.func(x[0,1], w), 1e-2, "1"
  
  e2 = 2.42
  assert_in_delta e2, m.func(x[1,1], w), 1e-2, "2"    
  assert_in_delta (e1 + e2) / 2.0, m.func(x, w), 1e-2, "3"  

  assert_in_delta 3.1622776602, norm(w), 1e-2, "4"
  m2 = LinearRegressionModelL2.new 1.7
  assert_in_delta 9.809225, m2.func(x, w), 1e-2, "5"
end

test_21()

## Question 2.2 (10 Points)

Implement the gradient for the regularized linear regression using the above objective function.

In [11]:
class LinearRegressionModelL2
  def grad data, w
    # BEGIN YOUR CODE
    grad_res = Hash.new
    
    data.each do |record|
      record["features"].each do |key, value|
        if(!grad_res.key?(key))
          grad_res[key] = 0.0
        end
        grad_res[key] += value * ((predict record, w) - record["label"]) / data.length
      end
    end
      
    grad_res.each do |key, value|
      grad_res[key] += @reg_param * w[key]
    end
    #END YOUR CODE
    grad_res
  end
end

:grad

In [12]:
### TEST ###
def test_22()
  m = LinearRegressionModelL2.new 0.0
  w = {"x1" => 1.0, "x2" => -3.0}
  x = [
      {"features" => {"x1" => 0.7, "x2" => -0.3}, "label" => 0.97},
      {"features" => {"x1" => -2.7, "x2" => -1.3}, "label" => -1.0}  
  ]
  
  g1_1 = 0.441
  assert_in_delta g1_1, m.grad(x[0,1], w)["x1"], 1e-2, "1"
  
  g2_1 = -5.94
  assert_in_delta g2_1, m.grad(x[1,1], w)["x1"], 1e-2, "2"    
  assert_in_delta (g1_1 + g2_1) / 2.0, m.grad(x, w)["x1"], 1e-2, "3"  

  m2 = LinearRegressionModelL2.new 1.7
  assert_in_delta -1.0495, m2.grad(x, w)["x1"], 1e-2, "5"
end

test_22()

## Question 2.3 (10 Points)

Implement a function that calculates the Root Mean Squared Error (RMSE) for a given dataset for a prediction model and weights.

RMSE is defined as follows:

# $e = \sqrt{\frac{\sum_{i=1}^N{ \left( \hat{y} - y \right) ^ 2 }}{N}}$

where $N$ is the number of examples in the dataset.

Hint: Use the ```mean``` function in the assignment library.

In [13]:
def score_regression_model_rmse(data, weights, model)
  # BEGIN YOUR CODE
  res = 0.0
  
  
  record_num = data.length
  data.each do |record|
    y_predict = model.predict(record, weights)
    y_truth = record["label"]
    
    res += (y_predict - y_truth) ** 2 / record_num
  end
  
  e = Math.sqrt(res)
  e
  
  # END YOUR CODE
end

:score_regression_model_rmse

In [14]:
### TEST ###
def test_23()
  m = LinearRegressionModelL2.new 0.0
  w = {"x1" => 1.0, "x2" => -3.0}
  x = [
      {"features" => {"x1" => 0.7, "x2" => -0.3}, "label" => 0.97},
      {"features" => {"x1" => -2.7, "x2" => -1.3}, "label" => -1.0}  
  ]
  
  e1 = (((0.7 + -3 * -0.3) - 0.97) ** 2)
  e2 = (((-2.7 + -3.0 * -1.3) - -1.0) ** 2)
  
  rmse = Math.sqrt((e1 + e2) / 2.0)
  assert_in_delta rmse, score_regression_model_rmse(x, w, m), 1e-2, "3"  
end

test_23()

## Question 3.1 (10 Points)

Using a small provided dataset, shown below, we will investigate model complexity. First, implement a _polynomial_ feature representation. 

Call the bias feature "1" for this part. For a dataset with two features, $x_1$ and $x_2$, a polynomial representation of degree 0 is as follows:

# $\phi(x, k = 0) = \left( 1 \right)$

degree 1:

# $\phi(x, k = 1) = \left( 1, x_1, x_2 \right)$

degree 2: 

# $\phi(x, k = 2) = \left( 1, x_1, x_2, x_1^2, x_2^2, x_1 x_2  \right)$

and more generally, for degree $k$:

# $\phi(x, k) = \left(1, x_1 \phi(x,k-1), x_2 \phi(x,k-1) \right)$

For your convenience, the function ```poly_features``` emits, for degree $k$, the names of the features to be multiplied. After generating the features, apply ```z_normalize``` to only the newly added features (i.e., not the original features or the bias).

Note: You may notice that the dataset we plan to use only has one feature and therefore the above seems overly complex. Don't worry, we will see this again. ;)

In [15]:
polydata = read_sparse_data_from_csv "polydata"
x1 = polydata["data"].collect {|r| r["features"]["x1"]}
x2 = polydata["data"].collect {|r| r["label"]}
puts "Polydata Regression Dataset"
Daru::DataFrame.new({x1: x1, x2: x2})
.plot(type: :scatter, x: :x1, y: :x2) do |plot, diagram|
  plot.x_label "X1"
  plot.y_label "Label"
  plot.legend false
end

Polydata Regression Dataset


In [16]:
def poly_features features, degree
  poly_features = ["1"]

  degree.times do |i|
    poly_features += poly_features.flat_map do |x_prev|
      features.reject {|x| x == "1" or x == "bias"}.collect do |x|
        [x, x_prev.split("*")].flatten.sort.join("*")
      end
    end
    poly_features.uniq!
  end
  poly_features.collect {|k| k.gsub /^1\*([^\*]*)$/, '\1'}
end

poly_features ["x1", "x2"], 3

["1", "x1", "x2", "1*x1*x1", "1*x1*x2", "1*x2*x2", "1*x1*x1*x1", "1*x1*x1*x2", "1*x1*x2*x2", "1*x2*x2*x2"]

In [17]:
def assign_formula poly_feature, feature_val 
  single_feature = poly_feature.split("\*")
  curr = 1
  single_feature.each do |single|
    if(feature_val != nil && feature_val.key?(single))
      curr *= feature_val[single]
    end
  end
  curr
end

def create_polynomial_features dataset, degree
  degree += 1
  polydataset = dataset.clone
  features = poly_features dataset["features"], degree
  polydataset["data"] = dataset["data"].collect do |r|
    u = r.clone
    u["features"] = r["features"].clone
    u
  end
  
  # BEGIN YOUR CODE
  dataset["data"].each_with_index do |record, index|
    curr_record = Hash.new
    curr_record["features"] = Hash.new
    features_clone = record["features"].clone
    features.each do |single_feature|
      curr_record["features"][single_feature] = assign_formula(single_feature, features_clone)
    end
    curr_record["label"] = record["label"].clone
    polydataset["data"][index] = curr_record
  end
  
  polydataset = z_normalize(polydataset)
  polydataset["data"].each_with_index do |record, index|
    ori_record = dataset["data"][index]
    curr_record = polydataset["data"][index]
    ori_record["features"].each do |key, value|
      if(curr_record["features"].key?(key))
        curr_record["features"][key] = value.clone
      end
    end
  end
#   puts polydataset["data"].first["features"]
  return polydataset
end

:create_polynomial_features

In [18]:
def test_31()
  data = read_sparse_data_from_csv "polydata"
  assert_in_delta 12.8132, data["data"].first["features"]["x1"], 1e-2, "1"
  
  polydata = create_polynomial_features data, 3
  
  xp = polydata["data"].first["features"]
  assert_in_delta 12.8132, xp["x1"], 1e-2, "2: Does not normalize original features"
  assert_in_delta -0.905, xp["1*x1*x1"], 1e-2, "3: Applies normalization to new features"
  assert_in_delta -0.827, xp["1*x1*x1*x1"], 1e-2, "4: Applies normalization to new features"
end

test_31()

## Question 3.2 (5 Points)

Let's fit this dataset with different polynomial degrees. First, let's see how well linear regression fits the training data. 

Implement a training function that, given a training and testing dataset, trains the model using mini-batch SGD and returns the RMSE error value on both training and testing sets.

In [19]:
def train(sgd, obj, w, train_set, test_set, num_epoch = 100, batch_size = 20)
  # BEGIN YOUR CODE

  num_iter = train_set["data"].length / batch_size
  total_iter = num_iter * num_epoch
  
  total_iter.times do |i|
    data = train_set["data"].sample(batch_size)
    sgd.update(data)
  end
  
  train_rmse = score_regression_model_rmse(train_set["data"], sgd.weights, obj)
  test_rmse = score_regression_model_rmse(train_set["data"], sgd.weights, obj)
  #END YOUR CODE
  return [train_rmse, test_rmse]
end

:train

In [20]:
def test_32()
  data = read_sparse_data_from_csv "polydata"
  polydata = create_polynomial_features data, 1
  x1 = polydata["data"].collect {|r| r["features"]["x1"]}
  x2 = polydata["data"].collect {|r| r["label"]}
  
  w = Hash.new {|h,k| h[k] = 0.0}
  lr = 1e-3
  obj = LinearRegressionModelL2.new 0.0
  sgd = StochasticGradientDescent.new obj, w, lr

  train_set = polydata
  test_set = polydata
  train_rmse, test_rmse = train(sgd, obj, w, train_set, test_set, num_epoch = 100, batch_size = 20)
  assert_true train_rmse < 2, "1"
  assert_true test_rmse < 2, "2"
  assert_true train_rmse > 0, "3"
  assert_true test_rmse > 0, "4"
  assert_in_delta train_rmse, test_rmse, 1e-5, "5"
end

test_32()

## Question 3.3 (10 Points)

Implement a simplified version of Gaussian Complexity. Observe that as model complexity increases, test error worsens.

In this simplification, we will compute the average loss of a randomly permuted datasets. Let $H(X,Y)$ be the loss on the training set on a function trained on input examples $x_i\in X$ with labels $y_i\in Y$. Permute the training labels as $y^\prime_i = g y_i$ where $g ~ N(0,1)$ is sampled from a normal distribution with mean 0 and standard deviation 1. Compute the following Gaussian Complexity:

# $R_G(X,H) = -\frac{1}{K} \sum_k H(X,Y^\prime) $

which, in words, is the average of $K$ separate trainings each with a randomly permuted label. We use negative RMSE here to indicate that a more complex model should be more sensitive to permutation and therefore its loss should be lower.


In [21]:
def gaussian_complexity(dataset, obj)
  rng = Distribution::Normal.rng(0,1, 293891)
  lr = 1e-2
  tr_rmses = []
  te_rmses = []
  norms = []
  100.times do |i|
    # BEGIN YOUR CODE
    weight = Hash.new{|h, k| h[k] = 0.0}
    sgd = StochasticGradientDescent.new obj, weight, lr
    
    train_set = dataset.clone
    train_set["data"] = dataset["data"].collect do |r|
      u = r.clone
      u["features"] = r["features"].clone
      u
    end
    
    test_set = dataset.clone
    test_set["data"] = dataset["data"].collect do |r|
      u = r.clone
      u["features"] = r["features"].clone
      u
    end
    
    train_set["data"].each do |train_record|
      train_record["label"] *= rng.call
    end
    
    test_set["data"].each do |test_record|
      test_record["label"] *= rng.call
    end
    
    batch_size = 20
    train_batch = train_set["data"].sample(batch_size)
    test_batch = test_set["data"].sample(batch_size)
    sgd.update(train_batch)
    
    tr_rmses << score_regression_model_rmse(train_batch, sgd.weights, obj)
    te_rmses << score_regression_model_rmse(test_batch, sgd.weights, obj)
    
    norms << norm(sgd.weights)
    #END YOUR CODE
  end  

  result = [mean(tr_rmses), mean(norms), mean(te_rmses)]
  puts result.join("\t")
  result
end

:gaussian_complexity

In [22]:
def test_33()
  stats = Hash.new {|h,k| h[k] = []}
  
  8.times do |i|
    data = read_sparse_data_from_csv "polydata"
    polydata = create_polynomial_features data, i
    obj = LinearRegressionModelL2.new 0.0
    tr_rmse, w_norm, te_rmse = gaussian_complexity(polydata, obj)
    
    stats[:degree] << i
    stats[:train_rmse] << tr_rmse    
    stats[:test_rmse] << te_rmse
    stats[:complexity] << -tr_rmse
  end
  tr_rmse = stats[:train_rmse]
  assert_true(tr_rmse[0] > 0.0)
  assert_true(tr_rmse[0] > tr_rmse[1])
  assert_true(tr_rmse[1] > tr_rmse[2])
  assert_true(tr_rmse[2] > tr_rmse[3])
  assert_true(tr_rmse[2] < 10.0)
  
  te_rmse = stats[:test_rmse]
  assert_true(te_rmse[0] > 0.0)
  assert_true(te_rmse[0] < te_rmse[1])
  assert_true(te_rmse[1] < te_rmse[2])
  assert_true(te_rmse.last < 10.0)
  
  z_plot = Nyaplot::Plot.new
  z_plot.x_label("Model Complexity").y_label("Test RMSE")
  z_plot.add(:line, stats[:complexity], stats[:test_rmse]).color(:black)
  z_plot.show()  
end
test_33()

3.0395113971800107	0.10539173136710334	3.4765620236368124
3.0394342639466174	0.10559820703380213	3.4773625843288083
3.0391406769717086	0.10585072106472174	3.478077093585965
3.0386317891517676	0.10614177186035872	3.4786909628158504
3.0379318722668502	0.10646029866000922	3.4792111401140393
3.0370714135708603	0.10679705669721341	3.479651879045765
3.036080032936978	0.1071452359503917	3.4800283100597067
3.0349838834388745	0.10750002170864127	3.480354057467837


## Question 3.4 (5 points)

Does regularization reduce the Gaussian Complexity? Copy ```test_33``` above and modify it to select a fixed value for the polynomial degree, say $k=5$. Validate that both norm and complexity decreases as you increase the regularization parameter. Due to limitations in SGD, some large regularization values may cause the trainer to diverge. Try adjusting the learning rate. 


In [23]:
def complexity_vs_norm()
  stats = Hash.new {|h,k| h[k] = []}
  data = read_sparse_data_from_csv "polydata"

  [0.0, 0.1, 0.5, 1.0, 1.5, 2.0, 5.0, 10.0, 15.0, 100.0].each do |reg|
    # BEGIN YOUR CODE
    polydata = create_polynomial_features data, 3
    obj = LinearRegressionModelL2.new reg
    tr_rmse, w_norm, te_rmse = gaussian_complexity(polydata, obj)
    #END YOUR CODE
    stats[:regularizer] << reg
    stats[:train_rmse] << tr_rmse    
    stats[:test_rmse] << te_rmse
    stats[:norms] << w_norm    
    stats[:complexity] << -tr_rmse
  end
  
  return stats
end

:complexity_vs_norm

In [23]:
def test_34()
  stats = complexity_vs_norm()
  
  assert_true(stats[:train_rmse].all? {|t| t > 0 and t < 5})
  assert_true(stats[:test_rmse].all? {|t| t > 0 and t < 5})  
  assert_true(stats[:norms].all? {|t| t > 0 and t < 10})    
  z_plot = Nyaplot::Plot.new
  z_plot.x_label("Weight Norm").y_label("Model Complexity")
  z_plot.add(:line, stats[:norms], stats[:complexity]).color(:black)
  z_plot.show()  
end

test_34()

3.0386317891517676	0.10614177186035874	3.4786909628158504
3.0386317891517676	0.10614177186035872	3.478690962815851
3.0386317891517676	0.10614177186035872	3.4786909628158504
3.0386317891517676	0.10614177186035872	3.478690962815851
3.0386317891517676	0.10614177186035872	3.478690962815851
3.0386317891517676	0.10614177186035874	3.4786909628158504
3.0386317891517676	0.10614177186035872	3.478690962815851
3.0386317891517676	0.10614177186035872	3.478690962815851
3.0386317891517685	0.10614177186035872	3.4786909628158504
3.0386317891517676	0.10614177186035872	3.4786909628158504


## Question 4.1 (10 Points)

Moving on to classification, implement L2 regularization for Logisitic Regression. This should follow closely what you did in Question 2.X above. 

Use the Log Loss formulation, $\log(1 + \exp(-y\cdot \hat{y}))$ when calculating the objective value.

In [24]:
class LogisticRegressionModelL2
  def initialize reg_param
    @reg_param = reg_param
  end

  def predict row, w
    x = row["features"]    
    1.0 / (1 + Math.exp(-dot(w, x)))
  end
  
  def adjust w
    w.each_key {|k| w[k] = 0.0 if w[k].nan? or w[k].infinite?}
    w.each_key {|k| w[k] = 0.0 if w[k].abs > 1e5 }
  end
  
  def func data, w
    # BEGIN YOUR CODE
#     raise NotImplementedError.new()
    update_value = 0.0
    data.each do |record|
      if record["label"] == -1
        record["label"] = 0
      end
      predict_value = predict(record, w)
      update_value -= record["label"] * Math.log(predict_value) + (1 - record["label"]) * Math.log(1 - predict_value) 
    end
    res = (@reg_param / 2) * norm(w) ** 2 + update_value / data.length
    #END YOUR CODE
  end
end

:func

In [25]:
### TEST ###
def test_41()
  m = LogisticRegressionModelL2.new 0.0
  w = {"x1" => 1.0, "x2" => -3.0}
  x = [
      {"features" => {"x1" => 0.7, "x2" => -0.3}, "label" => 1.0},
      {"features" => {"x1" => -2.7, "x2" => -1.3}, "label" => -1.0}  
  ]
  
  e1 = 0.1839007409
  assert_in_delta e1, m.func(x[0,1], w), 1e-2, "1"
  
  e2 = 1.4632824673
  assert_in_delta e2, m.func(x[1,1], w), 1e-2, "2"    
  assert_in_delta (e1 + e2) / 2.0, m.func(x, w), 1e-2, "3"  

  assert_in_delta 3.1622776602, norm(w), 1e-2, "4"
  m2 = LogisticRegressionModelL2.new 1.7
  assert_in_delta 9.3235916041, m2.func(x, w), 1e-2, "5"
end

test_41()

## Question 4.2 (10 Points)

Implement the gradient for L2 regularized Logisitic Regression. As in Assignment 5, use the 0 / 1 version of the loss to simplify the derivation.

In [26]:
class LogisticRegressionModelL2
  def grad data, w
    # BEGIN YOUR CODE
    
    g = Hash.new
    count = Hash.new
    
    data.each do |record|
      record["features"].each do |key, value|
        if !g.key?(key)
          g[key] = 0.0
        end
        if !count.key?(key)
          count[key] = 0
        end
        if record["label"] == -1
          record["label"] = 0
        end
        predict_value = predict(record, w)
        update_value = value * (predict_value - record["label"])
        g[key] += update_value
        count[key] += 1
      end
    end
    
    g.each do |key, value|
      g[key] = @reg_param * w[key] + value / count[key]
    end
    #END YOUR CODE  
    return g
  end
end

:grad

In [27]:
### TEST ###
def test_42()
  m = LogisticRegressionModelL2.new 0.0
  w = {"x1" => 1.0, "x2" => -3.0}
  x = [
      {"features" => {"x1" => 0.7, "x2" => -0.3}, "label" => 1.0},
      {"features" => {"x1" => -2.7, "x2" => -1.3}, "label" => -1.0}  
  ]
  
  g1_1 = -0.1175871304
  assert_in_delta g1_1, m.grad(x[0,1], w)["x1"], 1e-2, "1"
  
  g2_1 =  -2.0750169154
  assert_in_delta g2_1, m.grad(x[1,1], w)["x1"], 1e-2, "2"    
  assert_in_delta (g1_1 + g2_1) / 2.0, m.grad(x, w)["x1"], 1e-2, "3"  

  m2 = LogisticRegressionModelL2.new 1.7
  assert_in_delta 1.0 * 1.7 + (g1_1 + g2_1) / 2.0, m2.grad(x, w)["x1"], 1e-2, "5"
end

test_42()

## Question 4.3 (2 points)

Implement a function that will score your logistic regression model and return an array of pairs of (score, class label).

In [28]:
def score_binary_classification_model(data, weights, model)
  # BEGIN YOUR CODE
  
  scores = Array.new
    
  data.each do |record|
    score = model.predict(record, weights)
    label = record["label"]
    scores << [score, label]
  end
  
  #END YOUR CODE
  return scores
end

:score_binary_classification_model

In [29]:
### TEST ###
def test_43()
  m = LogisticRegressionModelL2.new 888.0
  w = {"x1" => 1.0, "x2" => -3.0}
  x = [
      {"features" => {"x1" => 0.7, "x2" => -0.3}, "label" => 1.0},
      {"features" => {"x1" => -2.7, "x2" => -1.3}, "label" => 0.0}  
  ]
  
  e1 = 0.8320183851
  e2 = 0.7685247835
  
  scores = score_binary_classification_model(x, w, m)
  assert_in_delta e1, scores[0][0], 1e-2, "1"
  assert_in_delta e2, scores[1][0], 1e-2, "2"  
  assert_in_delta 1.0, scores[0][1], 1e-2, "3"
  assert_in_delta 0.0, scores[1][1], 1e-2, "4"  
end

test_43()

## Question 5.1 (10 Points)

Given an array of pairs of score and class label (0,1), calculate the AUC metric. It is not necessary to draw the curve, but you are welcome to do that. Assume scores are not sorted.

Recall the definition of AUC as either the area under the ROC curve or the probability of mis-ranking a positive example. Choose one of these methods for the implementation. 

In [30]:
def calc_auc_only(scores)
  # BEGIN YOUR CODE
  res = Array.new
  
  min = (scores.min{|a,b| a[0] <=> b[0]})[0]
  max = (scores.max{|a,b| a[0] <=> b[0]})[0]
  num = 50
  step = (max - min) / num
  
  (min..max).step(step).to_a.each do |threshold|
    false_neg = 0.0 
    true_pos = 0.0
    true_neg = 0.0
    false_pos = 0.0
    scores.each do |record|
      if(record[1] == 0 or record[1] == -1) and record[0] >= threshold
        true_neg += 1.0
      end
      if(record[1] == 0 or record[1] == -1) and record[0] < threshold
        false_pos += 1.0
      end
      if(record[1] == 1 and record[0] >= threshold)
        true_pos += 1.0
      end
      if(record[1] == 1 and record[0] < threshold)
        false_neg += 1.0
      end
    end
    res << [(false_pos / (false_pos + true_neg)), (true_pos / (true_pos + false_neg))]    
  end
  auc = 0.0
  sorted = res.sort{|a,b| a[0] <=> b[0]}
  (1..res.length - 1).each do |index|
    width = res[index][0] - res[index - 1][0]
    height = res[index - 1][1]
    auc += width * height
  end

  #END YOUR CODE
  return auc
end


:calc_auc_only

In [31]:
def test_51()
  good_model = [[0.9, 1], [0.89, 1], [0.7, 0], [0.8, 1], [0.8, 0], [0.7, 1], [0.6, 0], [0.5, 0], [0.1, 0]]
  assert_true(calc_auc_only(good_model) > 0.8)
  assert_true(calc_auc_only(good_model) < 1)
  
  srand(777)
  ok_model = Array.new(100) {|i| [100 - i, (rand < (100 - i) / 100.0) ? 1 : 0] }
  ok_auc = calc_auc_only(ok_model)
  assert_in_delta(0.8631239935587761, ok_auc, 1e-3)
  
  bad_model = Array.new(1000) {|i| [1000 - i, rand < 0.5 ? 1 : 0] }
  bad_auc = calc_auc_only(bad_model)
  assert_in_delta(0.5, bad_auc, 5e-2)

end

test_51()

## Question 5.2 (10 Points)

The following dataset has _irrelevant features_. Find them and use regularization to control them. 

Implement a training method that trains a logistic regression model and returns training and testing AUC values. This follows closely question 3.2 above. Next, fill in the driver code that trains the model for each regularization value and populates an array of training AUC, testing AUC, and weight vector norm values.

Hint: The weights for regularization parameter are displayed.

In [32]:
def train_logistic_regression(sgd, obj, w, train_set, test_set, num_epoch = 100, batch_size = 20)
  # BEGIN YOUR CODE
  num_iter = train_set["data"].length / batch_size
  total_iter = num_iter * num_epoch
  
  total_iter.times do |i|
    data = train_set["data"].sample(batch_size)
    sgd.update(data)
  end
  
  train_score = score_binary_classification_model(train_set["data"], sgd.weights, obj)
  test_score = score_binary_classification_model(test_set["data"], sgd.weights, obj)
  
  train_auc = calc_auc_only(train_score)
  test_auc = calc_auc_only(test_score)
  #END YOUR CODE
  return [train_auc, test_auc]
end


:train_logistic_regression

In [43]:
def test_logistic_regularizers(corner)
  stats = Hash.new {|h,k| h[k] = Array.new}
  [0.0, 0.01, 0.05, 0.1, 0.15, 0.2, 0.5, 1.0, 10.0].each do |reg|
    tr_aucs = []
    te_aucs = []
    w_norms = []

    cross_validate corner, 2 do |tr, te, fold|
      w = Hash.new {|h,k| h[k] = 0.0}
      lr = 1e-2
      m = LogisticRegressionModelL2.new reg
      sgd = StochasticGradientDescent.new m, w, lr
      # BEGIN YOUR CODE
      res = train_logistic_regression(sgd, m, w, tr, te, 30, batch_size = 20)
      tr_aucs << res[0]
      te_aucs << res[1]
      w_norms << norm(sgd.weights)
      w = sgd.weights
      #END YOUR CODE
      puts w if fold == 0
    end
    puts [reg, mean(w_norms), mean(tr_aucs), mean(te_aucs), stdev(te_aucs)].join("\t")
    stats[:reg] << reg
    stats[:tr_aucs] << mean(tr_aucs)
    stats[:w_norms] << mean(w_norms)
    stats[:te_aucs] << mean(te_aucs)    
  end
  
  return stats
end

:test_logistic_regularizers

In [44]:
def test_52()
  corner = z_normalize(read_sparse_data_from_csv("corner"))
  corner["data"].first
  
  stats = test_logistic_regularizers(corner)
  assert_true(stats[:tr_aucs].all? {|a| a > 0.7 and a < 1.0}, "1")
  assert_true(stats[:te_aucs].all? {|a| a > 0.7 and a < 1.0}, "2")
  assert_true(stats[:w_norms][0] > stats[:w_norms][6], "3")
  assert_true(stats[:w_norms][6] > stats[:w_norms].last, "4")  
  Daru::DataFrame.new stats
end

test_52()

{"x1"=>0.006668614241375083, "x2"=>-0.01527417788870782, "x3"=>-0.02463687745903464, "x4"=>-0.009122212336865415, "x5"=>0.0014261509170007034, "x6"=>-0.00946291246872394, "1"=>0.0}
0.0	0.028645676154739286	0.9643851820947096	0.9527316386766381	0.015579358237889925
{"x1"=>0.00903537173447664, "x2"=>-0.026198290867591664, "x3"=>-0.016517410488557868, "x4"=>-0.008969163507992877, "x5"=>0.007931864505250505, "x6"=>-0.0012269517286207941, "1"=>0.0}
0.01	0.03292452747166931	0.9115628156764487	0.94915881047865	0.036376944821047606
{"x1"=>0.016711644171778724, "x2"=>-0.0049917288634013035, "x3"=>-0.016289191960523745, "x4"=>0.01144666850493519, "x5"=>-0.005907964847962639, "x6"=>0.003662339994143011, "1"=>0.0}
0.05	0.027700974010504677	0.9386305736439899	0.9637561893895275	0.002354513689117949
{"x1"=>0.006898754369550028, "x2"=>-0.028983405954304296, "x3"=>-0.022716105796524524, "x4"=>-0.013064184469441605, "x5"=>0.006242389028959988, "x6"=>-0.002820437709138653, "1"=>0.0}
0.1	0.03345811308925

Unnamed: 0,reg,tr_aucs,w_norms,te_aucs
0,0.0,0.9643851820947096,0.0286456761547392,0.952731638676638
1,0.01,0.9115628156764488,0.0329245274716693,0.94915881047865
2,0.05,0.93863057364399,0.0277009740105046,0.9637561893895276
3,0.1,0.9535743680188123,0.0334581130892554,0.902164273116654
4,0.15,0.9213325988624864,0.0319944696233557,0.8961105786794552
5,0.2,0.9231552868060804,0.0343980901495663,0.8850751658688167
6,0.5,0.9702868852459016,0.0232837602173881,0.9608948087431696
7,1.0,0.9553112869334638,0.0250547537663325,0.8539934679313119
8,10.0,0.8816768086544965,0.00541709659448,0.9008478953800412


## Question 5.3 (5 Points)

Make the function below return an array of feature names you think are irrelevant.

In [39]:
def guess_irrelevant_features()
  # BEGIN YOUR CODE
  answer = ["1", "x4", "x6"]
  #END YOUR CODE
  return answer
end

:guess_irrelevant_features

In [40]:
corner = read_sparse_data_from_csv("corner")

t53_answer = guess_irrelevant_features()
assert_true(t53_answer.is_a?(Array))
assert_false(t53_answer.empty?)
assert_false((corner["features"] & t53_answer).empty?)
