# CS6140 Assignments

**Instructions**
1. In each assignment cell, look for the block:
 ```
  #BEGIN YOUR CODE
  raise NotImplementedError.new()
  #END YOUR CODE
 ```
1. Replace this block with your solution.
1. Test your solution by running the cells following your block (indicated by ##TEST##)
1. Click the "Validate" button above to validate the work.

**Notes**
* You may add other cells and functions as needed
* Keep all code in the same notebook
* In order to receive credit, code must "Validate" on the JupyterHub server

---

# Assignment 6: Model Evaluation and Regularization


In this assignment, we will investigate two evaluation methods and two ways that regularization can be used to control the behavior of linear models. Most of the code here will be copied or refactored from previous assignments. You are encouraged to copy **your code only** from previous assignments.

In [48]:
require './assignment_lib'

false

# Question 1.1 (1 Point)

Copy **YOUR** implementation of ```StochasticGradientDescent``` from [Assignment 5](../assignment-5/assignment-5.ipynb) into the following cell.

In [49]:
# BEGIN YOUR CODE
class StochasticGradientDescent
  attr_reader :weights
  attr_reader :objective
  def initialize obj, w_0, lr = 0.01
    @objective = obj
    @weights = w_0
    @n = 1.0
    @lr = lr
  end
  def update x
    # BEGIN YOUR CODE
    g = @objective.grad(x, @weights)
    learning_rate = @lr / Math.sqrt(@n)
    @weights.each do |k, v|
      @weights[k] -= g[k] * learning_rate
    end
    @n += 1.0
    #END YOUR CODE
  end
end
#END YOUR CODE

:update

In [50]:
### Hidden Test (See Test 1.1 from Assignment 5) ###
assert_not_nil(StochasticGradientDescent.class)

## Question 1.2 (1 point)

Copy **YOUR** implementation of the ```dot``` product and ```norm``` functions from [Assignment 4](../assignment-4/assignment-4.ipynb) into the following cell. Please copy the whole function, not just the parts within the comments.

In [51]:
# BEGIN YOUR CODE
#Implement the error function given a weight vector, w
def dot x, w
  # BEGIN YOUR CODE
  sum = 0.0
    
    if !(x.empty? or w.empty?)
      x.each do |k, v|
          if w.has_key?(k)
              sum += v * w[k]
          end
      end
    end
    
    return sum
  #END YOUR CODE
end

def norm w
  # BEGIN YOUR CODE
  return Math.sqrt(dot(w, w))
  #END YOUR CODE
end
#END YOUR CODE

:norm

In [52]:
def test_12()
  assert_in_delta 2.0, norm({"a" => 1.41421, "b" => 1.41421}), 1e-2
  assert_in_delta 2.0, norm({"a" => -1.41421, "b" => 1.41421}), 1e-2
  assert_in_delta 0.0, norm({}), 1e-2

  assert_in_delta 6.0, dot({"a" => 2.0}, {"a" => 3.0}), 1e-6
  assert_in_delta 6.0, dot({"a" => 2.0}, {"a" => 3.0, "b" => 4.0}), 1e-6
  assert_equal 0.0, dot({}, {})
  assert_equal 0.0, dot({"a" => 1.0}, {"b" => 1.0})
end

test_12()

## Question 1.3 (1 Point)

Refactor **YOUR** $z$-score normalization method from [Assignment 5](../assignment-5/assignment-5.ipynb), where we called it ```create_zspambase```. It should be general enough to normalize any dataset. Only normalize features in the ```features``` key.

Note: Watch out for zero-stdev features.

In [53]:
def z_normalize dataset
  zdataset = dataset.clone
  zdataset["data"] = dataset["data"].collect do |r|
    u = r.clone
    u["features"] = r["features"].clone
    u
  end

  # BEGIN YOUR CODE
  mu = Hash.new
  dev = Hash.new
  data = zdataset["data"]
  
  zdataset["features"].each do |fname|
    processed_data = data.collect {|row| row["features"][fname]}
    processed_data = processed_data.select {|x| x != nil }
    mu[fname] = mean(processed_data)
    dev[fname] = stdev(processed_data)
  end
  
  zdataset["features"].each do |fname|
    data.each do |row|
      #set missing values to 0.0
      row["features"][fname] = dev[fname] == 0.0 ? 0.0 : (row["features"][fname] == nil ? 0.0 : (row["features"][fname] - mu[fname]) / dev[fname])
    end
  end
  #END YOUR CODE
  return zdataset
end

:z_normalize

In [54]:
### TEST ###
def test_13()
  spambase = read_sparse_data_from_csv "spambase"
  zspambase = z_normalize spambase

  assert_in_delta 0.27, spambase["data"].first["features"]["word_freq_our"], 1e-5
  assert_in_delta -0.628106690674003, zspambase["data"].first["features"]["word_freq_our"], 1e-5

  assert_in_delta 607.0, spambase["data"].first["features"]["capital_run_length_total"], 1e-5
  assert_in_delta 0.53386, zspambase["data"].first["features"]["capital_run_length_total"], 1e-5
end

test_13()

## Question 2.1 (10 Points)

Change your ```LinearRegression``` implementation from  [Assignment 5](../assignment-5/assignment-5.ipynb) to implement regularization. The new implementation requires a value for $\lambda$. The regularization objective function for linear regression in a mini-batch is as follows:

# $L(w,X) = \frac{\lambda}{2} \left\lVert w \right\rVert ^ 2 + \frac{1}{n} \sum_{i} \frac{1}{2} \left(f(w,x_i) - y_i\right) ^ 2$

where ```reg_param``` corresponds to $\lambda$ in the formula above.

Note that there is no $\frac{1}{n}$ in front of the regularizer penalty. The ```predict``` and ```adjust``` methods have been provided for you. 

Hint: Use ```dot``` and ```norm``` as needed.

In [55]:
class LinearRegressionModelL2
  def initialize reg_param
    @reg_param = reg_param
  end

  def predict row, w
    x = row["features"]    
    yhat = dot(w, x)
  end
  
  def adjust w
    w.each_key {|k| w[k] = 0.0 if w[k].nan? or w[k].infinite?}
    w.each_key {|k| w[k] = 0.0 if w[k].abs > 1e5 }
  end
  
  def func data, w
    # BEGIN YOUR CODE
    l = 0.0
    data.each do |x|
      l += (predict(x, w) - x["label"]) ** 2
    end
    return 0.5 * @reg_param * norm(w) ** 2 + l / (2.0 * data.size)
    #END YOUR CODE
  end
end

:func

In [56]:
### TEST ###
def test_21()
  m = LinearRegressionModelL2.new 0.0
  w = {"x1" => 1.0, "x2" => -3.0}
  x = [
      {"features" => {"x1" => 0.7, "x2" => -0.3}, "label" => 0.97},
      {"features" => {"x1" => -2.7, "x2" => -1.3}, "label" => -1.0}  
  ]
  
  e1 = 0.19845
  assert_in_delta e1, m.func(x[0,1], w), 1e-2, "1"
  
  e2 = 2.42
  assert_in_delta e2, m.func(x[1,1], w), 1e-2, "2"    
  assert_in_delta (e1 + e2) / 2.0, m.func(x, w), 1e-2, "3"  

  assert_in_delta 3.1622776602, norm(w), 1e-2, "4"
  m2 = LinearRegressionModelL2.new 1.7
  assert_in_delta 9.809225, m2.func(x, w), 1e-2, "5"
end

test_21()

## Question 2.2 (10 Points)

Implement the gradient for the regularized linear regression using the above objective function.

In [57]:
class LinearRegressionModelL2
  def grad data, w
    g = Hash.new {|h,k| h[k] = 0.0}
    # BEGIN YOUR CODE
    data[0]["features"].each_key do |v|
      gd = 0.0
      data.each do |x|
        gd += x["features"][v] == nil ? 0.0 : (predict(x, w)-x["label"]) * x["features"][v]
      end
      g[v] = @reg_param * w[v] + gd / data.size.to_f
    end
    #END YOUR CODE
    return g
  end
end

:grad

In [58]:
### TEST ###
def test_22()
  m = LinearRegressionModelL2.new 0.0
  w = {"x1" => 1.0, "x2" => -3.0}
  x = [
      {"features" => {"x1" => 0.7, "x2" => -0.3}, "label" => 0.97},
      {"features" => {"x1" => -2.7, "x2" => -1.3}, "label" => -1.0}  
  ]
  
  g1_1 = 0.441
  assert_in_delta g1_1, m.grad(x[0,1], w)["x1"], 1e-2, "1"
  
  g2_1 = -5.94
  assert_in_delta g2_1, m.grad(x[1,1], w)["x1"], 1e-2, "2"    
  assert_in_delta (g1_1 + g2_1) / 2.0, m.grad(x, w)["x1"], 1e-2, "3"  

  m2 = LinearRegressionModelL2.new 1.7
  assert_in_delta -1.0495, m2.grad(x, w)["x1"], 1e-2, "5"
end

test_22()

## Question 2.3 (10 Points)

Implement a function that calculates the Root Mean Squared Error (RMSE) for a given dataset for a prediction model and weights.

RMSE is defined as follows:

# $e = \sqrt{\frac{\sum_{i=1}^N{ \left( \hat{y} - y \right) ^ 2 }}{N}}$

where $N$ is the number of examples in the dataset.

Hint: Use the ```mean``` function in the assignment library.

In [59]:
def score_regression_model_rmse(data, weights, model)
  # BEGIN YOUR CODE
  err = []
  n = data.size()
  data.each do |row|
    y_hat = model.predict(row, weights)
    err.push ((y_hat - row["label"]) ** 2.0)   
  end
  return Math.sqrt(mean(err))
  #END YOUR CODE
end

:score_regression_model_rmse

In [60]:
### TEST ###
def test_23()
  m = LinearRegressionModelL2.new 0.0
  w = {"x1" => 1.0, "x2" => -3.0}
  x = [
      {"features" => {"x1" => 0.7, "x2" => -0.3}, "label" => 0.97},
      {"features" => {"x1" => -2.7, "x2" => -1.3}, "label" => -1.0}  
  ]
  
  e1 = (((0.7 + -3 * -0.3) - 0.97) ** 2)
  e2 = (((-2.7 + -3.0 * -1.3) - -1.0) ** 2)
  
  rmse = Math.sqrt((e1 + e2) / 2.0)
  assert_in_delta rmse, score_regression_model_rmse(x, w, m), 1e-2, "3"  
end

test_23()

## Question 3.1 (10 Points)

Using a small provided dataset, shown below, we will investigate model complexity. First, implement a _polynomial_ feature representation. 

Call the bias feature "1" for this part. For a dataset with two features, $x_1$ and $x_2$, a polynomial representation of degree 0 is as follows:

# $\phi(x, k = 0) = \left( 1 \right)$

degree 1:

# $\phi(x, k = 1) = \left( 1, x_1, x_2 \right)$

degree 2: 

# $\phi(x, k = 2) = \left( 1, x_1, x_2, x_1^2, x_2^2, x_1 x_2  \right)$

and more generally, for degree $k$:

# $\phi(x, k) = \left(1, x_1 \phi(x,k-1), x_2 \phi(x,k-1) \right)$

For your convenience, the function ```poly_features``` emits, for degree $k$, the names of the features to be multiplied. After generating the features, apply ```z_normalize``` to only the newly added features (i.e., not the original features or the bias).

Note: You may notice that the dataset we plan to use only has one feature and therefore the above seems overly complex. Don't worry, we will see this again. ;)

In [61]:
polydata = read_sparse_data_from_csv "polydata"
x1 = polydata["data"].collect {|r| r["features"]["x1"]}
x2 = polydata["data"].collect {|r| r["label"]}
puts "Polydata Regression Dataset"
Daru::DataFrame.new({x1: x1, x2: x2})
.plot(type: :scatter, x: :x1, y: :x2) do |plot, diagram|
  plot.x_label "X1"
  plot.y_label "Label"
  plot.legend false
end

Polydata Regression Dataset


In [72]:
def z_normalize_features dataset, features
  zdataset = dataset.clone
  zdataset["data"] = dataset["data"].collect do |r|
    u = r.clone
    u["features"] = r["features"].clone
    u
  end

  # BEGIN YOUR CODE
  mu = Hash.new
  dev = Hash.new
  data = zdataset["data"]
  
  features.each do |fname|
    processed_data = data.collect {|row| row["features"][fname]}
    processed_data = processed_data.select {|x| x != nil }
    mu[fname] = mean(processed_data)
    dev[fname] = stdev(processed_data)
  end
  
  features.each do |fname|
    data.each do |row|
      #missing values are set to 0.0
      row["features"][fname] = row["features"][fname] == nil ? 0.0 : (row["features"][fname] - mu[fname]) / dev[fname]
    end
  end
  #END YOUR CODE
  return zdataset
end

:z_normalize_features

In [73]:
def poly_features features, degree
  poly_features = ["1"]

  degree.times do |i|
    poly_features += poly_features.flat_map do |x_prev|
      features.reject {|x| x == "1" or x == "bias"}.collect do |x|
        [x, x_prev.split("*")].flatten.sort.join("*")
      end
    end
    poly_features.uniq!
  end
  poly_features.collect {|k| k.gsub /^1\*([^\*]*)$/, '\1'}
end

poly_features ["x1", "x2"], 3

["1", "x1", "x2", "1*x1*x1", "1*x1*x2", "1*x2*x2", "1*x1*x1*x1", "1*x1*x1*x2", "1*x1*x2*x2", "1*x2*x2*x2"]

In [74]:
def create_polynomial_features dataset, degree
  polydataset = dataset.clone
  features = poly_features dataset["features"], degree
  # BEGIN YOUR CODE
  polydataset["features"] = []
  features.each do |fname|
    polydataset["features"].push(fname)
  end
  
  #get all the features
  polydataset["features"].uniq! 

  polydataset["data"].each do |row|
    row["features"].delete_if {|k, v| !polydataset["features"].include?(k)}
    polydataset["features"].each do |fname|
      if !row["features"].has_key?(fname)
        row["features"][fname] = 1.0
        fname_array = fname.split("*")
        fname_array.each do |ftr|
            row["features"][fname] *= row["features"][ftr]
        end
      end  
    end
  end

  #z_normalize
  polydataset = z_normalize_features(polydataset, features.reject{|f| f.split("*").size < 2})  
  return polydataset 
  #END YOUR CODE
end

:create_polynomial_features

In [75]:
def test_31()
  data = read_sparse_data_from_csv "polydata"
  assert_in_delta 12.8132, data["data"].first["features"]["x1"], 1e-2, "1"
  
  polydata = create_polynomial_features data, 3
  
  xp = polydata["data"].first["features"]
  assert_in_delta 12.8132, xp["x1"], 1e-2, "2: Does not normalize original features"
  assert_in_delta -0.905, xp["1*x1*x1"], 1e-2, "3: Applies normalization to new features"
  assert_in_delta -0.827, xp["1*x1*x1*x1"], 1e-2, "4: Applies normalization to new features"
end

test_31()

## Question 3.2 (5 Points)

Let's fit this dataset with different polynomial degrees. First, let's see how well linear regression fits the training data. 

Implement a training function that, given a training and testing dataset, trains the model using mini-batch SGD and returns the RMSE error value on both training and testing sets.

In [76]:
def train(sgd, obj, w, train_set, test_set, num_epoch = 100, batch_size = 20)
  # BEGIN YOUR CODE
  num_epoch.times do |i|
    sgd.update train_set["data"].sample(batch_size)
  end

  train_rmse = score_regression_model_rmse(train_set["data"], sgd.weights, obj)
  test_rmse = score_regression_model_rmse(test_set["data"], sgd.weights, obj)
  #END YOUR CODE
  return [train_rmse, test_rmse]
end

:train

In [77]:
def test_32()
  data = read_sparse_data_from_csv "polydata"
  polydata = create_polynomial_features data, 1
  x1 = polydata["data"].collect {|r| r["features"]["x1"]}
  x2 = polydata["data"].collect {|r| r["label"]}
  
  w = Hash.new {|h,k| h[k] = 0.0}
  lr = 1e-3
  obj = LinearRegressionModelL2.new 0.0
  sgd = StochasticGradientDescent.new obj, w, lr

  train_set = polydata
  test_set = polydata
  train_rmse, test_rmse = train(sgd, obj, w, train_set, test_set, num_epoch = 100, batch_size = 20)
  assert_true train_rmse < 2, "1"
  assert_true test_rmse < 2, "2"
  assert_true train_rmse > 0, "3"
  assert_true test_rmse > 0, "4"
  assert_in_delta train_rmse, test_rmse, 1e-5, "5"
end

test_32()

## Question 3.3 (10 Points)

Implement a simplified version of Gaussian Complexity. Observe that as model complexity increases, test error worsens.

In this simplification, we will compute the average loss of a randomly permuted datasets. Let $H(X,Y)$ be the loss on the training set on a function trained on input examples $x_i\in X$ with labels $y_i\in Y$. Permute the training labels as $y^\prime_i = g y_i$ where $g ~ N(0,1)$ is sampled from a normal distribution with mean 0 and standard deviation 1. Compute the following Gaussian Complexity:

# $R_G(X,H) = -\frac{1}{K} \sum_k H(X,Y^\prime) $

which, in words, is the average of $K$ separate trainings each with a randomly permuted label. We use negative RMSE here to indicate that a more complex model should be more sensitive to permutation and therefore its loss should be lower.


In [80]:
#helper function
def permute_label dataset, rng
  permute_data = dataset.clone
  permute_data["data"] = dataset["data"].collect do |row|
    {"features"=>row["features"], "label"=>row["label"] * rng.call}
  end
 return permute_data 
end

:permute_label

In [83]:
def gaussian_complexity(dataset, obj)
  rng = Distribution::Normal.rng(0,1, 293891)
  lr = 1e-2
  tr_rmses = []
  te_rmses = []  
  norms = []

  100.times do |i|
    # BEGIN YOUR CODE
    permute_data = permute_label(dataset, rng)
    w = Hash.new {|h,k| h[k] = 0.0}
    lr = 2e-3
    sgd = StochasticGradientDescent.new obj, w, lr
    train_rmse, test_rmse = train(sgd, obj, w, permute_data, dataset, num_epoch = 5, batch_size = 20)
    tr_rmses.push(train_rmse)
    te_rmses.push(test_rmse)
    norms.push(norm(sgd.weights))
    #END YOUR CODE
  end  
  result = [mean(tr_rmses), mean(norms), mean(te_rmses)]
  puts result.join("\t")
  result
end

:gaussian_complexity

In [84]:
def test_33()
  stats = Hash.new {|h,k| h[k] = []}
  
  8.times do |i|
    data = read_sparse_data_from_csv "polydata"
    polydata = create_polynomial_features data, i
    obj = LinearRegressionModelL2.new 0.0
    tr_rmse, w_norm, te_rmse = gaussian_complexity(polydata, obj)
    
    stats[:degree] << i
    stats[:train_rmse] << tr_rmse    
    stats[:test_rmse] << te_rmse
    stats[:complexity] << -tr_rmse
  end
  tr_rmse = stats[:train_rmse]
  assert_true(tr_rmse[0] > 0.0)
  assert_true(tr_rmse[0] > tr_rmse[1])
  assert_true(tr_rmse[1] > tr_rmse[2])
  assert_true(tr_rmse[2] > tr_rmse[3])
  assert_true(tr_rmse[2] < 10.0)
  
  te_rmse = stats[:test_rmse]
  assert_true(te_rmse[0] > 0.0)
  assert_true(te_rmse[0] < te_rmse[1])
  assert_true(te_rmse[1] < te_rmse[2])
  assert_true(te_rmse.last < 10.0)
  
  z_plot = Nyaplot::Plot.new
  z_plot.x_label("Model Complexity").y_label("Test RMSE")
  z_plot.add(:line, stats[:complexity], stats[:test_rmse]).color(:black)
  z_plot.show()  
end
test_33()

2.5003728523184092	0.001236681150280377	2.435210656287256
2.43548083839562	0.01845251909510033	2.4558233103508162
2.435094888883059	0.018596735898807802	2.455838032590295
2.43462559668921	0.018769594635272696	2.455846830244951
2.4340834829644415	0.018961648517876042	2.455849727210985
2.4334811366995446	0.019163545829377498	2.4558481059826756
2.432830959518362	0.019368808213133432	2.4558435776526766
2.4321441328412106	0.019573415645213414	2.4558375341122876


## Question 3.4 (5 points)

Does regularization reduce the Gaussian Complexity? Copy ```test_33``` above and modify it to select a fixed value for the polynomial degree, say $k=5$. Validate that both norm and complexity decreases as you increase the regularization parameter. Due to limitations in SGD, some large regularization values may cause the trainer to diverge. Try adjusting the learning rate. 


In [85]:
def complexity_vs_norm()
  stats = Hash.new {|h,k| h[k] = []}
  data = read_sparse_data_from_csv "polydata"

  [0.0, 0.1, 0.5, 1.0, 1.5, 2.0, 5.0, 10.0, 15.0, 100.0].each do |reg|
    # BEGIN YOUR CODE
    polydata = create_polynomial_features data, 5
    obj = LinearRegressionModelL2.new reg
    tr_rmse, w_norm, te_rmse = gaussian_complexity(polydata, obj)
    #END YOUR CODE
    stats[:regularizer] << reg
    stats[:train_rmse] << tr_rmse    
    stats[:test_rmse] << te_rmse
    stats[:norms] << w_norm    
    stats[:complexity] << -tr_rmse
  end
  
  return stats
end

:complexity_vs_norm

In [86]:
def test_34()
  stats = complexity_vs_norm()
  
  assert_true(stats[:train_rmse].all? {|t| t > 0 and t < 5})
  assert_true(stats[:test_rmse].all? {|t| t > 0 and t < 5})  
  assert_true(stats[:norms].all? {|t| t > 0 and t < 10})    
  z_plot = Nyaplot::Plot.new
  z_plot.x_label("Weight Norm").y_label("Model Complexity")
  z_plot.add(:line, stats[:norms], stats[:complexity]).color(:black)
  z_plot.show()  
end

test_34()

2.4334811366995446	0.019163545829377494	2.4558481059826756
2.4334838801268392	0.01916120294621463	2.4558426980522396
2.4334948639491873	0.019151834722347806	2.455821081791134
2.4335086164392012	0.01914013188452707	2.4557940962175433
2.4335223940987536	0.019128437311328423	2.4557671492058684
2.4335361968571303	0.019116750998166388	2.455740240700157
2.433619536536236	0.019046806323348972	2.4555795951680066
2.4337603971491593	0.01893088961613826	2.455314896074094
2.433903649183996	0.018815791139506613	2.4550539537452454
2.4366426852243204	0.01697994655475885	2.4511431112880833


## Question 4.1 (10 Points)

Moving on to classification, implement L2 regularization for Logisitic Regression. This should follow closely what you did in Question 2.X above. 

Use the Log Loss formulation, $\log(1 + \exp(-y\cdot \hat{y}))$ when calculating the objective value.

In [87]:
class LogisticRegressionModelL2
  def initialize reg_param
    @reg_param = reg_param
  end

  def predict row, w
    x = row["features"]    
    1.0 / (1 + Math.exp(-dot(w, x)))
  end
  
  def adjust w
    w.each_key {|k| w[k] = 0.0 if w[k].nan? or w[k].infinite?}
    w.each_key {|k| w[k] = 0.0 if w[k].abs > 1e5 }
  end
  
  def func data, w
    # BEGIN YOUR CODE
    l = 0.0
    data.each do |d|
      l += Math.log(1 + Math.exp(- (dot(w, d["features"]) * d["label"])))
    end
    
    return 0.5 * @reg_param * norm(w) ** 2.0 + l / data.size
    #END YOUR CODE
  end
end

:func

In [88]:
### TEST ###
def test_41()
  m = LogisticRegressionModelL2.new 0.0
  w = {"x1" => 1.0, "x2" => -3.0}
  x = [
      {"features" => {"x1" => 0.7, "x2" => -0.3}, "label" => 1.0},
      {"features" => {"x1" => -2.7, "x2" => -1.3}, "label" => -1.0}  
  ]
  
  e1 = 0.1839007409
  assert_in_delta e1, m.func(x[0,1], w), 1e-2, "1"
  
  e2 = 1.4632824673
  assert_in_delta e2, m.func(x[1,1], w), 1e-2, "2"    
  assert_in_delta (e1 + e2) / 2.0, m.func(x, w), 1e-2, "3"  

  assert_in_delta 3.1622776602, norm(w), 1e-2, "4"
  m2 = LogisticRegressionModelL2.new 1.7
  assert_in_delta 9.3235916041, m2.func(x, w), 1e-2, "5"
end

test_41()

## Question 4.2 (10 Points)

Implement the gradient for L2 regularized Logisitic Regression. As in Assignment 5, use the 0 / 1 version of the loss to simplify the derivation.

In [89]:
class LogisticRegressionModelL2
  def grad data, w
    # BEGIN YOUR CODE
    g = Hash.new {|h,k| h[k] = 0.0}
    
    data[0]["features"].each_key do |f|
      gd = 0.0
      data.each do |d|
        t = Math.exp(- d["label"] * dot(d["features"], w))
        gd += (t / (1 + t)) * (- d["label"] * d["features"][f])
      end
      g[f] = @reg_param * w[f] + gd / data.size
    end
    #END YOUR CODE
    return g
  end
end

:grad

In [90]:
### TEST ###
def test_42()
  m = LogisticRegressionModelL2.new 0.0
  w = {"x1" => 1.0, "x2" => -3.0}
  x = [
      {"features" => {"x1" => 0.7, "x2" => -0.3}, "label" => 1.0},
      {"features" => {"x1" => -2.7, "x2" => -1.3}, "label" => -1.0}  
  ]
  
  g1_1 = -0.1175871304
  assert_in_delta g1_1, m.grad(x[0,1], w)["x1"], 1e-2, "1"
  
  g2_1 =  -2.0750169154
  assert_in_delta g2_1, m.grad(x[1,1], w)["x1"], 1e-2, "2"    
  assert_in_delta (g1_1 + g2_1) / 2.0, m.grad(x, w)["x1"], 1e-2, "3"  

  m2 = LogisticRegressionModelL2.new 1.7
  assert_in_delta 1.0 * 1.7 + (g1_1 + g2_1) / 2.0, m2.grad(x, w)["x1"], 1e-2, "5"
end

test_42()

## Question 4.3 (2 points)

Implement a function that will score your logistic regression model and return an array of pairs of (score, class label).

In [93]:
def score_binary_classification_model(data, weights, model)
  # BEGIN YOUR CODE
  score = Array.new (data.size) {Array.new(2, 0)}
  i = 0
  data.each do |row|
    pair = Array.new (2)
    y_hat = model.predict(row, weights)
    pair[0] = y_hat
    pair[1] = row["label"]
    score[i] = pair
    i += 1
  end
  #END YOUR CODE
  return score
end

:score_binary_classification_model

In [94]:
### TEST ###
def test_43()
  m = LogisticRegressionModelL2.new 888.0
  w = {"x1" => 1.0, "x2" => -3.0}
  x = [
      {"features" => {"x1" => 0.7, "x2" => -0.3}, "label" => 1.0},
      {"features" => {"x1" => -2.7, "x2" => -1.3}, "label" => 0.0}  
  ]
  
  e1 = 0.8320183851
  e2 = 0.7685247835
  
  scores = score_binary_classification_model(x, w, m)
  assert_in_delta e1, scores[0][0], 1e-2, "1"
  assert_in_delta e2, scores[1][0], 1e-2, "2"  
  assert_in_delta 1.0, scores[0][1], 1e-2, "3"
  assert_in_delta 0.0, scores[1][1], 1e-2, "4"  
end

test_43()

## Question 5.1 (10 Points)

Given an array of pairs of score and class label (0,1), calculate the AUC metric. It is not necessary to draw the curve, but you are welcome to do that. Assume scores are not sorted.

Recall the definition of AUC as either the area under the ROC curve or the probability of mis-ranking a positive example. Choose one of these methods for the implementation. 

In [107]:
def cal_area (fp_rate, tp_rate)
  area = 0.0
  
  x_prev = 0.0
  y_prev = 0.0
  fp_rate.length.times do |i|
    x = fp_rate[i]
    y = tp_rate[i]
    area += (y + y_prev) * (x - x_prev) / 2.0
    x_prev = x
    y_prev = y
  end
  return area
end

def calc_auc_only(scores)
  # BEGIN YOUR CODE
  sorted_scores = scores.sort { |a,b| a[0] <=> b[0] }
  sorted_label = sorted_scores.collect { |s| s[1] }
  sorted_y_hat = sorted_scores.collect { |s| s[0] }
  positive = sorted_label.count { |l| l == 1 }.to_f
  negative = sorted_label.count { |l| l < 1 }.to_f
  
  res = []
  res << [0, 0, 0]
  true_positive = 0
  false_positive = 0
  (sorted_label.length - 1).downto(0) do |i|
    threshold = sorted_y_hat[i]
    if sorted_label[i] > 0
      true_positive += 1.0
    else
      false_positive += 1.0
    end
    fp_rate = negative == 0.0 ? 0.0 : false_positive / negative;
    tp_rate = positive == 0.0 ? 0.0 : true_positive / positive;
    res << [fp_rate, tp_rate, threshold]
  end
  curve_x = res.collect{ |r| r[0] }
  curve_y = res.collect{ |r| r[1] }
  auc = cal_area(curve_x, curve_y)
  #END YOUR CODE
  return auc
end


:calc_auc_only

In [108]:
def test_51()
  good_model = [[0.9, 1], [0.89, 1], [0.7, 0], [0.8, 1], [0.8, 0], [0.7, 1], [0.6, 0], [0.5, 0], [0.1, 0]]
  assert_true(calc_auc_only(good_model) > 0.8)
  assert_true(calc_auc_only(good_model) < 1)
  
  srand(777)
  ok_model = Array.new(100) {|i| [100 - i, (rand < (100 - i) / 100.0) ? 1 : 0] }
  ok_auc = calc_auc_only(ok_model)
  assert_in_delta(0.8631239935587761, ok_auc, 1e-3)
  
  bad_model = Array.new(1000) {|i| [1000 - i, rand < 0.5 ? 1 : 0] }
  bad_auc = calc_auc_only(bad_model)
  assert_in_delta(0.5, bad_auc, 5e-2)

end

test_51()

## Question 5.2 (10 Points)

The following dataset has _irrelevant features_. Find them and use regularization to control them. 

Implement a training method that trains a logistic regression model and returns training and testing AUC values. This follows closely question 3.2 above. Next, fill in the driver code that trains the model for each regularization value and populates an array of training AUC, testing AUC, and weight vector norm values.

Hint: The weights for regularization parameter are displayed.

In [102]:
def train_logistic_regression(sgd, obj, w, train_set, test_set, num_epoch = 100, batch_size = 20)
  # BEGIN YOUR CODE
  num_epoch.times do |t|
    train_set["data"].shuffle.each_slice(batch_size) do |batch|    
      sgd.update batch
    end
  end
  train_scores = score_binary_classification_model(train_set["data"], sgd.weights, obj)
  train_auc = calc_auc_only(train_scores)
  test_scores = score_binary_classification_model(test_set["data"], sgd.weights, obj)
  test_auc = calc_auc_only(test_scores)
  #END YOUR CODE
  return [train_auc, test_auc]
end


:train_logistic_regression

In [103]:
def test_logistic_regularizers(corner)
  stats = Hash.new {|h,k| h[k] = Array.new}
  [0.0, 0.01, 0.05, 0.1, 0.15, 0.2, 0.5, 1.0, 10.0].each do |reg|
    tr_aucs = []
    te_aucs = []
    w_norms = []

    cross_validate corner, 2 do |tr, te, fold|
      # BEGIN YOUR CODE
      obj = LogisticRegressionModelL2.new reg
      w = Hash.new {|h,k| h[k] = 0.1}
      lr = 0.5
      sgd = StochasticGradientDescent.new obj, w, lr
      tr_auc, te_auc = (train_logistic_regression(sgd, obj, w, tr, te, num_epoch = 50, batch_size = 20))
      tr_aucs.push (tr_auc)
      te_aucs.push (te_auc)
      w_norms.push (norm (sgd.weights))
      #END YOUR CODE
      puts w if fold == 0
    end
    puts [reg, mean(w_norms), mean(tr_aucs), mean(te_aucs), stdev(te_aucs)].join("\t")
    stats[:reg] << reg
    stats[:tr_aucs] << mean(tr_aucs)
    stats[:w_norms] << mean(w_norms)
    stats[:te_aucs] << mean(te_aucs)    
  end
  
  return stats
end

:test_logistic_regularizers

In [104]:
def test_52()
  corner = z_normalize(read_sparse_data_from_csv("corner"))
  corner["data"].first
  
  stats = test_logistic_regularizers(corner)
  assert_true(stats[:tr_aucs].all? {|a| a > 0.7 and a < 1.0}, "1")
  assert_true(stats[:te_aucs].all? {|a| a > 0.7 and a < 1.0}, "2")
  assert_true(stats[:w_norms][0] > stats[:w_norms][6], "3")
  assert_true(stats[:w_norms][6] > stats[:w_norms].last, "4")  
  Daru::DataFrame.new stats
end

test_52()

{"x1"=>0.1345914402945271, "x2"=>-0.11095193826210342, "x3"=>-0.1757040654199514, "x4"=>0.029541135947739466, "x5"=>0.0003601418340158256, "x6"=>-0.016785445952223633, "1"=>0.1}
0.0	0.24068641829620327	0.9804795498446293	0.9743046107331821	0.017632239388771117
{"x1"=>0.09867302615659221, "x2"=>-0.11638960045262386, "x3"=>-0.08828431069959748, "x4"=>0.040515016018545576, "x5"=>0.05959757049875124, "x6"=>-0.02684299561101966, "1"=>0.07072162519713893}
0.01	0.22536460019780602	0.9573380702815293	0.9732150356110999	0.003027343963833256
{"x1"=>0.16236020801722006, "x2"=>-0.03909353740645574, "x3"=>-0.1007260391525799, "x4"=>0.06962013849878565, "x5"=>0.0932464092492029, "x6"=>0.04355742928352505, "1"=>0.017657068421981774}
0.05	0.2419643665552974	0.8748652782489604	0.8777859685822551	0.020609625551962395
{"x1"=>0.10685847045577533, "x2"=>-0.14764964036020142, "x3"=>-0.07479004514408723, "x4"=>0.04093692725593921, "x5"=>0.01721318702953041, "x6"=>-0.00311802122632829, "1"=>0.0031024811251329

Unnamed: 0,reg,tr_aucs,w_norms,te_aucs
0,0.0,0.9804795498446292,0.2406864182962032,0.974304610733182
1,0.01,0.9573380702815292,0.225364600197806,0.9732150356111
2,0.05,0.8748652782489604,0.2419643665552974,0.8777859685822551
3,0.1,0.9580595448055766,0.1778103466423439,0.9356517174771144
4,0.15,0.9027670579013296,0.1479576401868823,0.9709775734119068
5,0.2,0.966212799390764,0.1207514064903717,0.9712771303391226
6,0.5,0.8825990437158475,0.0742859765933257,0.9070184426229516
7,1.0,0.9491974043715852,0.0462923110633741,0.9498804644808744
8,10.0,0.7998422463448598,0.0067344023015443,0.8888757310862416


## Question 5.3 (5 Points)

Make the function below return an array of feature names you think are irrelevant.

In [105]:
def guess_irrelevant_features()
  # BEGIN YOUR CODE
  answer = ["x4", "x5", "x6"]
  #END YOUR CODE
  return answer
end

:guess_irrelevant_features

In [106]:
corner = read_sparse_data_from_csv("corner")

t53_answer = guess_irrelevant_features()
assert_true(t53_answer.is_a?(Array))
assert_false(t53_answer.empty?)
assert_false((corner["features"] & t53_answer).empty?)
