# CS6140 Assignments

**Instructions**
1. In each assignment cell, look for the block:
 ```
  #BEGIN YOUR CODE
  raise NotImplementedError.new()
  #END YOUR CODE
 ```
1. Replace this block with your solution.
1. Test your solution by running the cells following your block (indicated by ##TEST##)
1. Click the "Validate" button above to validate the work.

**Notes**
* You may add other cells and functions as needed
* Keep all code in the same notebook
* In order to receive credit, code must "Validate" on the JupyterHub server

---

# Final Project Tests

In [Part 3](./part-3.ipynb) of the final project, you will create your own features and try different combinations of models. This notebook sets up the foundation for the final project. Most of the code you need for the final project has already been implemented by you across multiple assignments. It has been scattered and we have copied and pasted where necessary. 

In this assignment, you will assemble a small library of the ```Learner```s, ```FeatureTransformer```s and ```Metric```s needed for the final project. The tests here do not cover everything we did in the course or everything you need for the final project, but it is a good start. You should add more code to these files than is required for these tests. Don't bother adding anything you didn't use in the final project.

In [1]:
require './final_project_lib.rb'
def load_german_credit_dataset; JSON.parse(File.read('german-credit.json')); end
def load_spambase_dataset; JSON.parse(File.read('spambase.json')); end

"if(window['d3'] === undefined ||\n   window['Nyaplot'] === undefined){\n    var path = {\"d3\":\"https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min\",\"downloadable\":\"http://cdn.rawgit.com/domitry/d3-downloadable/master/d3-downloadable\"};\n\n\n\n    var shim = {\"d3\":{\"exports\":\"d3\"},\"downloadable\":{\"exports\":\"downloadable\"}};\n\n    require.config({paths: path, shim:shim});\n\n\nrequire(['d3'], function(d3){window['d3']=d3;console.log('finished loading d3');require(['downloadable'], function(downloadable){window['downloadable']=downloadable;console.log('finished loading downloadable');\n\n\tvar script = d3.select(\"head\")\n\t    .append(\"script\")\n\t    .attr(\"src\", \"http://cdn.rawgit.com/domitry/Nyaplotjs/master/release/nyaplot.js\")\n\t    .attr(\"async\", true);\n\n\tscript[0][0].onload = script[0][0].onreadystatechange = function(){\n\n\n\t    var event = document.createEvent(\"HTMLEvents\");\n\t    event.initEvent(\"load_nyaplot\",false,false);\n\t    win

:load_spambase_dataset

## Metrics

Copy / refactor **your** implementations of the following in the file ```metrics.rb``` located in this directory. Some are already in the file for you. Note that you should copy the entire method / class to the file.

* class AUCMetric 
* cross_validate
* mean
* stdev

Note whenever you edit the ```metrics.rb``` file, you must restart this notebook. Try adding everything in the cell below and moving to the file only when done.

When complete the next cell should contain only the following:
```ruby
require '/home/<username>/final-project-3/metrics.rb'
```

where ```<username>``` is your username. You can find that in the URL for this page.

In [2]:
require '/home/aidasharif/final-project-3/metrics.rb'


true

## Decision Tree

Copy / refactor **your** implementations of ```DecisionTreeLearner``` and ```RandomForestLearner``` into ```decision_tree.rb``` in this directory.

Note whenever you edit the ```decision_tree.rb``` file, you must restart this notebook. Try adding everything in the cell below and removing to the file when done.

When done, the cell below should contain only the following:
```ruby
require '/home/<username>/final-project-3/decision_trees.rb'
```

where ```<username>``` is your username. You can find that in the URL for this page.

In [3]:
require '/home/aidasharif/final-project-3/decision_trees.rb'

true

In [4]:
def test_418b74()
  german_credit = load_german_credit_dataset()
  examples = german_credit["data"]
  learner = DecisionTreeLearner.new 1, min_size: 100, max_depth: 10
  learner.train german_credit

  scores = learner.evaluate german_credit
  metric = AUCMetric.new
  fp, tp, training_auc = metric.roc_curve scores
  plot_roc_curve fp, tp, training_auc
end

test_418b74()

In [5]:
def test_806b20()
  german_credit = load_german_credit_dataset()
  learner = RandomForestLearner.new 1, num_trees: 11, min_size: 100, max_depth: 10
  learner.train german_credit
  assert_equal 11, learner.trees.size

  scores = learner.evaluate german_credit
  metric = AUCMetric.new
  fp, tp, training_auc = metric.roc_curve scores
  plot_roc_curve fp, tp, training_auc
end
test_806b20()

## Linear Models
Convert your regularized logistic regression implementation into a ```LogisticRegressionLearner```, stored in ```"linear_models.rb"```. You should reuse the ```StochasticGradientDescent``` and ```LogisticRegressionModelL2``` classes from previous assignments. 

The LogisticRegressionLearner should look like this:

```ruby
class LogisticRegressionLearner
  attr_reader :parameters
  attr_reader :weights  
  include Learner  
  
  def initialize regularization: 0.0, learning_rate: 0.01, batch_size: 20, epochs: 1
    @parameters = {"regularization" => regularization, 
      "learning_rate" => learning_rate, 
      "epochs" => epochs, "batch_size" => batch_size}
  end
    
  def train dataset
      ###
  end
    
  def predict example
      ###
  end
    
  def evaluate dataset
      ###
  end
```

Note whenever you edit the decision_tree.rb file, you must restart this notebook. Try adding everything in the cell below and removing to the file when done. When done, the cell below should only contain
```ruby
require '/home/<username>/final-project-3/linear_models.rb'
```

where ```<username>``` is your username. You can find that in the URL for this page.


In [6]:
require '/home/aidasharif/final-project-3/linear_models.rb'

true

## Transformers
Refactor / copy your ```FeatureTransformer``` implementations to a file ```transformers.rb```. An extra transformer has been implemented for you to demonstrate how to apply feature transforms before training learners.

Note whenever you edit the decision_tree.rb file, you must restart this notebook. Try adding everything in the cell below and removing to the file when done.

When done, the cell below should contain only this:
```ruby
require '/home/<username>/final-project-3/transformers.rb'
```

where ```<username>``` is your username. You can find that in the URL for this page.

In [7]:
require '/home/aidasharif/final-project-3/transformers.rb'

true

In [8]:
def test_0f2399()
  spambase = load_spambase_dataset()  
  linear = LogisticRegressionLearner.new(regularization: 0.01, learning_rate: 0.001, batch_size: 20, epochs: 1)
  
  ## Provide your own name to a TransformingLearner
  zlearner = CopyingTransformingLearner.new(ZScoreTransformer.new(spambase["features"]), linear)  
  zlearner.name = "ZScore_Logistic"
  
  learners = [zlearner]
  
  df, learner_summary = parameter_search learners, spambase
  best_model_stats = learner_summary["ZScore_Logistic"]
  assert_true best_model_stats["mean_test_metric"] > 0.6, "Must return AUC > 0.6"
  
  df
end
test_0f2399()

5-fold CV: CopyingTransformingLearner, parameters: {"regularization"=>0.01, "learning_rate"=>0.001, "epochs"=>1, "batch_size"=>20, "learner"=>"LogisticRegressionLearner"}
0.7860720025711341
{
  "ZScore_Logistic": {
    "learner": "ZScore_Logistic",
    "parameters": {
      "regularization": 0.01,
      "learning_rate": 0.001,
      "epochs": 1,
      "batch_size": 20,
      "learner": "LogisticRegressionLearner"
    },
    "folds": 5,
    "mean_train_metric": 0.7820145303468843,
    "stdev_train_metric": 0.007698875670632638,
    "mean_test_metric": 0.7860720025711341,
    "stdev_test_metric": 0.013568446144670808
  }
}


Unnamed: 0,learner,parameters,folds,mean_train_metric,stdev_train_metric,mean_test_metric,stdev_test_metric
0,ZScore_Logistic,"{""regularization""=>0.01, ""learning_rate""=>0.001, ""epochs""=>1, ""batch_size""=>20, ""learner""=>""LogisticRegressionLearner""}",5,0.7820145303468843,0.0076988756706326,0.7860720025711341,0.0135684461446708


In [9]:
def test_ef8eed()
  german_credit = load_german_credit_dataset()  
  
  linear = LogisticRegressionLearner.new(regularization: 0.0001, learning_rate: 0.01, batch_size: 20, epochs: 50)    
  transformer = FeatureTransformPipeline.new(
    OneHotEncoding.new(%w(checking_account credit_history purpose savings job_tenure)),
    OneHotEncoding.new(%w(personal_status_gender other_debtors property other_installments housing)),
    OneHotEncoding.new(%w(job has_telephone is_foreign_worker)),
    ZScoreTransformer.new(%w(loan_duration installment_to_salary residence_tenure age)),
    LogTransform.new(%w(credit_amount))
  )
  
  zlearner = CopyingTransformingLearner.new(transformer, linear)  
  zlearner.name = "OneHot_ZScore_Log_Logistic"
  learners = [zlearner]
  df, best_model = parameter_search learners, german_credit

  best_model_stats = best_model[zlearner.name]
  assert_true best_model_stats["mean_test_metric"] > 0.65, "Must return AUC > 0.65"
end
test_ef8eed()

5-fold CV: CopyingTransformingLearner, parameters: {"regularization"=>0.0001, "learning_rate"=>0.01, "epochs"=>50, "batch_size"=>20, "learner"=>"LogisticRegressionLearner"}
0.731868995874254
{
  "OneHot_ZScore_Log_Logistic": {
    "learner": "OneHot_ZScore_Log_Logistic",
    "parameters": {
      "regularization": 0.0001,
      "learning_rate": 0.01,
      "epochs": 50,
      "batch_size": 20,
      "learner": "LogisticRegressionLearner"
    },
    "folds": 5,
    "mean_train_metric": 0.7594472547655037,
    "stdev_train_metric": 0.0048369213421869655,
    "mean_test_metric": 0.731868995874254,
    "stdev_test_metric": 0.018248865673145685
  }
}


## Putting it together

Now, we will test different type models on the same dataset and compare. This is similar to what will happen in the final project, so use this test as an example.

In [None]:
def test_a0551a()
  german_credit = load_german_credit_dataset()

  linear = LogisticRegressionLearner.new(regularization: 0.0001, learning_rate: 0.01, batch_size: 20, epochs: 50)    
  transformer = FeatureTransformPipeline.new(
    OneHotEncoding.new(%w(checking_account credit_history purpose savings job_tenure)),
    OneHotEncoding.new(%w(personal_status_gender other_debtors property other_installments housing)),
    OneHotEncoding.new(%w(job has_telephone is_foreign_worker)),
    ZScoreTransformer.new(%w(loan_duration installment_to_salary residence_tenure age)),
    LogTransform.new(%w(credit_amount))
  )

  learners = [
      DecisionTreeLearner.new(1, min_size: 5, max_depth: 50),
      RandomForestLearner.new(1, num_trees: 11, min_size: 5, max_depth: 50),
      TransformingLearner.new(transformer, linear)      
  ]  
  df, model_stats = parameter_search learners, german_credit
  
  assert_true model_stats["DecisionTreeLearner"]["mean_test_metric"] > 0.65, "Decision Tree > 0.65"
  assert_true model_stats["RandomForestLearner"]["mean_test_metric"] > 0.70, "Random Forest > 0.70"  
  assert_true model_stats["TransformingLearner"]["mean_test_metric"] > 0.66, "Random Forest > 0.66"

  linear_model_auc = model_stats["TransformingLearner"]["mean_test_metric"]
  decision_tree_auc = model_stats["DecisionTreeLearner"]["mean_test_metric"]
  
  assert_true linear_model_auc > decision_tree_auc, "Linear Model with transforms beats Decision Tree out of the box"  
  df
end
test_a0551a()

5-fold CV: DecisionTreeLearner, parameters: {"min_size"=>5, "max_depth"=>50}
0.6570607349657211
5-fold CV: RandomForestLearner, parameters: {"num_trees"=>11, "min_size"=>5, "max_depth"=>50}
0.713962735276577
5-fold CV: TransformingLearner, parameters: {"regularization"=>0.0001, "learning_rate"=>0.01, "epochs"=>50, "batch_size"=>20, "learner"=>"LogisticRegressionLearner"}
