# Boosting small neural networks for binary classification
For this exercise you will implement Boosting of small neural networks for a binary classification problem. You will be using pytorch to fit your neural network models. You will apply your model to a spam dataset.

Recall from the class that the loss function for binary classification for $y \in {1, -1}$ is $L(y, f) = \log(1 + e^{-y\cdot f(x)})$. Therefore the pseudo-residual $r_i$ is:

$$ r_i = \frac{y^{(i)}}{1+ e^{-y^{(i)}\cdot f(x)}}$$

In class we also derived the best constant for binary classification in this setting $f_0 = \log(\frac{1 + \bar{y}}{1-\bar{y}})$. For this algorithm instead of fitting a regression tree to $\{(x^{(i)}, r_i)\}_{i=1}^N$ at every iteration, we will fit a neural network model.

Let's denote $T_m(x)$ is the fitted linear regression model at time $m$. At time $m$, updateds are given by $f_m = f_{m-1} + \nu T_m$ where $\nu$ is a fixed learing rate. After $M$ iterations we have learned
$f_M = f_0 + \sum_{m=1}^M \nu \cdot T_m$. Hard predictions are done by using the sign function 
$F(x) = sign(f_M) = sign(f_0 + \sum_{m=1}^M \nu \cdot T_m)$. 

In [1]:
from pathlib import Path
import pandas as pd
import numpy as np
import math

import torch
import torch.nn as nn
import torch.nn.functional as F
from sklearn.preprocessing import StandardScaler

In [2]:
PATH = Path("data-hw3")

In [3]:
# accuracy computation
# this data is not highly imbalance so accuracy is ok
def accuracy(y, pred):
    return np.sum(y == pred) / float(y.shape[0]) 

In [4]:
def parse_spambase_data(filename):
    """ Given a filename return X and Y numpy arrays

    X is of size number of rows x num_features
    Y is an array of size the number of rows
    Y is the last element of each row. (Convert 0 to -1)
    """
    dataset = np.loadtxt(filename, delimiter=",")
    K = len(dataset[0])
    Y = dataset[:, K - 1]
    X = dataset[:, 0 : K - 1]
    Y = np.array([-1. if y == 0. else 1. for y in Y])
    return X, Y

In [5]:
def normalize(X, X_val):
    """ Given X, X_val compute X_scaled, X_val_scaled
    
    Use StandardScaler()
    return X_scaled, X_val_scaled
    """
    ### BEGIN SOLUTION
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    X_val_scaled = scaler.transform(X_val)
    ### END SOLUTION
    return X_scaled, X_val_scaled

In [6]:
X, Y = parse_spambase_data(PATH/"spambase.train")
X_val, Y_val = parse_spambase_data(PATH/"spambase.test")
X, X_val = normalize(X, X_val)

In [7]:
xx = np.around(X[0, :3],3)
assert(np.array_equal(xx, np.array([-0.343, -0.168, -0.556])))

In [8]:
class NN(nn.Module):
    def __init__(self, D, seed):
        super(NN, self).__init__()
        torch.manual_seed(seed) # this is for reproducibility
        self.linear1 = nn.Linear(D,10)
        self.linear2 = nn.Linear(10,1)
        self.bn1 = nn.BatchNorm1d(num_features=10)
        
    def forward(self, x):
        x = self.linear1(x)
        x = self.bn1(F.relu(x))
        return self.linear2(x)

In [9]:
def fitNN(model, X, r, epochs=22, lr=0.1):
    """ Fit a regression model to the pseudo-residuals
    
    returns the fitted values on training data as a numpy array
    Shape of the resturn should be (N,) not (N,1).
    """
    ### BEGIN SOLUTION
    X = torch.tensor(X).float()
    r = torch.tensor(r).float()
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
                                
    for _ in range(epochs):
        model.train() 
        r_hat = model(X).squeeze()
        loss = F.mse_loss(r_hat, r)

        optimizer.zero_grad()
        loss.backward()

        optimizer.step()
    
    r_hat = model(X).squeeze()
    out = r_hat
    ### END SOLUTION
    return out.detach().numpy()

In [10]:
def gradient_boosting_predict(X, f0, models, nu):
    """Given X, models, f0 and nu predict y_hat in {-1, 1}
    
    y_hat should be a numpy array with shape (N,)
    """
    ### BEGIN SOLUTION
    models = [model.eval() for model in models]
    y_hat = np.sign(f0 + np.sum(np.array([model(torch.tensor(X).float()).detach().numpy().squeeze() 
                                 for model in models]) * nu, axis=0))
    ### END SOLUTION
    return y_hat 

In [11]:
def compute_pseudo_residual(Y, fm):
    """ vectorized computation of the pseudoresidual
    """
    ### BEGIN SOLUTION
    res = np.divide(Y, 1 + np.exp(np.multiply(Y, fm)))
    ### END SOLUTION
    return res

In [12]:
y = np.array([-1, -1, 1, 1])
fm = np.array([-0.4, .1, -0.3 , 2])
res = compute_pseudo_residual(y, fm)
xx = np.around(res, 3)
actual = np.array([-0.401, -0.525,  0.574,  0.119])
assert(np.array_equal(xx, actual))

In [13]:
def boostingNN(X, Y, num_iter, nu):
    """Given an numpy matrix X, a array y and num_iter return trees and weights 
   
    Input: X, y, num_iter
    Outputs: array of Regression models
    Assumes y is {-1, 1}
    """
    models = []
    N, D = X.shape
    seeds = [s+1 for s in range(num_iter)] # use this seeds to call the model
    
    ### BEGIN SOLUTION
    f0 = np.log((1 + X.mean()) / (1 - X.mean()))
    fm = f0.copy()
    for i in range(num_iter):
        model = NN(D=D, seed=seeds[i])
        r = compute_pseudo_residual(Y, fm)
        r_hat = fitNN(model, X, r)
        models.append(model)
        fm += (nu * r_hat)
    
    ### END SOLUTION
    return f0, models

In [14]:
X, Y = parse_spambase_data(PATH/"spambase.train")
X_val, Y_val = parse_spambase_data(PATH/"spambase.test")
X, X_val = normalize(X, X_val)

In [15]:
nu = .1
f0, models = boostingNN(X, Y, num_iter=20, nu=nu)
y_hat = gradient_boosting_predict(X, f0, models, nu=nu)

In [16]:
acc_train = accuracy(Y, y_hat)
assert(np.around(acc_train, decimals=3)==0.919)

In [18]:
y_hat = gradient_boosting_predict(X_val, f0, models, nu=nu)
acc_val = accuracy(Y_val, y_hat)
assert(np.around(acc_val, decimals=4)==0.927)