# Test notebook for boosting

The goal of this notebook is to test the algorithms on ```boosting.py```. We will teste them on four datasets:

- Diabetes dataset
- Prostate Cancer dataset
- California Housing Dataset
- Leukemia Dataset



## Diabetes Dataset

We can import this dataset:

In [110]:
from sklearn.datasets import load_diabetes
import numpy as np
from boosting import Boosting
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

data = load_diabetes().data
response = load_diabetes().target

In [111]:
data.shape, response.shape

((442, 10), (442,))

We have n = 442 and p = 10. Next we will normalice our data and our responses


In [112]:
def normalize_data(data,responses):
    y = responses - np.mean(responses)
    X = data - np.mean(data, axis=0)
    X = X / np.linalg.norm(X, axis=0)
    return X, y

In [None]:
X,y = normalize_data(data,response)

We are going to define a bench mark with ```Sklearn```:

In [118]:
def r_squared(X,y):
    reg = LinearRegression().fit(X, y)
    r2 = reg.score(X, y)
    return r2

def r2andmse(X,b,y):
    predictions = np.dot(X, b)
    ss_res = np.sum((predictions - y)**2)
    ss_tot = np.sum((y - np.mean(y))**2)
    r_squared = 1 - (ss_res / ss_tot)
    mse = np.mean((y - predictions) ** 2)
    return (r_squared, mse)


And now we can star our experiments:

In [172]:
bosting = {}
sklearn_r2 = r_squared(X, y)

Boosting_algorithms = ['LS-Boost', 'FS-Boost', 
                       'R-FS-Boost', 'Path-R-FS-Boost']

exp = Boosting(X, y)


b_ls = exp.LS_Boost(numiter=1000, epsilon=0.01)
bosting['LS-Boost'] = r2andmse(X, b_ls, y)

b_fs = exp.FS_Boost(numiter=10000, epsilon=0.25)
bosting['FS-Boost'] = r2andmse(X, b_fs, y)

b_rfs = exp.R_FS(numiter=10000, epsilon=0.25, delta=10079)
bosting['R-FS-Boost'] = r2andmse(X, b_rfs, y)

num_itr = 50000
deltalist = np.linspace(0.1, 1, num_itr)
b_pathrfs = exp.Path_R_FS(numiter=num_itr, epsilon=0.01, delta_list=deltalist)
bosting['Path-R-FS-Boost'] = r2andmse(X, b_pathrfs, y)

ValueError: The delta_list parameter must be a list of bounded positive values.

In [170]:
bosting['Path-R-FS-Boost']

(1.9811682544279563e-05, 5929.767415913281)