In [1]:
import numpy as np
import math
from copy import deepcopy
from sklearn.metrics import accuracy_score
import sklearn.datasets
from sklearn import preprocessing
from sklearn.svm import SVC

In [2]:
X,y = sklearn.datasets.make_hastie_10_2()
X_train = X[0:8000,:]
y_train = y[0:8000]
X_test = X[8000:,:]
y_test = y[8000:]

# Exercise 1
Implement the AdaBoost ensemble algorithm by completing the following code:

In [9]:
class AdaBoost:
    def __init__(self, weakModel, T):
        self.wealModel_ = weakModel
        self.T_ = T
        
    def fit(self, X, y):
        w = np.ones(len(X)) / len(X)
        self.models_ = []
        self.a_ = []
        for i in range(self.T_):
            m = deepcopy(self.wealModel_)
            m.fit(X,y,sample_weight=w)
            y_pred = m.predict(X)
            err = np.sum(w.dot([1 if y_t != y_p else 0 for y_t,y_p in zip(y,y_pred)]))
            a = 0.5 * np.log((1-err)/err)
            w = w * np.exp(-a * y * y_pred)
            w = w / np.sum(w)
            m.err_ = err
            self.models_.append(m)
            self.a_.append(a)

    def predict(self, X):
        partials_sum = np.zeros(len(X))
        for i in range(len(self.models_)):
            y_partials_pred_ = np.multiply(self.models_[i].predict(X), self.a_[i])
            partials_sum = np.add(partials_sum, y_partials_pred_)
        return [np.sign(sm) for sm in partials_sum]

In the implementation you are free to assume:
- that the problem is a binary classification problem with labels in $\{-1, +1\}$.
- that the weakModel can fit a weighted sample set by means of the call `weakModel.fit(X,y,sample_weight=w)` where `w` is a vector of length $|y|$.

Test your implementation on the dataset loaded above and using an SVC with a polynomial kernel. 

In [4]:
class SVC_:
        def __init__(self, kernel="rbf", degree="3"):
            self.svc = SVC(kernel=kernel, degree=degree)

        def fit(self, X,y,sample_weight=None):
            if sample_weight is not None:
                sample_weight = sample_weight * len(X)

            self.svc.fit(X,y,sample_weight=sample_weight)
            return self

        def predict(self, X):
            return self.svc.predict(X)

In [5]:
weakModel = SVC_(kernel="poly", degree=3)
adaboost = AdaBoost(weakModel, 100)
adaboost.fit(X_train, y_train)
y_train_ = adaboost.predict(X_train)
y_test_ = adaboost.predict(X_test)
print "Accuracy on the Train Set: %s" % accuracy_score(y_train, y_train_)
print "Accuracy on the Test Set: %s" % accuracy_score(y_test, y_test_)

Accuracy on the Train Set: 0.88875
Accuracy on the Test Set: 0.862


and evaluate the AdaBoost performances as usual by calculating the classification error. 

**Note 1**:  
since the labels are bound to be in ${+1, -1}$, the classification error can be easily computed as:
$$
   error(y,y') = \frac{1}{2} - \frac{y^T \times y'}{2N},
$$
where $N$ is the total number of examples. The formula can be derived noticing that $y^T \times y'$ calculates the number $N_c$ of examples correctly classified  minus the number $N_{\bar c}$ of examples incorrectly classified. We have then $y^T \times y' = N_c - N_{\bar c}$ and by noticing that $N = N_c + N_{\bar c}$:
$$
   N - y^T \times y' = 2 N_{\bar c} \Rightarrow \frac{N - y^T \times y'}{2 N} = \frac{N_{\bar c}}{N} = error(y,y')
$$

**Note 2**:
do not forget to deepcopy your base model before fitting it to the new data

**Note 3**:
The SVC model allows specifying weights, but it *does not* work well when weights are normalized (it works well when the weights are larger). The following class takes normalized weights and denormalize them before passing them to the SVC classifier:

```python
    class SVC_:
        def __init__(self, kernel="rbf", degree="3"):
            self.svc = SVC(kernel=kernel, degree=degree)

        def fit(self, X,y,sample_weight=None):
            if sample_weight is not None:
                sample_weight = sample_weight * len(X)

            self.svc.fit(X,y,sample_weight=sample_weight)
            return self

        def predict(self, X):
            return self.svc.predict(X)
```

# Exercise 2

Write a weak learner to be used with the AdaBoost algorithm you just wrote. The weak learner that you will implement shall work as follows:

- creates a random linear model by generating the needed weight vector $\mathbf{w}$ at random; each weight shall be sampled from U(-1,1);
- it evaluates the weighted loss $\epsilon_t$ on the given dataset and flip the linear model if $\epsilon_t > 0.5$
- at prediction time it predicts +1 if $\mathbf{x} \cdot \mathbf{w} > 0$ it predicts -1 otherwise.

In [6]:
class RandomLinearModel:
    def loss(self, y, y_, w):
        return np.sum(w.dot([1 if y_t != y_p else 0 for y_t,y_p in zip(y,y_)]))
        
    def fit(self,X,y,sample_weight=None):
        self.w_ = np.random.uniform(-1,1,X.shape[1])
        err = self.loss(y, self.predict(X), sample_weight)
        if err > 0.5:
            self.w_ * -1
        
    def predict(self,X):
        return [1 if s > 0 else -1 for s in X.dot(self.w_)]

Learn an AdaBoost model using the RandomLinearModel weak learner printing every $K$ iterations the weighted error and the current error of the ensemble (you are free to choose $K$ so to make your output just frequent enough to let you know what is happening but without flooding the console with messages). Evaluate the training and test error of the final ensemble model.

In [10]:
rs = RandomLinearModel()
a = AdaBoost(rs,10000)
a.fit(X_train,y_train)

y_train_ = a.predict(X_train)
y_test_ = a.predict(X_test)

print "Accuracy on the Train Set: %s" % accuracy_score(y_train, y_train_)
print "Accuracy on the Test Set: %s" % accuracy_score(y_test, y_test_)

Accuracy on the Train Set: 0.80875
Accuracy on the Test Set: 0.517


Write few paragraphs about what you think about the experiment and about the results you obtained. 

While with the SVM AdaBoost after about a dozen iterates to reach a pretty good accuracy, with the RandomLinearModel instead even after many iterations there is a minimum gain in accuracy. This is due to the fact that the classifier is slightly better than the random classifier. AdaBoost combining the models manages to improve performance slightly. It is also noted that the combined classifier learns to recognize the train set, but not the test set, ie it does not learn to generalize.