# Q2 Implementation of Adaboost

The following table shows the training dataset. It consists of 10 data and 2 labels.

|  |    |    |     |   |    |    |    |    |    |    |
|--|----|----|-----|---|----|----|----|----|----|----|
|x |0   |1   |2    |3  |4   |5   |6   |7   |8   | 9  |
|y |1   |1   |1    |-1 |-1  |-1  |1   |1   |1   |-1  |


We assume the weak classifier is produced by $x < v$ or $x > v$ where $v$ is the threshold and makes the classifier get the best accuracy on the dataset. You should implement the **AdaBoost** algorithm to learn a **strong classifier**. Notice that you CANNOT use Adaboost library. You need to implement it manually.

You should also report the final expression of the strong classifier, such as $C^∗(x) = sign [\alpha_1 C_1(x) + \alpha_2 C_2(x) + \alpha_3 C_3(x) + \cdots]$, where $C_i(x)$ is the base classifier and $\alpha_i$ is the weight of base classifier. You are also required to describe each basic classifier in detail.

For simplicity, the threshold $v$ should be the multiple of 0.5, i.e., $v\%0.5==0$. For example, you can set $v$ as 2, 2.5, or 3, but you cannot set $v$ as 2.1.

In [221]:
import math
import numpy as np

class BaseClassifier:
    """
    The Base Classifier Class
    
    Initial Parameters:
        v: the threshold of the base classifier
        name: (string) the name of the base classifier
        lower_v_label: (+1 or -1) the label when x < v
    """
    
    
    def __init__(self, v, name, lower_v_label):
        self.v = v
        self.name = name
        self.lower_v_label = lower_v_label
        
    def predict(self, X):
        y_predict = list()
        for x in X:
            if x < self.v:
                y_predict.append(self.lower_v_label)
            else:
                y_predict.append(-self.lower_v_label)
        return y_predict
        
    def get_error_rate(self, X, y, weights):
        weight_count = 0
        total = len(y)
        y_predict = self.predict(X)
        for i in range(total):
            if y_predict[i] != y[i]:
                weight_count += weights[i]
        return weight_count / total

We consider from two sides to find all best base classifiers, firstly we consider this situation,

$$
C(x) = \left\{
\begin{aligned}
+1 &  , x < v \\
-1 & , \ x \geq v
\end{aligned}
\right.
$$

With $v$ satisfying $v\%0.5==0$, it's obvious when $v=2.5$ or $v=8.5$, classifier achieves the lowest error rate 0.3, only misclassfying 3 samples. We set the two classifiers as $C_1$ and $C_2$:

$$
C_1(x) = \left\{
\begin{aligned}
+1 &  , x < 2.5 \\
-1 & , \ x \geq 2.5
\end{aligned}
\right.
\ \ \ \ \
C_2(x) = \left\{
\begin{aligned}
+1 &  , x < 8.5 \\
-1 & , \ x \geq 8.5
\end{aligned}
\right.
$$

Then we consider a classifier with this form,

$$
C(x) = \left\{
\begin{aligned}
-1 &  , x < v \\
+1 & , \ x \geq v
\end{aligned}
\right.
$$

Similarly, we can find that the classifier with $v=5.5$ has the smallest error rate of 0.4. We denote it by $C_3$:

$$
C_3(x) = \left\{
\begin{aligned}
-1 &  , x < 5.5 \\
+1 & , \ x \geq 5.5
\end{aligned}
\right.
$$

In [222]:
# Training Dataset
X = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
y = [1, 1, 1, -1, -1, -1, 1, 1, 1, -1]

weights = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

# Initialize three best basic classifiers C1, C2, C3
# v = 2.5, lower_v_label = 1
C1 = BaseClassifier(v=2.5, name="C1", lower_v_label=1)
print("Classifier 1: v < 2.5 => y = +1, v >= 2.5 => y = -1")
print("prediction:",C1.predict(X))
print("error rate: ",C1.get_error_rate(X, y, weights))
print()

# v = 8.5, lower_v_label = 1
C2 = BaseClassifier(v=8.5, name="C2", lower_v_label=1)
print("Classifier 1: v < 8.5 => y = +1, v >= 8.5 => y = -1")
print("prediction:", C2.predict(X))
print("error rate: ", C2.get_error_rate(X, y, weights))
print()

# v = 5.5, lower_v_label = -1
C3 = BaseClassifier(v=5.5, name="C3", lower_v_label=-1)
print("Classifier 1: v < 2.5 => y = -1, v >= 2.5 => y = +1")
print("prediction:",C3.predict(X))
print("error rate: ",C3.get_error_rate(X, y, weights))
print()

Classifier 1: v < 2.5 => y = +1, v >= 2.5 => y = -1
prediction: [1, 1, 1, -1, -1, -1, -1, -1, -1, -1]
error rate:  0.3

Classifier 1: v < 8.5 => y = +1, v >= 8.5 => y = -1
prediction: [1, 1, 1, 1, 1, 1, 1, 1, 1, -1]
error rate:  0.3

Classifier 1: v < 2.5 => y = -1, v >= 2.5 => y = +1
prediction: [-1, -1, -1, -1, -1, -1, 1, 1, 1, 1]
error rate:  0.4



In [223]:
class Adaboost:
    """
    Adaboost Classifier Class
    
    Initial Parameters:
        base_classifiers: [Classifier1, Classifier2, ...] the list of base classifiers used to train
        n_classifiers: The maximum number of classifiers at which boosting is terminated
    """
    
    def __init__(self, base_classifiers, n_classifiers):
        self.base_classifiers = base_classifiers
        self.n_classifiers = n_classifiers
    
    # train on data X -> y
    def fit(self, X, y):
        weights = [1/len(X)] * len(X) # initialize weight
        self.alphas = list()
        self.classifiers = list()
        for n in range(self.n_classifiers):
            min_error_rate = 1
            for cf in self.base_classifiers:
                e = cf.get_error_rate(X, y, weights)
                print(cf.name, "error rate = ", e)
                if e < min_error_rate:
                    best_base_classifier = cf
                    min_error_rate = e
            
            # calculate the importance of the classifier
            alpha = 1/2 * math.log((1-min_error_rate)/min_error_rate) 
            
            # save alpha and the best_base_classifier at each iteration
            self.alphas.append(alpha)
            self.classifiers.append(best_base_classifier)
            
            # update weights
            y_predict = best_base_classifier.predict(X)
            for i in range(len(weights)):
                if y_predict[i] == y[i]:
                    weights[i] = weights[i] / (2 * (1 - min_error_rate))
                else:
                    weights[i] = weights[i] / (2 * min_error_rate)
            print("weights = ", weights)
            print()
            
        # the information to print to describe the final ensemble classifier
        info = "final ensemble classifier = "
        i = 0
        for i in range(len(self.alphas)):
            if i < len(self.alphas) - 1:
                info += (str(self.alphas[i]) + " * " + self.classifiers[i].name + " + ")
            else:
                info += (str(self.alphas[i]) + " * " + self.classifiers[i].name)
        print(info)

    def predict(self, X):
        scores = np.array([0.0]*len(X))
        for i in range(len(self.alphas)):
            scores += self.alphas[i] * np.array(self.classifiers[i].predict(X))
        
        return np.sign(scores)
    
    def get_error_rate(self, X, y, weights):
        weight_count = 0
        total = len(y)
        y_predict = self.predict(X)
        for i in range(total):
            if y_predict[i] != y[i]:
                weight_count += weights[i]
        return weight_count / total

In [224]:
ada = Adaboost(base_classifiers=[C1, C2, C3], n_classifiers=3)
ada.fit(X, y)

C1 error rate =  0.030000000000000006
C2 error rate =  0.030000000000000006
C3 error rate =  0.04
weights =  [0.051546391752577324, 0.051546391752577324, 0.051546391752577324, 0.051546391752577324, 0.051546391752577324, 0.051546391752577324, 1.6666666666666665, 1.6666666666666665, 1.6666666666666665, 0.051546391752577324]

C1 error rate =  0.5
C2 error rate =  0.015463917525773196
C3 error rate =  0.02061855670103093
weights =  [0.02617801047120419, 0.02617801047120419, 0.02617801047120419, 1.6666666666666667, 1.6666666666666667, 1.6666666666666667, 0.8464223385689353, 0.8464223385689353, 0.8464223385689353, 0.02617801047120419]

C1 error rate =  0.25392670157068065
C2 error rate =  0.5
C3 error rate =  0.010471204188481676
weights =  [1.25, 1.25, 1.25, 0.8421516754850089, 0.8421516754850089, 0.8421516754850089, 0.42768959435626097, 0.42768959435626097, 0.42768959435626097, 1.25]

final ensemble classifier = 1.7380493449176364 * C1 + 2.07683056968926 * C2 + 2.2742999172498486 * C3


In [225]:
ada.predict(X)

array([ 1.,  1.,  1., -1., -1., -1.,  1.,  1.,  1., -1.])

In [226]:
ada.get_error_rate(X, y, weights)

0.0

In the end, we get a strong classifier $C^*(x)$

$$C^*(x) = sign(1.7380493449176364 * C_1(x) + 2.07683056968926 * C_2(x) + 2.2742999172498486 * C_3(x))$$

that can achieve **0 error of classification** where

$$
C_1(x) = \left\{
\begin{aligned}
+1 &  , x < 2.5 \\
-1 & , \ x \geq 2.5
\end{aligned}
\right.
\ \ \ \ \
C_2(x) = \left\{
\begin{aligned}
+1 &  , x < 8.5 \\
-1 & , \ x \geq 8.5
\end{aligned}
\right.
\ \ \ \ \ 
C_3(x) = \left\{
\begin{aligned}
-1 &  , x < 5.5 \\
+1 & , \ x \geq 5.5
\end{aligned}
\right.
$$

The classification result of $C^*(x)$ is `[1, 1, 1, -1, -1, -1, 1, 1, 1, -1]`