# Implementation of Adaboost Algorithm
In this part we implement adaboost with ipython notebook. First let's load and play with the data. The data set contains two classes. It cannot be separated with a linear classifier.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from numpy  import *
url = 'https://coding.net/u/HongHuangNeu/p/Machine-Learning-Notes-Data/git/raw/master/AdaBoost/data.txt'
df = pd.read_csv(url,sep='\t')
df[:3]

In [None]:
cmap = {1: 'red', -1: 'blue'}
df.plot(x='x1', y='x2', kind='scatter', c=[cmap.get(c, 'black') for c in df.label])

plt.show()

# Adaboost Algorithm

Assume a 2-class classification problem. Training set is $\{(x_1,y_1),(x_2,y_2),...(x_N,y_N)\}$, where $y \in \{-1,+1\}$. 

1. Initially, all training points have the same weight: $w_{i}^{(0)}=\frac{1}{N}$. Choose a weak classification model $\phi(x;\theta)\in \{-1,1\}$.

2. Repeat the following for each iteration $m$: train a weak classifier $\phi(x;\theta_{m})$ on the training set such that the weighted error
    $$P_{m}=\sum^{N}_{i=1} w_{i}^{(m)} I(1-y_{i}\phi(x_{i};\theta_{m}))$$

    is minimized

    where $I(x)$ is 0 when $x$ is 0 and 1 when $x$ is non-zero

    The weight for base classifier $\phi(x;\theta_{m})$ is
    $$\alpha_{m}=\frac{1}{2}\ln \frac{1-P_{m}}{P_{m}}$$

    The weight of each data point for the next iteration

    $$w_{i}^{(m+1)}=\frac{w_{i}^{(m)}exp(-y_{i}\alpha_{m}\phi(x_{i};\theta_{m}))}{z_{m}} $$
    where $z_{m}$ is the normalized factor.

3. The resulting classifier is 
    $$f(x)=sign\{F(x)\}$$
    where 
    $$F(x)=\sum_{k=1}^{K}\alpha_{k}\phi(x;\theta_{k})$$
    The iterative process terminates when the predefined number of iterations has been reached or the error of the complete classifier $F(x)$ on the training set is 0.



## Decision Stump 
The base classifier used is decision stump, which is a single layered decision tree. It only looks at one feature in the data set. If the value is smaller(or lager) than the threshold, assign the object to one class. If the value is larger or equal to(smaller or equal to) the threshold, assign it to the other class. The **classify** function performs the classification of decision stump classifier. The information of the classifier is maintained in a dict, including the threshold, the feature to look at, and the range(smaller or larger than the threshold) which belongs to 1-class.



In [None]:
import numpy as np
def classify(dataSet, classifier):
    threshold = classifier['threshold']
    featureIndex = classifier['featureIndex']
    numberOfRows = dataSet.shape[0]
    labels = ones((numberOfRows,1))
    if classifier['operator'] == 'lt':                   #'lt' means 'less than'
        labels[dataSet[:,featureIndex]>=threshold] = -1
    else:
        labels[dataSet[:,featureIndex]<threshold] = -1
    return labels


## Trainer of Decision Stump
The decision stump is trained by enumerating all possibilities, which contains the combination of three variables:
1. features (correspond to **featureIndex** in the code)
2. position the cut the range of the chosen feature (correspond to **minValue + i * stepSize** in the code)
3. range that belongs to the 1-class(left or right of the cutting point)(correspond to **operator** in the code)

So we have a nested loop with three layers. For the second variable, we cut the range of a feature $[minValue, maxValue]$ into $numberOfPieces$ pieces. We enumerate all the cutting points and in each iteration we use one cutting point as the point to separate this feature into two parts. 

For each combination of the three variables mentioned above, we generate a decision stump and measure the error on the **weighted training set**. The decision with the lowest error will be returned.

In [None]:

#trainer of decision stump
def trainStump(dataSet, weights):
    numberOfFeatures = dataSet.shape[1] - 1
    numberOfRows = dataSet.shape[0]
    numberOfPieces = 100
    bestClassifier = {}
    bestError = float("inf")
    #enumerate all possible features
    for featureIndex in range(numberOfFeatures):
        minValue=dataSet[:,featureIndex].min()
        maxValue=dataSet[:,featureIndex].max()
        
        #cut the range of the feature into numberOfPieces pieces
        stepSize = (maxValue - minValue)/numberOfPieces
        #enumerate all possible cutting points
        for i in range(-1,numberOfPieces+1):
            #enumerate all possible operators 
            for operator in ['lt','gt']:
                #generate the decision stump with the corresponding setting
                classifier={}
                classifier['threshold'] = minValue + i * stepSize
                classifier['featureIndex'] = featureIndex
                classifier['operator'] = operator
                #calculated the error on the weighted training set
                labels = classify(dataSet, classifier)
                error = zeros((numberOfRows,1))
                #perform element-wise multiply so that in the resulting column vector, misclassified data points correspond to -1
                temp = dataSet[:,numberOfFeatures].reshape(numberOfRows,1)*labels #we must do a reshape here because dataSet[:,numberOfFeatures] gives an array. We need a column vector                
                error[temp[:,0] == -1] = 1
                #the final error is calculated with matrix multiplication
                weightError = mat(weights).T*mat(error)
                #return the decision stump with the lowest 
                if(weightError[0,0]<bestError):
                    bestError = weightError[0,0]
                    bestClassifier=classifier
                    #the classification result of the best classifier on the training set
                    bestLabels = labels
    return bestClassifier , bestError, bestLabels

#play with this classifier tainer. 
dataSet = df.values    
numberOfRows = dataSet.shape[0]
#assume that all the training points have equal weights
weights = ones((numberOfRows,1))
weights = weights/numberOfRows
classifier,error,labels=trainStump(dataSet,weights)
classifier

## The complete Adaboost
The body of adaboost algorithm is implemented in function **adaBoost**. Initially, all the training data points have the same weight. In each of the **numberOfIterations** iterations, first train a new decision stump on the weighted training set, then calculate the **alpha** for this classifier and the new weights of data points for the next iteration.

The method **applyFullClassifier** apply the complete boosted classifier and return thepredictions and the error on the training set. The predictions of individual base classifiers are multiplied with weights of the classifier and summed up. The sign of the sum correspond to the label of the complete classifier.

The method **applyFullClassifierWithoutError** apply the complete boosted classifier without returning error.

In [None]:
#apply the complete boosted classifier and return thepredictions and the error on the training set
def applyFullClassifier(dataSet,classifiers):
    numberOfRows=dataSet.shape[0]
    numberOfFeatures=dataSet.shape[1]-1
    prediction=zeros((numberOfRows,1))
    for c in classifiers.keys():
        item = classifiers[c]
        classifier = item['classifier']
        alpha = item['alpha']
        prediction = prediction + alpha*classify(dataSet, classifier)
    prediction[prediction[:,0]<0]=-1
    prediction[prediction[:,0]>=0]=1
    error = zeros((numberOfRows,1))
    temp = dataSet[:,numberOfFeatures].reshape(numberOfRows,1)*prediction
    error[temp[:,0] == -1] = 1
    return prediction, error

#body of adaboost
def adaBoost(dataSet, numberOfIterations):
    classifiers = {}
    numberOfRows = dataSet.shape[0]
    numberOfFeatures=dataSet.shape[1] - 1
    #weights of training data points, initially all data points have the same weight.
    weights = ones((numberOfRows,1))
    weights = weights/numberOfRows
    labelsOfFullClassifier = zeros((numberOfRows,1))
    expectedLabels=dataSet[:,numberOfFeatures].reshape(numberOfRows,1)
    for i in range(numberOfIterations):
        classifiers[i] = {}
        #train a new classifier. Here error is P_m in the formula
        bestClassifier , error, labels = trainStump(dataSet,weights)
        classifiers[i]['classifier'] = bestClassifier
        #update alpha according based on the error of the newly trained based classifier on the weighted training set.
        alpha = 0.5*log((1-error)/error)
        classifiers[i]['alpha'] = alpha
        #update the weight of the training points 
        weights = weights * exp(expectedLabels * labels *(0-1)*alpha)
        z = weights.sum()
        weights =  weights / z
        #apply the complete boosted classifier on the dataset
        predictions, errors =applyFullClassifier(dataSet,classifiers)
        numberOfMisclassifiedObjects = errors.sum()
        #if no training points are misclassified, exit looping
        print(str(numberOfMisclassifiedObjects)+" objects wrongly classified")
        if numberOfMisclassifiedObjects == 0:
            break
    
    return classifiers

#apply the complete boosted classifier without returning error.
def applyFullClassifierWithoutError(dataSet,classifiers):
    numberOfRows=dataSet.shape[0]
    numberOfFeatures=dataSet.shape[1]-1
    prediction=zeros((numberOfRows,1))
    for c in classifiers.keys():
        item = classifiers[c]
        classifier = item['classifier']
        alpha = item['alpha']
        prediction = prediction + alpha*classify(dataSet, classifier)
    prediction[prediction[:,0]<0]=-1
    prediction[prediction[:,0]>=0]=1
    return prediction

boostedClassifier = adaBoost(dataSet,10)
boostedClassifier
        
        
        
        
        

## Plot the decision boundary


In [None]:
import numpy as np
h=0.1
x_min, x_max = dataSet[:, 0].min() - 1, dataSet[:, 0].max() + 1
y_min, y_max = dataSet[:, 1].min() - 1, dataSet[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))

# here "model" is your model's prediction (classification) function
Z = applyFullClassifierWithoutError(np.c_[xx.ravel(), yy.ravel()],boostedClassifier) 

# Put the result into a color plot
Z = Z.reshape(xx.shape)
Z
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)
plt.axis('off')

# Plot also the training points
plt.scatter(dataSet[:, 0], dataSet[:, 1], c=[cmap.get(c, 'black') for c in df.label], cmap=plt.cm.Paired)
plt.show()