# Boosting

## AdaBoost Algorithm

* 初始化训练数据的权值分布

$$D_1 = (w_{11}, \cdots, w_{1i}, \cdots, w_{1N}), \ w_{1i} = \frac{1}{N}, \ i=1, 2, \cdots, N$$

* 对$m = 1,2, \cdots, M$
    
    * 使用具有权值分布$D_m$的训练数据集学习，得到基本分类器$G_m(x)$
    
    * 计算$G_m(x)$在训练数据集上的分类误差率
    
    $$e_m=\sum_{i=1}^NP(G_m(x_i)\neq y_i)=\sum_{i=1}^Nw_{mi}I(G_m(x_i)\neq y_i)$$
    
    * 计算$G_m(x)$的系数
    
    $$\alpha_m=\frac{1}{2}log\frac{1-e_m}{e_m}$$
    
    * 更新训练数据集的权值分布
    
    $$D_{m+1} = (w_{m+1, 1}, \cdots, w_{m+1, i}, \cdots, w_{m+1, N})$$
    
    $$w_{m+1, i}=\frac{w_{mi}}{Z_m}exp(-\alpha_my_iG_m(x_i))$$
    
    $$Z_m = \sum_{i=1}^Nw_{mi}exp(-\alpha_my_iG_m(x_i))$$

* 得到最终分类器

$$G(x)=sign(\sum_{m=1}^M\alpha_mG_m(x))$$

## Boosting Tree

### 梯度提升算法

* 初始化

$$f_0(x)=argmin_c \sum_{i=1}^NL(y_i,c)$$

* 对$m = 1,2, \cdots, M$

    * 对$i = 1,2, \cdots, N$，计算
    
    $$r_{mi}=-[\frac{\partial L(y_i, f(x_i))}{\partial f(x_i)}]_{f(x)=f_{m-1}(x)}$$
    
    * 对$r_{mi}$拟合一个回归树，得到第m棵树的叶结点区域$R_{mj}, j = 1, 2, \cdots, J$
    
    * 对$j = 1, 2, \cdots, J$，计算
    
    $$c_{mj}=argmin_c \sum_{x_i \in R_{mj}}L(y_i,f_{m-1}(x_i)+c)$$
    
    * 更新$f_m(x)=f_{m-1}(x)+\sum_{j=1}^Jc_{mj}I(x \in R_{mj})$
    
* 得到回归树

$$\hat{f}(x)=f_M(x)=\sum_{m=1}^M\sum_{j=1}^Jc_{mj}I(x \in R_{mj})$$

In [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier

In [2]:
X, y = load_iris(return_X_y=True)
X, y = X[:100], y[:100]
print(X.shape, y.shape)

(100, 4) (100,)


In [3]:
class AdaBoost():
    def __init__(self, n_estimators=50, learning_rate=1.0):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.alpha = np.zeros((1, n_estimators))
        self.index = np.zeros((n_estimators, ), dtype=int)
        self.threshold = np.zeros((1, n_estimators))
    
    def fit(self, X, y):
        '''
        Input:
        - X: of shape(N, D)
        - y: of shape(N, )
        '''
        N, self.D = X.shape
        w = np.ones((N, )) / N
        
        for i in range(self.n_estimators):
            alpha, self.alpha[i], self.index[i], self.threshold[i] = self._G(w, X, y)
            if alpha == np.inf:
                break
            w = self._W(alpha, i, X, y, alpha)
    
    def _G(self, w, X, y):
        best_error = np.inf
        for i in range(self.D):
            x = X[:, i]
            lower, upper = np.min(x), np.max(x)
            j = lower
            while j < upper+self.learning_rate:
                error_left = np.sum(w * ((x > j) != y))
                error_right = np.sum(w * ((x <= j) != y))
                
                if error_left <= error_right:
                    d = 1
                    error = error_left
                else:
                    d = -1
                    error = error_right
                
                if error == 0:
                    return np.inf, d * np.inf, i, j
                
                if error < best_error:
                    index = i
                    direction = d
                    threshold = j
                    alpha = np.log((1-error)/error)
                    best_error = error
                
                j += self.learning_rate
                
        return alpha, alpha * direction, index, threshold
    
    def _W(self, i, w, X, y):
        z = w * np.exp(y * (X[:, i] > self.threshold[i]) * self.alpha[i])
        return z / np.sum(z)
    
    def predict(self, X):
        return np.sum((X[:, self.index] > self.threshold) * self.alpha, axis=1) >= 0
    
    def score(self, X, y):
        y_hat = self.predict(X)
        return np.sum(y_hat == y) / y.shape[0]

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
adaboost = AdaBoostClassifier()
model = AdaBoost()
adaboost.fit(X_train, y_train)
model.fit(X_train, y_train)

print(model.score(X_test, y_test), adaboost.score(X_test, y_test))

1.0 1.0


