## GBM- L1

GBM - L2 对 outliers 很敏感。因为 Outliers 的 MSE 平方的权重会增大。如果我们不能消除 outliers 噪声的话，我们可以使用 L1, MAE 来作为优化方向。

向着 MSE 优化，是通过对误差的大小(magnitude)来优化，而并没有考虑误差的方向(direction)。我们可通过 MAE 来找到误差优化的方向。

[Heading in the right direction](http://explained.ai/gradient-boosting/L1-loss.html)

### 使用符号向量

我们可以使用符号向量 $sign(y_i - F_{m - 1}(x_i))$ 来作为方向。向量中的每个元素的取值为 {-1， 0， 1}, 这样无论 outliers 的大小，其方向都是在一个量级上的。

L1 的决策树桩的叶子节点预测的值就是中位数(median) 而不是 L2 中的平均数 (mean) 了。而初始的 F0 也是取中位数而非均值了。因为中位数有最小的 L1 误差。

In [83]:
import numpy as np
import pandas as pd

In [96]:
def data():
    return pd.read_csv('rent-l1.txt', delimiter = '\t')

def mae(vals):
    return np.sum(np.abs(vals - np.median(vals))) / len(vals) # mae 除以元素个数，当 mae 相同时可以选择元素个数多的划分。

class DecisionTree(object):
    def __init__(self):
        return
    
    def model(self, X, y, costErr = mae):
        m, n = X.shape
        min_mae = np.inf
        feature, split_value, lhs, rhs = 0, 0, 0, 0
        
        for i in range(n):
            xVals = X[:, i]
            uniVals = list(set(xVals))
            uniVals.sort()
            candidates = [(uniVals[i] + uniVals[i + 1]) / 2 for i in range(len(uniVals) - 1)]
            
            for c in candidates:
                mae = costErr(np.sign(y[xVals <= c])) + costErr(np.sign(y[xVals > c]))
#                 print(c, y[xVals <= c], y[xVals > c], mae)
                if mae < min_mae:
                    min_mae, feature, split_value = mae, i, c
                    lhs = np.median(y[xVals <= c])
                    rhs = np.median(y[xVals > c])
                    
        self.lhs, self.rhs = lhs, rhs
        self.split_feature, self.split_value = feature, split_value
        return self
    
    def predict(self, X):
        m = X.shape[0]
        y = np.ones(m) * self.lhs
        y[X[:, self.split_feature] > self.split_value] = self.rhs
        
        return y

In [97]:
class BoostL1(object):
    def __init__(self):
        self.trees = []
        self.f0 = 0
        return
    
    def model(self, X, y, alpha = 1, iters = 50):
        self.iters = iters
        self.alpha = alpha
        m = X.shape[0]
        self.f0 = np.median(y)
        yHat = np.ones(m) * self.f0
        
        for i in range(iters):
            residuals = y - yHat
            print('residuals:', residuals)
            tree = DecisionTree().model(X, residuals)
            yHat = yHat + alpha * tree.predict(X)
            self.trees.append(tree)
        return self
    
    def predict(self, X):
        m = X.shape[0]
        y = np.ones(m) * self.f0
        for i in range(self.iters):
            y = y + self.alpha * self.trees[i].predict(X)
        return y
    
    def printTrees(self):
        print('F0:', self.f0)
        for i in range(self.iters):
            tree = self.trees[i]
            print('Tree ', i)
            print('\t split feature:', tree.split_feature)
            print('\t split feature:', tree.split_value)
            print('\t left median', tree.lhs)
            print('\t right median', tree.rhs)
            

In [98]:
df = data()
X = df.values[:, :-1]
y = df.values[:, -1]
bstL1 = BoostL1().model(X, y, 1, 3)
bstL1.printTrees()
bstL1.predict(X)

residuals: [-120.  -80.    0.  170.  720.]
residuals: [ -20.   20. -170.    0.  550.]
residuals: [   0.   10. -180.  -10.  540.]
F0: 1280.0
Tree  0
	 split feature: 0
	 split feature: 825.0
	 left median -100.0
	 right median 170.0
Tree  1
	 split feature: 0
	 split feature: 775.0
	 left median -20.0
	 right median 10.0
Tree  2
	 split feature: 0
	 split feature: 925.0
	 left median -5.0
	 right median 540.0


array([1155., 1185., 1455., 1455., 2000.])