In [1]:
import numpy as np
import sys
if "../" not in sys.path:
  sys.path.append("../") 
from lib.utils import read_data_from_file, sign

# Question 16-18

16. For Questions 16-20, you will play with the decision stump algorithm.<br/><br/>In class, we taught about the learning model of "positive and negative rays'' (which is simply one-dimensional perceptron) for one-dimensional data. The model contains hypotheses of the form:$$h_{s,\theta}(x)=s⋅sign(x−\theta).$$The model is frequently named the "decision stump'' model and is one of the simplest learning models. As shown in class, for one-dimensional data, the VC dimension of the decision stump model is 2.<br/><br/>In fact, the decision stump model is one of the few models that we could easily minimize $E_{in}$ efficiently by enumerating all possible thresholds. In particular, for N examples, there are at most 2N dichotomies (see page 2 of lecture 5 slides), and thus at most 2N different $E_{in}$ values. We can then easily choose the dichotomy that leads to the lowest $E_{in}$, where ties an be broken by randomly choosing among the lowest $E_{in}$ ones. The chosen dichotomy stands for a combination of some "spot" (range of $\theta$) and s, and commonly the median of the range is chosen as the $\theta$ that realizes the dichotomy.<br/><br/>In this problem, you are asked to implement such and algorithm and run your program on an artificial data set. First of all, start by generating a one-dimensional data by the procedure below:
 - Generate x by a uniform distribution in $[-1,1]$
 - Generate y by $f(x)=\bar{s}(x)$ + noise where $\bar{s}(x)=sign(x)$ and the noise flips the result with $20\%$ probability.
 
 For any decision stump $h_{s, \theta}$ with $\theta\in[−1,1]$, express $E_{out}(h_{s, \theta})$ as a function of $\theta$ and $s$.
 1. $0.3+0.5s(|\theta|-1)$
 2. $0.3+0.5s(1-|\theta|)$
 3. $0.5+0.3s(|\theta|-1)$
 4. $0.5+0.3s(1-|\theta|)$
 5. none of the other choices

sol:$\lambda=\frac{4}{5}, \mu=\frac{1+s}{2}\frac{|\theta|}{2}+\frac{1-s}{2}\frac{2-|\theta|}{2}=\frac{s|\theta|-s+1}{2}\\ \implies 
\begin{eqnarray*}
E_{out}(h_{s, \theta})&=&\lambda\mu+(1-\lambda)(1-\mu) \\
&=&0.6\mu+0.2 \\
&=&0.5+0.3s(|\theta|-1)
\end{eqnarray*}$

In [2]:
np.random.seed(10)
X = np.random.uniform(-1, 1, 20)
y = sign(X) * sign(np.random.uniform(-0.2, .8, 20))

print(np.vstack((X,y)).T)

[[0.542641286533492 1]
 [-0.958496101281197 1]
 [0.26729646985255084 1]
 [0.4976077650772237 1]
 [-0.0029859753948191514 -1]
 [-0.5504067089383047 -1]
 [-0.603874270480752 -1]
 [0.5210614243979175 1]
 [-0.6617783268749291 -1]
 [-0.8233203716519795 -1]
 [0.3707196367355945 1]
 [0.9067866923898731 1]
 [-0.9921034673441711 -1]
 [0.024384526771553228 1]
 [0.625241923304227 -1]
 [0.2250521336587763 1]
 [0.4435106348635991 -1]
 [-0.4162478636587337 -1]
 [0.8355482450258869 -1]
 [0.42915156679538113 1]]


In [3]:
def decision_stump_1d(X, y):
    """
    One-Dimensional Decision Stump Algorithm
    Args:
        X: 数据
        y: 标签
    Returns:
        s: 符号
        theta: threshhold
        err_in: ...
    """
    h = lambda s, theta: s * sign(X - theta)
    # sort data
    indices = np.argsort(X)
    X = X[indices]
    y = y[indices]
    
    # cal err_in
    thetas = (np.concatenate((X, X[-1:] + 1)) + np.concatenate((X[:1]-1, X)))/2
    all_err_in = [(h(s,theta)!=y).mean() for s in [-1, 1] for theta in thetas]
    
    # find best s, theta
    index = np.argmin(all_err_in)
    s = [-1, 1][index//len(thetas)]
    theta = thetas[index%len(thetas)]
    err_in = all_err_in[index]
    
    return s, theta, err_in

In [4]:
e_out = lambda s, theta: .5 + .3 * s * (np.abs(theta) - 1)

17. Generate a data set of size 20 by the procedure above and run the one-dimensional decision stump algorithm on the data set. Record $E_{in}$ and compute $E_{out}$ with the formula above. Repeat the experiment (including data generation, running the decision stump algorithm, and computing $E_{in}$ and $E_{out}$) $5,000$ times. What is the average $E_{in}$?

18. Continuing from the previous question, what is the average $E_{out}$ out?

In [5]:
np.random.seed(0)
e_ins = []
e_outs = []
for i in range(5000):
    # Generate data
    X = np.random.uniform(-1, 1, 20)
    y = sign(X) * sign(np.random.uniform(-0.2, .8, 20))
    
    # run the one-dimensional decision stump algorithm
    s, theta, err_in = decision_stump_1d(X, y)
    
    # cal Eout
    err_out = e_out(s, theta)
    
    e_ins.append(err_in)
    e_outs.append(err_out)
print('average Ein: {}\taverage Eout: {}'.format(np.mean(e_ins), np.mean(e_outs)))

average Ein: 0.1702	average Eout: 0.25734444036057114


# Question 19-20

19. Decision stumps can also work for multi-dimensional data. In particular, each decision stump now deals with a specific dimension $i$, as shown below.$$h_{s,i,\theta}(x)=s⋅sign(x_{i}−\theta).$$Implement the following decision stump algorithm for multi-dimensional data:<br/><br/>a) for each dimension $i=1,2,⋯,d$, find the best decision stump $h_{s,i,\theta}$ using the one-dimensional decision stump algorithm that you have just implemented.<br/><br/>b) return the "best of best"' decision stump in terms of $E_{in}$. If there is a tie , please randomly choose among the lowest-$E_{in}$ones<br/><br/>The training data $D_{train}$ is available at:<br/><br/>https://www.csie.ntu.edu.tw/~htlin/mooc/datasets/mlfound_math/hw2_train.dat<br/><br/>The testing data $D_{test}$ is available at:<br/><br/>https://www.csie.ntu.edu.tw/~htlin/mooc/datasets/mlfound_math/hw2_test.dat<br/><br/>Run the algorithm on the $D_{train}$. Report the $E_{{in}}$ of the optimal decision stump returned by your program.



In [6]:
def decision_stump_multi_dim(X, y):
    """
    Multi-Dimensional Decision Stump Algorithm
    Args:
        X: 数据
        y: 标签
    Returns:
        best: (dim, s, theta, err_in) 元组
    """    
    dim = len(X[0])
    
    best_param = (0,0,0,np.inf)
    for d in range(dim):
        s, theta, err_in = decision_stump_1d(X[:,d], y)
        #result.append((d, s, theta, err_in))
        if err_in < best_param[3]:
            best_param = (d, s, theta, err_in)
    return best_param

In [7]:
data_train = read_data_from_file('hw2_train.dat')
data_test = read_data_from_file('hw2_test.dat')
print(data_train.shape)
print(data_train[:5])
print(data_test.shape)
print(data_test[:5])

(100, 10)
[[  8.105  -3.5     4.769   4.541  -9.829   5.252   3.838  -3.408  -4.824
   -1.   ]
 [ -6.273  -2.097   9.404   1.143   3.487  -5.206   0.061   5.024  -6.687
    1.   ]
 [  1.624  -1.173   4.26   -3.607  -6.632   4.431  -8.355   7.206  -8.977
    1.   ]
 [-10.      7.758  -2.67   -8.88   -1.099  -9.183  -4.086   8.962   5.841
    1.   ]
 [  8.464   1.762   2.729   2.724   8.155   6.096  -2.844   9.8     3.302
   -1.   ]]
(1000, 10)
[[ 0.531 -1.884 -0.351 -1.796 -9.891  6.12   2.486  8.44  -5.123 -1.   ]
 [ 5.123  5.047  5.404 -1.742 -0.317  9.585 -4.016 -1.8   -5.633  1.   ]
 [ 3.286  4.251 -4.837 -7.065 -7.546 -4.727  9.055  4.941 -6.287  1.   ]
 [-0.795 -1.617 -8.414 -5.391  6.641  1.269 -5.806 -7.375  9.469  1.   ]
 [-4.362  1.49  -7.232  0.802  4.424 -4.777  6.075  3.48  -9.837  1.   ]]


In [8]:
h_best = decision_stump_multi_dim(data_train[:,:-1], data_train[:,-1])
print(h_best)

(3, -1, 1.6175000000000002, 0.25)


20. Use the returned decision stump to predict the label of each example within $D_{test}$. Report an estimate of $E_{out}$ by $E_{test}$.

In [9]:
def cal_eout(X_test, y_test, i, s, theta):
    h = lambda i, s, theta, X: s * sign(X[:,i] - theta)
    return (h(i, s, theta, X_test) != y_test).mean()

In [10]:
print('Etest = ', cal_eout(data_test[:,:-1], data_test[:,-1], h_best[0], h_best[1], h_best[2]))

Etest =  0.355
