## 11.2 使用Embedding提升神经网络性能
	接下来我们构建一个神经网络，根据输入数据的格式不同，一种只对分类特征进行one-hot编码转换，另一种对分类特征使用Embedding处理，然后比较两种方式的模型性能。
### 11.2.1 基于one-hot编码的模型
1）把训练数据集进行one-hot编码转换，并对原数据进行划分、采样等操作。

In [5]:
#对特征转换为one-hot编码
one_hot_as_input=True
if one_hot_as_input:
    print("Using one-hot encoding as input")
    enc = OneHotEncoder(sparse=False)
    enc.fit(X)
    X = enc.transform(X)


X_train = X[:train_size]
X_val = X[train_size:]
y_train = y[:train_size]
y_val = y[train_size:]


X_train, y_train = sample(X_train, y_train, 200000)  # Simulate data sparsity
print("Number of samples used for training: " + str(y_train.shape[0]))


Using one-hot encoding as input
Number of samples used for training: 200000


2）构建神经网络

In [6]:
class Model(object):

    def evaluate(self, X_val, y_val):
        assert(min(y_val) > 0)
        guessed_sales = self.guess(X_val)
        relative_err = numpy.absolute((y_val - guessed_sales) / y_val)
        result = numpy.sum(relative_err) / len(y_val)
        return result


class NN(Model):

    def __init__(self, X_train, y_train, X_val, y_val):
        super().__init__()
        self.epochs = 10
        self.checkpointer = ModelCheckpoint(filepath="best_model_weights.hdf5", verbose=1, save_best_only=True)
        self.max_log_y = max(numpy.max(numpy.log(y_train)), numpy.max(numpy.log(y_val)))
        self.__build_keras_model()
        self.fit(X_train, y_train, X_val, y_val)

    def __build_keras_model(self):
        self.model = Sequential()
        self.model.add(Dense(1000, kernel_initializer="uniform", input_dim=1183))
        #self.model.add(Dense(1000, kernel_initializer="uniform", input_dim=8))
        self.model.add(Activation('relu'))
        self.model.add(Dense(500, kernel_initializer="uniform"))
        self.model.add(Activation('relu'))
        self.model.add(Dense(1))
        self.model.add(Activation('sigmoid'))

        self.model.compile(loss='mean_absolute_error', optimizer='adam')

    def _val_for_fit(self, val):
        val = numpy.log(val) / self.max_log_y
        return val

    def _val_for_pred(self, val):
        return numpy.exp(val * self.max_log_y)

    def fit(self, X_train, y_train, X_val, y_val):
        self.model.fit(X_train, self._val_for_fit(y_train),
                       validation_data=(X_val, self._val_for_fit(y_val)),
                       epochs=self.epochs, batch_size=128,
                       # callbacks=[self.checkpointer],
                       )
        # self.model.load_weights('best_model_weights.hdf5')
        print("Result on validation data: ", self.evaluate(X_val, y_val))

    def guess(self, features):
        result = self.model.predict(features).flatten()
        return self._val_for_pred(result)

3）训练模型。

In [7]:
models = []
print("Fitting NN...")
for i in range(2):
     models.append(NN(X_train, y_train, X_val, y_val))

Fitting NN...
Train on 200000 samples, validate on 84434 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Result on validation data:  0.1111847571553691
Train on 200000 samples, validate on 84434 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Result on validation data:  0.10668033416072932


4）评估模型

In [8]:
print("Evaluate combined models...")
print("Training error...")
r_train = evaluate_models(models, X_train, y_train)
print(r_train)

print("Validation error...")
r_val = evaluate_models(models, X_val, y_val)
print(r_val)

Evaluate combined models...
Training error...
0.033962882939043926
Validation error...
0.10440801234057484


### 12.2.2 基于Embedding的模型
1.生成用于含Embedding层神经网络数据

In [9]:
#重新获取训练数据
f = open('feature_train_data.pickle', 'rb')
(X, y) = pickle.load(f)

num_records = len(X)
train_size = int(train_ratio * num_records)


2.对数据进行划分和采样

In [10]:
#划分数据
X_train = X[:train_size]
X_val = X[train_size:]
y_train = y[:train_size]
y_val = y[train_size:]


X_train, y_train = sample(X_train, y_train, 200000)  # Simulate data sparsity
print("Number of samples used for training: " + str(y_train.shape[0]))

Number of samples used for training: 200000


3.构建含Embedding层的神经网络

In [11]:
class NN_with_EntityEmbedding(Model):

    def __init__(self, X_train, y_train, X_val, y_val):
        super().__init__()
        self.epochs = 10
        self.checkpointer = ModelCheckpoint(filepath="best_model_weights.hdf5", verbose=1, save_best_only=True)
        self.max_log_y = max(numpy.max(numpy.log(y_train)), numpy.max(numpy.log(y_val)))
        self.__build_keras_model()
        self.fit(X_train, y_train, X_val, y_val)

    def preprocessing(self, X):
        X_list = split_features(X)
        return X_list

    def __build_keras_model(self):
        input_store = Input(shape=(1,))
        output_store = Embedding(1115, 10, name='store_embedding')(input_store)
        output_store = Reshape(target_shape=(10,))(output_store)

        input_dow = Input(shape=(1,))
        output_dow = Embedding(7, 6, name='dow_embedding')(input_dow)
        output_dow = Reshape(target_shape=(6,))(output_dow)

        input_promo = Input(shape=(1,))
        output_promo = Dense(1)(input_promo)

        input_year = Input(shape=(1,))
        output_year = Embedding(3, 2, name='year_embedding')(input_year)
        output_year = Reshape(target_shape=(2,))(output_year)

        input_month = Input(shape=(1,))
        output_month = Embedding(12, 6, name='month_embedding')(input_month)
        output_month = Reshape(target_shape=(6,))(output_month)

        input_day = Input(shape=(1,))
        output_day = Embedding(31, 10, name='day_embedding')(input_day)
        output_day = Reshape(target_shape=(10,))(output_day)

        input_germanstate = Input(shape=(1,))
        output_germanstate = Embedding(12, 6, name='state_embedding')(input_germanstate)
        output_germanstate = Reshape(target_shape=(6,))(output_germanstate)

        input_model = [input_store, input_dow, input_promo,
                       input_year, input_month, input_day, input_germanstate]

        output_embeddings = [output_store, output_dow, output_promo,
                             output_year, output_month, output_day, output_germanstate]

        output_model = Concatenate()(output_embeddings)
        output_model = Dense(1000, kernel_initializer="uniform")(output_model)
        output_model = Activation('relu')(output_model)
        output_model = Dense(500, kernel_initializer="uniform")(output_model)
        output_model = Activation('relu')(output_model)
        output_model = Dense(1)(output_model)
        output_model = Activation('sigmoid')(output_model)

        self.model = KerasModel(inputs=input_model, outputs=output_model)

        self.model.compile(loss='mean_absolute_error', optimizer='adam')

    def _val_for_fit(self, val):
        val = numpy.log(val) / self.max_log_y
        return val

    def _val_for_pred(self, val):
        return numpy.exp(val * self.max_log_y)

    def fit(self, X_train, y_train, X_val, y_val):
        self.model.fit(self.preprocessing(X_train), self._val_for_fit(y_train),
                       validation_data=(self.preprocessing(X_val), self._val_for_fit(y_val)),
                       epochs=self.epochs, batch_size=128,
                       # callbacks=[self.checkpointer],
                       )
        # self.model.load_weights('best_model_weights.hdf5')
        print("Result on validation data: ", self.evaluate(X_val, y_val))

    def guess(self, features):
        features = self.preprocessing(features)
        result = self.model.predict(features).flatten()
        return self._val_for_pred(result)


4.训练模型

In [12]:
models = []

print("Fitting NN_with_EntityEmbedding...")
for i in range(1):
    models.append(NN_with_EntityEmbedding(X_train, y_train, X_val, y_val))


Fitting NN_with_EntityEmbedding...
Train on 200000 samples, validate on 84434 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Result on validation data:  0.10162505226807965


5.评估模型

In [13]:
print("Evaluate combined models...")
print("Training error...")
r_train = evaluate_models(models, X_train, y_train)
print(r_train)

print("Validation error...")
r_val = evaluate_models(models, X_val, y_val)
print(r_val)

Evaluate combined models...
Training error...
0.07067204018839432
Validation error...
0.10162505226807965


6.保存生成的Embedding数据

In [14]:
save_embeddings = True
if save_embeddings:
    model = models[0].model
    store_embedding = model.get_layer('store_embedding').get_weights()[0]
    dow_embedding = model.get_layer('dow_embedding').get_weights()[0]
    year_embedding = model.get_layer('year_embedding').get_weights()[0]
    month_embedding = model.get_layer('month_embedding').get_weights()[0]
    day_embedding = model.get_layer('day_embedding').get_weights()[0]
    german_states_embedding = model.get_layer('state_embedding').get_weights()[0]
    with open(saved_embeddings_fname, 'wb') as f:
        pickle.dump([store_embedding, dow_embedding, year_embedding,
                     month_embedding, day_embedding, german_states_embedding], f, -1)

7.定义获取Embedding数据的函数

In [15]:
#从训练结果读取各特征的embedding向量，并用这些向量作为输入值
def embed_features(X, saved_embeddings_fname):
    # f_embeddings = open("embeddings_shuffled.pickle", "rb")
    f_embeddings = open(saved_embeddings_fname, "rb")
    embeddings = pickle.load(f_embeddings) 
    
    #因store_open,promo这两列，至多只有两个值，没有进行embedding，故需排除在外
    index_embedding_mapping = {1: 0, 2: 1, 4: 2, 5: 3, 6: 4, 7: 5}
    X_embedded = []

    (num_records, num_features) = X.shape
    for record in X:
        embedded_features = []
        for i, feat in enumerate(record):
            feat = int(feat)
            if i not in index_embedding_mapping.keys():
                embedded_features += [feat]
            else:
                embedding_index = index_embedding_mapping[i]
                embedded_features += embeddings[embedding_index][feat].tolist()

        X_embedded.append(embedded_features)

    return numpy.array(X_embedded)


### 11.3构建XGBoost模型
1）生成培训数据。这里对特征只进行数值化处理，不做独热编码转换

In [16]:
#重新获取训练数据
f = open('feature_train_data.pickle', 'rb')
(X, y) = pickle.load(f)

num_records = len(X)
train_size = int(train_ratio * num_records)

2）划分数据并进行数据采样  
独热编码这种操作通常适用于利用向量空间度量的算法，无序型分类变量的独热编码可以避免向量距离计算导致的偏序性。而对于树模型，通常不用独热编码，对分类变量进行标签化就行,这里我们也不对分类特征进行独热编码转换。

In [18]:
#划分数据
X_train = X[:train_size]
X_val = X[train_size:]
y_train = y[:train_size]
y_val = y[train_size:]


X_train, y_train = sample(X_train, y_train, 200000)  # Simulate data sparsity
print("Number of samples used for training: " + str(y_train.shape[0]))

Number of samples used for training: 200000


3）构建XGBoost模型

In [17]:
class XGBoost(Model):

    def __init__(self, X_train, y_train, X_val, y_val):
        super().__init__()
        dtrain = xgb.DMatrix(X_train, label=numpy.log(y_train))
        evallist = [(dtrain, 'train')]
        param = {'nthread': -1,
                 'max_depth': 7,
                 'eta': 0.02,
                 'silent': 1,
                 'objective': 'reg:linear',
                 'colsample_bytree': 0.7,
                 'subsample': 0.7}
        num_round = 1000
        self.bst = xgb.train(param, dtrain, num_round, evallist)
        print("Result on validation data: ", self.evaluate(X_val, y_val))

    def guess(self, feature):
        dtest = xgb.DMatrix(feature)
        return numpy.exp(self.bst.predict(dtest))


4）训练模型

In [19]:
models = []
print("Fitting XGBoost...")
models.append(XGBoost(X_train, y_train, X_val, y_val))

Fitting XGBoost...
[0]	train-rmse:8.10002
[1]	train-rmse:7.93845
[2]	train-rmse:7.78014
[3]	train-rmse:7.625
[4]	train-rmse:7.47285
[5]	train-rmse:7.32385
[6]	train-rmse:7.17773
[7]	train-rmse:7.03462
[8]	train-rmse:6.89434
[9]	train-rmse:6.75693
[10]	train-rmse:6.62224
[11]	train-rmse:6.49033
[12]	train-rmse:6.36091
[13]	train-rmse:6.23419
[14]	train-rmse:6.11001
[15]	train-rmse:5.98827
[16]	train-rmse:5.86906
[17]	train-rmse:5.7522
[18]	train-rmse:5.63768
[19]	train-rmse:5.52545
[20]	train-rmse:5.4155
[21]	train-rmse:5.3077
[22]	train-rmse:5.20213
[23]	train-rmse:5.09864
[24]	train-rmse:4.99722
[25]	train-rmse:4.89787
[26]	train-rmse:4.80051
[27]	train-rmse:4.70508
[28]	train-rmse:4.61165
[29]	train-rmse:4.52001
[30]	train-rmse:4.43024
[31]	train-rmse:4.34227
[32]	train-rmse:4.25609
[33]	train-rmse:4.17168
[34]	train-rmse:4.08889
[35]	train-rmse:4.00782
[36]	train-rmse:3.92835
[37]	train-rmse:3.85054
[38]	train-rmse:3.77426
[39]	train-rmse:3.69948
[40]	train-rmse:3.62625
[41]	train-r

[325]	train-rmse:0.306073
[326]	train-rmse:0.305822
[327]	train-rmse:0.30575
[328]	train-rmse:0.305378
[329]	train-rmse:0.30495
[330]	train-rmse:0.304898
[331]	train-rmse:0.304549
[332]	train-rmse:0.304498
[333]	train-rmse:0.304378
[334]	train-rmse:0.30427
[335]	train-rmse:0.304222
[336]	train-rmse:0.304182
[337]	train-rmse:0.304126
[338]	train-rmse:0.303731
[339]	train-rmse:0.303687
[340]	train-rmse:0.303607
[341]	train-rmse:0.303539
[342]	train-rmse:0.303177
[343]	train-rmse:0.303134
[344]	train-rmse:0.303088
[345]	train-rmse:0.302846
[346]	train-rmse:0.302798
[347]	train-rmse:0.302373
[348]	train-rmse:0.30227
[349]	train-rmse:0.302185
[350]	train-rmse:0.301942
[351]	train-rmse:0.301836
[352]	train-rmse:0.301615
[353]	train-rmse:0.301575
[354]	train-rmse:0.301478
[355]	train-rmse:0.301108
[356]	train-rmse:0.300886
[357]	train-rmse:0.300849
[358]	train-rmse:0.300808
[359]	train-rmse:0.300754
[360]	train-rmse:0.300684
[361]	train-rmse:0.300636
[362]	train-rmse:0.300068
[363]	train-rmse

[642]	train-rmse:0.261011
[643]	train-rmse:0.260993
[644]	train-rmse:0.260739
[645]	train-rmse:0.260573
[646]	train-rmse:0.260538
[647]	train-rmse:0.260516
[648]	train-rmse:0.260501
[649]	train-rmse:0.260437
[650]	train-rmse:0.260336
[651]	train-rmse:0.260307
[652]	train-rmse:0.260297
[653]	train-rmse:0.260274
[654]	train-rmse:0.26026
[655]	train-rmse:0.260101
[656]	train-rmse:0.259928
[657]	train-rmse:0.259681
[658]	train-rmse:0.259671
[659]	train-rmse:0.259503
[660]	train-rmse:0.25949
[661]	train-rmse:0.259358
[662]	train-rmse:0.259339
[663]	train-rmse:0.259319
[664]	train-rmse:0.259302
[665]	train-rmse:0.259131
[666]	train-rmse:0.258843
[667]	train-rmse:0.258803
[668]	train-rmse:0.258567
[669]	train-rmse:0.258411
[670]	train-rmse:0.25825
[671]	train-rmse:0.258235
[672]	train-rmse:0.25808
[673]	train-rmse:0.257929
[674]	train-rmse:0.257902
[675]	train-rmse:0.257781
[676]	train-rmse:0.257702
[677]	train-rmse:0.257688
[678]	train-rmse:0.25767
[679]	train-rmse:0.257414
[680]	train-rmse:

[959]	train-rmse:0.231535
[960]	train-rmse:0.23149
[961]	train-rmse:0.231481
[962]	train-rmse:0.23145
[963]	train-rmse:0.231273
[964]	train-rmse:0.231131
[965]	train-rmse:0.231121
[966]	train-rmse:0.231113
[967]	train-rmse:0.231006
[968]	train-rmse:0.230999
[969]	train-rmse:0.230864
[970]	train-rmse:0.230855
[971]	train-rmse:0.230827
[972]	train-rmse:0.230679
[973]	train-rmse:0.230488
[974]	train-rmse:0.230329
[975]	train-rmse:0.230213
[976]	train-rmse:0.230115
[977]	train-rmse:0.2301
[978]	train-rmse:0.230089
[979]	train-rmse:0.229923
[980]	train-rmse:0.229893
[981]	train-rmse:0.229881
[982]	train-rmse:0.229867
[983]	train-rmse:0.229754
[984]	train-rmse:0.229747
[985]	train-rmse:0.229611
[986]	train-rmse:0.229486
[987]	train-rmse:0.229474
[988]	train-rmse:0.229335
[989]	train-rmse:0.229318
[990]	train-rmse:0.229229
[991]	train-rmse:0.229217
[992]	train-rmse:0.229063
[993]	train-rmse:0.229052
[994]	train-rmse:0.228953
[995]	train-rmse:0.228855
[996]	train-rmse:0.228694
[997]	train-rmse

5）评估模型

In [20]:
print("Evaluate combined models...")
print("Training error...")
r_train = evaluate_models(models, X_train, y_train)
print(r_train)

print("Validation error...")
r_val = evaluate_models(models, X_val, y_val)
print(r_val)

Evaluate combined models...
Training error...
0.18057345828903387
Validation error...
0.1937494686448902


### 11.4使用Embedding数据的XGBoost模型
1）把Embedding作为输入XGBoost模型的数据。

In [21]:
embeddings_as_input=True
if embeddings_as_input:
    print("Using learned embeddings as input")
    X = embed_features(X, saved_embeddings_fname)
    
X_train = X[:train_size]
X_val = X[train_size:]
y_train = y[:train_size]
y_val = y[train_size:]

Using learned embeddings as input


2）训练模型

In [22]:
models = []
print("Fitting XGBoost...")
models.append(XGBoost(X_train, y_train, X_val, y_val))

Fitting XGBoost...
[0]	train-rmse:8.09873
[1]	train-rmse:7.93695
[2]	train-rmse:7.77837
[3]	train-rmse:7.62285
[4]	train-rmse:7.4707
[5]	train-rmse:7.32131
[6]	train-rmse:7.1752
[7]	train-rmse:7.03175
[8]	train-rmse:6.89139
[9]	train-rmse:6.7537
[10]	train-rmse:6.61877
[11]	train-rmse:6.48656
[12]	train-rmse:6.35701
[13]	train-rmse:6.22999
[14]	train-rmse:6.10559
[15]	train-rmse:5.98363
[16]	train-rmse:5.86411
[17]	train-rmse:5.74706
[18]	train-rmse:5.63222
[19]	train-rmse:5.5198
[20]	train-rmse:5.40954
[21]	train-rmse:5.30154
[22]	train-rmse:5.19566
[23]	train-rmse:5.09197
[24]	train-rmse:4.99026
[25]	train-rmse:4.89059
[26]	train-rmse:4.79296
[27]	train-rmse:4.69732
[28]	train-rmse:4.60357
[29]	train-rmse:4.51169
[30]	train-rmse:4.42166
[31]	train-rmse:4.33343
[32]	train-rmse:4.24688
[33]	train-rmse:4.16214
[34]	train-rmse:4.07916
[35]	train-rmse:3.99778
[36]	train-rmse:3.91802
[37]	train-rmse:3.83987
[38]	train-rmse:3.76333
[39]	train-rmse:3.68819
[40]	train-rmse:3.61467
[41]	train-

[324]	train-rmse:0.121339
[325]	train-rmse:0.121212
[326]	train-rmse:0.121147
[327]	train-rmse:0.121077
[328]	train-rmse:0.120964
[329]	train-rmse:0.12085
[330]	train-rmse:0.120763
[331]	train-rmse:0.120668
[332]	train-rmse:0.120591
[333]	train-rmse:0.120479
[334]	train-rmse:0.120391
[335]	train-rmse:0.120297
[336]	train-rmse:0.120215
[337]	train-rmse:0.120148
[338]	train-rmse:0.120064
[339]	train-rmse:0.119979
[340]	train-rmse:0.119907
[341]	train-rmse:0.11981
[342]	train-rmse:0.119754
[343]	train-rmse:0.119681
[344]	train-rmse:0.119592
[345]	train-rmse:0.11951
[346]	train-rmse:0.119465
[347]	train-rmse:0.119404
[348]	train-rmse:0.119306
[349]	train-rmse:0.119234
[350]	train-rmse:0.119158
[351]	train-rmse:0.119033
[352]	train-rmse:0.11899
[353]	train-rmse:0.118934
[354]	train-rmse:0.118832
[355]	train-rmse:0.118786
[356]	train-rmse:0.118743
[357]	train-rmse:0.1187
[358]	train-rmse:0.118642
[359]	train-rmse:0.118561
[360]	train-rmse:0.118516
[361]	train-rmse:0.118467
[362]	train-rmse:0

[641]	train-rmse:0.10763
[642]	train-rmse:0.107605
[643]	train-rmse:0.107581
[644]	train-rmse:0.107567
[645]	train-rmse:0.107555
[646]	train-rmse:0.107543
[647]	train-rmse:0.107506
[648]	train-rmse:0.107481
[649]	train-rmse:0.107471
[650]	train-rmse:0.107445
[651]	train-rmse:0.107426
[652]	train-rmse:0.107394
[653]	train-rmse:0.107374
[654]	train-rmse:0.107362
[655]	train-rmse:0.107322
[656]	train-rmse:0.107305
[657]	train-rmse:0.107268
[658]	train-rmse:0.107224
[659]	train-rmse:0.107201
[660]	train-rmse:0.107184
[661]	train-rmse:0.107157
[662]	train-rmse:0.107136
[663]	train-rmse:0.107113
[664]	train-rmse:0.107102
[665]	train-rmse:0.107057
[666]	train-rmse:0.107034
[667]	train-rmse:0.10699
[668]	train-rmse:0.106963
[669]	train-rmse:0.106942
[670]	train-rmse:0.106927
[671]	train-rmse:0.106892
[672]	train-rmse:0.106883
[673]	train-rmse:0.106864
[674]	train-rmse:0.106847
[675]	train-rmse:0.106828
[676]	train-rmse:0.106815
[677]	train-rmse:0.106797
[678]	train-rmse:0.106761
[679]	train-rm

[958]	train-rmse:0.101718
[959]	train-rmse:0.101702
[960]	train-rmse:0.101693
[961]	train-rmse:0.101687
[962]	train-rmse:0.101678
[963]	train-rmse:0.10167
[964]	train-rmse:0.101658
[965]	train-rmse:0.101643
[966]	train-rmse:0.101625
[967]	train-rmse:0.101614
[968]	train-rmse:0.101587
[969]	train-rmse:0.101555
[970]	train-rmse:0.101549
[971]	train-rmse:0.101541
[972]	train-rmse:0.101515
[973]	train-rmse:0.101489
[974]	train-rmse:0.101481
[975]	train-rmse:0.101458
[976]	train-rmse:0.101453
[977]	train-rmse:0.101444
[978]	train-rmse:0.101438
[979]	train-rmse:0.101429
[980]	train-rmse:0.101413
[981]	train-rmse:0.101398
[982]	train-rmse:0.101379
[983]	train-rmse:0.101374
[984]	train-rmse:0.101366
[985]	train-rmse:0.101357
[986]	train-rmse:0.101344
[987]	train-rmse:0.101332
[988]	train-rmse:0.10132
[989]	train-rmse:0.101304
[990]	train-rmse:0.101298
[991]	train-rmse:0.101279
[992]	train-rmse:0.101265
[993]	train-rmse:0.101255
[994]	train-rmse:0.101247
[995]	train-rmse:0.101237
[996]	train-rm

3）评估模型

In [23]:
print("Evaluate combined models...")
print("Training error...")
r_train = evaluate_models(models, X_train, y_train)
print(r_train)

print("Validation error...")
r_val = evaluate_models(models, X_val, y_val)
print(r_val)

Evaluate combined models...
Training error...
0.07297976184932689
Validation error...
0.09668812940871037
