In [43]:
"""
下面通过评估混凝土坍度的示例，
介绍如何通过XGBoost解决一般的回归问题。
混凝土坍度测试数据集是一个通过混凝土的各种指标特征评估其抗压强度的数据集[插图]，
共包含1030个样本、9个特征，特征的具体信息如下。

    1）水泥：数据类型为浮点型，单位为立方米每千克。
    2）高炉渣：数据类型为浮点型，单位为立方米每千克。
    3）煤灰：数据类型为浮点型，单位为立方米每千克。
    4）水：数据类型为浮点型，单位为立方米每千克。
    5）高效减水剂：数据类型为浮点型，单位为立方米每千克。
    6）粗骨料：数据类型为浮点型，单位为立方米每千克。
    7）细骨料：数据类型为浮点型，单位为立方米每千克。
    8）年龄：数据类型为整型，单位为天。
    9）混凝土抗压强度定量：数据类型为浮点型，单位为MPa。
    
"""

'\n下面通过评估混凝土坍度的示例，\n介绍如何通过XGBoost解决一般的回归问题。\n混凝土坍度测试数据集是一个通过混凝土的各种指标特征评估其抗压强度的数据集[插图]，\n共包含1030个样本、9个特征，特征的具体信息如下。\n\n    1）水泥：数据类型为浮点型，单位为立方米每千克。\n    2）高炉渣：数据类型为浮点型，单位为立方米每千克。\n    3）煤灰：数据类型为浮点型，单位为立方米每千克。\n    4）水：数据类型为浮点型，单位为立方米每千克。\n    5）高效减水剂：数据类型为浮点型，单位为立方米每千克。\n    6）粗骨料：数据类型为浮点型，单位为立方米每千克。\n    7）细骨料：数据类型为浮点型，单位为立方米每千克。\n    8）年龄：数据类型为整型，单位为天。\n    9）混凝土抗压强度定量：数据类型为浮点型，单位为MPa。\n    \n'

In [44]:
import pandas as pd
import numpy as np
import xgboost as xgb

data = pd.read_excel('./data/Concrete_Data.xls')
data.head(10)

Unnamed: 0,Cement (component 1)(kg in a m^3 mixture),Blast Furnace Slag (component 2)(kg in a m^3 mixture),Fly Ash (component 3)(kg in a m^3 mixture),Water (component 4)(kg in a m^3 mixture),Superplasticizer (component 5)(kg in a m^3 mixture),Coarse Aggregate (component 6)(kg in a m^3 mixture),Fine Aggregate (component 7)(kg in a m^3 mixture),Age (day),label
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.986111
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.887366
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.269535
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05278
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.296075
5,266.0,114.0,0.0,228.0,0.0,932.0,670.0,90,47.029847
6,380.0,95.0,0.0,228.0,0.0,932.0,594.0,365,43.698299
7,380.0,95.0,0.0,228.0,0.0,932.0,594.0,28,36.44777
8,266.0,114.0,0.0,228.0,0.0,932.0,670.0,28,45.854291
9,475.0,0.0,0.0,228.0,0.0,932.0,594.0,28,39.28979


In [45]:
# 不知什么原因无法替换列名称故手动修改
# data.rename(columns={b"Concrete compressive strength":'label'}, inplace=True)

In [46]:
# data.head(10)

In [47]:
# 分割数据集
mask = np.random.rand(len(data)) < 0.8
train = data[mask]
test = data[mask]

In [48]:
xgb_train = xgb.DMatrix(train.iloc[:,:7], label=train.label)
xgb_test = xgb.DMatrix(test.iloc[:,:7], label=test.label)

In [49]:
# 模型训练
params = {
    'objective': 'reg:linear',
    'booster': 'gbtree',
    'eta':0.1,
    'min_child_weight':1,
    'max_depth':5
}
# 训练次数
num_round = 60
watch_list = [(xgb_train, 'train'), (xgb_test, 'test')]
model = xgb.train(params, xgb_train, num_round, watch_list)
model.save_model('./model/compress_concrete.model')

[0]	train-rmse:35.92204	test-rmse:35.92204
[1]	train-rmse:32.76475	test-rmse:32.76475
[2]	train-rmse:29.96615	test-rmse:29.96615
[3]	train-rmse:27.44896	test-rmse:27.44896
[4]	train-rmse:25.20868	test-rmse:25.20868
[5]	train-rmse:23.22446	test-rmse:23.22446
[6]	train-rmse:21.46064	test-rmse:21.46064
[7]	train-rmse:19.90827	test-rmse:19.90827
[8]	train-rmse:18.53800	test-rmse:18.53800
[9]	train-rmse:17.33391	test-rmse:17.33391
[10]	train-rmse:16.28263	test-rmse:16.28263
[11]	train-rmse:15.35888	test-rmse:15.35888
[12]	train-rmse:14.55607	test-rmse:14.55607
[13]	train-rmse:13.85916	test-rmse:13.85916
[14]	train-rmse:13.26498	test-rmse:13.26498
[15]	train-rmse:12.75476	test-rmse:12.75476
[16]	train-rmse:12.31484	test-rmse:12.31484
[17]	train-rmse:11.93789	test-rmse:11.93789
[18]	train-rmse:11.61893	test-rmse:11.61893
[19]	train-rmse:11.34238	test-rmse:11.34238
[20]	train-rmse:11.10675	test-rmse:11.10675
[21]	train-rmse:10.90629	test-rmse:10.90629
[22]	train-rmse:10.74102	test-rmse:10.7410

In [50]:
# 以下部分为通过已经训练好的模型进行预测
bst = xgb.Booster()
bst.load_model("./model/compress_concrete.model")
pred = bst.predict(xgb_test)
pred



array([71.89995  , 65.52917  , 38.608162 , 29.655378 , 49.88594  ,
       40.17383  , 49.88594  , 40.680336 , 29.655378 , 40.329777 ,
       49.486176 , 27.806362 , 51.607277 , 40.17383  , 40.680336 ,
       40.329777 , 27.806362 , 27.806362 , 49.486176 , 51.607277 ,
       40.329777 , 40.680336 , 51.714546 , 49.88594  , 29.655378 ,
       40.680336 , 46.244423 , 33.036777 , 38.608162 , 40.680336 ,
       33.036777 , 33.036777 , 49.486176 , 40.329777 , 40.329777 ,
       27.886133 , 40.17383  , 33.036777 , 40.17383  , 46.244423 ,
       33.036777 , 51.714546 , 27.806362 , 29.655378 , 40.680336 ,
       29.655378 , 38.608162 , 51.714546 , 49.88594  , 22.238033 ,
       46.244423 , 51.607277 , 27.806362 , 46.244423 , 71.28677  ,
       50.66634  , 51.982437 , 53.632565 , 57.19892  , 50.728294 ,
       61.19625  , 57.892166 , 53.632565 , 40.59715  , 52.950974 ,
       53.632565 , 45.98199  , 57.562233 , 47.13675  , 64.06043  ,
       50.839527 , 64.06043  , 57.280834 , 49.93642  , 50.6663

In [51]:
# 最后将模型保存为文本模式，方便后续分析与优化
dump_model = bst.dump_model("./model/compress_concrete.txt")