## 数据加载
我们的误差补偿训练数据存放在<I>adjustments.tsv</I>中，测试数据放在<I>test_adjustments.tsv</I>。

**测试数据集为真实值，不能进行调整，否则会导致实际模型测试结果和实际的预测结果存在偏差，出现过于乐观或者过于消极的测试结果**

只需要执行加载训练数据的代码即可，当然也可以修改代码加载指定文件

In [1]:
import numpy as np
import pandas as pd

# 加载训练数据
train_dataset = pd.read_csv('./adjustments.tsv',
                          sep='\t',
                          skipinitialspace=True)
# 加载测试数据     ！！！ 测试数据集为真实值，不能进行调整，否则将会导致实际模型测试结果和真实预测结果存在偏差，使得最终加工的作品和预期不一致
test_dataset = pd.read_csv('./test_adjustments.tsv',
                          sep='\t',
                          skipinitialspace=True)

打印并查看数据，可以看出，adjustments.tsv文件的前n列为特征值，这些值代表着真实世界中影响机床的环境因素，例如刀具磨损、温度、湿度等等；后面几列为补偿指令。

## 数据分析

打印并查看数据，可以看出，adjustments.tsv文件的前n列为特征值，这些值代表着真实世界中影响机床的环境因素，例如刀具磨损、温度、湿度等等；后面几列为补偿指令。

In [2]:
np.set_printoptions(precision=3, suppress=True)
dataset = train_dataset.copy()
dataset.head()


Unnamed: 0,特征0,特征1,特征2,特征3,特征4,特征5,特征6,特征7,特征8,特征9,...,特征16,特征17,补偿0,补偿1,补偿2,补偿3,补偿4,补偿5,补偿6,补偿7
0,1.4158,2.9711,10.7935,7.5279,2.3352,8.1042,2.3096,3.3367,11.8639,12.7142,...,171.764,1434.24,0.331511,-0.932553,0.285048,-0.1435,-0.833982,0.767568,0.463969,1.9048
1,0.628,1.8616,10.177,7.4684,2.1915,8.5945,0.1379,2.9661,11.5816,12.2487,...,185.824,1469.19,0.894066,-0.446796,0.058519,-0.4624,0.715252,0.999105,0.988844,0.689742
2,0.9648,1.8103,10.1682,5.9705,2.0629,6.5349,2.8694,3.1185,11.7464,12.2074,...,187.576,1540.76,0.999982,0.460716,0.997809,-0.4624,0.723031,0.992935,0.682903,0.74958
3,0.7119,1.6221,10.1487,6.8678,2.0694,6.8806,1.5791,2.3003,11.5545,12.0659,...,189.938,1498.29,0.998794,-0.862448,0.329694,-0.4624,0.891745,0.015078,0.997127,0.984439
4,0.3797,1.6852,10.9601,5.0035,3.1659,5.9471,0.0858,2.6402,11.7458,12.7041,...,181.275,1465.11,0.442893,-0.990703,-0.1514,0.2245,0.61788,-0.506397,-0.997502,0.376126


In [3]:
ds=dataset.dropna()
w=['特征'+str(i) for i in range(18)]
ds=ds.drop_duplicates(w)
print(ds.shape)
ds.describe()

ds1=test_dataset.dropna()
w=['特征'+str(i) for i in range(18)]
ds1=ds1.drop_duplicates(w)
print(ds1.shape)
ds1.describe()


(50000, 26)
(2998, 26)


Unnamed: 0,特征0,特征1,特征2,特征3,特征4,特征5,特征6,特征7,特征8,特征9,...,特征16,特征17,补偿0,补偿1,补偿2,补偿3,补偿4,补偿5,补偿6,补偿7
count,2998.0,2998.0,2998.0,2998.0,2998.0,2998.0,2998.0,2998.0,2998.0,2998.0,...,2998.0,2998.0,2998.0,2998.0,2998.0,2998.0,2998.0,2998.0,2998.0,2998.0
mean,1.008837,2.01809,10.501365,6.980399,2.991379,6.986831,1.983407,2.994988,11.739018,12.600981,...,172.242767,1474.643939,0.525279,0.005173,0.250365,0.010439,-0.269551,0.52464,0.222959,0.561442
std,0.573132,0.57895,0.286296,1.134771,0.574884,1.142659,1.144038,0.577503,0.128526,0.196063,...,12.198489,54.894522,0.399656,0.699439,0.49797,0.330564,0.672572,0.494762,0.716798,1.124138
min,0.0005,1.0,10.0003,5.0007,2.0004,5.0008,0.0001,2.0007,11.1554,11.7506,...,121.207,1377.98,-0.950279,-0.999992,-1.24066,-0.5332,-1.0,-0.863171,-0.999998,-3.35427
25%,0.518975,1.506825,10.2577,6.033325,2.50175,6.004525,1.001925,2.509325,11.6681,12.485025,...,165.2995,1437.355,0.279705,-0.693314,-0.09669,-0.1435,-0.847946,0.307439,-0.492667,-0.107755
50%,1.0143,2.04255,10.49735,6.9628,2.9556,6.95675,1.98595,2.98635,11.7621,12.64445,...,174.141,1456.825,0.617192,0.015394,0.250205,-0.1435,-0.550868,0.709205,0.495988,0.390784
75%,1.488325,2.523625,10.75045,7.944,3.481775,7.9576,2.9538,3.496425,11.835725,12.747475,...,181.63725,1496.5425,0.860046,0.696679,0.604809,0.2245,0.370577,0.905374,0.877369,1.15979
max,1.9982,2.9986,10.9993,8.9963,3.9994,8.9985,3.9962,3.9994,11.9501,12.9332,...,193.145,1785.88,1.0,1.0,1.83218,0.6324,0.999997,0.999997,1.0,6.58587


In [4]:
# import matplotlib.pyplot as plt
# import seaborn as sns
# # 选择需要可视化的数据列
# #data = ds[ds]

# # 循环遍历每个数据列
# for col in ds.columns:
#     # 绘制单个数据列的核密度估计图
#     fig = plt.figure(figsize=(5, 3))
#     sns.kdeplot(ds[col])
    
#     # 添加网格线
#     plt.grid()
#     #plt.xticks(np.arange(-0.2, 1.2, 0.1),rotation=45)
   
#     # 设置图的标题和横轴标签
#     plt.title(f'Kernel Density Estimate - {col}')
#     plt.xlabel(col)
    
#     # 显示绘制的图形
#     plt.show()

然后我们可以简单分析数据，比如我们可以查看特征值和补偿值的分布特性，比如均值和方差：

In [5]:
average = np.average(ds.values[:,:18], axis=0)
variance = np.var(ds.values[:,:18], axis=0)
print('均值', average)
print('方差', variance)

# average1 = np.average(ds1.values[:,:18], axis=0)
# variance1 = np.var(ds1.values[:,:18], axis=0)
# print('均值', average1)
# print('方差', variance1)


均值 [   1.       1.998   10.5      6.998    3.001    6.995    2.005    2.999
   11.736   12.604    1.768    0.649   19.417   19.598    3.349   17.983
  172.057 1475.054]
方差 [   0.332    0.332    0.083    1.327    0.333    1.337    1.331    0.336
    0.017    0.039    0.074    0.213    0.163    0.069   10.31   525.861
  149.833 3136.668]


In [6]:
#计算前18列特征的均值和标准差
mean = np.mean(ds.values[:, :18], axis=0)
std = np.std(ds.values[:, :18], axis=0)


mean1 = np.mean(ds1.values[:, :18], axis=0)
std1 = np.std(ds1.values[:, :18], axis=0)
# 定义上限和下限
upper_limit = mean + 3 * std
lower_limit = mean - 3 * std



upper_limit1 = mean1 + 3 * std
lower_limit1 = mean1 - 3 * std
# 使用布尔索引删除超出上限和下限的行
cleaned_ds = ds[~((ds.values[:, :18] > upper_limit) | (ds.values[:, :18] < lower_limit)).any(axis=1)]


cleaned_ds1 = ds1[~((ds1.values[:, :18] > upper_limit1) | (ds1.values[:, :18] < lower_limit1)).any(axis=1)]


# cleaned_ds=cleaned_ds.drop(cleaned_ds[(cleaned_ds['特征16']<181)&(cleaned_ds['特征16']>165)].sample(frac=0.2,random_state=0).index)
# cleaned_ds1=cleaned_ds1.drop(cleaned_ds1[(cleaned_ds1['特征16']<181)&(cleaned_ds1['特征16']>165)].sample(frac=0.2,random_state=0).index)
# # 计算特征的均值和标准差
# mean = np.mean(ds.values, axis=0)
# std = np.std(ds.values, axis=0)

# # 定义上限和下限
# upper_limit = mean + 3 * std
# lower_limit = mean - 3 * std

# # 使用布尔索引删除超出上限和下限的行
# cleaned_ds = ds[~((ds.values > upper_limit) | (ds.values < lower_limit)).any(axis=1)]

# 打印清理后数据集的形状
print(cleaned_ds.shape)

print(cleaned_ds.shape)
cleaned_ds.describe()

cleaned_ds=pd.DataFrame(cleaned_ds)
variance=np.var(cleaned_ds)

print(variance)


(45455, 26)
(45455, 26)
特征0        0.327537
特征1        0.328899
特征2        0.081437
特征3        1.290450
特征4        0.317921
特征5        1.286620
特征6        1.317239
特征7        0.327636
特征8        0.014313
特征9        0.032006
特征10       0.054821
特征11       0.191810
特征12       0.108303
特征13       0.049248
特征14       8.091563
特征15     304.621928
特征16     118.242898
特征17    2244.934253
补偿0        0.150524
补偿1        0.501245
补偿2        0.239188
补偿3        0.100698
补偿4        0.410969
补偿5        0.250375
补偿6        0.509709
补偿7        1.214950
dtype: float64


In [7]:
# from sklearn.preprocessing import MinMaxScaler

# # 提取特征数据和标签数据
# X = cleaned_ds.values[:, :18]  # 提取前18列作为特征数据
# y = cleaned_ds.values[:, 18:]  # 提取后8列作为标签数据

# # 创建MinMaxScaler对象，并对特征数据进行归一化
# scaler = MinMaxScaler()
# X_normalized = scaler.fit_transform(X)

# # 打印归一化后特征数据的范围（最小值和最大值）
# print('特征数据归一化范围:', np.min(X_normalized), np.max(X_normalized))

# # 打印训练数据集上计算得到的均值和标准差
# print('训练数据集均值:', scaler.mean_)
# print('训练数据集标准差:', scaler.scale_)

# # 将归一化后的特征数据与标签数据重新合并成一个数据集
# normalized_ds = np.concatenate((X_normalized, y), axis=1)
# # 将归一化后的数据集转换为DataFrame
# normalized_ds = pd.DataFrame(normalized_ds)


## 构建训练集和测试集

接着，我们将数据分为训练集和测试集

我们分别获取训练集和测试集的特征以及补偿值：

In [8]:
# from sklearn.preprocessing import MinMaxScaler

train_ds=dataset.sample(frac=0.9,random_state=0)
val_ds=dataset.drop(train_ds.index)


train_ds=cleaned_ds.sample(frac=0.9,random_state=0)
val_ds=cleaned_ds.drop(train_ds.index)


#训练集
train_features=train_ds.values[:,:18]
# sc=MinMaxScaler(feature_range=(0,1))
# train_features=sc.fit_transform(train_features)
# train_features=pd.DataFrame(train_features)
# train_features=train_features.drop(train_features[(train_features['8']<0.8)&(train_features['8']>0.53)].sample(frac=0.2,random_state=0).index)
train_labels=train_ds.values[:,18:]

#验证集
val_features=val_ds.values[:,:18]
# sc=MinMaxScaler(feature_range=(0,1))
# val_features=sc.fit_transform(val_features)
# val_features=pd.DataFrame(val_features)
val_labels=val_ds.values[:,18:]

# 测试集
# test_features=test_dataset.values[:,:18]
# test_labels=test_dataset.values[:,18:]
test_features=cleaned_ds1.values[:,:18]
test_labels=cleaned_ds1.values[:,18:]

# test_features = cleaned_ds1.copy()
# test_labels = test_features[['补偿'+str(i) for i in range(8)]].copy()
# test_features = test_features.drop(['补偿'+str(i) for i in range(8)], axis=1)

# test_features=cleaned_ds1.values[:,:18]
# test_labels=cleaned_ds1.values[:,18:]
# print(train_features)
# print(train_features.describe())
print(train_features.shape,train_labels.shape)
print(val_features.shape,val_labels.shape)
print(test_features.shape,test_labels.shape)

(40910, 18) (40910, 8)
(4545, 18) (4545, 8)
(2771, 18) (2771, 8)


数据的分析和预处理的方法有很多种，我们只展示了一种方法。用户可根据自己的需要使用其他方法。

数据的分析和预处理的方法有很多种，我们只展示了一种方法。用户可根据自己的需要使用其他方法。

## 模型构建

本平台支持基于Tensorflow-Serving的HTTP调用方式：该方式支持任何部署在TensorFlow Serving上的模型

### TensorFlow
首先，我们导入相关的依赖包。

In [9]:
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
from numpy import array
from numpy.random import uniform
from numpy import hstack
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

print(tf.__version__)

2.8.2


然后，我们开始构建模型

In [25]:
#----------构建模型及训练-----------------
from tensorflow.keras.layers import Dense,Dropout
from tensorflow.keras import regularizers
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras import optimizers
# model = tf.keras.Sequential([
#     layers.Dense(100, input_dim=train_features.shape[1], activation="relu"),
#     Dropout(0.5),
#     layers.Dense(train_labels.shape[1])
# ])


# 创建Normalizaiton层
normalizer = tf.keras.layers.experimental.preprocessing.Normalization()
# # 计算并设置归一化参数
normalizer.adapt(train_features)
# 构建模型
normalizer.adapt(val_features)


# model = tf.keras.Sequential([
#     normalizer,  # 归一化层作为第一层
#     layers.Dense(300, activation="relu",input_dim=train_features.shape[1]),
#     layers.Dense(300, activation="relu"),
#     layers.Dense(300, activation="relu"),
#     layers.Dense(train_labels.shape[1])
#     # normalizer,  # 归一化层作为第一层
#     # layers.Dense(100, activation="relu"),
#     # layers.Dense(train_labels.shape[1])
# ])

# # optimizers=optimizers.Nadam(learning_rate=0.01)
# model.compile(loss="mse", optimizer="adam")  # 根据情况调整参数
# model.summary()


model = tf.keras.Sequential([
normalizer, # 归一化层作为第一层
layers.Dense(512, activation="gelu", input_dim=train_features.shape[1]),

layers.Dense(256, activation="gelu"),

layers.Dense(128,activation="gelu"),
layers.Dense(64,activation="gelu"),
layers.Dense(32,activation="gelu"),
layers.Dense(16,activation="gelu"),
layers.Dropout(0.3),
layers.Dense(train_labels.shape[1])
# normalizer, # 归一化层作为第一层
# layers.Dense(100, activation="relu"),
# layers.Dense(train_labels.shape[1])
])

# optimizers=optimizers.Nadam(learning_rate=0.01)
model.compile(loss="mse", optimizer="adam") # 根据情况调整参数
model.summary()





# best_features=[]
# for i in range(train_features.shape[1]):
#     selected_features=best_features+[i]
#     selected_train_features=train_features[:,selected_features]
# model.fit(   # 根据情况调整参数
#     train_features,
#     train_labels,
#     validation_data=(val_features, val_labels),
#     epochs=20,
#     batch_size=32,
#     callbacks=[lr_scheduler]
# )   
# baselin_acc=model.evaluate(selected_train_features,train_labels,verbose=0)    
# if not best_features or baselin_acc >=best_accurancy:
#     best_features=selected_features
#     best_accurancy=baselin_acc
# best_X=train_features[:,best_features]
# model.fit(   # 根据情况调整参数
#     best_X,
#     train_labels,
#     validation_data=(val_features, val_labels),
#     epochs=20,
#     batch_size=32,
#     callbacks=[lr_scheduler]
# )   


Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization_7 (Normalizat  (None, 18)               37        
 ion)                                                            
                                                                 
 dense_10 (Dense)            (None, 512)               9728      
                                                                 
 dense_11 (Dense)            (None, 256)               131328    
                                                                 
 dense_12 (Dense)            (None, 128)               32896     
                                                                 
 dense_13 (Dense)            (None, 64)                8256      
                                                                 
 dense_14 (Dense)            (None, 32)                2080      
                                                      

## 模型训练
设置模型训练参数进行模型训练

In [26]:
# from sklearn.metrics import mean_squared_error
# from tensorflow.keras.callbacks import EarlyStopping

# # 设置 EarlyStopping 回调函数，如果验证集的损失不再改善，则停止训练
# early_stopping = EarlyStopping(monitor='val_loss', patience=20, restore_best_weights=True)
# lr_scheduler = tf.keras.callbacks.ReduceLROnPlateau(factor=0.9, patience=5)
# model.fit(   # 根据情况调整参数
#     train_features,
#     train_labels,
#     validation_data=(val_features, val_labels),
#     epochs=200,
#     batch_size=32,
#     callbacks=[lr_scheduler]
# )


from sklearn.metrics import mean_squared_error
from tensorflow.keras.callbacks import EarlyStopping

# 设置 EarlyStopping 回调函数，如果验证集的损失不再改善，则停止训练
early_stopping = EarlyStopping(monitor='val_loss', patience=30, restore_best_weights=True)
lr_scheduler = tf.keras.callbacks.ReduceLROnPlateau(factor=0.9, patience=100)
model.fit( # 根据情况调整参数
train_features,
train_labels,
validation_data=(val_features, val_labels),
epochs=500,
batch_size=32,
callbacks=[lr_scheduler]
)




# baselin_acc=model.evaluate(val_features,val_labels,verbose=0)
# print(baselin_acc)
# best_features=[]
# best_accurary=0.0
# for i in range(train_features.shape[1]):
#     selected_feature=best_features+[i]
#     selected_train_features=train_features[:,selected_feature]
#     selected_val_features=val_features[:,selected_feature]
    
# selected_feature=pd.DataFrame(selected_feature)   
# select_model = tf.keras.Sequential([
#     normalizer,  # 归一化层作为第一层
#     layers.Dense(300, activation="relu", input_dim=selected_feature.shape[1]),
#     layers.Dense(300, activation="relu"),
#     layers.Dense(300, activation="relu"),
#     layers.Dense(train_labels.shape[1])
#     # normalizer,  # 归一化层作为第一层
#     # layers.Dense(100, activation="relu"),
#     # layers.Dense(train_labels.shape[1])
# ])
# select_model.compile(loss="mse", optimizer="adam")  # 根据情况调整参数
# select_model.summary()


# select_model.fit(   # 根据情况调整参数
#     selected_train_features,
#     train_labels,
#     validation_data=(selected_val_features, val_labels),
#     epochs=50,
#     batch_size=32,
#     callbacks=[lr_scheduler]
# )

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500
Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500
Epoch 77/500
Epoch 78

<keras.callbacks.History at 0x7f023c792f50>

测试模型训练结果

In [None]:
test_preds = model.predict(test_features)
print("y1 MSE:%.4f" % mean_squared_error(test_labels, test_preds))
print("y1 MSE:%.4f" % len(test_labels), '------',len(test_preds))

y1 MSE:0.0302
y1 MSE:2771.0000 ------ 2771


### 模型部署

误差补偿模型的部署路径为<I>v1/models/slot0/versions/<版本号>/</I> ，且版本号必须为数字。注意，tensorflow-serving在加载模型的时候会自动加载版本号最高的模型，并卸载低版本号的模型。因此，每次部署新部署模型时需要递增版本号。由于我们的系统已经预置了一个低精度版本的模型，并且将版本号设置为1，所以用户在部署自定义模型时应当至少将版本号设置为2。

In [16]:
model_version = 11111111111111
tf.keras.models.save_model(
    model,
    f'/models/slot0/{model_version}/', # v1/models/slot0/为tensorflow-serving的模型根目录
    overwrite=True,
    include_optimizer=True,
    save_format=None,
    signatures=None,
    options=None
)

INFO:tensorflow:Assets written to: /models/slot0/11111111111111/assets


注意，tensorflow-serving卸载旧版本模型并加载新版本模型的过程往往需要数十秒的时间，在次期间对模型发送请求会得到“Servable not found for request”的错误。用户可以使用<I>docker logs adjustment-serving-container</I>查看是否已经加载完毕。

接下来我们测试是否部署成功：

In [17]:
import json
import requests
from pprint import pprint

test_features=pd.DataFrame(test_features)
test_labels=pd.DataFrame(test_labels)
req_data = json.dumps({
            'inputs': test_features.values[:1].tolist()
        })  
print(req_data)
response = requests.post(f'http://fireeye-test-model-container:8501/v1/models/slot0/versions/{model_version}:predict', # 根据部署地址填写
                         data=req_data,
                         headers={"content-type": "application/json"})
if response.status_code != 200:
    raise RuntimeError('Request tf-serving failed: ' + response.text)
resp_data = json.loads(response.text)    
if 'outputs' not in resp_data \
                    or type(resp_data['outputs']) is not list:
    raise ValueError('Malformed tf-serving response')

print(resp_data)
print("{'outputs':",test_labels.values[:1].tolist())

print("y1 MSE:%.4f" % mean_squared_error(test_labels.values[:1].tolist(), resp_data['outputs']))

{"inputs": [[1.4158, 2.9711, 10.7935, 7.5279, 2.3352, 8.1042, 2.3096, 3.3367, 11.8639, 12.7142, 1.8581, 0.3898, 19.8309, 19.771, 0.0001, 1.7768, 171.764, 1434.24]]}
{'outputs': [[0.37509656, -0.888041139, 0.309078217, -0.13684985, -0.864648044, 0.756150842, 0.469049394, 1.89120865]]}
{'outputs': [[0.331511, -0.932553, 0.285048, -0.1435, -0.833982, 0.767568, 0.463969, 1.9048]]
y1 MSE:0.0007


测试成功之后，用户需要在web页面配置相关任务的服务地址，地址的格式为：<I>“http://fireeye-test-model-container:8501/v1/models/slot0/versions/<版本号>:predict ”</I>。