执行多元时间序列分析时，需要使用多个特征预测当前的目标在训练时，如果使用 5 列 [feature1, feature2, feature3, feature4, target] 来训练模型，我们需要提供 4 列 [feature1, feature2, feature3, feature4]。

导入预测所需要的库
在Keras中有两种深度学习的模型：序列模型（Sequential）和通用模型（Model）。差异在于不同的拓扑结构。

In [46]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from tensorflow.keras.models import Sequential  #按顺序建立
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dense,Dropout  #全连接层
from sklearn.preprocessing import MinMaxScaler  #数据归一化
from keras.wrappers.scikit_learn import KerasRegressor   #回归
from sklearn.model_selection import GridSearchCV   #自动调参

读取数据

In [47]:
df = pd.read_csv("B0005.csv")
df.head()  #前面五行

In [48]:
capacity_original_half = np.array(df)[:20,4]#1到20行的第四列的容量数据  切片

训练测试拆分

In [49]:
df_for_training=df[:120]
df_for_testing=df[:]
print(df_for_training.shape)
print(df_for_testing.shape)

MinMax归一化预处理

In [50]:
scaler = MinMaxScaler(feature_range=(0,1))
df_for_training_scaled = scaler.fit_transform(df_for_training)
df_for_testing_scaled = scaler.transform(df_for_testing)
df_for_training_scaled

将数据拆分为X和Y


In [51]:
def createXY(dataset,n_past):
    dataX = []
    dataY = []
    for i in range(n_past,len(dataset)):
    
        dataX.append(dataset[i-n_past:i,0:dataset.shape[1]])  
        dataY.append(dataset[i,4])
    return np.array(dataX),np.array(dataY)

trainX, trainY = createXY(df_for_training_scaled,20) 
testX, testY = createXY(df_for_testing_scaled,20) 

n_past是预测下一个目标值时将在过去查看的步骤数，为20的话，就是使用过去20个值（包括目标列在内的所有特性）来预测第21个目标值
所以trainX有所有的特征值，而trainY中只有目标值

In [52]:
print(trainX.shape)
print(trainY.shape)

print(testX.shape)
print(testY.shape)

如果查看 trainX[1] 值，会发现到它与 trainX[0] 中的数据相同（第一列除外），因为我们将看到前 20 个来预测第 21 列，在第一次预测之后它会自动移动 到第 2 列并取下一个 20 值来预测下一个目标值。

每个数据都将保存在 trainX 和 trainY 中

GridSearchCV，它存在的意义就是自动调参，只要把参数输进去，就能给出最优化的结果和参数。但是这个方法适合于小数据集，一旦数据的量级上去了，很难得出结果。

训练模型，使用gridsearchcv网格搜索进行超参数（需要人工选择的参数）调整找到基础模型

GridSearchCV的名字其实可以拆分为两部分，GridSearch和CV，即网格搜索和交叉验证。网格搜索，搜索的是参数，即在指定的参数范围内，按步长依次调整参数，利用调整的参数训练学习器，从所有的参数中找到在验证集上精度最高的参数，是一个训练和比较的过程。

1.选择并构建训练模型model

2.将训练模型model投入到GridSearchCV中，得到GridSearchCV模型grid_model

3.用grid_model拟合训练集数据，选择在validation_dataset上效果最好的参数的模型best_estimator

4.1.用best_estimator拟合训练集（得到的结果应该与之前不同，因为之前用交叉验证等方法对训练集进行了分割）

4.2.用best_estimator拟合测试集



In [53]:
learning_rate = 0.01
def build_model(optimizer):
    grid_model = Sequential()
    grid_model.add(LSTM(50,return_sequences=True,input_shape=(20,5)))
    grid_model.add(LSTM(50))
    grid_model.add(Dropout(0.2))
    grid_model.add(Dense(1))
    
    grid_model.compile(loss = 'mse',optimizer=optimizer)
    return grid_model
grid_model = KerasRegressor(build_fn=build_model,verbose=1)

parameters = {'batch_size':[16,24,28,32,40],
            'epochs':[300,500,800],
            'optimizer':['adam']}

grid_search = GridSearchCV(estimator = grid_model,
                          param_grid = parameters,cv = 2)

将模型拟合到trainX和trainY数据中

In [54]:
grid_search = grid_search.fit(trainX,trainY)

找到最佳的模型参数


In [55]:
grid_search.best_params_

将最佳模型保存在在my_model变量中

In [56]:
my_model=grid_search.best_estimator_.model

In [57]:
prediction = my_model.predict(testX)
print("prediction is\n",prediction)
print("\nprediction Shape-",prediction.shape)

In [58]:
prediction_copies_array = np.repeat(prediction,5,axis=-1)#在缩放的时候一行有五列，现在是目标列一列，所以将预测列复制四次得到五列相同的值

In [59]:
pred=scaler.inverse_transform(np.reshape(prediction_copies_array,(len(prediction),5)))[:,4]#只需要最后一列 切片

将这个pred的值与testY进行比较,testY也是按比例缩放，同样要逆变换

In [60]:
original_copies_array = np.repeat(testY,5,axis=-1)
original=scaler.inverse_transform(np.reshape(original_copies_array,(len(testY),5)))[:,4]

In [61]:
capacity_original_complete = np.append(capacity_original_half,original)
pred_complete = np.append(capacity_original_half,pred)

In [62]:
plt.plot(pred_complete,color = 'blue',label = 'Prediccted Capacity')
plt.plot(capacity_original_complete,color = 'red',label = 'Real Capacity')
plt.title('B0005 Battery')
plt.xlabel('Cycle')
plt.ylabel('Capacity')
plt.legend()
plt.show()

In [63]:
from math import sqrt
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
print("mean_absolute_error:",mean_absolute_error(original,pred))
print("mean_squared_error:",mean_squared_error(original,pred))
print("rmse:",sqrt(mean_squared_error(original,pred)))
print("r2 score:",r2_score(original,pred))

预测未来的值

df.loc[]：是按标签或者布尔数组进行行/列索引
df.iloc[]：是按标签位置（from 0 to length - 1)或者布尔数组进行索引

In [64]:
df_cycle_past = df.iloc[79:99,:]  
df_cycle_past


In [65]:
df_cycle_future=pd.read_csv("B18_test1.csv",encoding="gbk")


In [66]:
df_cycle_future["容量"] = 0


In [67]:
#剔除预测数据中容量列，进行归一化缩放，拼接20个预测输入和88test点
#df_cycle_future = df_cycle_future[["循环次数","平均放电电压","平均放电温度","等压降放电时间","容量"]]
old_scaled_array = scaler.transform(df_cycle_past)
new_scaled_array = scaler.transform(df_cycle_future)
new_scaled_df = pd.DataFrame(new_scaled_array)
new_scaled_df.iloc[:,4] = np.nan
full_df = pd.concat([pd.DataFrame(old_scaled_array),new_scaled_df]).reset_index().drop(["index"],axis=1)

In [None]:
#滚动填充容量数据预测
full_df_scaled_array = full_df.values
all_data = [] #预测值
time_step = 20
for i in range(time_step,len(full_df_scaled_array)):
    data_x = []
    data_x.append(full_df_scaled_array[i-time_step:i,0:full_df_scaled_array.shape[1]])
    data_x = np.array(data_x)
    prediction = my_model.predict(data_x)
    print(prediction)
    all_data.append(prediction)
    full_df.iloc[i,4] = prediction   

In [None]:
full_df_scaled_array[0:,0:full_df_scaled_array.shape[1]]
full_df

In [None]:
#逆缩放
new_array=np.array(all_data)
new_array=new_array.reshape(-1,1)
prediction_copies_array = np.repeat(new_array,5,axis=-1)
y_pred_future_cycle = scaler.inverse_transform(np.reshape(prediction_copies_array,(len(new_array),5)))[:,4]
print(y_pred_future_cycle)

起始点为80预测末尾88个容量

In [None]:
capacity_original_half = np.array(df)[:100,4]
capacity_original_complete = np.array(df)[:,4]
len(capacity_original_half)

In [None]:
pred_complete = np.append(capacity_original_half,y_pred_future_cycle)

In [None]:
len(pred_complete)

In [None]:
len(capacity_original_complete)

In [None]:
plt.plot(pred_complete,color = 'blue',label = 'Predicted Capacity')
plt.plot(capacity_original_complete,color = 'red',label = 'Real Capacity')
plt.title('B0018 Battery')
plt.xlabel('Cycle')
plt.ylabel('Capacity')
plt.legend()
plt.show()

In [None]:
from math import sqrt
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
print("mean_absolute_error MAE:", mean_absolute_error(capacity_original_complete, pred_complete))
print("mean_squared_error MSE:", mean_squared_error(capacity_original_complete, pred_complete))
print("rmse:", sqrt(mean_squared_error(capacity_original_complete, pred_complete)))
print("r2 score:", r2_score(capacity_original_complete, pred_complete))