但现在我们需要制作了一个新的csv：df = pd.read_csv('/root/Download/AlgaeBloomForecast/merged_data.csv')

```
date,temp,oxygen,NH3,TP,TN,algae,area,weather,max_temperature,min_temperature,aqi,aqiLevel,wind_direction,wind_power,aqiInfo
2021-06-02,26.1875,6.6665,0.025,0.068275,1.07325,14400000.0,无锡,阴-阵雨,26,21,24,1,东南风,4级,优
2021-06-03,25.881666666666664,6.6418333333333335,0.0251166666666666,0.0637833333333333,0.9151666666666666,10867091.666666666,无锡,阴-阵雨,26,19,66,2,西北风,3级,良
2021-06-04,25.895,7.946333333333333,0.025,0.0637833333333333,0.9203333333333332,25498423.33333333,无锡,阴-多云,26,18,51,2,西南风,3级,良
2021-06-05,26.85,9.084,0.025,0.04776,0.9058,21100000.0,无锡,晴,32,19,67,2,西南风,3级,良
2021-06-06,28.256666666666664,9.514333333333331,0.025,0.0440666666666666,0.9233333333333332,15211340.0,无锡,晴,33,19,80,2,南风,3级,良
2021-06-07,27.635,8.3865,0.025,0.0366499999999999,0.7778333333333333,7994458.333333333,无锡,阴-多云,35,21,68,2,东南风,3级,良
2021-06-08,28.19666666666667,8.397499999999999,0.025,0.0418666666666666,0.7323333333333334,12259158.333333334,无锡,阴-多云,30,24,36,1,东南风,3级,优
2021-06-09,28.751666666666665,8.309166666666668,0.025,0.0389833333333333,0.601,6891956.666666667,无锡,阴-雷阵雨,32,24,52,2,东南风,3级,良
2021-06-10,28.741666666666664,7.385833333333333,0.025,0.03785,0.5256666666666666,6301236.666666667,无锡,阴,28,24,38,1,东南风,2级,优
2021-06-11,29.491666666666664,7.6176666666666675,0.025,0.0327666666666666,0.4495,6244151.666666667,无锡,阴-多云,32,23,82,2,东风,2级,良
2021-06-12,29.58666666666667,7.271999999999999,0.025,0.02975,0.3741666666666667,4201731.666666667,无锡,多云-雷阵雨,33,24,41,1,东南风,3级,优
2021-06-13,29.563333333333333,6.929333333333333,0.025,0.0302833333333333,0.2663333333333333,4964940.0,无锡,阴-小雨,28,25,34,1,西南风,2级,优
2021-06-14,29.58833333333333,6.963166666666666,0.025,0.0290666666666666,0.1886666666666666,5394340.0,无锡,阴-小雨,31,25,46,1,东南风,3级,优
2021-06-15,30.21,7.23925,0.025,0.033425,0.396,6927237.5,无锡,阴-小雨,33,24,48,1,西南风,3级,优
```

这里我们要用TCN预测藻类的爆发，来捕获时间序列数据的长距离依赖关系。这里提供了逐日的数据，需要考虑date的影响。

通过前期的数据分析，我们发现：
- 这里需要考虑['temp', 'oxygen', 'NH3', 'TP', 'TN'，'algae']对'algae'的影响
- 这里需要考虑weather这一列出现“晴”的影响，以及“晴”的长距离依赖关系。
- 温度的影响，这里，只采用temp作为特征，考虑温度的长距离依赖关系。
- 不考虑max_temperature,min_temperature。
- wind_power大于4级时是一个需要考虑的因素，考虑wind_power的长距离依赖关系。
- 这里的aqi,aqiLevel，aqiInfo与空气质量相关，不考虑。
- wind_direction也不考虑。

请你输出完整的代码。需要：
- 做归一化的处理
- 需要对数据分割，将最近180天的结果作为测试集。
- 需要绘制训练过程的图片、需要输出预测效果的图片，它将生成两个图像文件：`training_loss_TCN.png`显示训练过程，`prediction_results_TCN.png`显示预测效果
- 模型的训练参数是可调整的，比如，添加了学习率调度（ReduceLROnPlateau）、在优化器中添加了L2正则化（weight_decay）、调整了模型结构，包括隐藏层维度和dropout率。

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

# 读取数据
df = pd.read_csv('/root/Download/AlgaeBloomForecast/merged_data.csv')

# 数据预处理
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date')

# 提取相关特征
features = ['temp', 'oxygen', 'NH3', 'TP', 'TN', 'algae']
target = 'algae'

# 添加天气和风力特征
df['is_sunny'] = (df['weather'].str.contains('晴')).astype(int)
df['high_wind'] = (df['wind_power'].str.extract(r'(\d+)').astype(float) > 4).astype(int)
# 准备特征数据
X = df[features + ['is_sunny', 'high_wind']].values
y = df[target].values

# 归一化处理
scaler_X = MinMaxScaler()
scaler_y = MinMaxScaler()
X_scaled = scaler_X.fit_transform(X)
y_scaled = scaler_y.fit_transform(y.reshape(-1, 1))

# 准备序列数据
def create_sequences(X, y, time_steps=1):
    Xs, ys = [], []
    for i in range(len(X) - time_steps):
        Xs.append(X[i:(i + time_steps)])
        ys.append(y[i + time_steps])
    return np.array(Xs), np.array(ys)

time_steps = 30  # 使用过去30天的数据来预测
X_seq, y_seq = create_sequences(X_scaled, y_scaled, time_steps)

# 分割数据集，最近180天作为测试集
test_size = 180
X_train, X_test = X_seq[:-test_size], X_seq[-test_size:]
y_train, y_test = y_seq[:-test_size], y_seq[-test_size:]

# 转换为PyTorch张量
X_train = torch.FloatTensor(X_train)
y_train = torch.FloatTensor(y_train)
X_test = torch.FloatTensor(X_test)
y_test = torch.FloatTensor(y_test)

# 创建数据加载器
train_dataset = TensorDataset(X_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

In [2]:


# 定义TCN模型
class TCN(nn.Module):
    def __init__(self, input_size, output_size, num_channels, kernel_size, dropout):
        super(TCN, self).__init__()
        self.tcn = nn.Sequential(
            nn.Conv1d(input_size, num_channels, kernel_size, padding=(kernel_size-1)//2),
            nn.ReLU(),
            nn.Conv1d(num_channels, num_channels, kernel_size, padding=(kernel_size-1)//2),
            nn.ReLU(),
            nn.Conv1d(num_channels, num_channels, kernel_size, padding=(kernel_size-1)//2),
            nn.ReLU(),
        )
        self.linear = nn.Linear(num_channels, output_size)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        # x shape: (batch_size, sequence_length, input_size)
        x = x.transpose(1, 2)  # (batch_size, input_size, sequence_length)
        y = self.tcn(x)
        y = y[:, :, -1]  # 只取最后一个时间步
        y = self.dropout(y)
        return self.linear(y)

# 初始化模型
input_size = X_train.shape[2]
model = TCN(input_size=input_size, output_size=1, num_channels=64, kernel_size=3, dropout=0.2)

# 定义损失函数和优化器
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.5, patience=10, min_lr=1e-6)

# 训练模型
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

num_epochs = 100
train_losses = []
val_losses = []

for epoch in range(num_epochs):
    model.train()
    train_loss = 0
    for batch_X, batch_y in train_loader:
        batch_X, batch_y = batch_X.to(device), batch_y.to(device)
        optimizer.zero_grad()
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
    
    train_loss /= len(train_loader)
    train_losses.append(train_loss)
    
    # 验证
    model.eval()
    with torch.no_grad():
        val_outputs = model(X_test.to(device))
        val_loss = criterion(val_outputs, y_test.to(device))
        val_losses.append(val_loss.item())
    
    scheduler.step(val_loss)
    
    print(f'Epoch [{epoch+1}/{num_epochs}], Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}')

# 绘制训练过程
plt.figure(figsize=(10, 6))
plt.plot(train_losses, label='Training Loss')
plt.plot(val_losses, label='Validation Loss')
plt.title('Model Training History')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.savefig('training_loss_TCN_pytorch.png')
plt.close()

# 预测
model.eval()
with torch.no_grad():
    y_pred = model(X_test.to(device)).cpu().numpy()

# 反归一化
y_test_inv = scaler_y.inverse_transform(y_test)
y_pred_inv = scaler_y.inverse_transform(y_pred)

# 绘制预测结果
plt.figure(figsize=(12, 6))
plt.plot(df['date'].values[-test_size:], y_test_inv, label='Actual')
plt.plot(df['date'].values[-test_size:], y_pred_inv, label='Predicted')
plt.title('Algae Bloom Prediction')
plt.xlabel('Date')
plt.ylabel('Algae Concentration')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('prediction_results_TCN_pytorch.png')
plt.close()

# 计算并打印评估指标
mse = np.mean((y_test_inv - y_pred_inv)**2)
mae = np.mean(np.abs(y_test_inv - y_pred_inv))
print(f"Mean Squared Error: {mse}")
print(f"Mean Absolute Error: {mae}")

Epoch [1/100], Train Loss: 0.0186, Val Loss: 0.0104
Epoch [2/100], Train Loss: 0.0075, Val Loss: 0.0063
Epoch [3/100], Train Loss: 0.0038, Val Loss: 0.0058
Epoch [4/100], Train Loss: 0.0035, Val Loss: 0.0054
Epoch [5/100], Train Loss: 0.0033, Val Loss: 0.0056
Epoch [6/100], Train Loss: 0.0028, Val Loss: 0.0049
Epoch [7/100], Train Loss: 0.0028, Val Loss: 0.0047
Epoch [8/100], Train Loss: 0.0029, Val Loss: 0.0050
Epoch [9/100], Train Loss: 0.0030, Val Loss: 0.0042
Epoch [10/100], Train Loss: 0.0024, Val Loss: 0.0037
Epoch [11/100], Train Loss: 0.0024, Val Loss: 0.0042
Epoch [12/100], Train Loss: 0.0022, Val Loss: 0.0028
Epoch [13/100], Train Loss: 0.0024, Val Loss: 0.0030
Epoch [14/100], Train Loss: 0.0021, Val Loss: 0.0038
Epoch [15/100], Train Loss: 0.0021, Val Loss: 0.0048
Epoch [16/100], Train Loss: 0.0023, Val Loss: 0.0050
Epoch [17/100], Train Loss: 0.0021, Val Loss: 0.0040
Epoch [18/100], Train Loss: 0.0021, Val Loss: 0.0037
Epoch [19/100], Train Loss: 0.0021, Val Loss: 0.0032
Ep