# 上分思路
- 数据
- 模型
- 损失函数
- 训练方式
- 超参数

### **第一步**
在运行环境中安装对应的库 执行该命令即可

In [1]:
! pip install -r requirements.txt

[31mERROR: Could not open requirements file: [Errno 2] 没有那个文件或目录: 'requirements.txt'[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.1.2[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


### **第二步**
导入运行所需要的库函数 

In [2]:
import os
import pandas as pd
import xarray as xr
from torch.utils.data import Dataset, DataLoader

### **第三步**
数据集路径配置设置
- 比赛的数据部分分为**数据特征**和**数据真值**两部分，数据特征是模型训练的**输入**，数据真值是模型训练的**标签**
- 其中数据特征部分 输入的路径目录下包含年份文件夹 
 - 例如示例给出的 "输入路径/2021/..." 各年份文件夹下包含从官网下载的压缩包(e.g. weather.round1.train.ft.2021.1.zip) 解压后文件夹下有不同时段的数据文件夹(e.g. 20210101-00), 内部包含6个nc文件, 是从伏羲大模型中获取的从第6小时到第72小时的数据

- 数据真值部分 输入的路径目录下包含3个年份的.nc数据, 其中选择哪些年份的特征数据作为输入, 就在years中添加哪些年份
- fcst_steps指预测的时间步长, 从第1小时到第72小时, 间隔为1小时



In [3]:
# path config
feature_path = 'feature' #自定义路径并修改为自己的路径
gt_path = 'groundtruth' #自定义路径并修改为自己的路径
years = ['2021']
fcst_steps = list(range(1, 73, 1))

### **第四步**
Feature类和GroundTruth类是数据集的定义
方便后续自定义数据集和数据加载类, 方便我们训练时取数据

In [4]:
# Feature部分
class Feature:
    def __init__(self):
        self.path = feature_path
        self.years = years
        self.fcst_steps = fcst_steps
        self.features_paths_dict = self.get_features_paths()

    def get_features_paths(self):
        init_time_path_dict = {}
        for year in self.years:
            init_time_dir_year = os.listdir(os.path.join(self.path, year))
            for init_time in sorted(init_time_dir_year):
                init_time_path_dict[pd.to_datetime(init_time)] = os.path.join(self.path, year, init_time)
        return init_time_path_dict

    def get_fts(self, init_time):
        return xr.open_mfdataset(self.features_paths_dict.get(init_time) + '/*').sel(lead_time=self.fcst_steps).isel(
            time=0)
    
# GroundTruth部分
class GT:
    def __init__(self):
        self.path = gt_path
        self.years = years
        self.fcst_steps = fcst_steps
        self.gt_paths = [os.path.join(self.path, f'{year}.nc') for year in self.years]
        self.gts = xr.open_mfdataset(self.gt_paths)

    def parser_gt_timestamps(self, init_time):
        return [init_time + pd.Timedelta(f'{fcst_step}h') for fcst_step in self.fcst_steps]

    def get_gts(self, init_time):

        return self.gts.sel(time=self.parser_gt_timestamps(init_time))

### **第五步**
mydataset类的定义, 整合了加载特征和特征对应真值的功能, 方便后续训练时取数据

In [5]:
# 构建Dataset部分
class mydataset(Dataset):
    def __init__(self):
        self.ft = Feature()
        self.gt = GT()
        self.features_paths_dict = self.ft.features_paths_dict
        self.init_times = list(self.features_paths_dict.keys())

    def __getitem__(self, index):
        init_time = self.init_times[index]
        try:
            ft_item = self.ft.get_fts(init_time).to_array().isel(variable=0).values
            gt_item = self.gt.get_gts(init_time).to_array().isel(variable=0).values
        except KeyError as e:
            print(e)
            print(f'init_time: {init_time} not found')
            # return None, None
            return self.__getitem__(index - 1)
        
        return ft_item, gt_item

    def __len__(self):
        return len(list(self.init_times))

### **第六步**
前五步已经完成了数据预处理加载的相关类和函数的准备, 这里我们可以通过实例化mydataset类来查看数据数量
同时完成数据集的构建后, 我们可以通过DataLoader来查看数据集的数据

In [6]:
import torch
# define dataset
my_data = mydataset()
print('sample num:', mydataset().__len__())

sample num: 4


In [7]:
from torch.utils.data import Dataset, DataLoader, random_split
# Split the dataset into training and validation sets
train_size = int(0.8 * len(my_data))
val_size = len(my_data) - train_size
train_dataset, val_dataset = torch.utils.data.random_split(my_data, [train_size, val_size])

# Create data loaders for training and validation sets
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=1, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1, shuffle=False)

### **第七步**
- 完成了数据的准备工作, 接下来就是构建模型的部分
- Model这个类, 对我们的模型进行定义, 方便后续训练时调用
- 这里我们以一个简单的只有一个卷积层的网络为例
- 在本次比赛中, 我们的输入数据维度是(1, 24, 72, W, H), 输出数据维度是(1, 72, W, H) 可以在赛题中查看

In [8]:
import torch
import torch.nn as nn

#通道注意力机制
class TimeAttentionModule(nn.Module):
    def __init__(self, channels, heads=1, dropout=0.1):
        super(TimeAttentionModule, self).__init__()
        self.heads = heads
        self.channels = channels

        # 为每个头分配通道
        self.key_channels = channels // heads
        self.query_channels = channels // heads
        self.value_channels = channels // heads

        # 定义查询（Q）、键（K）、值（V）的线性变换
        self.query_layer = nn.Linear(channels, heads * self.query_channels)
        self.key_layer = nn.Linear(channels, heads * self.key_channels)
        self.value_layer = nn.Linear(channels, heads * self.value_channels)

        # 输出层
        self.output_layer = nn.Linear(channels, channels)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        B, T, C = x.shape  # B: 批次大小, T: 时间步长, C: 通道数
        x = x.permute(0, 2, 1)  # 调整形状为 (B, C, T)

        queries = self.query_layer(x).view(B, self.heads, self.query_channels, T)
        keys = self.key_layer(x).view(B, self.heads, self.key_channels, T)
        values = self.value_layer(x).view(B, self.heads, self.value_channels, T)

        # 计算注意力权重
        attention_scores = torch.matmul(queries, keys.transpose(-1, -2)) / math.sqrt(self.key_channels)
        attention_weights = torch.softmax(attention_scores, dim=-1)

        # 应用注意力权重
        output = torch.matmul(attention_weights, values)

        # 合并头
        output = output.reshape(B, C, T)

        # 通过输出层
        output = self.output_layer(output)

        # 添加残差连接和层归一化
        output = self.dropout(self.activation(output + x))

        return output
#注意力机制
class ChannelAttentionModule(nn.Module):
    def __init__(self, in_channels, reduction_ratio=16):
        super(ChannelAttentionModule, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(in_channels, in_channels // reduction_ratio, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(in_channels // reduction_ratio, in_channels, bias=False),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        avg_out = self.fc(self.avg_pool(x).view(b, c))
        out = x * avg_out.expand_as(x)
        return out
# 
class EnhancedModel(nn.Module):
    def __init__(self, num_in_ch, num_out_ch):
        super(EnhancedModel, self).__init__()
        self.in_channels = num_in_ch
        self.conv1 = nn.Conv2d(num_in_ch, 64, kernel_size=3, padding=1)
        self.batchnorm = nn.BatchNorm2d(64) 
        self.attention1 = ChannelAttentionModule(64)
        self.activation = nn.ReLU()
        self.conv2 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
        self.attention2 = ChannelAttentionModule(64)
        self.conv3 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
        self.conv4 = nn.Conv2d(64, num_out_ch, kernel_size=3, padding=1)
        self.time_attention = TimeAttentionModule(64)
       
    
    def forward(self, x):
        B, S, C, W, H = tuple(x.shape)
        x = x.reshape(B, -1, W, H)
        out = self.conv1(x)
        out = self.attention1(out)
        out = self.activation(out)
        out = self.conv2(out)
        out = self.attention2(out)
        out = self.activation(out)
        out = self.conv3(out)
        out = self.activation(out)
        out = self.conv4(out)
        out = self.conv1(out)
        out = self.activation(out)
        out = self.time_attention(out)
        out = self.activation(out)
        out = out.reshape(B, S, W, H)
        
        return out
    
# define model
in_varibales = 24
in_times = len(fcst_steps)
out_varibales = 1
out_times = len(fcst_steps)
input_size = in_times * in_varibales
output_size = out_times * out_varibales
model = EnhancedModel(input_size, output_size).cuda()

In [9]:
# 推荐先使用这个网络
import torch
import torch.nn as nn

class EnhancedModel(nn.Module):
    def __init__(self, num_in_ch, num_out_ch):
        super(EnhancedModel, self).__init__()
        self.conv = nn.Conv2d(num_in_ch, num_out_ch, kernel_size=3, padding=1)
        self.activation = nn.ReLU()


    def forward(self, x):
        B, S, C, W, H = tuple(x.shape)
        x = x.reshape(B, -1, W, H)
        out = self.conv(x)
        out = self.activation(out)
        out = out.reshape(B, S, W, H)
        return out

# define model
in_varibales = 24
in_times = len(fcst_steps)
out_varibales = 1
out_times = len(fcst_steps)
input_size = in_times * in_varibales
output_size = out_times * out_varibales
model = EnhancedModel(input_size, output_size).cuda()

### **第八步**
定义模型的损失函数部分， 用于模型训练做反向传播

In [10]:
loss_func = nn.SmoothL1Loss()
# loss_func = nn.MSELoss()

### **第九步**
模型训练部分

In [11]:
# 模型初始化
import torch.nn.init as init
def init_weights(m):
    if isinstance(m, nn.Conv2d):
        init.xavier_uniform_(m.weight)
        if m.bias is not None:
            init.constant_(m.bias, 0)

model.apply(init_weights)

EnhancedModel(
  (conv): Conv2d(1728, 72, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (activation): ReLU()
)

In [12]:
import numpy as np
import torch
# from tqdm import tqdm
# Train the model
num_epochs = 300
optimizer = torch.optim.Adam(model.parameters(), lr=0.0004, weight_decay=1e-6)

# for epoch in tqdm(range(num_epochs)):
os.makedirs('./model', exist_ok=True)
for epoch in range(num_epochs):
    model.train()
    loss = 0.0
    for index, (ft_item, gt_item) in enumerate(train_loader):
        ft_item = ft_item.cuda().float()
        gt_item = gt_item.cuda().float()
        # print("gt", gt_item.max(), gt_item.min())
        # Backward and optimize
        optimizer.zero_grad()
        # Forward pass
        output_item = model(ft_item)
        # print(output_item.max(), output_item.min())
        loss = loss_func(output_item, gt_item)
            
        loss.backward()
        optimizer.step()
            
        loss += loss.item()
        # Print the loss for every 10 steps
        if (index+1) % 10 == 0:
            print(f"Epoch [{epoch+1}/{num_epochs}], Step [{index+1}/{len(train_loader)}], Loss: {loss.item():.6f}")
            loss = 0.0
    # Save the model weights
    torch.save(model.state_dict(), f'./model/model_weights_{epoch}.pth')
    model.eval()
    val_loss = 0.0
    with torch.no_grad():
        for index, (ft_item, gt_item) in enumerate(val_loader):
            ft_item = ft_item.cuda().float()
            gt_item = gt_item.cuda().float()
            output_item = model(ft_item)
            val_loss = loss_func(output_item.max(), gt_item.max())
            val_loss += val_loss.item()
    val_loss /= len(val_loader)
    print(f"[Epoch {epoch+1}/{num_epochs}], Validation Loss: {val_loss:.6f}")

print("Done!")


[Epoch 1/300], Validation Loss: 0.011621
[Epoch 2/300], Validation Loss: 0.510049
[Epoch 3/300], Validation Loss: 0.213261
[Epoch 4/300], Validation Loss: 0.006004
[Epoch 5/300], Validation Loss: 0.440476
[Epoch 6/300], Validation Loss: 1.064613
[Epoch 7/300], Validation Loss: 1.445765
[Epoch 8/300], Validation Loss: 1.589299
[Epoch 9/300], Validation Loss: 1.945040
[Epoch 10/300], Validation Loss: 2.072266
[Epoch 11/300], Validation Loss: 2.029991
[Epoch 12/300], Validation Loss: 1.829606
[Epoch 13/300], Validation Loss: 1.647306
[Epoch 14/300], Validation Loss: 1.466165
[Epoch 15/300], Validation Loss: 1.512321
[Epoch 16/300], Validation Loss: 1.719423
[Epoch 17/300], Validation Loss: 1.837920
[Epoch 18/300], Validation Loss: 1.896048
[Epoch 19/300], Validation Loss: 1.812620
[Epoch 20/300], Validation Loss: 1.769122
[Epoch 21/300], Validation Loss: 1.720227
[Epoch 22/300], Validation Loss: 1.892065
[Epoch 23/300], Validation Loss: 1.887482
[Epoch 24/300], Validation Loss: 1.792696
[

### **第十步**
- 模型推理部分, 通过加载模型使用测试数据作为输入, 得到预测结果
- 其中test_data_path需要给出从下载测试数据解压后的目录路径

In [15]:
# Inference
# Load the model weights
model.load_state_dict(torch.load('model/model_weights_99.pth'))
model.eval()
import os

test_data_path = "test/weather.round1.test"
os.makedirs("./output", exist_ok=True)
for index, test_data_file in enumerate(os.listdir(test_data_path)):
    test_data = torch.load(os.path.join(test_data_path, test_data_file))
    test_data = test_data.cuda().float()
    
    # Forward pass
    output_item = model(test_data)
    output_item = output_item.to(torch.float16)
    
    # Print the output shape
    print(f"Output shape for sample {test_data_file.split('.')[0]}: {output_item.shape}, {output_item.dtype == torch.float16}")
    print(f"{test_data_file.split('.')[0]}: max: {output_item.max()}, min: {output_item.min()}")
    
    # Save the output
    output_path = f"output/{test_data_file}"
    torch.save(output_item.cpu(), output_path)

Output shape for sample 122: torch.Size([1, 72, 57, 81]), True
122: max: 38.71875, min: 0.0
Output shape for sample 228: torch.Size([1, 72, 57, 81]), True
228: max: 11.359375, min: 0.0
Output shape for sample 226: torch.Size([1, 72, 57, 81]), True
226: max: 13.1796875, min: 0.0
Output shape for sample 227: torch.Size([1, 72, 57, 81]), True
227: max: 29.84375, min: 0.0
Output shape for sample 054: torch.Size([1, 72, 57, 81]), True
054: max: 35.59375, min: 0.0
Output shape for sample 176: torch.Size([1, 72, 57, 81]), True
176: max: 10.375, min: 0.0
Output shape for sample 058: torch.Size([1, 72, 57, 81]), True
058: max: 10.71875, min: 0.0
Output shape for sample 183: torch.Size([1, 72, 57, 81]), True
183: max: 37.6875, min: 0.0
Output shape for sample 222: torch.Size([1, 72, 57, 81]), True
222: max: 15.5859375, min: 0.0
Output shape for sample 299: torch.Size([1, 72, 57, 81]), True
299: max: 12.0078125, min: 0.0
Output shape for sample 042: torch.Size([1, 72, 57, 81]), True
042: max: 3.2

In [14]:
!zip -r output.zip output

updating: output/ (stored 0%)
updating: output/122.pt (deflated 24%)
updating: output/228.pt (deflated 95%)
updating: output/226.pt (deflated 76%)
updating: output/227.pt (deflated 24%)
updating: output/054.pt (deflated 24%)
updating: output/176.pt (deflated 93%)
updating: output/058.pt (deflated 96%)
updating: output/183.pt (deflated 23%)
updating: output/222.pt (deflated 81%)
updating: output/299.pt (deflated 79%)
updating: output/042.pt (deflated 99%)
updating: output/012.pt (deflated 26%)
updating: output/006.pt (deflated 27%)
updating: output/194.pt (deflated 28%)
updating: output/069.pt (deflated 42%)
updating: output/143.pt (deflated 73%)
updating: output/133.pt (deflated 94%)
updating: output/163.pt (deflated 23%)
updating: output/118.pt (deflated 78%)
updating: output/000.pt (deflated 95%)
updating: output/063.pt (deflated 24%)
updating: output/264.pt (deflated 77%)
updating: output/186.pt (deflated 25%)
updating: output/266.pt (deflated 85%)
updating: output/051.pt (deflated 