这里 我们使用普通RNN和SegRNN架构对比patch的作用

为什么要进行Patch？

我们希望收集一个长序列的信息

我们可以将RNN类的模型理解为一个顺序的信息收集器，这个收集器可以从前向后逐步遍历所有的时间步，

在每个时间步上，得到的信息储存是之前的历史信息+本次收集的信息，在最后用收集的信息进行输出

因为提升预测的需要，我们必须扩展序列长度，来获取更全面的信息；

但是如果总的距离过长，就必须压缩历史信息的占比，导致远距离信息微弱甚至丢失

而如果不压缩历史信息，会导致梯度爆炸。

因此我们可以参考CNN的思路，采用1维卷积的方式尝试解决这个问题：

首先，我们将原始序列分为长度为某个值子序列，然后每若干步采样一次。

在每个子序列内，可以使用Linear或RNN，尝试在序列中识别出小段的上涨趋势、下跌趋势和平盘之类的信息，并抽象为信息向量；

由于每个子序列内长度有限，RNN可以充分吸取信息而不必担心长序列信息丢失的问题。

如果此时的序列仍然过长，我们可以再加入一层，将上一层得到的子序列再分组为新的子序列，同样用Linear或RNN收集子序列信息。

在这一层我们可以识别出更复杂的形态组合，例如连续多端上涨之后的下跌，抑或是平盘之后的变盘形态等等。

直到整个序列的长度已经很小了，此时我们再使用RNN进行最后一次收集，并将的得到的信息向量传入输出层输出出我们需要的任务。

这样一来，每层的RNN都面对一个相对较小的子序列，不至于出现长距离信息丢失的问题

而不同层的RNN处理的问题是不一样的，其参数和方式也有所不同

In [2]:
import os
os.chdir('d:/future/Index_Future_Prediction')

In [3]:
import numpy as np
import pandas as pd

import torch
import torch.nn as nn
from torch.nn import functional as F
from torch.optim import lr_scheduler, Adam, AdamW
from scipy.stats import norm, t

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

from IPython.display import display, Markdown

In [4]:
from utils.random_split import RandomSplit, CallableDataset
from utils.back_test import BackTest
from utils.hybrid_loss import HybridLoss
from utils.hybrid_decoder import HybridDecoder
from utils.prediction_recorder import PredictionRecorder
from utils.train_animator import TrainAnimator
from utils.model_train import ModelTrain
from utils.get_ohlcv import GetOHLCV

In [5]:
assets_list = ['IH.CFX', 'IF.CFX', 'IC.CFX', 'AU.SHF', 'FU.SHF', 'JM.DCE','RB.SHF','HC.SHF', 'I.DCE', 'M.DCE', 'CF.ZCE',]
assets_list = ['IH.CFX', 'IF.CFX', 'IC.CFX']

In [6]:
seq_len = 20
pred_len = 5
train_ratio = 0.5
validation_ratio = 0.2
test_ratio = 0.02
threshold_ratio = 0.25

patch_size = 5

hidden_size = 10
num_layers = 1

In [7]:
def get_random_split(train_ratio, validation_ratio, test_ratio):
    source = GetOHLCV()
    sample_date = source.get_data('M.DCE', 5, 0.3)
    date_column = sample_date['trade_date'].copy()
    total_size = len(date_column)
    train_size = int(train_ratio * total_size)
    validation_size = int(validation_ratio * total_size)
    test_size = int(test_ratio * total_size)
    random_split = np.random.randint(train_size, total_size - validation_size - test_size)
    validation_start = date_column.iloc[random_split]
    test_start = date_column.iloc[random_split+validation_size]
    test_end = date_column.iloc[random_split+validation_size+test_size]
    return validation_start, test_start, test_end

In [8]:
def get_data_set(assets_list, validation_start, test_start, test_end, seq_len, pred_len, threshold_ratio):

    source = GetOHLCV()

    train_set = None
    validation_set = None
    test_set = None
    feature_column = ['open', 'high', 'low', 'close', 'log_open','log_high','log_low','log_close','log_amount',]
    label_column = ['label_return','down_prob','middle_prob','up_prob']
    
    for code in assets_list:

        data = source.get_data(code, pred_len, threshold_ratio)

        train_data = data[data['trade_date'] < validation_start].copy()
        validation_data = data[(data['trade_date'] >= validation_start) & (data['trade_date'] < test_start)].copy()
        test_data = data[(data['trade_date'] >= test_start) & (data['trade_date'] < test_end)].copy()
    
        train_feature = torch.tensor(train_data[feature_column].values, dtype = torch.float32, device = 'cuda:0')
        train_feature = train_feature.unfold(dimension = 0, size = seq_len, step = 1).transpose(1,2)

        validation_feature = torch.tensor(validation_data[feature_column].values, dtype = torch.float32, device = 'cuda:0')
        validation_feature = validation_feature.unfold(dimension = 0, size = seq_len, step = 1).transpose(1,2)

        test_feature = torch.tensor(test_data[feature_column].values, dtype = torch.float32, device = 'cuda:0')
        test_feature = test_feature.unfold(dimension = 0, size = seq_len, step = 1).transpose(1,2)



        train_label = torch.tensor(train_data[label_column].values, dtype = torch.float32, device = 'cuda:0')
        train_label = train_label[seq_len-1:]

        validation_label = torch.tensor(validation_data[label_column].values, dtype = torch.float32, device = 'cuda:0')
        validation_label = validation_label[seq_len-1:]

        test_label = torch.tensor(test_data[label_column].values, dtype = torch.float32, device = 'cuda:0')
        test_label = test_label[seq_len-1:]



        if train_set == None:
            train_set = CallableDataset(train_feature, train_label)
        else:
            train_set = train_set + CallableDataset(train_feature, train_label)

        if validation_set == None:
            validation_set = CallableDataset(validation_feature, validation_label)
        else:
            validation_set = validation_set + CallableDataset(validation_feature, validation_label)
        
        if test_set == None:
            test_set = CallableDataset(test_feature, test_label)
        else:
            test_set = test_set + CallableDataset(test_feature, test_label)

    return train_set, validation_set, test_set

In [9]:
validation_start, test_start, test_end = get_random_split(train_ratio, validation_ratio, test_ratio)

In [10]:
recorder = PredictionRecorder()
animator = TrainAnimator(figsize=(12,6))

Animator data has been reset.


In [11]:
class PriceInstanceNorm(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten(start_dim = 1)
    
    def forward(self, x):
        flattened_x = self.flatten(x)
        log_x = torch.log(flattened_x)
        mean = torch.mean(log_x ,dim = 1).unsqueeze(1).unsqueeze(1)
        std = torch.std(log_x ,dim = 1).unsqueeze(1).unsqueeze(1)
        processed_x = (log_x - mean)/std
        return processed_x

x = torch.arange(1,141, dtype = torch.float32)
x = x.reshape(7,4,5)
x
pin = PriceInstanceNorm()
pin(x)

tensor([[[-2.6046e+00, -1.7517e+00, -1.2528e+00, -8.9882e-01, -6.2426e-01,
          -3.9992e-01, -2.1025e-01, -4.5944e-02,  9.8982e-02,  2.2862e-01,
           3.4590e-01,  4.5296e-01,  5.5145e-01,  6.4263e-01,  7.2753e-01,
           8.0694e-01,  8.8153e-01,  9.5186e-01,  1.0184e+00,  1.0815e+00],
         [ 1.1415e+00,  1.1988e+00,  1.2535e+00,  1.3058e+00,  1.3561e+00,
           1.4043e+00,  1.4508e+00,  1.4955e+00,  1.5387e+00,  1.5804e+00,
           1.6208e+00,  1.6598e+00,  1.6977e+00,  1.7344e+00,  1.7701e+00,
           1.8047e+00,  1.8385e+00,  1.8713e+00,  1.9032e+00,  1.9344e+00],
         [ 1.9648e+00,  1.9944e+00,  2.0234e+00,  2.0517e+00,  2.0793e+00,
           2.1064e+00,  2.1328e+00,  2.1587e+00,  2.1841e+00,  2.2090e+00,
           2.2333e+00,  2.2572e+00,  2.2806e+00,  2.3036e+00,  2.3262e+00,
           2.3484e+00,  2.3702e+00,  2.3916e+00,  2.4126e+00,  2.4333e+00],
         [ 2.4536e+00,  2.4736e+00,  2.4933e+00,  2.5127e+00,  2.5318e+00,
           2.5506e+00,

In [12]:
class SimplePatch(nn.Module):
    """
    简单的时间序列分段，用于输入RNN模型，因此不需要附加位置信息；
    """
    def __init__(self, patch_size):
        super().__init__()
        self.patch_size = patch_size

    def forward(self, x):
        batch_size = x.shape[0]
        seq_len = x.shape[1]
        num_channels = x.shape[2]
        num_patch = seq_len // self.patch_size
        effective_seq_len = num_patch * self.patch_size
        effective_x = x[:,-effective_seq_len:,:]
        reshaped_x = effective_x.reshape(batch_size, num_patch, self.patch_size, num_channels)
        fallten_x = reshaped_x.reshape(batch_size, num_patch, self.patch_size, num_channels)
        return fallten_x

In [13]:
ss = SimplePatch(patch_size = 5)
x = torch.zeros(size = (100,17,5))
ss(x).shape

torch.Size([100, 3, 5, 5])

In [16]:
class Patch_LSTM(nn.Module):
    """循环神经网络模型"""
    def __init__(self, input_size, hidden_size, num_layers, dropout, patch_size):
        super().__init__()
        self.device = 'cuda:0'
        self.patch = SimplePatch(patch_size = patch_size)
        self.process = nn.GRU(
            input_size = input_size * patch_size,
            hidden_size = hidden_size,
            num_layers = num_layers,
            dropout = dropout,
            batch_first = True,
            # nonlinearity='relu',
        )
        self.regularization = nn.Sequential(nn.Flatten(),nn.Dropout(dropout))
        self.output = HybridDecoder(dim_state = hidden_size, init_prob = [0.0,0.5,0.0])

    def forward(self, x):

        #为了提升模型的泛化能力，我们每次输入都随机舍弃一部分前端的序列
        if self.training:
            seq_len = x.shape[1]
            random_drop = np.random.randint(0, seq_len//2)
            x = x[:,random_drop:,:] 

        # patch
        x = self.patch(x)

        # lstm
        x = self.process(x)[0][:,-1,:]
        
        return self.output(x)

In [17]:
result = np.zeros(shape = (10, len(assets_list), 4))

for i in range(10):
    validation_start, test_start, test_end = get_random_split(train_ratio, validation_ratio, test_ratio)
    train_set, validation_set, test_set = get_data_set(assets_list, validation_start, test_start, test_end, seq_len, pred_len, threshold_ratio)

    for j in range(len(assets_list)):
        code = assets_list[j]
        train_set_2, validation_set_2, test_set_2 = get_data_set([code], validation_start, test_start, test_end, seq_len, pred_len, threshold_ratio)

        model = Patch_LSTM(input_size = 5, hidden_size = hidden_size, num_layers = num_layers, dropout = 0.5, patch_size = 5).to('cuda:0')
        loss_fn = HybridLoss(alpha = 1e-1, delta = 1)
        optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay = 1e-1)
        scheduler = lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.5)
        train = ModelTrain(model = model,
                        batch_size = 100,
                        train_set = train_set,
                        validation_set = validation_set,
                        test_set = test_set,
                        loss_fn = loss_fn,
                        optimizer = optimizer,
                        scheduler=scheduler,
                        recorder=recorder,
                        graph=animator,
                        )
        
        train.train_set = train_set_2
        train.validation_set = validation_set_2

        prediction, precision = train.epoch_train(epochs = 10, round = 100, early_stop = 10)
        result[i,j,0] = prediction
        result[i,j,1] = precision

        # prediction, precision = train.epoch_train(epochs = 10, round = 100, early_stop = 10)
        # result[i,j,2] = prediction
        # result[i,j,3] = precision



Animator data has been reset.


  0%|          | 0/100 [00:00<?, ?it/s]


RuntimeError: input.size(-1) must be equal to input_size. Expected 25, got 45

In [None]:
all_assets = pd.DataFrame({
    'stage_1_prediction': np.mean(result, axis = 0)[:,0],
    'stage_2_prediction': np.mean(result, axis = 0)[:,2],

    'stage_1_precision': np.mean(result, axis = 0)[:,1],
    'stage_2_precision': np.mean(result, axis = 0)[:,3],

    'stage_1_precision_std': np.std(result, axis = 0)[:,1],
    'stage_2_precision_std': np.std(result, axis = 0)[:,3],
})
all_assets.index = pd.Series(assets_list)
for col in all_assets.columns:
    all_assets[col] = all_assets[col].apply(lambda x: f"{x:.1%}")

# 转换为Markdown
markdown_table = all_assets.to_markdown(index=False)
print(f'hidden_size: {hidden_size}, num_layers: {num_layers}, seq_len: {seq_len}')
print(markdown_table)

hidden_size: 10, num_layers: 1, seq_len: 20
| stage_1_prediction   | stage_2_prediction   | stage_1_precision   | stage_2_precision   | stage_1_precision_std   | stage_2_precision_std   |
|:---------------------|:---------------------|:--------------------|:--------------------|:------------------------|:------------------------|
| 79.8%                | 0.0%                 | 7.3%                | 0.0%                | 23.3%                   | 0.0%                    |
| 54.4%                | 0.0%                 | 9.1%                | 0.0%                | 27.6%                   | 0.0%                    |
| 65.7%                | 0.0%                 | 4.6%                | 0.0%                | 13.5%                   | 0.0%                    |


hidden_size: 10, num_layers: 1, seq_len: 40
| stage_1_prediction   | stage_2_prediction   | stage_1_precision   | stage_2_precision   | stage_1_precision_std   | stage_2_precision_std   |
|:---------------------|:---------------------|:--------------------|:--------------------|:------------------------|:------------------------|
| 13.3%                | 0.0%                 | 5.8%                | 0.0%                | 41.2%                   | 0.0%                    |
| 0.6%                 | 0.0%                 | 13.5%               | 0.0%                | 30.6%                   | 0.0%                    |
| 5.5%                 | 0.0%                 | 6.4%                | 0.0%                | 24.2%                   | 0.0%                    |