# 基于飞桨实现乒乓球时序动作定位大赛 ：B榜第18名方案

## 方案说明
本项目基于基线，对BMN模型进行了些许修改

## 赛题介绍

在众多大规模视频分析情景中，从冗长未经修剪的视频中定位并识别短时间内发生的人体动作成为一个备受关注的课题。当前针对人体动作检测的解决方案在大规模视频集上难以奏效，高效地处理大规模视频数据仍然是计算机视觉领域一个充满挑战的任务。其核心问题可以分为两部分，一是动作识别算法的复杂度仍旧较高，二是缺少能够产生更少视频提案数量的方法（更加关注短时动作本身的提案）。

这里所指的视频动作提案是指一些包含特定动作的候选视频片段。为了能够适应大规模视频分析任务，时序动作提案应该尽可能满足下面两个需求：
（1）更高的处理效率，例如可以设计出使时序视频片段编码和打分更高效的机制；
（2）更强的判别性能，例如可以准确定位动作发生的时间区间。

本次比赛旨在激发更多的开发者和研究人员关注并参与有关视频动作定位的研究，创建性能更出色的动作定位模型。

## 数据集介绍

本次比赛的数据集包含了19-21赛季兵乓球国际比赛（世界杯、世锦赛、亚锦赛，奥运会）和国内比赛（全运会，乒超联赛）中标准单机位高清转播画面的特征信息，共包含912条视频特征文件，每个视频时长在0～6分钟不等，特征维度为2048，以pkl格式保存。我们对特征数据中面朝镜头的运动员的回合内挥拍动作进行了标注，单个动作时常在0～2秒不等，训练数据为729条标注视频，A测数据为91条视频，B测数据为92条视频，训练数据标签以json格式给出。

## 对模型的修改
### 修改前模型
```
        #Base Module
        y1 = self.b_conv1(x)
        y1 = self.b_conv1_act(y1)
        y1 = self.b_conv2(y1)
        y1 = self.b_conv2_act(y1)
```
x输入后，经过基础层（Base Module）之后，分别输入到三个不同的处理层（TEM，PEM，BM）。

### 修改后模型
```
        #Base Module
        y1 = self.b_conv1(x)
        y1 = self.b_conv1_act(y1)
        y1 = self.b_conv2(y1)
        y1 = self.b_conv2_act(y1)

        y2 = self.b_conv1_1(x)
        y2 = self.b_conv1_act_1(y2)
        y2 = self.b_conv2_1(y2)
        y2 = self.b_conv2_act_1(y2)
```
x输入后，经过两个基础层后，TEM和PEM对y1进行处理，得到时间信息，BM层对y2进行处理得到类别信息。

这种处理方式目的是促使网络对类别信息和时间信息侧重地去学习不同的特征。

### 修改后的BMN代码
代码附于文末

## checkpoints与如何测试
checkpoints为BMN_epoch_00008.pdparams

运行PredictB.ipynb即可复现B榜结果

## 数据集预处理

本方案采用PaddleVideo中的BMN模型。BMN模型是百度自研，2019年ActivityNet夺冠方案，为视频动作定位问题中proposal的生成提供高效的解决方案，在PaddlePaddle上首次开源。此模型引入边界匹配(Boundary-Matching, BM)机制来评估proposal的置信度，按照proposal开始边界的位置及其长度将所有可能存在的proposal组合成一个二维的BM置信度图，图中每个点的数值代表其所对应的proposal的置信度分数。网络由三个模块组成，基础模块作为主干网络处理输入的特征序列，TEM模块预测每一个时序位置属于动作开始、动作结束的概率，PEM模块生成BM置信度图。

本赛题中的数据包含912条ppTSM抽取的视频特征，特征保存为pkl格式，文件名对应视频名称，读取pkl之后以(num_of_frames, 2048)向量形式代表单个视频特征。其中num_of_frames是不固定的，同时数量也比较大，所以pkl的文件并不能直接用于训练。同时由于乒乓球每个动作时间非常短，为了可以让模型更好的识别动作，所以这里将数据进行分割。


1. 首先解压数据集
执行以下命令解压数据集，解压之后将压缩包删除，保证项目空间小于100G。否则项目会被终止。

In [1]:
%cd /home/aistudio/data/
!tar xf data122998/Features_competition_train.tar.gz
!tar xf data123004/Features_competition_test_A.tar.gz
!cp data122998/label_cls14_train.json .
!rm -rf data12*

/home/aistudio/data


2. 解压好数据之后，首先对label标注文件进行分割。执行以下脚本分割标注文件。

In [2]:
import json
import random

import numpy as np

random.seed(0)
source_path = "/home/aistudio/data/label_cls14_train.json"

annos = json.load(open(source_path))
fps = annos['fps']
annos = annos['gts']
new_annos = {}
max_frams = 0

for anno in annos:
    if anno['total_frames'] > max_frams:
        max_frams = anno['total_frames']
    for i in range(9000//100):
        subset = 'training'
        clip_start = i * 4
        clip_end = (i + 1) * 4
        video_name = anno['url'].split('.')[0] + f"_{i}"
        new_annos[video_name] = {
            'duration_second': 100 / fps,
            'subset': subset,
            'duration_frame': 100,
            'annotations': [],
            'feature_frame': -1

        }
        actions = anno['actions']
        for act in actions:
            start_id = act['start_id']
            end_id = act['end_id']
            new_start_id = -1
            new_end_id = -1
            if start_id > clip_start and end_id < clip_end:
                new_start_id = start_id - clip_start
                new_end_id = end_id - clip_start
            elif start_id < clip_start < end_id < clip_end:
                new_start_id = 0
                new_end_id = end_id - clip_start
            elif clip_start < start_id < clip_end < end_id:
                new_start_id = start_id - clip_start
                new_end_id = 4
            elif start_id < clip_start < clip_end < end_id:
                new_start_id = 0
                new_end_id = 4
            else:
                continue

            new_annos[video_name]['annotations'].append({
                'segment': [round(new_start_id, 2), round(new_end_id, 2)],
                'label': str(act['label_ids'][0])
            })
        if len(new_annos[video_name]['annotations']) == 0:
            new_annos.pop(video_name)


json.dump(new_annos, open('new_label_cls14_train.json', 'w+'))
print(len(list(new_annos.keys())))

12597


执行完毕后，在data目录中生成了新的标注文件new_label_cls14_train.json。下面开始分割训练集和测试集的数据。

3. 执行以下脚本，分割训练集。

In [3]:
import os
import os.path as osp
import glob
import pickle
import paddle

import numpy as np

file_list = glob.glob("/home/aistudio/data/Features_competition_train/*.pkl")

max_frames = 9000

npy_path = ("/home/aistudio/data/Features_competition_train/npy/")
if not osp.exists(npy_path):
    os.makedirs(npy_path)

for f in file_list:
    video_feat = pickle.load(open(f, 'rb'))
    tensor = paddle.to_tensor(video_feat['image_feature'])
    pad_num = 9000 - tensor.shape[0]
    pad1d = paddle.nn.Pad1D([0, pad_num])
    tensor = paddle.transpose(tensor, [1, 0])
    tensor = paddle.unsqueeze(tensor, axis=0)
    tensor = pad1d(tensor)
    tensor = paddle.squeeze(tensor, axis=0)
    tensor = paddle.transpose(tensor, [1, 0])

    sps = paddle.split(tensor, num_or_sections=90, axis=0)
    for i, s in enumerate(sps):
        file_name = osp.join(npy_path, f.split('/')[-1].split('.')[0] + f"_{i}.npy")
        np.save(file_name, s.detach().numpy())
    pass



W0227 07:06:21.816072   150 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0227 07:06:21.820129   150 device_context.cc:465] device: 0, cuDNN Version: 7.6.


In [4]:
!rm /home/aistudio/data/Features_competition_train/*.pkl

执行后在data/Features_competition_train/npy目录下生成了训练用的numpy数据。

In [5]:
import glob
import pickle
import json

import numpy as np
import paddle

file_list = glob.glob("/home/aistudio/data/Features_competition_test_A/*.pkl")

max_frames = 9000

npy_path = ("/home/aistudio/data/Features_competition_test_A/npy/")
if not osp.exists(npy_path):
    os.makedirs(npy_path)

for f in file_list:
    video_feat = pickle.load(open(f, 'rb'))
    tensor = paddle.to_tensor(video_feat['image_feature'])
    pad_num = 9000 - tensor.shape[0]
    pad1d = paddle.nn.Pad1D([0, pad_num])
    tensor = paddle.transpose(tensor, [1, 0])
    tensor = paddle.unsqueeze(tensor, axis=0)
    tensor = pad1d(tensor)
    tensor = paddle.squeeze(tensor, axis=0)
    tensor = paddle.transpose(tensor, [1, 0])

    sps = paddle.split(tensor, num_or_sections=90, axis=0)
    for i, s in enumerate(sps):
        file_name = osp.join(npy_path, f.split('/')[-1].split('.')[0] + f"_{i}.npy")
        np.save(file_name, s.detach().numpy())
    pass

## 训练模型

数据集分割好之后，可以开始训练模型，使用以下命令进行模型训练。首先需要安装PaddleVideo的依赖包。

In [None]:
%cd /home/aistudio/PaddleVideo/
!pip install -r requirements.txt

开始训练模型。

In [None]:
%cd /home/aistudio/PaddleVideo/
!python main.py -c configs/localization/bmn.yaml \
                -w /home/aistudio/BMN_epoch_00004.pdparams
                # -w /home/aistudio/PaddleVideo/output/BMN/BMN_epoch_00006.pdparams


## 模型导出
将训练好的模型导出用于推理预测，执行以下脚本。

In [7]:
%cd /home/aistudio/PaddleVideo/
!python tools/export_model.py -c configs/localization/bmn.yaml -p output/BMN/BMN_epoch_00004.pdparams -o inference/BMN

/home/aistudio/PaddleVideo
Building model(BMN)...
W0227 07:35:51.439486  4758 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0227 07:35:51.443939  4758 device_context.cc:465] device: 0, cuDNN Version: 7.6.
Loading params from (output/BMN/BMN_epoch_00004.pdparams)...
  return (isinstance(seq, collections.Sequence) and
model (BMN) has been already saved in (inference/BMN).


## 推理预测

使用导出的模型进行推理预测，执行以下命令。

In [None]:
%cd /home/aistudio/PaddleVideo/
!python tools/predict.py --input_file /home/aistudio/data/Features_competition_test_A/npy \
 --config configs/localization/bmn.yaml \
 --model_file inference/BMN/BMN.pdmodel \
 --params_file inference/BMN/BMN.pdiparams \
 --use_gpu=True \
 --use_tensorrt=False

上面程序输出的json文件是分割后的预测结果，还需要将这些文件组合到一起。执行以下脚本：

In [11]:
import os
import json
import glob

json_path = "/home/aistudio/data/Features_competition_test_A/npy"
json_files = glob.glob(os.path.join(json_path, '*_*.json'))

submit_dic = {"version": None,
              "results": {},
              "external_data": {}
              }
results = submit_dic['results']
for json_file in json_files:
    j = json.load(open(json_file, 'r'))
    old_video_name = list(j.keys())[0]
    video_name = list(j.keys())[0].split('/')[-1].split('.')[0]
    video_name, video_no = video_name.split('_')
    start_id = int(video_no) * 4
    if len(j[old_video_name]) == 0:
        continue
    for i, top in enumerate(j[old_video_name]):
        if video_name in results.keys():
            results[video_name].append({'score': round(top['score'], 2),
                                        'segment': [round(top['segment'][0] + start_id, 2), round(top['segment'][1] + start_id, 2)]})
        else:
            results[video_name] = [{'score':round(top['score'], 2),
                                        'segment': [round(top['segment'][0] + start_id, 2), round(top['segment'][1] + start_id, 2)]}]

json.dump(submit_dic, open('/home/aistudio/submission.json', 'w', encoding='utf-8'))


最后会在用户目录生成submission.json文件，压缩后下载提交即可。

In [12]:
%cd /home/aistudio/
!zip submission.zip submission.json

/home/aistudio
updating: submission.json (deflated 91%)




## 原基线作者的话

1. 可以增加训练的epoch数量。
2. 可以调整学习率策略，比如warmup和余弦退火等。
3. 我认为最关键的还是数据预处理，本方案只是简单的每4秒划分，其实并不合理，会出现将一个动作划到两个文件可能。可参照[FootballAciton](https://github.com/PaddlePaddle/PaddleVideo/blob/application/FootballAction/datasets/script/get_instance_for_bmn.py)的划分方法，进一步优化训练数据。

最后祝大家都能获得好成绩。

欢迎大家关注我的公众号：人工智能研习社
获取最新的比赛Baseline,可在后台回复比赛名称或比赛网址，我会尽量为大家提供Baseline。


## 修改后的BMN文件
代替PaddleVideo/paddlevideo/modeling/backbones/bmn.py即可
```
# Copyright (c) 2020  PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import math
import numpy as np
import paddle
from paddle import ParamAttr
from ..registry import BACKBONES


def _get_interp1d_bin_mask(seg_xmin, seg_xmax, tscale, num_sample,
                           num_sample_perbin):
    """ generate sample mask for a boundary-matching pair """
    plen = float(seg_xmax - seg_xmin)
    plen_sample = plen / (num_sample * num_sample_perbin - 1.0)
    total_samples = [
        seg_xmin + plen_sample * ii
        for ii in range(num_sample * num_sample_perbin)
    ]
    p_mask = []
    for idx in range(num_sample):
        bin_samples = total_samples[idx * num_sample_perbin:(idx + 1) *
                                    num_sample_perbin]
        bin_vector = np.zeros([tscale])
        for sample in bin_samples:
            sample_upper = math.ceil(sample)
            sample_decimal, sample_down = math.modf(sample)
            if (tscale - 1) >= int(sample_down) >= 0:
                bin_vector[int(sample_down)] += 1 - sample_decimal
            if (tscale - 1) >= int(sample_upper) >= 0:
                bin_vector[int(sample_upper)] += sample_decimal
        bin_vector = 1.0 / num_sample_perbin * bin_vector
        p_mask.append(bin_vector)
    p_mask = np.stack(p_mask, axis=1)
    return p_mask


def get_interp1d_mask(tscale, dscale, prop_boundary_ratio, num_sample,
                      num_sample_perbin):
    """ generate sample mask for each point in Boundary-Matching Map """
    mask_mat = []
    for start_index in range(tscale):
        mask_mat_vector = []
        for duration_index in range(dscale):
            if start_index + duration_index < tscale:
                p_xmin = start_index
                p_xmax = start_index + duration_index
                center_len = float(p_xmax - p_xmin) + 1
                sample_xmin = p_xmin - center_len * prop_boundary_ratio
                sample_xmax = p_xmax + center_len * prop_boundary_ratio
                p_mask = _get_interp1d_bin_mask(sample_xmin, sample_xmax,
                                                tscale, num_sample,
                                                num_sample_perbin)
            else:
                p_mask = np.zeros([tscale, num_sample])
            mask_mat_vector.append(p_mask)
        mask_mat_vector = np.stack(mask_mat_vector, axis=2)
        mask_mat.append(mask_mat_vector)
    mask_mat = np.stack(mask_mat, axis=3)
    mask_mat = mask_mat.astype(np.float32)

    sample_mask = np.reshape(mask_mat, [tscale, -1])
    return sample_mask


def init_params(name, in_channels, kernel_size):
    fan_in = in_channels * kernel_size * 1
    k = 1. / math.sqrt(fan_in)
    param_attr = ParamAttr(name=name,
                           initializer=paddle.nn.initializer.Uniform(low=-k,
                                                                     high=k))
    return param_attr


@BACKBONES.register()
class BMN(paddle.nn.Layer):
    """BMN model from
    `"BMN: Boundary-Matching Network for Temporal Action Proposal Generation" <https://arxiv.org/abs/1907.09702>`_
    Args:
        tscale (int): sequence length, default 100.
        dscale (int): max duration length, default 100.
        prop_boundary_ratio (float): ratio of expanded temporal region in proposal boundary, default 0.5.
        num_sample (int): number of samples betweent starting boundary and ending boundary of each propoasl, default 32.
        num_sample_perbin (int):  number of selected points in each sample, default 3.
    """

    def __init__(
        self,
        tscale,
        dscale,
        prop_boundary_ratio,
        num_sample,
        num_sample_perbin,
        feat_dim=400,
    ):
        super(BMN, self).__init__()

        #init config
        self.feat_dim = feat_dim
        self.tscale = tscale
        self.dscale = dscale
        self.prop_boundary_ratio = prop_boundary_ratio
        self.num_sample = num_sample
        self.num_sample_perbin = num_sample_perbin

        self.hidden_dim_1d = 256
        self.hidden_dim_2d = 128
        self.hidden_dim_3d = 512

        # Base Module
        self.b_conv1 = paddle.nn.Conv1D(
            in_channels=self.feat_dim,
            out_channels=self.hidden_dim_1d,
            kernel_size=3,
            padding=1,
            groups=4,
            weight_attr=init_params('Base_1_w', self.feat_dim, 3),
            bias_attr=init_params('Base_1_b', self.feat_dim, 3))
        self.b_conv1_act = paddle.nn.ReLU()

        self.b_conv2 = paddle.nn.Conv1D(
            in_channels=self.hidden_dim_1d,
            out_channels=self.hidden_dim_1d,
            kernel_size=3,
            padding=1,
            groups=4,
            weight_attr=init_params('Base_2_w', self.hidden_dim_1d, 3),
            bias_attr=init_params('Base_2_b', self.hidden_dim_1d, 3))
        self.b_conv2_act = paddle.nn.ReLU()

        # Base Module 2
        self.b_conv1_1 = paddle.nn.Conv1D(
            in_channels=self.feat_dim,
            out_channels=self.hidden_dim_1d,
            kernel_size=3,
            padding=1,
            groups=4,
            weight_attr=init_params('Base_1_w_1', self.feat_dim, 3),
            bias_attr=init_params('Base_1_b_1', self.feat_dim, 3))
        self.b_conv1_act_1 = paddle.nn.ReLU()

        self.b_conv2_1 = paddle.nn.Conv1D(
            in_channels=self.hidden_dim_1d,
            out_channels=self.hidden_dim_1d,
            kernel_size=3,
            padding=1,
            groups=4,
            weight_attr=init_params('Base_2_w_1', self.hidden_dim_1d, 3),
            bias_attr=init_params('Base_2_b_1', self.hidden_dim_1d, 3))
        self.b_conv2_act_1 = paddle.nn.ReLU()

        # Temporal Evaluation Module
        self.ts_conv1 = paddle.nn.Conv1D(
            in_channels=self.hidden_dim_1d,
            out_channels=self.hidden_dim_1d,
            kernel_size=3,
            padding=1,
            groups=4,
            weight_attr=init_params('TEM_s1_w', self.hidden_dim_1d, 3),
            bias_attr=init_params('TEM_s1_b', self.hidden_dim_1d, 3))
        self.ts_conv1_act = paddle.nn.ReLU()

        self.ts_conv2 = paddle.nn.Conv1D(
            in_channels=self.hidden_dim_1d,
            out_channels=1,
            kernel_size=1,
            padding=0,
            groups=1,
            weight_attr=init_params('TEM_s2_w', self.hidden_dim_1d, 1),
            bias_attr=init_params('TEM_s2_b', self.hidden_dim_1d, 1))
        self.ts_conv2_act = paddle.nn.Sigmoid()

        self.te_conv1 = paddle.nn.Conv1D(
            in_channels=self.hidden_dim_1d,
            out_channels=self.hidden_dim_1d,
            kernel_size=3,
            padding=1,
            groups=4,
            weight_attr=init_params('TEM_e1_w', self.hidden_dim_1d, 3),
            bias_attr=init_params('TEM_e1_b', self.hidden_dim_1d, 3))
        self.te_conv1_act = paddle.nn.ReLU()
        self.te_conv2 = paddle.nn.Conv1D(
            in_channels=self.hidden_dim_1d,
            out_channels=1,
            kernel_size=1,
            padding=0,
            groups=1,
            weight_attr=init_params('TEM_e2_w', self.hidden_dim_1d, 1),
            bias_attr=init_params('TEM_e2_b', self.hidden_dim_1d, 1))
        self.te_conv2_act = paddle.nn.Sigmoid()

        #Proposal Evaluation Module
        self.p_conv1 = paddle.nn.Conv1D(
            in_channels=self.hidden_dim_1d,
            out_channels=self.hidden_dim_2d,
            kernel_size=3,
            padding=1,
            groups=1,
            weight_attr=init_params('PEM_1d_w', self.hidden_dim_1d, 3),
            bias_attr=init_params('PEM_1d_b', self.hidden_dim_1d, 3))
        self.p_conv1_act = paddle.nn.ReLU()

        # init to speed up
        sample_mask = get_interp1d_mask(self.tscale, self.dscale,
                                        self.prop_boundary_ratio,
                                        self.num_sample, self.num_sample_perbin)
        self.sample_mask = paddle.to_tensor(sample_mask)
        self.sample_mask.stop_gradient = True

        self.p_conv3d1 = paddle.nn.Conv3D(
            in_channels=128,
            out_channels=self.hidden_dim_3d,
            kernel_size=(self.num_sample, 1, 1),
            stride=(self.num_sample, 1, 1),
            padding=0,
            weight_attr=ParamAttr(name="PEM_3d1_w"),
            bias_attr=ParamAttr(name="PEM_3d1_b"))
        self.p_conv3d1_act = paddle.nn.ReLU()

        self.p_conv2d1 = paddle.nn.Conv2D(
            in_channels=512,
            out_channels=self.hidden_dim_2d,
            kernel_size=1,
            stride=1,
            padding=0,
            weight_attr=ParamAttr(name="PEM_2d1_w"),
            bias_attr=ParamAttr(name="PEM_2d1_b"))
        self.p_conv2d1_act = paddle.nn.ReLU()

        self.p_conv2d2 = paddle.nn.Conv2D(
            in_channels=128,
            out_channels=self.hidden_dim_2d,
            kernel_size=3,
            stride=1,
            padding=1,
            weight_attr=ParamAttr(name="PEM_2d2_w"),
            bias_attr=ParamAttr(name="PEM_2d2_b"))
        self.p_conv2d2_act = paddle.nn.ReLU()

        self.p_conv2d3 = paddle.nn.Conv2D(
            in_channels=128,
            out_channels=self.hidden_dim_2d,
            kernel_size=3,
            stride=1,
            padding=1,
            weight_attr=ParamAttr(name="PEM_2d3_w"),
            bias_attr=ParamAttr(name="PEM_2d3_b"))
        self.p_conv2d3_act = paddle.nn.ReLU()

        self.p_conv2d4 = paddle.nn.Conv2D(
            in_channels=128,
            out_channels=2,
            kernel_size=1,
            stride=1,
            padding=0,
            weight_attr=ParamAttr(name="PEM_2d4_w"),
            bias_attr=ParamAttr(name="PEM_2d4_b"))
        self.p_conv2d4_act = paddle.nn.Sigmoid()

    def init_weights(self):
        pass

    def forward(self, x):
        #Base Module
        y1 = self.b_conv1(x)
        y1 = self.b_conv1_act(y1)
        y1 = self.b_conv2(y1)
        y1 = self.b_conv2_act(y1)

        y2 = self.b_conv1_1(x)
        y2 = self.b_conv1_act_1(y2)
        y2 = self.b_conv2_1(y2)
        y2 = self.b_conv2_act_1(y2)

        #TEM
        xs = self.ts_conv1(y1)
        xs = self.ts_conv1_act(xs)
        xs = self.ts_conv2(xs)
        xs = self.ts_conv2_act(xs)
        xs = paddle.squeeze(xs, axis=[1])
        xe = self.te_conv1(y1)
        xe = self.te_conv1_act(xe)
        xe = self.te_conv2(xe)
        xe = self.te_conv2_act(xe)
        xe = paddle.squeeze(xe, axis=[1])

        #PEM
        xp = self.p_conv1(y2)
        xp = self.p_conv1_act(xp)
        #BM layer
        xp = paddle.matmul(xp, self.sample_mask)
        xp = paddle.reshape(xp, shape=[0, 0, -1, self.dscale, self.tscale])

        xp = self.p_conv3d1(xp)
        xp = self.p_conv3d1_act(xp)
        xp = paddle.squeeze(xp, axis=[2])
        xp = self.p_conv2d1(xp)
        xp = self.p_conv2d1_act(xp)
        xp = self.p_conv2d2(xp)
        xp = self.p_conv2d2_act(xp)
        xp = self.p_conv2d3(xp)
        xp = self.p_conv2d3_act(xp)
        xp = self.p_conv2d4(xp)
        xp = self.p_conv2d4_act(xp)
        return xp, xs, xe

```