Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support data_parallel training and ucf101 dataset #4819

Merged
merged 9 commits into from
Sep 1, 2020
55 changes: 36 additions & 19 deletions dygraph/tsm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,44 +6,61 @@
## 内容

- [模型简介](#模型简介)
- [安装说明](#安装说明)
- [数据准备](#数据准备)
- [模型训练](#模型训练)
- [模型评估](#模型评估)


## 模型简介

Temporal Shift Module是由MIT和IBM Watson AI Lab的Ji Lin,Chuang Gan和Song Han等人提出的通过时间位移来提高网络视频理解能力的模块, 详细内容请参考论文[Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383v1)

## 数据准备
## 安装说明

1. 在当前模型库运行样例代码需要PaddlePaddle v.2.0.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.6/beginners_guide/install/index_cn.html)中的说明来更新PaddlePaddle。
2. 下载模型repo: git clone https://github.com/PaddlePaddle/models

### 其他环境依赖

- Python >= 3.7

TSM的训练数据采用由DeepMind公布的Kinetics-400动作识别数据集。数据下载及准备请参考[数据说明](data/dataset/README.md)
- CUDA >= 8.0

### 小数据集验证
- CUDNN >= 7.0

为了便于快速迭代,我们采用了较小的数据集进行动态图训练验证,分别进行了两组实验验证:

1. 其中包括8k大小的训练数据和2k大小的测试数据。
2. 其中包括了十类大小的训练数据和测试数据。
## 数据准备

TSM的训练数据采用UCF101行为识别数据集,包含101个行为类别。
ucf101_reader.py文件中的ucf101_root设置为ucf101数据集目录,其中的videos、rawframes分别为视频格式和帧图格式,大小分别为6.8G、56G。
准备数据步骤:
1. 下载官方ucf101数据: wget https://www.crcv.ucf.edu/data/UCF101/UCF101.rar, 解压存放到$ucf101_root/videos
2. 提取视频frames文件(TODO),存放到$ucf101_root/frames
3. 生成video文件路径list文件(步骤TODO),存放到./data/dataset/ucf101/


## 模型训练

数据准备完毕后,可以通过如下方式启动训练:
数据准备完毕后,可以通过如下方式启动训练.

- 从头开始训练
sh run_ucf101.sh

bash run.sh train
- 基于imagenet pretrain的resnet backbone参数进行训练:

## 模型评估
1. 需要加载在ImageNet上训练的ResNet50权重作为初始化参数,wget https://paddlemodels.bj.bcebos.com/video_classification/ResNet50_pretrained.tar.gz, 并解压
2. 通过--weights=./ResNet50_pretrained/启动训练: sh run_ucf101_imagenet.sh

数据准备完毕后,可以通过如下方式启动训练:
- 基于k400 pretrain模型进行finetune:

bash run.sh eval
1. 下载静态图已发布模型 wget https://paddlemodels.bj.bcebos.com/video_classification/TSM.pdparams
2. mkdir k400_wei && mv TSM.pdparams k400_wei
3. 通过--weights=k400_wei/TSM.pdparams启动训练: sh run_ucf101_k400.sh

在从Kinetics400选取的十类的数据集下
在UCF101数据集下

|Top-1|Top-5|
|:-:|:-:|
|76.56%|98.1%|
|Top-1|Top-5|pretrain|
|:-:|:-:|:-:|
|84.37%|95.68%|ImageNet|
|94.54%|98.96%|Kinetics-400|

全量数据集精度
Top-1 0.70
请参考:[静态图](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleVideo)
61 changes: 46 additions & 15 deletions dygraph/tsm/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import time
import sys
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.layer_helper import LayerHelper
from paddle.fluid.dygraph.nn import Conv2D, Pool2D, BatchNorm, Linear
import math
Expand All @@ -28,7 +29,8 @@ def __init__(self,
filter_size,
stride=1,
groups=1,
act=None):
act=None,
name=None):
super(ConvBNLayer, self).__init__()

self._conv = Conv2D(
Expand All @@ -39,14 +41,22 @@ def __init__(self,
padding=(filter_size - 1) // 2,
groups=None,
act=None,
param_attr=fluid.param_attr.ParamAttr(),
param_attr=fluid.param_attr.ParamAttr(name=name + "_weights"),
bias_attr=False)
if name == "conv1":
bn_name = "bn_" + name
else:
bn_name = "bn" + name[3:]

self._batch_norm = BatchNorm(
num_filters,
act=act,
param_attr=fluid.param_attr.ParamAttr(),
bias_attr=fluid.param_attr.ParamAttr())
param_attr=ParamAttr(
name=bn_name + "_scale"), #fluid.param_attr.ParamAttr(),
bias_attr=ParamAttr(bn_name +
"_offset"), #fluid.param_attr.ParamAttr())
moving_mean_name=bn_name + "_mean",
moving_variance_name=bn_name + "_variance")

def forward(self, inputs):
y = self._conv(inputs)
Expand All @@ -61,32 +71,36 @@ def __init__(self,
num_filters,
stride,
shortcut=True,
seg_num=8):
seg_num=8,
name=None):
super(BottleneckBlock, self).__init__()

self.conv0 = ConvBNLayer(
num_channels=num_channels,
num_filters=num_filters,
filter_size=1,
act='relu')
act='relu',
name=name + "_branch2a")
self.conv1 = ConvBNLayer(
num_channels=num_filters,
num_filters=num_filters,
filter_size=3,
stride=stride,
act='relu')
act='relu',
name=name + "_branch2b")
self.conv2 = ConvBNLayer(
num_channels=num_filters,
num_filters=num_filters * 4,
filter_size=1,
act=None)
act=None,
name=name + "_branch2c")

if not shortcut:
self.short = ConvBNLayer(
num_channels=num_channels,
num_filters=num_filters * 4,
filter_size=1,
stride=stride)
stride=stride,
name=name + "_branch1")
self.shortcut = shortcut
self.seg_num = seg_num
self._num_channels_out = int(num_filters * 4)
Expand Down Expand Up @@ -119,7 +133,12 @@ def __init__(self, name_scope, config):
num_filters = [64, 128, 256, 512]

self.conv = ConvBNLayer(
num_channels=3, num_filters=64, filter_size=7, stride=2, act='relu')
num_channels=3,
num_filters=64,
filter_size=7,
stride=2,
act='relu',
name="conv1")
self.pool2d_max = Pool2D(
pool_size=3, pool_stride=2, pool_padding=1, pool_type='max')

Expand All @@ -129,14 +148,23 @@ def __init__(self, name_scope, config):
for block in range(len(depth)):
shortcut = False
for i in range(depth[block]):
if self.layers in [101, 152] and block == 2:
if i == 0:
conv_name = "res" + str(block + 2) + "a"
else:
conv_name = "res" + str(block + 2) + "b" + str(i)
else:
conv_name = "res" + str(block + 2) + chr(97 + i)

bottleneck_block = self.add_sublayer(
'bb_%d_%d' % (block, i),
conv_name,
BottleneckBlock(
num_channels=num_channels,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1,
shortcut=shortcut,
seg_num=self.seg_num))
seg_num=self.seg_num,
name=conv_name))
num_channels = int(bottleneck_block._num_channels_out)
self.bottleneck_block_list.append(bottleneck_block)
shortcut = True
Expand All @@ -151,9 +179,12 @@ def __init__(self, name_scope, config):
self.class_dim,
act="softmax",
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)),
initializer=fluid.initializer.Uniform(-stdv, stdv),
name="fc_0.w_0"),
bias_attr=fluid.param_attr.ParamAttr(
learning_rate=2.0, regularizer=fluid.regularizer.L2Decay(0.)))
learning_rate=2.0,
regularizer=fluid.regularizer.L2Decay(0.),
name="fc_0.b_0"))

def forward(self, inputs):
y = fluid.layers.reshape(
Expand Down
81 changes: 81 additions & 0 deletions dygraph/tsm/reader_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.

import pickle
import cv2
import numpy as np
import random


class ReaderNotFoundError(Exception):
"Error: reader not found"

def __init__(self, reader_name, avail_readers):
super(ReaderNotFoundError, self).__init__()
self.reader_name = reader_name
self.avail_readers = avail_readers

def __str__(self):
msg = "Reader {} Not Found.\nAvailiable readers:\n".format(
self.reader_name)
for reader in self.avail_readers:
msg += " {}\n".format(reader)
return msg


class DataReader(object):
"""data reader for video input"""

def __init__(self, model_name, mode, cfg):
self.name = model_name
self.mode = mode
self.cfg = cfg

def create_reader(self):
"""Not implemented"""
pass

def get_config_from_sec(self, sec, item, default=None):
if sec.upper() not in self.cfg:
return default
return self.cfg[sec.upper()].get(item, default)


class ReaderZoo(object):
def __init__(self):
self.reader_zoo = {}

def regist(self, name, reader):
assert reader.__base__ == DataReader, "Unknow model type {}".format(
type(reader))
self.reader_zoo[name] = reader

def get(self, name, mode, cfg):
for k, v in self.reader_zoo.items():
if k == name:
return v(name, mode, cfg)
raise ReaderNotFoundError(name, self.reader_zoo.keys())


# singleton reader_zoo
reader_zoo = ReaderZoo()


def regist_reader(name, reader):
reader_zoo.regist(name, reader)


def get_reader(name, mode, cfg):
reader_model = reader_zoo.get(name, mode, cfg)
return reader_model.create_reader()
1 change: 1 addition & 0 deletions dygraph/tsm/run_ucf101.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
CUDA_VISIBLE_DEVICES=0,1,2,3 python3.7 -m paddle.distributed.launch --started_port 38989 --log_dir ./mylog.ucf101.frames tsm.py --config=./tsm_ucf101.yaml --use_gpu=True --use_data_parallel=True
1 change: 1 addition & 0 deletions dygraph/tsm/run_ucf101_imagenet.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
CUDA_VISIBLE_DEVICES=0,1,2,3 python3.7 -m paddle.distributed.launch --started_port 18989 --log_dir ./mylog.ucf101.frames.imagenet train.py --config=./tsm_ucf101.yaml --use_gpu=True --use_data_parallel=True --weights=./ResNet50_pretrained/
1 change: 1 addition & 0 deletions dygraph/tsm/run_ucf101_k400.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
CUDA_VISIBLE_DEVICES=4,5,6,7 python3.7 -m paddle.distributed.launch --started_port 38989 --log_dir ./mylog.ucf101.frames.k400 train.py --config=./tsm_ucf101.yaml --use_gpu=True --use_data_parallel=True --weights=k400_wei/TSM.pdparams
1 change: 1 addition & 0 deletions dygraph/tsm/run_ucf101_k400_sing.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
CUDA_VISIBLE_DEVICES=1 python3.7 -m paddle.distributed.launch --started_port 38989 --log_dir ./mylog.ucf101.frames.k400.sing train.py --config=./tsm_ucf101_sing.yaml --use_gpu=True --use_data_parallel=False --weights=k400_wei/TSM.pdparams
Loading