# TSM模型进行视频分类
## 所需环境

有GPU就`paddlepaddle-gpu==2.2.1`,无GPU就`paddlepaddle==2.2.1`

详情请看requirements.txt,文件具有一定兼容性.库的近似版本应该也可以.

## 文件下载

暴力数据集(二分类)可以在下面的连接下载(下载完请放入data文件夹下)

https://aistudio.baidu.com/aistudio/datasetdetail/125525

训练所需的预训练权重,可以在下面连接下载(下载完请放入项目的根目录下)

https://videotag.bj.bcebos.com/PaddleVideo-release2.1/TSM/TSM_k400.pdparams

## 训练步骤
1. 数据集的准备
下载暴力视频数据集,下载完请放入data文件夹下

```
数据集格式如下所示:
├── data
│   └── Violence
│       ├── V_1.mp4
│       ├── V_2.mp4
│   └── NonViolence
│       ├── NV_1.mp4
│       ├── NV_2.mp4
```

2. 下载预训练权重
下载完请放入项目的根目录下

3. 数据集的处理
运行`get_annotation.py`文件,获得数据集的索引文件,在annotation下生成.具体生成格式如下所示:
```
data/data/NonViolence/NV_296.mp4 0
data/data/NonViolence/NV_462.mp4 0
data/data/NonViolence/NV_985.mp4 0
```
4. 训练步骤
运行`train.py`即可开始训练(如果没有GPU的话,建议把`setting.py`中的`log_interval`变量改成1,方便我们更自己的观看进度)

训练中会在output中输出许多模型文件.

5. 预测步骤
首先在`predict.py`更改`model_file`变量指定自己的模型文件.

然后运行`predict.py`即可开始预测(默认预测数据的验证集,如果想进行修改可以定义一个类似与`annotation/violence_val_videos.txt`)进行预测.

## 我想直接运行你的代码
请去Ai Studio直接fork,之后便可以运行我的代码了.

项目连接:https://aistudio.baidu.com/aistudio/projectdetail/3415438

## Q & A
Q: 我想开启top5的计算,应该怎么做的

A: 请取消`utils.py 134行`和`model.py 437行`的注释,注释掉`utils.py 92-93行`

Q: 我使用的自己的数据集,为什么出现了下面的错误
```
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:166)
```
A: 系统没有找到你的annoation中视频的文件,这里出现了问题
```
data/data/NonViolence/NV_296.mp4(没有找到) 0
data/data/NonViolence/NV_462.mp4(没有找到) 0
data/data/NonViolence/NV_985.mp4(没有找到) 0
```

Q: 跑着跑着程序被杀死是什么问题

A: 很有可能是你的视频太大,例如(大于10MB),希望减小batch_size,实在不行删掉这个视频吧.

# Ai Stdio 练丹开始

In [None]:
# 解压文件
!unzip -q data/data125597/data.zip -d data
!mv data/data/* data/

In [None]:
# 下载预先训练权重
!wget https://videotag.bj.bcebos.com/PaddleVideo-release2.1/TSM/TSM_k400.pdparams

--2022-01-15 21:17:37--  https://videotag.bj.bcebos.com/PaddleVideo-release2.1/TSM/TSM_k400.pdparams
Resolving videotag.bj.bcebos.com (videotag.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368a
Connecting to videotag.bj.bcebos.com (videotag.bj.bcebos.com)|182.61.200.229|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 147734703 (141M) [application/octet-stream]
Saving to: ‘TSM_k400.pdparams.1’


2022-01-15 21:17:44 (19.5 MB/s) - ‘TSM_k400.pdparams.1’ saved [147734703/147734703]



In [None]:
import os
import cv2
import copy
import time
import paddle
import random
import traceback

import numpy as np
import os.path as osp
import paddle.nn as nn
import paddle.nn.functional as F
import paddle.nn.initializer as init

from PIL import Image
from tqdm import tqdm
from paddle import ParamAttr
from collections import OrderedDict
from collections.abc import Sequence
from paddle.regularizer import L2Decay
from paddle.nn import (Conv2D, BatchNorm2D, Linear, Dropout, MaxPool2D,
                       AdaptiveAvgPool2D)

from settings import *
from data_preprocessing import *
from model import *
from utils import *

In [None]:
def train_model(validate=True):
    """Train model entry
    Args:
        weights (str): weights path for finetuning.
        validate (bool): Whether to do evaluation. Default: False.
    """
    output_dir = f"./output/{model_name}"
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    # 1. Construct model
    tsm = ResNetTSM(pretrained=pretrained,
                    layers=layers,
                    num_seg=num_seg)
    head = TSMHead(num_classes=num_classes,
                   in_channels=in_channels,
                   drop_ratio=drop_ratio)
    model = Recognizer2D(backbone=tsm, head=head)

    # 2. Construct dataset and dataloader
    train_pipeline = Compose(train_mode=True)
    train_dataset = VideoDataset(file_path=train_file_path,
                                 pipeline=train_pipeline,
                                 suffix=suffix)
    train_sampler = paddle.io.DistributedBatchSampler(train_dataset,
                                                      batch_size=batch_size,
                                                      shuffle=train_shuffle,
                                                      drop_last=True)  

    train_loader = paddle.io.DataLoader(train_dataset,
                                        batch_sampler=train_sampler,
                                        places=paddle.set_device(device),
                                        return_list=return_list)
    total_steps = len(train_dataset) / batch_size

    if validate:
        valid_pipeline = Compose(train_mode=False)
        valid_dataset = VideoDataset(file_path=valid_file_path,
                                     pipeline=valid_pipeline,
                                     suffix=suffix)
        valid_sampler = paddle.io.DistributedBatchSampler(valid_dataset,
                                                          batch_size=batch_size,
                                                          shuffle=valid_shuffle,
                                                          drop_last=True)
        valid_loader = paddle.io.DataLoader(valid_dataset,
                                            batch_sampler=valid_sampler,
                                            places=paddle.set_device(device),
                                            return_list=return_list)

    # 3. Construct solver.
    lr = paddle.optimizer.lr.PiecewiseDecay(boundaries=boundaries, values=values)
    optimizer = paddle.optimizer.Momentum(
        learning_rate=lr,
        momentum=momentum,
        parameters=model.parameters(),
        grad_clip=paddle.nn.ClipGradByGlobalNorm(clip_norm=clip_norm)
    )

    # 4. Train Model
    best = 0.
    for epoch in range(0, epochs):
        model.train()
        record_list = build_record(framework)
        tic = time.time()
        for i, data in enumerate(train_loader):
            record_list['reader_time'].update(time.time() - tic)

            # 4.1 forward
            outputs = model.train_step(data)

            # 4.2 backward
            avg_loss = outputs['loss']
            avg_loss.backward()

            # 4.3 minimize
            optimizer.step()
            optimizer.clear_grad()

            # log record
            record_list['lr'].update(optimizer._global_learning_rate(), batch_size)
            for name, value in outputs.items():
                record_list[name].update(value, batch_size)

            record_list['batch_time'].update(time.time() - tic)
            tic = time.time()

            if i % log_interval == 0:
                ips = "ips: {:.5f} instance/sec.".format(
                    batch_size / record_list["batch_time"].val)
                log_batch(record_list, i, epoch + 1, epochs, "train", ips, total_steps)

        # learning rate epoch step
        lr.step()

        ips = "avg_ips: {:.5f} instance/sec.".format(
            batch_size * record_list["batch_time"].count /
            record_list["batch_time"].sum)
        log_epoch(record_list, epoch + 1, "train", ips)

        def evaluate(best):
            model.eval()
            record_list = build_record(framework)
            record_list.pop('lr')
            tic = time.time()
            for i, data in enumerate(valid_loader):
                outputs = model.val_step(data)

                # log_record
                for name, value in outputs.items():
                    record_list[name].update(value, batch_size)

                record_list['batch_time'].update(time.time() - tic)
                tic = time.time()

                if i % log_interval == 0:
                    ips = "ips: {:.5f} instance/sec.".format(
                        batch_size / record_list["batch_time"].val)
                    log_batch(record_list, i, epoch + 1, epochs, "val", ips, total_steps)

            ips = "avg_ips: {:.5f} instance/sec.".format(
                batch_size * record_list["batch_time"].count /
                record_list["batch_time"].sum)
            log_epoch(record_list, epoch + 1, "val", ips)

            best_flag = False
            for top_flag in ['hit_at_one', 'top1']:
                if record_list.get(
                        top_flag) and record_list[top_flag].avg > best:
                    best = record_list[top_flag].avg
                    best_flag = True
            return best, best_flag

        # 5. Validation
        if validate or epoch == epochs - 1:
            with paddle.no_grad():
                best, save_best_flag = evaluate(best)
            # save best
            if save_best_flag:
                paddle.save(optimizer.state_dict(),
                     osp.join(output_dir, model_name + "_best.pdopt"))
                paddle.save(model.state_dict(),
                     osp.join(output_dir, model_name + "_best.pdparams"))
                print(
                    f"Already save the best model (top1 acc){int(best *10000)/10000}"
                )

        # 6. Save model and optimizer
        if epoch % save_interval == 0 or epoch == epochs - 1:
            paddle.save(
                optimizer.state_dict(),
                osp.join(output_dir,
                         model_name + f"_epoch_{epoch+1:05d}.pdopt"))
            paddle.save(
                model.state_dict(),
                osp.join(output_dir,
                         model_name + f"_epoch_{epoch+1:05d}.pdparams"))

    print(f'training {model_name} finished')


In [8]:
# 在执行代码过程中，如果出现 ‘ValueError: parameter name [conv1_weights] have be been used’ 问题，
# 可以点击上方的第三个按钮 ‘重启并运行全部’ 来解决
train_model(True)

W0116 11:11:16.944785   179 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0116 11:11:16.950973   179 device_context.cc:422] device: 0, cuDNN Version: 7.6.

  0%|          | 0/265 [00:00<?, ?it/s]
Loading conv._conv.weight: [A
Loading conv._batch_norm.weight: [A
Loading conv._batch_norm.bias:   [A
Loading conv._batch_norm._mean: [A
Loading conv._batch_norm._variance: [A
Loading res2a.conv0._conv.weight:   [A
Loading res2a.conv0._batch_norm.weight: [A
  3%|▎         | 8/265 [00:00<00:03, 78.94it/s]
Loading res2a.conv0._batch_norm._mean: [A
Loading res2a.conv0._batch_norm._variance: [A
Loading res2a.conv1._conv.weight:          [A
Loading res2a.conv1._batch_norm.weight: [A
Loading res2a.conv1._batch_norm.bias:   [A
Loading res2a.conv1._batch_norm._mean: [A
Loading res2a.conv1._batch_norm._variance: [A
Loading res2a.conv2._conv.weight:          [A
  6%|▋         | 17/265 [00:00<00:03, 80.38it/

epoch:[  1/10 ] train step:0 / 344.25 loss: 0.68812 lr: 0.001000 elapse: 1.016 reader: 0.861 top1: 0.50000s ips: 3.93704 instance/sec.
epoch:[  1/10 ] train step:10 / 344.25 loss: 0.15976 lr: 0.001000 elapse: 2.660 reader: 2.534 top1: 1.00000s ips: 1.50360 instance/sec.
epoch:[  1/10 ] train step:20 / 344.25 loss: 0.29059 lr: 0.001000 elapse: 1.666 reader: 1.539 top1: 0.75000s ips: 2.40090 instance/sec.
epoch:[  1/10 ] train step:30 / 344.25 loss: 0.08492 lr: 0.001000 elapse: 1.884 reader: 1.760 top1: 1.00000s ips: 2.12275 instance/sec.
epoch:[  1/10 ] train step:40 / 344.25 loss: 0.05763 lr: 0.001000 elapse: 0.340 reader: 0.212 top1: 1.00000s ips: 11.75324 instance/sec.
epoch:[  1/10 ] train step:50 / 344.25 loss: 1.64876 lr: 0.001000 elapse: 1.870 reader: 1.746 top1: 0.50000s ips: 2.13850 instance/sec.
epoch:[  1/10 ] train step:60 / 344.25 loss: 1.43403 lr: 0.001000 elapse: 3.113 reader: 2.988 top1: 0.50000s ips: 1.28488 instance/sec.
epoch:[  1/10 ] train step:70 / 344.25 loss: 0.4

KeyboardInterrupt: 

In [None]:
def test_model(weights):
    # 1. Construct model
    tsm = ResNetTSM(pretrained=None,
                    layers=layers,
                    num_seg=num_seg)
    head = TSMHead(num_classes=num_classes,
                   in_channels=in_channels,
                   drop_ratio=drop_ratio)
    model = Recognizer2D(backbone=tsm, head=head)

    # 2. Construct dataset and dataloader.
    test_pipeline = Compose(train_mode=False)
    test_dataset = VideoDataset(file_path=valid_file_path,
                                     pipeline=test_pipeline,
                                     suffix=suffix)
    test_sampler = paddle.io.DistributedBatchSampler(test_dataset,
                                                        batch_size=batch_size,
                                                        shuffle=valid_shuffle,
                                                        drop_last=True)
    test_loader = paddle.io.DataLoader(test_dataset,
                                        batch_sampler=test_sampler,
                                        places=paddle.set_device('gpu'),
                                        return_list=return_list)

    model.eval()

    state_dicts = paddle.load(weights)
    model.set_state_dict(state_dicts)

    # add params to metrics
    data_size = len(test_dataset)
    
    metric = CenterCropMetric(data_size=data_size, batch_size=batch_size)
    for batch_id, data in enumerate(test_loader):
        outputs = model.test_step(data)
        metric.update(batch_id, data, outputs)
    metric.accumulate()


In [None]:
# 验证集进行测试
# 在执行代码过程中，如果出现 ‘ValueError: parameter name [conv1_weights] have be been used’ 问题，
# 可以点击上方的第三个按钮 ‘重启并运行全部’ 来解决
model_file = './output/TSM/TSM_best.pdparams'
test_model(model_file)

W0115 23:26:01.676389 22490 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0115 23:26:01.681787 22490 device_context.cc:422] device: 0, cuDNN Version: 7.6.


[TEST] Processing batch 0/294 ...
[TEST] Processing batch 20/294 ...
[TEST] Processing batch 40/294 ...
[TEST] Processing batch 60/294 ...
[TEST] Processing batch 80/294 ...
[TEST] Processing batch 100/294 ...
[TEST] Processing batch 120/294 ...
[TEST] Processing batch 140/294 ...
[TEST] Processing batch 160/294 ...
[TEST] Processing batch 180/294 ...
[TEST] Processing batch 200/294 ...
[TEST] Processing batch 220/294 ...
[TEST] Processing batch 240/294 ...
[TEST] Processing batch 260/294 ...
[TEST] Processing batch 280/294 ...
[TEST] finished, avg_acc1= 0.9812925457954407, avg_acc5= 0.0 


In [None]:
def inference(model_file):
    # 1. Construct model
    tsm = ResNetTSM(pretrained=None,
                    layers=layers,
                    num_seg=num_seg)
    head = TSMHead(num_classes=num_classes,
                   in_channels=in_channels,
                   drop_ratio=drop_ratio)
    model = Recognizer2D(backbone=tsm, head=head)

    # 2. Construct dataset and dataloader.
    test_pipeline = Compose(train_mode=False)
    test_dataset = VideoDataset(file_path=valid_file_path,
                                     pipeline=test_pipeline,
                                     suffix=suffix)
    test_sampler = paddle.io.DistributedBatchSampler(test_dataset,
                                                     batch_size=1,
                                                     shuffle=True,
                                                     drop_last=True)
    test_loader = paddle.io.DataLoader(test_dataset,
                                       batch_sampler=test_sampler,
                                       places=paddle.set_device('gpu'),
                                       return_list=return_list)

    model.eval()
    state_dicts = paddle.load(model_file)
    model.set_state_dict(state_dicts)

    for batch_id, data in enumerate(test_loader):
        _, labels = data
        outputs = model.test_step(data)
        scores = F.softmax(outputs)
        class_id = paddle.argmax(scores, axis=-1)
        pred = class_id.numpy()[0]
        label = labels.numpy()[0][0]
        
        print('真实类别：{}, 模型预测类别：{}'.format(pred, label))
        if batch_id > 5:
            break

# 启动推理
# 在执行代码过程中，如果出现 ‘ValueError: parameter name [conv1_weights] have be been used’ 问题，
# 可以点击上方的第三个按钮 ‘重启并运行全部’ 来解决
model_file = './output/TSM/TSM_best.pdparams'
inference(model_file)

W0115 23:30:21.339340  4658 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0115 23:30:21.344605  4658 device_context.cc:422] device: 0, cuDNN Version: 7.6.


真实类别：0, 模型预测类别：0
真实类别：0, 模型预测类别：0
真实类别：0, 模型预测类别：0
真实类别：1, 模型预测类别：1
真实类别：0, 模型预测类别：0
真实类别：0, 模型预测类别：0
真实类别：0, 模型预测类别：0
