Generative Model-based Feature Knowledge Distillation for Action Recognition

This is the official repo of article Generative Model-based Feature Knowledge Distillation for Action Recognition which is accepted by AAAI-24.

Abstract

Knowledge distillation (KD), a technique widely employed in computer vision, has emerged as a de facto standard for improving the performance of small neural networks. However, prevailing KD-based approaches in video tasks primarily focus on designing loss functions and fusing cross-modal information. This overlooks the spatial-temporal feature semantics, resulting in limited advancements in model compression. ddressing this gap, our paper introduces an innovative knowledge distillation framework, with the generative model for training a lightweight student model. In particular, the framework is organized into two steps: the initial phase is Feature Representation, wherein a generative modelbased attention module is trained to represent feature semantics; Subsequently, the Generative-based Feature Distillation phase encompasses both Generative Distillation and Attention Distillation, with the objective of transferring attentionbased feature semantics with the generative model. The efficacy of our approach is demonstrated through comprehensive experiments on diverse popular datasets, proving considerable enhancements in video action recognition task. Moreover, the effectiveness of our proposed framework is validated in the context of more intricate video action detection task.

summary

Design a novel attention module that leverages the generative model to represent feature semantics within the 3D-CNN architecture.
Build a new framework that firstly introduces the novel concept of utilizing a generative model for distilling attention-based feature knowledge.

Getting Started

Environment

Python 3.7
PyTorch == 1.4.0 (Please make sure your pytorch version is 1.4)
NVIDIA GPU

Setup

cd detection
pip3 install -r requirements.txt
python3 setup.py develop

Data Preparation

For recognition

We follow the data preparation of repos PyTorch implementation of popular two-stream frameworks for video action recognition and Video Dataset Preprocess, please star them if helpful.

Environment we need to process dataset:

OS: Ubuntu 16.04
Python: 3.5
CUDA: 8.0
OpenCV3
dense_flow

To successfully install dense_flow(branch opencv-3.1), you probably need to install opencv3 with opencv_contrib. (For opencv-2.4.13, dense_flow will be installed more easily without opencv_contrib, but you should run code of this repository under opencv3 to avoid error)

UCF101

Download data UCF101 and use unrar x UCF101.rar to extract the videos.

Convert video to frames and extract optical flow

python build_of.py --src_dir ./UCF-101 --out_dir ./ucf101_frames --df_path <path to dense_flow>

build file lists for training and validation

python build_file_list.py --frame_path ./ucf101_frames --out_list_path ./settings

HMDB51

Download videos and train/test splits here. Make sure to put the video files as the following structure:

  HMDB51
  ├── brush_hair
  │   ├── April_09_brush_hair_u_nm_np1_ba_goo_0.avi
  │   └── ...
  ├── cartwheel
  │   ├── (Rad)Schlag_die_Bank!_cartwheel_f_cm_np1_le_med_0.avi
  │   └── ...
  ├── catch
  │   ├── 96-_Torwarttraining_1_catch_f_cm_np1_le_bad_0.avi
  │   └── ...

Convert from avi to jpg files using utils/video_jpg_ucf101_hmdb51.py

python utils/video2jpg_ucf101_hmdb51.py avi_video_directory jpg_video_directory

Generate n_frames files using utils/n_frames_ucf101_hmdb51.py

python utils/n_frames_ucf101_hmdb51.py jpg_video_directory

Generate annotation file in txt format using

utils/hmdb_gen_txt.py

annotation_dir_path includes brush_hair_test_split1.txt, ...

python utils/hmdb_gen_txt.py annotation_dir_path jpg_video_directory outdir

After pre-processing, the image output dir's structure is as follows:

  hmdb51_n_frames
  ├── brush_hair
  │   ├── April_09_brush_hair_u_nm_np1_ba_goo_0
  │   │   ├── image_00001.jpg
  │   │   ├── ...
  │   │   └── n_frames
  │   └── ...
  ├── cartwheel
  │   │   ├── image_00001.jpg
  │   │   ├── ...
  │   │   └── n_frames
  │   └── ...
  ├── catch
  │   ├── 96-_Torwarttraining_1_catch_f_cm_np1_le_bad_0
  │   │   ├── image_00001.jpg
  │   │   ├── ...
  │   │   └── n_frames
  │   └── ...

The Train_Test split file contains following structure:

  hmdb51_TrainTestlist
  ├── hmdb51_train.txt
  ├── hmdb51_test.txt
  └── hmdb51_val.txt

For detection

we follow the data preparation of AFSD, please star it if helpful.

THUMOS14 RGB data:

Download pre-processed RGB npy data (13.7GB): Weiyun
Unzip the RGB npy data to ./datasets/thumos14/validation_npy/ and ./datasets/thumos14/test_npy/

THUMOS14 flow data:

Because it costs more time to generate flow data for THUMOS14, to make easy to run flow model, we provide the pre-processed flow data in Google Drive and Weiyun (3.4GB):Google Drive,Weiyun
Unzip the flow npy data to ./datasets/thumos14/validation_flow_npy/ and ./datasets/thumos14/test_flow_npy/

If you want to generate npy data by yourself, please refer to the following guidelines:

RGB data generation manually:

To construct THUMOS14 RGB npy inputs, please download the THUMOS14 training and testing videos. Training videos: https://storage.googleapis.com/thumos14_files/TH14_validation_set_mp4.zip Testing videos: https://storage.googleapis.com/thumos14_files/TH14_Test_set_mp4.zip (unzip password is THUMOS14_REGISTERED)
Move the training videos to ./datasets/thumos14/validation/ and the testing videos to ./datasets/thumos14/test/
Run the data processing script: python3 AFSD/common/video2npy.py configs/thumos14.yaml

Flow data generation manually:

If you should generate flow data manually, firstly install the denseflow.
Prepare the pre-processed RGB data.
Check and run the script: python3 AFSD/common/gen_denseflow_npy.py configs/thumos14_flow.yaml

Inference

We provide pretrained models contain Top-I3D for UCF101&HMDB51 and Top-AFSD for THUMOS14 dataset:Google Drive

For UCF101:

cd recognition
python3 test_Top_I3D_att_fusion.py configs/stu_thumos14.yaml

for HMDB51:

cd recognition
python3 test_Top_I3D_att_fusion_hmdb51.py configs/stu_thumos14.yaml

For THUMOS14:

cd detection
# run RGB model
python3 GKD/thumos14/test.py configs/thumos14.yaml --checkpoint_path=models/thumos14/thumos14-rgb.ckpt --output_json=thumos14_rgb.json

# run flow model
python3 GKD/thumos14/test.py configs/thumos14_flow.yaml --checkpoint_path=models/thumos14/thumos14-flow.ckpt --output_json=thumos14_flow.json

# run fusion (RGB + flow) model
python3 GKD/thumos14/test.py configs/thumos14.yaml --fusion --output_json=thumos14_fusion.json

# evaluate THUMOS14 fusion result as example
python3 GKD/thumos14/eval.py output/thumos14_fusion.json

Training

For UCF101:

cd recognition
python3 train_Top_I3D_KD.py configs/stu_thumos14.yaml

For HMDB51:

cd recognition
python3 train_Top_I3D_KD_hmdb51.py configs/stu_thumos14.yaml

For THUMOS14:

cd detection
# train the RGB model
python3 GKD/thumos14/train.py configs/thumos14.yaml --lw=10 --cw=1 --piou=0.5

# train the flow model
python3 GKD/thumos14/train.py configs/thumos14_flow.yaml --lw=10 --cw=1 --piou=0.5

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
detection		detection
recognition		recognition
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detection

detection

recognition

recognition

README.md

README.md

Repository files navigation

Generative Model-based Feature Knowledge Distillation for Action Recognition

Abstract

summary

Getting Started

Environment

Setup

Data Preparation

For recognition

UCF101

HMDB51

For detection

Inference

Training

About

Releases

Packages

Languages

aaai-24/Generative-based-KD

Folders and files

Latest commit

History

Repository files navigation

Generative Model-based Feature Knowledge Distillation for Action Recognition

Abstract

summary

Getting Started

Environment

Setup

Data Preparation

For recognition

UCF101

HMDB51

For detection

Inference

Training

About

Resources

Stars

Watchers

Forks

Languages