Pytorch video human action recognition

Introduction

This repo contains several models for video human action recognition, including C3D implemented using PyTorch (0.4.0). Currently, we train the model on the Breakfast Action Dataset

Installation

The code was tested pip and Python 3.5.

Clone the repo:

git clone https://github.com/cantonioupao/pytorch-human_action_recognition_breakfast_dataset-C3D_model_implementation.git
cd pytorch-human_action_recognition_breakfast_dataset-C3D_model_implementation

Install dependencies:

For PyTorch dependency, see pytorch.org for more details.

For custom dependencies:
```
conda install opencv
pip install tqdm scikit-learn tensorboardX
```
Download pretrained model from BaiduYun or GoogleDrive. Currently only support pretrained model for C3D.
Configure your dataset and pretrained model path in mypath.py.
You can choose different models and datasets in train.py.

To train the model, please do:
```
python train.py
```

Datasets:

I used the Breakfast Action Dataset and downloaded from Serre Lab http://serre-lab.clps.brown.edu/resource/breakfast-actions-dataset/

Dataset directory tree for the:

Downloaded Breakfast Action Dataset

Breakfast
├── PO3
│   ├── webcam
│   │   ├── cereals.avi
│   │   ├── cereals.txt
│   │   └── ...
│   └── ...
├── PO4
│   ├── stereo
│   │   ├── coffee.avi
│   │   ├── coffee.txt
│   │   └── ...
│   └── ...
└── PO5
│   ├── cam1
│   │   ├── pancake.avi
│   │   ├── pancake.txt
│   │   └── ...
│   └── ...

After pre-processing, the output dir's structure is as follows:

Breakfast Action Dataset output directory "break"

break
├── stir_milk
│   ├── PO3_webcam_milk_123_450
│   │   ├── 00001.jpg
│   │   └── ...
│   └── ...
├── stir_coffee
│   ├── PO4_stereo_coffee_223_320
│   │   ├── 00001.jpg
│   │   └── ...
│   └── ...
└── fryegg
│   ├── PO5_cam1_pancake_1_230
│   │   ├── 00001.jpg
│   │   └── ...
│   └── ...

Experiments

These models were trained in machine with NVIDIA TITAN X 12gb GPU. Note that I splited train/val/test data for each dataset using sklearn. If you want to train models using official train/val/test data, you can look in dataset.py, and modify it to your needs.

Currently, I only train C3D model in the Breakfast Action Dataset. The train/val/test accuracy and loss curves for each experiment are shown below:

Breakfast Action Dataset After succesfully preprocessing and dividing the Breakfast Action Dataset, the output dataset directory size is 50GB. The new output directory holds all the action videos (converted to frames) and is classified based on the 48 actions and not the 10 activities. After running the "train.py" and seeting the hyperparameters of the framework( e.g batch_size , # of epochs, clip_length). The CMD training and tensorboard results are demonstrated below.

The overall accuracy for training the framework on the first set of hyperparameters is 30.26%.The actual accuracy of the framework can be tested by selecting any video from the Breakfast Action Dataset randomly and running it though the "inference.py". The results for a random video of the dataset, with the 5 hihgest probability actions are demonstrated below:

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
README_pics		README_pics
dataloaders		dataloaders
network		network
LICENSE		LICENSE
README.md		README.md
divide_dataset.py		divide_dataset.py
inference.py		inference.py
mypath.py		mypath.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_pics

README_pics

dataloaders

dataloaders

network

network

LICENSE

LICENSE

README.md

README.md

divide_dataset.py

divide_dataset.py

inference.py

inference.py

mypath.py

mypath.py

train.py

train.py

Repository files navigation

Pytorch video human action recognition

Introduction

Installation

Datasets:

Experiments

About

Releases

Packages

Languages

License

cantonioupao/pytorch-human_action_recognition_breakfast_dataset-C3D_model_implementation

Folders and files

Latest commit

History

Repository files navigation

Pytorch video human action recognition

Introduction

Installation

Datasets:

Experiments

About

Resources

License

Stars

Watchers

Forks

Languages