This work replicates the best experiment conducted by Aya Ismail et. al. as available on their github repository: vkhoi/KTH-Action-Recognition. This work reproduces the results for recognising KTH Action classifications with CNN and Optical Flow, however the code is updated to use the Pytorch Lightning framework
Official web page of KTH dataset: link. The KTH dataset consists of videos of humans performing 6 types of action: boxing, handclapping, handwaving, jogging, running, and walking. There are 25 subjects performing these actions in 4 scenarios: outdoor, outdoor with scale variation, outdoor with different clothes, and indoor. The total number of videos is therefore 25x4x6 = 600. The videos' frame rate are 25fps and their resolution is 160x120.
Running this model under Lightning means all of the Trainer command line arguements are available. Eg run the command line:
python Train_Evaluate_KTH_VideoBlockClassifier.py --data_path <your data directory> --gpus 1 --max_epochs 200
The Lightning framework also provides --help
option to list all the available options for training.
Action recognition of the KTH dataset was attempted only with the following approaches:
- CNN on block of frames:
- CNN on block of frames + optical flow:
In above the accuracy of xx.xx%was achieved for the second experiemnet (compared to the reported 90.27% accuracy achieved by the original authors)
The lines of code was reduced by a factor of xxx with the following counts improved:
Old filename | loc | New filename | loc |
---|---|---|---|
dataset.py | 170 (51) | KTH_DataModule.py | 73 |
data_utils.py | 145 (83) | ||
models/cnn_block_frame.py | 35 | KTH_VideoBlockClassifier.py | 55 |
train_cnn_block_frame.py | 48 | Train_Evaluate_KTH_VideoBlockClassifier.py | 9 |
train_helper.py | 103 (94) | ||
eval_cnn_block_frame.py | 58 | ||
Total: | 369 | 137 |
https://towardsdatascience.com/from-pytorch-to-pytorch-lightning-a-gentle-introduction-b371b7caaf09