Typically, Video Classification refers to the task of producing a label for actions identified in a given video. The task is to predict which class the video clip belongs to.
Lightning Flash ~flash.video.classification.model.VideoClassifier
and ~flash.video.classification.data.VideoClassificationData
classes internally rely on PyTorchVideo.
Let's develop a model to classifying video clips of Humans performing actions (such as: archery , bowling, etc.). We'll use data from the Kinetics dataset. Here's an outline of the folder structure:
video_dataset
├── train
│ ├── archery
│ │ ├── -1q7jA3DXQM_000005_000015.mp4
│ │ ├── -5NN5hdIwTc_000036_000046.mp4
│ │ ...
│ ├── bowling
│ │ ├── -5ExwuF5IUI_000030_000040.mp4
│ │ ├── -7sTNNI1Bcg_000075_000085.mp4
│ ... ...
└── val
├── archery
│ ├── 0S-P4lr_c7s_000022_000032.mp4
│ ├── 2x1lIrgKxYo_000589_000599.mp4
│ ...
├── bowling
│ ├── 1W7HNDBA4pA_000002_000012.mp4
│ ├── 4JxH3S5JwMs_000003_000013.mp4
... ...
Once we've downloaded the data using ~flash.core.data.download_data
, we create the ~flash.video.classification.data.VideoClassificationData
. We select a pre-trained backbone to use for our ~flash.video.classification.model.VideoClassifier
and fine-tune on the Kinetics data. The backbone can be any model from the PyTorchVideo Model Zoo. We then use the trained ~flash.video.classification.model.VideoClassifier
for inference. Finally, we save the model. Here's the full example:
../../../flash_examples/video_classification.py
The video classifier can be used directly from the command line with zero code using flash_zero
. You can run the above example with:
flash video_classification
To view configuration options and options for running the video classifier with your own data, use:
flash video_classification --help