Note: This code is not maintained anymore. For an updated version, please check this repository.
A brief description of the pipeline.
The main stages are the following:
- Extract descriptors from videos.
- Convert the descriptors in the so-called sufficient statistics.
- Convert the sufficient statistics to Fisher vectors or soft-counts.
- Compute the kernel matrix.
- Do the training and evaluation.
The descriptor extraction stage relies on the dense track code (add reference). The code is called from Python directly (add function name) and the extracted descriptors are passed back to Python through pipeline.
The descriptors are converted to sufficient statistics by the function [add function name]. This functions requires the previously computed GMM and PCA. The GMM is stored in the yael format. The PCA is computed with sklearn and pickled to the disk. The sufficient statistics are saved to disk to the location specified by [SstatsMap].
The sufficient statistics are merged and then converted to Fisher vectors using the function defined in the model class.
For efficient training the Fisher vectors are dot producted to obtain kernel matrices for training and testing. Prior to this, we apply the square rooting transform and the L2 normalization.
We use the precomputed kernels for training. Usually the training is dataset specific.