Techical report | 1-st in MQ challenge and 2-nd in NLQ challenge in Ego4D workshop at CVPR 2023.
This report presents ReLER submission to two tracks in the Ego4D Episodic Memory Benchmark in CVPR 2023, including Natural Language Queries and Moment Queries. This solution inherits from our proposed Action Sensitivity Learning framework (ASL) to better capture discrepant information of frames. Further, we incorporate a series of stronger video features and fusion strategies. Our method achieves an average mAP of 29.34, ranking 1st in Moment Queries Challenge, and garners 19.79 mean R1, ranking 2nd in Natural Language Queries Challenge. Our code will be released.
- May have some bugs or problems, we are in progress to improve the released codes.
- release the code for MQ
- release the code for NLQ
- tidy the code
- GCC, PyTorch==1.12.0, CUDA==cu113 dependencies
- pip dependencies
conda env create -n py38 python==3.8
conda activate py38
pip install tensorboard numpy pyyaml pandas h5py joblib
- NMS compilation
cd ./libs/utils
python setup.py install --user
cd ../..
-
Ego4D MQ Annotation, Video Data / Features Preparation
- Please refer to Ego4D website to download features.
- In our submission, we finally use InternVideo, EgoVLP, Slowfast and Omnivore features, where only combination of InternVideo and EgoVLP can achieve good results.
-
Ego4D Video Features Preparation
- By using
python convert_annotation.py
to convert official annotation to the processed one. And put it intodata/ego4d
. - Create config file such as
baseline.yaml
corrsponding to training. And put it intoconfigs/
- In
baseline.yaml
, you can specify annotation json_file, video features, training split and validation split, e.t.c.
- By using
- Change the train_split as
['train']
and val_split as['val']
. bash train_val.sh baseline 0
wherebaseline
is the corresponding config yaml and0
is the GPU ordinal.
- When running
bash train_val.sh baseline 0
, in epoch > max_epoch //3, it will automatically validate performance on val-set (e.g., average mAP, Recall@1x) - Can also run
bash val.sh checkpoint.ckpt baseline
to validate performance manually. - It is expected to get average mAP between 27-28(%).
- Change the train_split as
['train', 'val]
and val_split as['test']
. bash train_combine.sh baseline 0
wherebaseline
is the corresponding config yaml and0
is the GPU ordinal.- In this way, it will not validate performance during training, will save checkpoint of the last 5 epochs instead.
python infer.py --config configs/baseline.yaml --ckpt your_checkpoint
to finally generatesubmission.json
of detection results.- Then
python merge_submission.py
to generatesubmission_final.json
which is results of both detection and retrieval. - Upload
submission_final.json
to the Ego4D MQ test-server
Our model are based on Actionformer. Thanks for their contributions.
@article{shao2023action,
title={Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023},
author={Shao, Jiayi and Wang, Xiaohan and Quan, Ruijie and Yang, Yi},
journal={arXiv preprint arXiv:2306.09172},
year={2023}
}
@InProceedings{Shao_2023_ICCV,
author = {Shao, Jiayi and Wang, Xiaohan and Quan, Ruijie and Zheng, Junjun and Yang, Jiang and Yang, Yi},
title = {Action Sensitivity Learning for Temporal Action Localization},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {13457-13469}
}