Skip to content

Pytorch implementation for Egoinstructor at CVPR 2024

License

Notifications You must be signed in to change notification settings

Jazzcharles/Egoinstructor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Egoinstructor

Official Pytorch implementation for Egoinstructor at CVPR 2024

Retrieval-Augmented Egocentric Video Captioning
Jilan Xu, Yifei Huang, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Paper Project Page

Given an egocentric video, Egoinstructor automatically retrieves semantically relevant instructional videos (e.g. from HowTo100M) via a pretrained cross-view retrieval model and leverages the visual/textual information to generate the caption of the egocentric video.

Roadmap

  • Retrieval code and data released
  • Captioning code and data released
  • Online Demo
  • Pre-trained checkpoints

Prepare environment

Please refer to env.md

Cross-view Retrieval Module

To train a ego-exo crossview retrieval module, please refer to retrieval.

Retrieval-augmented Captioning

To train a retrieval-augmented egocentric video captioning model, please refer to captioning.

Citation

If this work is helpful for your research, please consider citing us.

@article{xu2024retrieval,
  title={Retrieval-augmented egocentric video captioning},
  author={Xu, Jilan and Huang, Yifei and Hou, Junlin and Chen, Guo and Zhang, Yuejie and Feng, Rui and Xie, Weidi},
  journal={arXiv preprint arXiv:2401.00789},
  year={2024}
}

License

This project is released under the MIT License

Acknowledgements

This project is built upon LaViLA and Otter. Thanks to the contributors of the great codebase.

About

Pytorch implementation for Egoinstructor at CVPR 2024

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published