Egoinstructor

Official Pytorch implementation for Egoinstructor at CVPR 2024

Retrieval-Augmented Egocentric Video Captioning
Jilan Xu, Yifei Huang, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Given an egocentric video, Egoinstructor automatically retrieves semantically relevant instructional videos (e.g. from HowTo100M) via a pretrained cross-view retrieval model and leverages the visual/textual information to generate the caption of the egocentric video.

Roadmap

Retrieval code and data released
Captioning code and data released
Online Demo
Pre-trained checkpoints

Prepare environment

Please refer to env.md

Cross-view Retrieval Module

To train a ego-exo crossview retrieval module, please refer to retrieval.

Retrieval-augmented Captioning

To train a retrieval-augmented egocentric video captioning model, please refer to captioning.

Citation

If this work is helpful for your research, please consider citing us.

@article{xu2024retrieval,
  title={Retrieval-augmented egocentric video captioning},
  author={Xu, Jilan and Huang, Yifei and Hou, Junlin and Chen, Guo and Zhang, Yuejie and Feng, Rui and Xie, Weidi},
  journal={arXiv preprint arXiv:2401.00789},
  year={2024}
}

License

This project is released under the MIT License

Acknowledgements

This project is built upon LaViLA and Otter. Thanks to the contributors of the great codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
captioning		captioning
retrieval		retrieval
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.md		env.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

captioning

captioning

retrieval

retrieval

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

env.md

env.md

requirements.txt

requirements.txt

Repository files navigation

Egoinstructor

Roadmap

Prepare environment

Cross-view Retrieval Module

Retrieval-augmented Captioning

Citation

License

Acknowledgements

About

Releases

Packages

Languages

License

Jazzcharles/Egoinstructor

Folders and files

Latest commit

History

Repository files navigation

Egoinstructor

Roadmap

Prepare environment

Cross-view Retrieval Module

Retrieval-augmented Captioning

Citation

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages