Skip to content

Sid2697/awesome-egocentric-vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Egocentric Vision Awesome

A curated list of egocentric vision resources.

Egocentric (first-person) vision is a sub-field of computer vision that analyses image/video data obtained using a wearable camera simulating a person's visual field.

Contents

Papers

Clustered in various problem statements.

Action/Activity Recognition

Object/Hand Recognition

Action/Gaze Anticipation

Localization

Clustering

Video Summarization

Social Interactions

Pose Estimation

Human Object Interaction

Temporal Boundary Detection

Privacy in Egocentric Videos

Multiple Egocentric Tasks

  • EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone - Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, and Pengchuan Zhang. In ICCV 2023. [project page] [code]

  • Egocentric Video-Language Pretraining - Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, Rongcheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu and Mike Zheng Shou. In NeurIPS 2022. [project page] [code]

  • Ego4D: Around the World in 3,000 Hours of Egocentric Video - Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Christian Fuegen, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Bernard Ghanem, Vamsi Krishna Ithapu, C.V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, and Jitendra Malik. In CVPR 2022. [Github] [project page] [video]

Task Understanding

Miscellaneous (New Tasks)

Clustered according to the conferences.

CVPR

ECCV

ICCV

WACV

BMVC

Datasets

  • EgoProceL - 62 hours of egocentric videos recorded by 130 subjects performing 16 tasks for procedure learning.
  • EgoBody - Large-scale dataset capturing ground-truth 3D human motions during social interactions in 3D scenes.
  • UnrealEgo - Large-scale naturalistic dataset for egocentric 3D human pose estimation.
  • Hand-object Segments - Hand-object interactions in 11,235 frames from 1,000 videos covering daily activities in diverse scenarios.
  • Ego4D - 3,025 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 855 unique camera wearers from 74 worldwide locations and 9 different countries.
  • HOI4D - HOI4D consists of 2.4M RGB-D egocentric video frames over 4000 sequences collected by 9 participants interacting with 800 different object instances from 16 categories over 610 different indoor rooms.
  • EgoCom - A natural conversations dataset containing multi-modal human communication data captured simultaneously from the participants' egocentric perspectives.
  • TREK-100 - Object tracking in first person vision.
  • MECCANO - 20 subject assembling a toy motorbike.
  • EPIC-Kitchens 2020 - Subjects performing unscripted actions in their native environments.
  • EPIC-Tent - 29 participants assembling a tent while wearing two head-mounted cameras. [paper]
  • EGO-CH - 70 subjects visiting two cultural sites in Sicily, Italy.
  • EPIC-Kitchens 2018 - 32 subjects performing unscripted actions in their native environments.
  • Charade-Ego - Paired first-third person videos.
  • EGTEA Gaze+ - 32 subjects, 86 cooking sessions, 28 hours.
  • ADL - 20 subjects performing daily activities in their native environments.
  • CMU kitchen - Multimodal, 18 subjects cooking 5 different recipes: brownies, eggs, pizza, salad, sandwich.
  • EgoSeg - Long term actions (walking, running, driving, etc.)
  • First-Person Social Interactions - 8 subjects at disneyworld.
  • UEC Dataset - Two choreographed datasets with different egoactions (walk, jump, climb, etc.) + 6 YouTube sports videos.
  • JPL - Interaction with a robot.
  • FPPA - Five subjects performing 5 daily actions.
  • UT Egocentric - 3-5 hours long videos capturing a person's day.
  • VINST/ Visual Diaries - 31 videos capturing the visual experience of a subject walking from metro station to work.
  • Bristol Egocentric Object Interaction (BEOID) - 8 subjects, six locations. Interaction with objects and environment.
  • Object Search Dataset - 57 sequences of 55 subjects on search and retrieval tasks.
  • UNICT-VEDI - Different subjects visiting a museum.
  • UNICT-VEDI-POI - Different subjects visiting a museum.
  • Simulated Egocentric Navigations - Simulated navigations of a virtual agent within a large building.
  • EgoCart - Egocentric images collected by a shopping cart in a retail store.
  • Unsupervised Segmentation of Daily Living Activities - Egocentric videos of daily activities.
  • Visual Market Basket Analysis - Egocentric images collected by a shopping cart in a retail store.
  • Location Based Segmentation of Egocentric Videos - Egocentric videos of daily activities.
  • Recognition of Personal Locations from Egocentric Videos - Egocentric videos clips of daily.
  • EgoGesture - 2k videos from 50 subjects performing 83 gestures.
  • EgoHands - 48 videos of interactions between two people.
  • DoMSEV - 80 hours/different activities.
  • DR(eye)VE - 74 videos of people driving.
  • THU-READ - 8 subjects performing 40 actions with a head-mounted RGBD camera.
  • EgoDexter - 4 sequences with 4 actors (2 female), and varying interactions with various objects and and cluttered background. [paper]
  • First-Person Hand Action (FPHA) - 3D hand-object interaction. Includes 1175 videos belonging to 45 different activity categories performed by 6 actors. [paper]
  • UTokyo Paired Ego-Video (PEV) - 1,226 pairs of first-person clips extracted from the ones recorded synchronously during dyadic conversations.
  • UTokyo Ego-Surf - Contains 8 diverse groups of first-person videos recorded synchronously during face-to-face conversations.
  • TEgO: Teachable Egocentric Objects Dataset - Contains egocentric images of 19 distinct objects taken by two people for training a teachable object recognizer.
  • Multimodal Focused Interaction Dataset - Contains 377 minutes of continuous multimodal recording captured during 19 sessions, with 17 conversational partners in 18 different indoor/outdoor locations.

Contribute

This is a work in progress. Contributions welcome! Read the contribution guidelines first.