Skip to content
View CaptainCook4D's full-sized avatar

Block or report CaptainCook4D

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
CaptainCook4D/README.md

CaptainCook4D: A Dataset for Understanding Errors in Procedural Activities

(This page is under continuous update and construction.)


Update

Oct 31st 2024: Released final code for baselines and features used for training the models.

Sep 26th 2024: Our paper was accepted into NeurIPS 2024 with the scores: 8,8,7.

Oct 30th 2024: Release of updated final code for baselines on (a) Error Recognition - (Supervised, Zero-Shot) (b) Multi-Step Localization

Aug 2024: Released extracted features for the dataset using Video Recognition Models.

July 2024: Release of code for baselines (a) Error Recognition (b) Multi-Step Localization

Dec 2023: Released dataset version 1.0


Overview

Overview


Abstract

Following step-by-step procedures is an essential component of various activities carried out by individuals in their everyday lives. These procedures serve as a guiding framework that helps achieve goals efficiently, whether assembling furniture or preparing a recipe. However, the complexity and duration of procedural activities inherently increase the likelihood of making errors. Understanding such procedural activities from a sequence of frames is a challenging task that demands an accurate interpretation of visual information and an ability to reason about the structure of the activity. To this end, we collected a new egocentric 4D dataset, Captain Cook, comprising 384 recordings (94.5 hrs) of people performing recipes in real kitchen environments. This dataset consists of two distinct activity types: one in which participants adhere to the provided recipe instructions and another where they deviate and induce errors. We provide 5.3K step annotations and 10K fine-grained action annotations and benchmark it on the following tasks: supervised error recognition, multi-step localization and procedure learning.


Normal & Error Steps

technique_error_1.mp4

Technique Error: In the recipe butter corn cup the first two video snippets exhibit the outcome of correctly following the instruction Mix the contents of the bowl well without any spillage, whereas the subsequent three snippets display the result of inducing errors by spilling out corn from the bowl while mixing.

measurement_error.mp4

Measurement Error: In the recipe scrambled eggs the first two video snippets exhibit the outcome of correctly following the instruction Peel 2 garlic cloves , whereas the subsequent three snippets display the result when a different number of garlic cloves (4, 1, and 1 respectively) are peeled instead of the intended 2 cloves.

order_error.mp4

Order Error: In the recipe spicy tuna avacado wraps the first two video snippets exhibit the outcome of correctly following the instruction Top lettuce leaves with tuna mixture , whereas the subsequent three snippets display the result when an incorrect order is followed where avacado is added after topping the leaves with the mixture.

preparation_error.mp4

Preparation Error: In the recipe mug cake the first two video snippets exhibit the outcome of correctly following the instruction Whisk batter , while the remaining snippets depict incorrect usage of utensils such as a spoon, tablespoon, and hand to perform the same task.

technique_error_2.mp4

Technique Error: In the recipe cucumber raita the first two video snippets exhibit the outcome of correctly following the instruction Chop or grate the cucumber , while the next three frames on the right show the results when the cucumber is cut improperly, sliced vertically, and sliced horizontally, respectively.


4D Snippets

4D_ROHITH_HOUSE.mp4

Task Graphs

You can find the task graphs for the following tasks in the dataset here: Task Graphs

task_graph_blender_banana_pancakes.mp4

Annotation Overview

Annotation Overview


Data Collection and Annotation Illustration

data_collection.mp4

Error Categories

Error Categories


Baselines

Supervised Error Recognition

Supervised Error Recognition

Zero-Shot Error Recognition

Zero Shot Error Recognition

Multi Step Localization

Multi Step Localization


Data Collection System


Download Data


Annotations


Features


Baselines


Works that used CaptainCook4D

June 2024: Differentiable Task Graph Learning


License & Consent

Our dataset is licensed under the Apache license 2.0: License.

Our dataset is approved by the Institutional Review Board (IRB) at the University of Florida: IRB Approval

All participants provided written consent for the data collection: Consent


NeurIPS Scores

PreRebuttalScores


Citation

@misc{peddi2023captaincook4d,
  title={{CaptainCook4D: A dataset for understanding errors in procedural activities}}, 
  author={Rohith Peddi and Shivvrat Arya and Bharath Challa and Likhitha Pallapothula and Akshay Vyas and Jikai Wang and Qifan Zhang and Vasundhara Komaragiri and Eric Ragan and Nicholas Ruozzi and Yu Xiang and Vibhav Gogate},
  year={2023},
  eprint={2312.14556},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Popular repositories Loading

  1. datacollection datacollection Public

    Python 9

  2. CaptainCook4D CaptainCook4D Public

    Config files for my GitHub profile.

    7

  3. annotations annotations Public

    [NeurIPS 2024]

    5

  4. error_recognition error_recognition Public

    Code for supervised error recognition and early error recognition

    Python 4

  5. downloader downloader Public

    [NeurIPS 2024] Downloader for downloading 2D data of the dataset

    Python 4

  6. feature_extractors feature_extractors Public

    Python 2