(This page is under continuous update and construction.)
Oct 31st 2024: Released final code for baselines and features used for training the models.
Sep 26th 2024: Our paper was accepted into NeurIPS 2024 with the scores: 8,8,7.
Oct 30th 2024: Release of updated final code for baselines on (a) Error Recognition - (Supervised, Zero-Shot) (b) Multi-Step Localization
Aug 2024: Released extracted features for the dataset using Video Recognition Models.
July 2024: Release of code for baselines (a) Error Recognition (b) Multi-Step Localization
Dec 2023: Released dataset version 1.0
Following step-by-step procedures is an essential component of various activities carried out by individuals in their everyday lives. These procedures serve as a guiding framework that helps achieve goals efficiently, whether assembling furniture or preparing a recipe. However, the complexity and duration of procedural activities inherently increase the likelihood of making errors. Understanding such procedural activities from a sequence of frames is a challenging task that demands an accurate interpretation of visual information and an ability to reason about the structure of the activity. To this end, we collected a new egocentric 4D dataset, Captain Cook, comprising 384 recordings (94.5 hrs) of people performing recipes in real kitchen environments. This dataset consists of two distinct activity types: one in which participants adhere to the provided recipe instructions and another where they deviate and induce errors. We provide 5.3K step annotations and 10K fine-grained action annotations and benchmark it on the following tasks: supervised error recognition, multi-step localization and procedure learning.
technique_error_1.mp4
Technique Error: In the recipe butter corn cup the first two video snippets exhibit the outcome of correctly following the instruction Mix the contents of the bowl well without any spillage, whereas the subsequent three snippets display the result of inducing errors by spilling out corn from the bowl while mixing.
measurement_error.mp4
Measurement Error: In the recipe scrambled eggs the first two video snippets exhibit the outcome of correctly following the instruction Peel 2 garlic cloves , whereas the subsequent three snippets display the result when a different number of garlic cloves (4, 1, and 1 respectively) are peeled instead of the intended 2 cloves.
order_error.mp4
Order Error: In the recipe spicy tuna avacado wraps the first two video snippets exhibit the outcome of correctly following the instruction Top lettuce leaves with tuna mixture , whereas the subsequent three snippets display the result when an incorrect order is followed where avacado is added after topping the leaves with the mixture.
preparation_error.mp4
Preparation Error: In the recipe mug cake the first two video snippets exhibit the outcome of correctly following the instruction Whisk batter , while the remaining snippets depict incorrect usage of utensils such as a spoon, tablespoon, and hand to perform the same task.
technique_error_2.mp4
Technique Error: In the recipe cucumber raita the first two video snippets exhibit the outcome of correctly following the instruction Chop or grate the cucumber , while the next three frames on the right show the results when the cucumber is cut improperly, sliced vertically, and sliced horizontally, respectively.
4D_ROHITH_HOUSE.mp4
You can find the task graphs for the following tasks in the dataset here: Task Graphs
task_graph_blender_banana_pancakes.mp4
data_collection.mp4
June 2024: Differentiable Task Graph Learning
Our dataset is licensed under the Apache license 2.0: License.
Our dataset is approved by the Institutional Review Board (IRB) at the University of Florida: IRB Approval
All participants provided written consent for the data collection: Consent
@misc{peddi2023captaincook4d,
title={{CaptainCook4D: A dataset for understanding errors in procedural activities}},
author={Rohith Peddi and Shivvrat Arya and Bharath Challa and Likhitha Pallapothula and Akshay Vyas and Jikai Wang and Qifan Zhang and Vasundhara Komaragiri and Eric Ragan and Nicholas Ruozzi and Yu Xiang and Vibhav Gogate},
year={2023},
eprint={2312.14556},
archivePrefix={arXiv},
primaryClass={cs.CV}
}