GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos

Implementation for the paper submitted to Transactions On Computational Social Systems (TCSS) journal.

🔍💡 Abstract

Recognizing group affect in in-the-wild settings remains challenging due to two key factors: the difficulty of capturing and labeling group data and the complexity of analyzing group affect amid diverse interactions and contextual variability. The lack of comprehensive datasets annotated with multimodal and contextual information further limits advances in the field. To address this, we introduce the Group Affect from ViDeos (GAViD) dataset, comprising 5091 video clips with multimodal data (video, audio and context), annotated with ternary valence and discrete emotion labels and enriched with VideoGPT-generated contextual metadata and human-annotated action cues. We also present CAGNet, a baseline model for multimodal context-aware group affect recognition. CAGNet achieves 63.20% test accuracy on GAViD, comparable to state-of-the-art performance.

📄 Code Files

The code files are currently private and will be made public after the acceptance/publication of the corresponding paper.

✏️ Dataset Details & Access

The IIT Roorkee Multimodal Video based Affect recognition (GAViD) Dataset has been compiled by Deepak Kumar, Abhsihsek Singh at Machine Vision Lab, IIT Roorkee under the supervision of Prof. Balasubramanian Raman. It Consists of 5091 video clips. The data is colleted from the youtube under the creative comman license policy.

Compliance with YouTube's Terms & Conditions

The data (videos) has been collected manually from YouTube with keywords such as Protest, Wedding Dance, Group Meeting etc.

🔄📋🪜 Steps Involved in the Dataset Compilation Process

Below are the steps involved in the dataset compilation process.

Step 1: Manual search of video with CC licence from the youtube

Videos were sourced from YouTube under the Creative Commons CC BY license,following the ethical protocols.
Included only videos that were CC-licensed and featured two or more individuals to capture real-world group dynamics.

Step 2: Segmenting the Videos in the clips of 5 Secs

All videos were split into 5-second clips using FFmpeg.
Retained up to 35 segments per source video, covering the majority of group members despite dynamic framing.

Step 3: Data Preprocessing

All segmented clips underwent a manual verification process.
Clips were excluded if they lacked a discernible group structure, such as the presence of a complete or partial group, or if the faces of all visible group members were not clearly detectable, which could compromise annotation accuracy

Step 3: Manual Annotation of video clips

Used Labelbox for annotation.
Clips were placed into 100 bins and assigned to 108 annotators (60 male, 48 female; average age 28 ± 5 years).
Three annotators viewed each clip and labeled group emotion (positive, neutral, negative), discrete emotion (happy, sad, fear, anger, neutral), intensity (high, medium, low), interaction type (cooperative, hostile, neutral) and action cues (e.g. smiling, clapping, shouting, dancing, singing, fighting, conversation, heated debate, protest, team activity).
The final label was decided by majority vote.

📝 Dataset Description

Below table is representing the dataset details. Here, ‘P’: Positive, ‘N’: Negative, ‘Ne’: Neutral, ‘H’: Happy, ‘S’: Sad, ‘F’: Fear, ‘A’: Anger.

🏷️ ➜ ⚙️ ➜ 🤖 Dataset Annotation Process Pipeline

Overview of the GAViD annotation pipeline and interface. The diagram illustrates stages of data collection, sample video frames with valence, emotion, intensity, cues and contextual labels, as well as the Labelbox interface used for multi-annotator input. Sample context descriptions and VideoGPT-suggested keywords demonstrate how human and AI annotations are integrated.

🧩🛠️ Data Annotation tool used

We have used labelbox for the annotation.

Below is the iamge for one sample used in the annotation

🏗️🧱 CAGNet Architecture Diagram

We propose CAGNet, a baseline GAR model that fuses visual, audio and contextual information tas shown in the diagram.

📦 Dataset Availability

Access to the IIT-R GAViD dataset can be obtained by zenodo link: https://zenodo.org/records/15448846

NOTE: For now we are providing only Train video clips. The corresponding paper is under Review in Transactions On Computational Social Systems (TCSS) journal. After its publication, the validation and Test set access will be granted upon request and approval, in accordance with the Responsible Use Policy.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Annotations_Sample.png		Annotations_Sample.png
Data_Annotations.png		Data_Annotations.png
Dataset_Details.png		Dataset_Details.png
Existing_Datasets.png		Existing_Datasets.png
README.md		README.md
fig_CAGNet.png		fig_CAGNet.png
fig_DataCompilation.png		fig_DataCompilation.png
labelbox.png		labelbox.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos

🔍💡 Abstract

📄 Code Files

✏️ Dataset Details & Access

Compliance with YouTube's Terms & Conditions

🔄📋🪜 Steps Involved in the Dataset Compilation Process

📝 Dataset Description

🏷️ ➜ ⚙️ ➜ 🤖 Dataset Annotation Process Pipeline

🧩🛠️ Data Annotation tool used

🏗️🧱 CAGNet Architecture Diagram

📦 Dataset Availability

Access to the IIT-R GAViD dataset can be obtained by zenodo link: https://zenodo.org/records/15448846

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Folders and files

Latest commit

History

Repository files navigation

GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos

🔍💡 Abstract

📄 Code Files

✏️ Dataset Details & Access

Compliance with YouTube's Terms & Conditions

🔄📋🪜 Steps Involved in the Dataset Compilation Process

📝 Dataset Description

🏷️ ➜ ⚙️ ➜ 🤖 Dataset Annotation Process Pipeline

🧩🛠️ Data Annotation tool used

🏗️🧱 CAGNet Architecture Diagram

📦 Dataset Availability

Access to the IIT-R GAViD dataset can be obtained by zenodo link: https://zenodo.org/records/15448846

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Packages