Code for training and evaluating the Communication Context Network (CoCoNet) for Communication Context Identification (CCI): Given an egocentric video stream with detected face bounding boxes, determine whether each individual is part of the camera wearer’s conversation group.
CCI is predicted for a set of face feature extracted from egocentric video as described in the Seeing Conversations paper.
Code was developed and tested with python 3.12.1
python3 -m venv .venv
source .venv/bin/activate # Linux
# .\.venv\Scripts\activate # Windows
pip install -e .
pip install -r requirements.txt
# install torch appropriately for your GPU
Download the data from https://doi.org/10.11583/DTU.31545667, unzip and place it in ./data.
Config can be changed in ./config/run_config/coconet.yaml. E.g., for smaller GPU or CPU reduce batch_size and segment_length_load. With default parameters as used in the paper CoCoNet converges within 2h on one 32GB GPU.
To run training and inference:
python -m cci.temp_cci.main --config "config/run_config/coconet.yaml"Model checkpoint and evaluation will be placed in ./results/runs/coconet
@inproceedings{dorszewski2026seeing,
author={Tobias Dorszewski and Jens Hjortkj\ae r},
title={Seeing Conversations: Communication Context Identification in Egocentric Video},
year={2026},
booktitle={CVPR}
}