Authors: Salah Eddine Bekhouche, Hichem Telli, Azeddine Benlamoudi, Salah Eddine Herrouz, Abdelmalik Taleb-Ahmed, Abdenour Hadid
Code for the ABAW10 Ambivalence/Hesitancy Challenge. Implements a 6-token fusion architecture with VideoMAE, HuBERT, and RoBERTa-GoEmotions encoders, conflict features, and text-guided late fusion.
# Create conda environment (Python 3.10 recommended)
conda create -n conda3.10 python=3.10 -y
conda activate conda3.10
# Install dependencies
pip install -r requirements.txt
# Install ffmpeg (required for audio loading)
# conda install -c conda-forge ffmpegPlace the BAH dataset in the data/ folder. Expected structure:
data/
data/ # labeled split
split/ # train.txt, val.txt, test.txt
Videos/
cropped-aligned-faces/
transcription/
test_unlabeled/ # challenge test set
split/
Videos/
cropped-aligned-faces/
transcription/
Obtain the BAH dataset from the ABAW10 Challenge / BAH dataset.
Run once before training for faster data loading:
conda run -n conda3.10 python scripts/extract_audio.pyWe will upload pre-trained weights so you can reproduce our results or retrain the model.
Simplest — use --hf_repo (downloads automatically):
# No manual download needed; checkpoint is fetched on first run
python scripts/predict.py \
--hf_repo Bekhouche/ConflictAwareAH \
--data_root data \
--split test --num_windows 5 --output outputs/submission_test.csv
python scripts/predict.py \
--hf_repo Bekhouche/ConflictAwareAH \
--data_root data \
--split test_unlabeled --num_windows 5 --output outputs/submission.csvOr download first, then use local path:
pip install huggingface_hub
python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='Bekhouche/ConflictAwareAH', local_dir='checkpoints/ConflictAwareAH')"
python scripts/predict.py \
--checkpoints checkpoints/ConflictAwareAH/best_model.pt \
--split test --num_windows 5 --output outputs/submission_test.csvChallenge configuration (leaderboard AVGF1 0.715):
CUDA_VISIBLE_DEVICES=0 bash scripts/train.sh \
--unfreeze_top_k 0 \
--label_smoothing 0.1 \
--dropout 0.4 \
--text_blend 0.5After training, replace <RUN_TIMESTAMP> with your run folder (e.g. outputs/runs/20260314_191324):
python scripts/predict.py \
--checkpoints outputs/runs/<RUN_TIMESTAMP>/best_model.pt \
--split test --num_windows 5 --output outputs/submission_test_eval.csv
python scripts/predict.py \
--checkpoints outputs/runs/<RUN_TIMESTAMP>/best_model.pt \
--split test_unlabeled --num_windows 5 --output outputs/submission.csvTo share your trained model:
pip install huggingface_hub
export HUGGING_FACE_HUB_TOKEN=hf_xxx # from https://huggingface.co/settings/tokens
# From project root (or use absolute path)
python scripts/upload_to_hf.py outputs/runs/20260314_191324 --repo_id your-username/conflict-aware-ah- Encoders: VideoMAE-Base, HuBERT-Base, RoBERTa-GoEmotions (frozen)
- Fusion: 6-token Transformer (v, a, t, |v−a|, |v−t|, |a−t|) + MLP
- Output: Text-guided blend:
α·σ(text_logit) + (1−α)·σ(full_logit)with α=0.5
@inproceedings{conflictawareah2026,
title={Conflict-Aware Multimodal Fusion for Ambivalence and Hesitancy Recognition},
author={Bekhouche, Salah Eddine and Telli, Hichem and Benlamoudi, Azeddine and Herrouz, Salah Eddine and Taleb-Ahmed, Abdelmalik and Hadid, Abdenour},
year={2026}
}