Please cite the following if you use this code.
@inproceedings{reboud2021,
title={Events Zero-Shot Classification for Character-Centric Video Summarization },
author={Reboud, Alison and Harrando, Ismail and Lisena, Pasquale and Troncy, Rapha{\"e}l},
booktitle={International Workshop on Video Retrieval Evaluation},
year={2021}
}
Steps to reproduce the final EURECOM approach for the TRECVID VSUM 2021 task
- Using
shots_transcripts_alignment.ipynb
, align the transcript content with the shot ID i.e. given the transcript files and master shot reference table, a CSV containing what was said in each shot (based on the transcript and shot boundaries) is produced (shot-aligned_transcripts.csv). - Face recognition: we select the shots displaying any of the the three characters of interests, keeping only those detection having a confidence scoregreater than 0.5 (
facerec_query_characters.csv
). In order to do so, we performed face recognition using our Face Recognition Service. The results can be found under facerec_out (2.challenge_people). We usefacerec_output_preprocessing.ipynb
to transform the JSON output into a CSV files with timestamps for each detection, thenfacerec_segmentation.ipynb
to align the detections with the shot IDs. Note : this folder also includes facerec results for a larger pool of EastEnders characters, as well as results of facerec with different thresholds of confidence, which experimented with but did not use in the final submission. - Perform Zero Shot Classification with event labels with
Zero_Shot_Pipeline_eastenders.ipynb
- Generate the shot candidate shots for the summary with
submission_generation.ipynb
. This is done by first concatenating the output of the previous steps (i.e. aligning coreference-resolved transcripts and facerec output with the master shot reference table), so that the content of each shot is aligned (time-wise) with the shot IDs.The N shots (N varying per run) with the higher similarity scores are picked and written into XML files (submissions
).