Skip to content

MeMAD approach for the TRECVID VSUM 2021 task

License

Notifications You must be signed in to change notification settings

D2KLab/trecvid-vsum

 
 

Repository files navigation

Please cite the following if you use this code.

@inproceedings{reboud2021,
  title={Events Zero-Shot Classification for Character-Centric Video Summarization },
  author={Reboud, Alison and Harrando, Ismail and  Lisena, Pasquale and Troncy, Rapha{\"e}l},
  booktitle={International Workshop on Video Retrieval Evaluation},
  year={2021}
}

trecvid-vsum

Steps to reproduce the final EURECOM approach for the TRECVID VSUM 2021 task Model architecture

  1. Using shots_transcripts_alignment.ipynb, align the transcript content with the shot ID i.e. given the transcript files and master shot reference table, a CSV containing what was said in each shot (based on the transcript and shot boundaries) is produced (shot-aligned_transcripts.csv).
  2. Face recognition: we select the shots displaying any of the the three characters of interests, keeping only those detection having a confidence scoregreater than 0.5 (facerec_query_characters.csv). In order to do so, we performed face recognition using our Face Recognition Service. The results can be found under facerec_out (2.challenge_people). We use facerec_output_preprocessing.ipynb to transform the JSON output into a CSV files with timestamps for each detection, then facerec_segmentation.ipynb to align the detections with the shot IDs. Note : this folder also includes facerec results for a larger pool of EastEnders characters, as well as results of facerec with different thresholds of confidence, which experimented with but did not use in the final submission.
  3. Perform Zero Shot Classification with event labels withZero_Shot_Pipeline_eastenders.ipynb
  4. Generate the shot candidate shots for the summary with submission_generation.ipynb. This is done by first concatenating the output of the previous steps (i.e. aligning coreference-resolved transcripts and facerec output with the master shot reference table), so that the content of each shot is aligned (time-wise) with the shot IDs.The N shots (N varying per run) with the higher similarity scores are picked and written into XML files (submissions).

About

MeMAD approach for the TRECVID VSUM 2021 task

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.2%
  • Shell 2.1%
  • Other 0.7%