This repo hosts the source code for our BMVC paper "Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction"
We introduce the YouCook2-BoundingBox dataset which contain spatial bounding boxes for video segments from YouCook2. We provide bounding box annotations from the validation & testing splits on videos containing objects from our class list. All videos widths were resized to 720px while maintaining the aspect ratio. The following files are included as part of the YouCook2-BoundingBox dataset:
- yc2_bb_skeleton.txt: Describes the expected structure/format of the JSON annotation files.
- yc2_bb_val_annotations.json: Contains the spatial bounding box annotations for videos from our validation split.
- yc2_testing_vid.json: Contains only the video dimensions from our testing split. We retain the testing split for server-side evaluation.
- yc2_training_vid.json: Contains only the video dimensions from the training split. Only the testing and validation splits have bounding box annotations, training is only given the sentence annotations as supervision.
- class_file.csv: Contains the complete list of the objects from our class list. Both the singular and plural forms are included.
The download link to both YouCook2 and YouCook2-BoundingBox data.
More details regarding the source code and pre-trained models are coming soon. Please stay tuned!