Skip to content
Branch: master
Clone or download
Latest commit df8a00a Aug 13, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
example Add files via upload Jul 19, 2018

Content-based Video Relevance Prediction Challenge

@ACM Multimedia 2018, Seoul, South Korea

(This challenge is fully sponsored by Hulu.)

Video relevance computation is one of the most important tasks for personalized online streaming service. Given the relevance of videos and viewer feedbacks, the system can provide personalized recommendations, which will help the viewer discover more content of interest. In most online service, the computation of video relevance table is based on the viewers' implicit feedback, e.g. watch and search history. The system analyzes the viewer-to-video preference and computes the video-to-video relevance scores using collaborative filtering based methods. However, this kind of method performs poorly for “cold-start” problems - when a new video is added to the library, the recommendation system needs to bootstrap the video relevance score with very little historical viewer feedbacks. One promising approach to solve “cold-start” is analyzing video content itself to predict the relevance score, i.e. predicting the video-to-video relevance by analyzing the keyframes, audio, subtitles and metadata. With the relevance score, we can provide better recommendations for our viewers.

Task and Data

The main task of this challenge is to predict the relevance between TV-shows or movies from video content and its features. Specifically, there are two separated tracks for TV-shows and movies respectively. The following components will be publicly available under this challenge:

Track 1: TV-shows

Pre-extracted features derived from nearly 7,000 TV-show video trailers. The whole set is divided into 3 subsets: training set (3,000 shows), validation set (over 800 shows), and testing set (3,000 shows).

Track 2: Movies

Pre-extracted features derived from over 10,000 movie video trailers. The whole set is divided into 3 subsets: training set (4,500 movies), validation set (over 1000 movies), and testing set (4,500 movies).

For training set and validation set in both tracks, we also provide the ground truth (relevance lists) derived from implicit viewer feedbacks. The viewer feedbacks have been cleaned to avoid any privacy issues.

The proposed methods will be evaluated based on recall rate regarding to top K prediction (the value of K is 100 as the final criteria). We will provide the python script to compute recall for evaluation.


If you use the CBVRP dataset in your work, please cite it as

Mengyi Liu, Xiaohui Xie, Hanning Zhou. "Content-based Video Relevance Prediction Challenge: Data, Protocol, and Baseline". arXiv preprint arXiv:1806.00737. 2018.

The full paper is available from


@misc{cbvrp-acmmm-2018, title={Content-based Video Relevance Prediction Challenge: Data, Protocol, and Baseline.}, author={Liu, Mengyi and Xie, Xiaohui and Zhou, Hanning}, journal={arXiv preprint arXiv:1806.00737}, year={2018} }


To register for the challenge and get access to the dataset, please complete the Online Agreement Form. We will send you the download instructions by email after the challenge data available date (Apr. 20th, 2018).


The participants should prepare a csv file for testing set (please refer to provided evaluation example to see the format of the submission file) and send the csv file to After receiving the submission csv file, we will evaluate the results and send back to the participants by email no later than July 1st.


April 2nd: Registration open.
April 20th: Challenge data available.
July 1st: Deadline for results submission.
July 8th: Deadline for paper submission (Optional, for more details, please refer to “Submissions” on
August 5th: Notification of winners and paper acceptance.
August 31st: Winners submit the tech report and source code.


Peng Wang ( Hulu LLC.
Xiaohui Xie ( Hulu LLC.
Hanning Zhou ( Hulu LLC.


Track 1: TV-shows

Team Recall@100
1 USTC_I_Know_U 0.186955901351
2 ZJGSU_RUC 0.178239889102
3 UESTC_edbigdata 0.130889878206
4 zhao kun 0.126281363656
5 Jiguo Li 0.11980215084
6 MIDAS 0.11742731329
7 Yash Sanjay Bhalgat 0.11289992799
8 Sheng Li 0.109182579786
9 Dhumketu 0.074798633568

Track 2: Movies

Team Recall@100
1 ZJGSU_RUC 0.151050683793
2 USTC_I_Know_U 0.147163330488
3 UESTC_edbigdata 0.103173063529
4 zhao kun 0.100695875919
5 MIDAS 0.098817118248
6 Yash Sanjay Bhalgat 0.096555415480
7 Sheng Li 0.082584524879
8 Dhumketu 0.044701797160


The total reward is $2,000 USD for each track including the taxable amount, which will be fully sponsored by Hulu LLC. The number of winners will depend on the number of participants and the quality of the results. The organizers reserve the complete right in the final judgement and decision.

The winners of the challenge are required to provide a technique report describing the details of the winning algorithms, and provide the source code to the organizers. The organizers will also run the released the code to test the reproducibility of the winner algorithms. The winners will give a presentation during the conference.


If you have any question, please send email to

You can’t perform that action at this time.