Skip to content

FPVSum : First-Person Video Summarization dataset

Notifications You must be signed in to change notification settings

azuxmioy/fpvsum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FPVSum Dataset

First-Person Video Summarization (FPVSum) dataset proposed in our ECCV 2018 paper "First-Person Video Summarization From Third Person's Point of Views."

Description

First-Person Video Summarization (FPVSum) dataset is a benchmark to evaluate video summarization algorithms specifically for first-person videos. FPVSum contains 14 categories of first-person videos captured by head or body-mounted devices alongside their corrosponding human-annotated frame-level importance scores which are used for training and evalutaion in this paper.

File description

The files of FPVSum are listed as below. We provide lists and scripts to obtain selected first-person videos from YouTube. In addition, both frame-level importance scores and the user interface for annotation are also provided for public use.

fpvsum/
 ⊢ crawler/        # scripts to obtain video data
 ⊢ annotation/     # ground-truth scores of videos
 ⊢ UI/             # user interface for labeling

Videos

FPVSum consists of 14 categories of videos from YouTube, resulting in a total number of 98 first-person videos. During the collection of the first-person videos, we found that a large number of such videos on YouTube are not raw videos but edited ones, consisting of obvious frame discontinuity, transitions of point-of-view, and unrelated contents. Thus, we carefully selected continuous first-person videos only (i.e., no transition within or between points of views), and exluded those with unrelated contents.

How to get the videos

Prerequisites

Run the scripts
To download the videos in the list, you may simply run the following commands in the crawler folder:

mkdir video_dir

chmod +x fetch_fpvsum_videos.sh

./fetch_fpvsum_videos.sh video_dir FPVSum_videolist.csv

Ground-Truth Scores

We invited more than 15 volunteers to perform video annotation. Given each video, annotators are asked to produce a summary that contains most of its important content and highlight segments using our designed human interface. We observe that most annotators would lose concentration on assigning scores for very long videos. Thus, for each video category we select about 35% of the video sequences to be annotated with ground-truth scores, while the remaining are viewed as the unlabeled ones.

The details of our annotation process are shown as follows:

  • The annotators require to select highlight/non-highlight segments in each video. They need to finish watching each video once, then they start the labeling process.
  • The annotators are asked to select the video parts which they consider interesting or important (i.e., mark the parts to red color using the interface). We note that an interesting part being marked may vary in any length.
  • The annotators are encouraged to produce the summary which accounts for 10% to 20% of the full video length.
  • Each frame would get an importance score which indicates how many annotators mark on this frame. We finally select frames ranked in the top 15% of all video frames as the highlight ones.

User Interface for Human Annotation

We provide a user interface (written in MATLAB language) for labeling video sequences. The interface shows each video excluding its audio track, ensuring annotators select highlight based on visual content only. Annotators are able to use the interface for moving forward and backward and modify their annotations at any time.

References

Hsuan-I Ho, Wei-Chen Chiu, and Yu-Chiang Frank Wang: Summarizing First-Person Videos from Third Persons’ Points of Views. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

About

FPVSum : First-Person Video Summarization dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published