Skip to content

SpencerWhitehead/NewsVideoDataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NewsVideoDataset

This repository includes the resources and instructions for retrieving the news video dataset from Whitehead et al., 2018.

Dataset statistics

Videos Duration (seconds) Tags Sentences Unique words (vocab)
Total 2,883 151,474 13,431 3,302 9,179
Average (per video) 52.5 4.7 1.2 3.2

Retrieving the videos

First, install the youtube_dl program from here: https://github.com/rg3/youtube-dl. Next, download urls.txt and run the following command:

youtube-dl -o "META_DATA_FILE_PATH/vid.%(id)s.%(ext)s" --batch-file urls.txt --restrict-filenames --write-info-json --recode-video mp4 --sleep-interval 20 --max-sleep-interval 50

This will retrieve the videos and meta-data from YouTube. If you would like to refine the meta-data files and eliminate potentially irrelevant information like the video playback quality options, then download and run:

python pack_data.py META_DATA_FILE_PATH CLEAN_META_DATA_FILE_PATH

Citation

If you use any of these resources, please use the following citation:

@inproceedings{whitehead2018KaVD,
    Author = {Whitehead, Spencer and Ji, Heng and Bansal, Mohit and Chang, Shih-Fu and Voss, Clare R.},
    title={Incorporating Background Knowledge into Video Description Generation},
    booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
    year={2018},
    month={November},
    publisher={Association for Computational Linguistics},
    location={Brussels, Belgium}
}

Disclaimer

This dataset is for educational and research purposes only. The videos found at the URLs are protected under their respective YouTube licenses and the terms therein. The videos were uploaded to YouTube and created by the AFP news agency (https://www.youtube.com/user/AFP, https://www.afp.com/en). We claim no ownership of these videos or their contents. They belong to their original copyright owners.

About

News Video Dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages