Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

README.md

Identifying the narrative styles of YouTube’s vloggers

Data and code for the paper "B. Kleinberg, M. Mozes and I. van der Vegt, 2018. Identifying the sentiment styles of YouTube’s vloggers. To appear in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Brussels, Belgium, 2018".

The downloaded YouTube transcripts can be found in the data/output_dir directory. We provide the transcripts come in two different ways:

  • raw (data/output_dir/raw): contains the raw and XML-encoded transcripts for each video as .txt file including temporal information about the start and end date at which a sequence is visible to the viewer when plaing a video.
  • parsed (data/output_dir/parsed): contains the parsed transcripts for each video, i.e. a continuous text consisting of the concatenation of all sequences for a video.

We provide metadata information for each video in data/overview.txt. This file contains comma-separated metadata for one video in each row. The first two elements of each row uniquely define the transcript file (vlogger name + local id). For example, the transcript file for Bratayley,1 would be the file 1.txt in data/output_dir/[raw/parsed]/Bratayley. In total, the information in each row represent username of vlogger, local id, video url, view count, date of video publication, url to user's YouTube channel.

About

Detecting narrative structures of Youtube vlogs (EMNLP 2018 paper)

Resources

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.