Skip to content
Repository for the research into radical and extremist infospheres on YouTube
Jupyter Notebook Python Makefile
Branch: master
Clone or download
Latest commit e0d1f3b Mar 8, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
DataCollection Merge branch 'master' into patch-1 Mar 8, 2019
Notebooks
RabbitHole Delete rabbithole.ipynb Feb 18, 2019
TopicModelling updated DataCollection module Feb 18, 2019
.gitignore
LICENSE Create LICENSE Feb 21, 2019
README.md

README.md

youtube_extremism

This is a repository for the research into radical and extremist infospheres on YouTube. We have used this code for a series of stories at de Volkskrant (link to stories) and de Correspondent (link to stories)

The code consists of several modules, packages and collections of code.

DataCollection

DataCollection contains a library for, well, large scale data collection. The code takes a list of channels and collects, through the YouTube API, the following data types:

  1. Channel information (basic statistics, relevant playlist ids and more)
  2. Videos (statistics and descriptions)
  3. Comments (all comments of the videos)
  4. Recommendations (all recommendations for the gathered videos)
  5. Transcripts (transcripts, if available, in English of the videos, gathered with the youtube-dl library

You'll find additional documentation in the DataCollection folder.

RabbitHole

Contains scripts and notebooks to gather and analyse data we used for an experiment into the recommendation system of YouTube. This codes still needs a lot of work.

Notebooks

Contains some notebooks used for the analysis of the data on right and left wing 'infospheres.' They just scratch the surface of possible analyses, but they can help you along.

TopicModelling

Contains a lot of scripts, data and ideas for natural language processing. The transcripts are a real treasure. During two hackathons we've written code to get a grip on this data. There is still a lot that need to be done, so please consider these scripts as suggestions.

Finally

If you are interested in the data (we have gathered aroung 100GB, or 500.000 videos of far right and far left content), please drop me a line. We won't share our comment data without a clear agreement on how to process those safely, because they are really sensitive data.

All code is written in python3.

Please let me know what we can do better. And please share your findings with us.

You can’t perform that action at this time.