NeuralBlock (NB) is a neural network built using Keras/Tensorflow that detects in-video YouTube sponsorships. There is support for both predicting (1) whether or not a text excerpt is a sponsorship (spot) or (2) whether or not this word in the sequence is part of a sponsorship.
Some examples of NB's predictions are provided in the
examples/ directory. The code for the web application is also provided and can be run locally.
High Level Summary
- NeuralBlock extracts transcripts from YouTube with YouTubeTranscriptApi.
- The SponsorBlock community has already pre-labeled sponsors.
- The timestamps from (2) are used to find the sections in the transcript that are sponsorships, thereby creating a training set.
- The sequence of text is tokenized using the top 10,000 words found in sponsorships. Note, using a pre-trained word embedding by fastText does not yield better performance.
- A bidirectional LSTM RNN is trained.
Using the Web App
Somewhat outdated. To be updated later. Dockerfile can be used
app/ directory contains a simple flask application that performs the primary functions of
predict_timestamps.py, and presents the results in the browser.
- Install flask and other necessary libraries.
- Move the models from the
app/models. There should be no subfolders.
python app/application.pyfrom a terminal.
- Go to
localhost:5000in a browser.
- Submit a valid video ID and click Submit
The results should return in a few seconds. Note, if a good transcript cannot be extracted by YouTubeTranscriptApi, the app will fail.
Predicting On New Data
Somewhat outdated. To be updated later.
- Install the python libraries TensorFlow and YouTubeTranscriptApi
- Update paths if necessary
- Provide a video id (vid). The network was trained on the database as of 3/3/20. Use a video that was created after that date to ensure that the video hasn't already been seen.
- Run predict_stream.py
- Manually inspect the output stored in the variable
Note, overusing YouTubeTranscriptApi can get your IP banned.
- Better transcripts: NeuralBlock depends on being able to download the full closed captioning. Some creators disallow auto-generated English captions, making it impossible for NB to predict on. The latter could be resolved through existing speech-to-text projects such as Mozilla's DeepSpeech.
- More accurate labels: The labels is imperfect because we don't know the moment a word is spoken, only an approximate time. For example, silence (visual only ad) or really short ad segments are hard to account for.
- Incorporate video: Visual cues, such as scene cuts, are also valuable in determining ads and can help with (2).
- Support for other languages: Only English is supported at this moment.