Skip to content
webtool to download data(json) from youtube(free api key needed)
HTML Python JavaScript Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Disclaimer : understand this is WIP

Intended goal

  • Explore youtube from a "data point of view"
  • Download requested data as a comprehensible way
  • Make data analyzable by CORTEXT platform


  • Explore video and videos from query, playlist or channel
  • Get Comments, captions and metrics from video list
  • Download data as JSON

Build on

  • Requests/Mongodb : build data
  • Flask/Jinja : front/back
  • Docker : deployment


You can use Docker for an agnostic deployment (it will need a well installed Docker) or launch it diretcly with Python (but will need to get all requirements : Python3, Mongodb)

  • docker service (with docker-compose) :
docker-compose up 

Then after all is correctly built and verifying it's running stop it for :

docker-compose start
  • python environment :

For this variant please be sure

cd cortext-pytheas-youtube
virtualenv env3 -p python3
source ./env3/bin/activate

Then from two terminal and for each ./ and rest/ (since new branch Rest has his own app):

pip install -R requirements.txt

Other requirements :

Configuration files

  • Capitalized keys are required to work
  • lower case for debug purpose


    "DATA_DIR": "data/",
    "PORT": 8080,
    "REDIRECT_URI": "http://localhost:8080/auth",
    "GRANT_HOST_URL": "",
    "MONGO_HOST": "mongo",
    "MONGO_DBNAME": "youtube",
    "MONGO_PORT": 27017,
    "REST_HOST": "rest",
    "REST_PORT": 5002,
    "api_key": "",
    "oauth_status": "True",
    "debug_level": "False"


    "LOG_DIR": "log/",
    "PORT": 5002,
    "MONGO_HOST": "localhost",
    "MONGO_DBNAME": "youtube",
    "MONGO_PORT": 27017

API Key from Google :

  1. Obtain an api key from Google and activate the YoutubeDataAPI from
  2. Put api key in Pytheas web interface or in persistant inside config file
  3. Start exploration


  • Youtube results (api & browser) from search can only provide ~500 results (but you can get more video list by channel, playlist, arbitraty list of videos or even horodated search query)
  • Automatic captions cannot be totally retrieved via API (need to trick with xml request and also with undocumented frontend Youtube API...)
  • Comments gets only one sub-level

Good to know

  • Search is list of video
  • Playlist is list from video
  • Author/Channel are different

Very basic rest implemented

  • /queries/
  • /queries/query_id
  • /queries/query_id/videos/
  • /videos/video_id
  • /videos/video_id/comments/
  • /comments/comment_id
  • /captions/caption_id

Next to do

  • multiThreaded // parallele // queuding (CELERY again ?)
  • continue refactoring : meaning dynamic and self function
  • integrate errorhandler directly from flask
  • Integrate api openSpec
  • Continue to integrate methods (see about network of reccommandation mainly)
  • ...
You can’t perform that action at this time.