Script for bundling Common Voice ( clips by language
CommonVoice Bundler

Script for bundling Common Voice ( clips by language.

What it does

  1. Query database for all clip data
  2. Download all those clips from an S3
  3. Anonymize clips client_id and filename (called path)
  4. Upload a tsv file with all the anonymized clip data
  5. Put clips into archives by language and upload it to (a different) S3

How to run it

  1. Install node (>= 8.3.0)
  2. Install yarn
  3. Install CorporaCreator
  4. Install mp3-duration-sum
  5. git clone
  6. Override the keys defined in config.js with a config.json in the same dir
  7. yarn
  8. yarn start
