Skip to content
Script for bundling Common Voice ( clips by language
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

CommonVoice Bundler

Script for bundling Common Voice ( clips by language.

What it does

  1. Query database for all clip data
  2. Download all those clips from an S3
  3. Anonymize clips client_id and filename (called path)
  4. Upload a tsv file with all the anonymized clip data
  5. Put clips into archives by language and upload it to (a different) S3

How to run it

  1. Install node (>= 8.3.0)
  2. Install yarn
  3. Install CorporaCreator
  4. Install mp3-duration-sum
  5. git clone
  6. Override the keys defined in config.js with a config.json in the same dir
  7. yarn
  8. yarn start
You can’t perform that action at this time.