Script for bundling Common Voice (https://voice.mozilla.org) clips by language.
What it does
- Query database for all clip data
- Download all those clips from an S3
- Anonymize clips
client_idand filename (called
- Upload a tsv file with all the anonymized clip data
- Put clips into archives by language and upload it to (a different) S3