Skip to content

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

License

Notifications You must be signed in to change notification settings

anton-l/wav2vec-toolkit

Repository files navigation

wav2vec-toolkit

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

This repository accompanies the 🤗 HuggingFace Community Paper on finetuning Wav2Vec2 XLSR for low-resource languages [link]

How to contribute

(Mostly identical to the huggingface/datasets contributing guide)

  1. Fork the repository by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.

  2. Clone your fork to your local disk, and add the base repository as a remote:

    git clone git@github.com:<your Github handle>/wav2vec-toolkit.git
    cd wav2vec-toolkit
    git remote add upstream https://github.com/anton-l/wav2vec-toolkit.git
  3. Set up a development environment by running the following command in a virtual environment:

    conda create -n env python=3.7 --y
    conda activate env
    pip install -e ".[dev]"
    pip install -r languages/{YOUR_SPECIFIC_LANGUAGE}/requirements.txt

    (If wav2vec-toolkit was already installed in the virtual environment, remove it with pip uninstall wav2vec_toolkit before reinstalling it in editable mode with the -e flag.)

  4. Create a new branch to hold your development changes:

    git checkout -b a-descriptive-name-for-my-changes

    do not work on the master branch.

  5. Develop the features on your branch.

    1. Adding a new language here
  6. Format your code. Run black and isort so that your newly added files look nice with the following command:

    black --line-length 119 --target-version py36 src scripts languages
    isort src scripts languages
  7. Once you're happy with your implementation, add your changes and make a commit to record your changes locally:

    git add .
    git commit

    It is a good idea to sync your copy of the code with the original repository regularly. This way you can quickly account for changes:

    git fetch upstream
    git rebase upstream/main

    Push the changes to your account using:

    git push -u origin a-descriptive-name-for-my-changes
  8. Once you are satisfied, go the webpage of your fork on GitHub. Click on "Pull request" to send your to the project maintainers for review.

About

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages