wav2vec-toolkit

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

This repository accompanies the 🤗 HuggingFace Community Paper on finetuning Wav2Vec2 XLSR for low-resource languages [link]

How to contribute

(Mostly identical to the huggingface/datasets contributing guide)

Fork the repository by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.

Clone your fork to your local disk, and add the base repository as a remote:

git clone git@github.com:<your Github handle>/wav2vec-toolkit.git
cd wav2vec-toolkit
git remote add upstream https://github.com/anton-l/wav2vec-toolkit.git

Set up a development environment by running the following command in a virtual environment:
```
conda create -n env python=3.7 --y
conda activate env
pip install -e ".[dev]"
pip install -r languages/{YOUR_SPECIFIC_LANGUAGE}/requirements.txt
```
(If wav2vec-toolkit was already installed in the virtual environment, remove it with pip uninstall wav2vec_toolkit before reinstalling it in editable mode with the -e flag.)
Create a new branch to hold your development changes:
```
git checkout -b a-descriptive-name-for-my-changes
```
do not work on the master branch.
Develop the features on your branch.
1. Adding a new language here
Format your code. Run black and isort so that your newly added files look nice with the following command:
```
black --line-length 119 --target-version py36 src scripts languages
isort src scripts languages
```
Once you're happy with your implementation, add your changes and make a commit to record your changes locally:
```
git add .
git commit
```
It is a good idea to sync your copy of the code with the original repository regularly. This way you can quickly account for changes:
```
git fetch upstream
git rebase upstream/main
```
Push the changes to your account using:
```
git push -u origin a-descriptive-name-for-my-changes
```
Once you are satisfied, go the webpage of your fork on GitHub. Click on "Pull request" to send your to the project maintainers for review.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
languages		languages
notebooks		notebooks
scripts		scripts
src/wav2vec_toolkit		src/wav2vec_toolkit
templates/language		templates/language
.gitignore		.gitignore
ADD_NEW_LANGUAGE.md		ADD_NEW_LANGUAGE.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wav2vec-toolkit

How to contribute

About

Releases

Packages

Contributors 4

Languages

License

anton-l/wav2vec-toolkit

Folders and files

Latest commit

History

Repository files navigation

wav2vec-toolkit

How to contribute

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages