W&B report link: https://wandb.ai/aapetukhov-new-economic-school/asr_project/reports/ASR-DeepSpeech-HW--Vmlldzo5NzE3ODgz
About • Installation • How To Use • Credits • License
Automatic Speech Recognition model DeepSpeech2 implemented from scratch in PyTorch.
This repository contains a project on Automatic Speech Recognition (ASR) with all necessary scripts and instructions for training and infering the model. For better evaluation results, you can also use a different language model, I use the pruned one because of the resources constraints.
Follow these steps to install the project:
-
(Optional) Create and activate new environment using
venv(+pyenv).venv(+pyenv) version:# create env ~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env # alternatively, using default python version python3 -m venv project_env # activate env source project_env
-
Install all required packages
pip install -r requirements.txt
-
Install
pre-commit:pre-commit install
To train a model, log in to wandb and run the following commands:
- First, train the model with
python train.py -cn=deepspeech2- Then, train it with
python train.py -cn=deepspeech2_360_augs_kaggle- Then, for clean,
python train.py -cn=ds2_finetune_strong_augsOr, for other,
python train.py -cn=ds2_large_finetuneWhere all configs are from src/configs and HYDRA_CONFIG_ARGUMENTS are optional arguments.
Download the pretrained models and clean lexicon from here and locate them in your directory. You can also do this by running this commands in command line, but be aware that the files are large. You can download only the clean model because it achieved the highest scores for my grade, but the other model might also make a hit.
- To download:
# install gdown
pip install gdown
# download best clean model
gdown 1XpAuRCg8phPTJxmzPyvrAUpc02ZgQC0O
# download the best other model
gdown 197CiNFeESxA6Mo6S5tv-hV8xF528WUrm
# download pretrained LM
gdown 1hqkXgR-OENH3uoILTInHCNKbmQFm-5wr
# download the lexicon for the LM, the default one is wrong
gdown 1HhqKQgOE4O-mnTbTm9s1JHMFZQTGpyyf- To run inference LOCALLY:
To run inference on clean (evaluate the model or save predictions):
python inference.py -cn=inference_clean_localpython inference.py -cn=inference_clean_local '+datasets.test.audio_dir=<YOUR_AUDIO_DIR>' '+datasets.test.transcription_dir=<YOUR_TRANSCRIPTION_DIR>'
To run inference on other:
```bash
python inference.py -cn=inference_other_local '+datasets.test.audio_dir=<YOUR_AUDIO_DIR>' '+datasets.test.transcription_dir=<YOUR_TRANSCRIPTION_DIR>'- To run inference ON KAGGLE:
To run inference on clean (evaluate the model or save predictions) in KAGGLE:
python inference.py -cn=inference_clean '+datasets.test.audio_dir=<YOUR_AUDIO_DIR>' '+datasets.test.transcription_dir=<YOUR_TRANSCRIPTION_DIR>'To run inference on other:
python inference.py -cn=inference_other '+datasets.test.audio_dir=<YOUR_AUDIO_DIR>' '+datasets.test.transcription_dir=<YOUR_TRANSCRIPTION_DIR>'This repository is based on a PyTorch Project Template.