Automatic Speech Recognition (ASR) with PyTorch

W&B report link: https://wandb.ai/aapetukhov-new-economic-school/asr_project/reports/ASR-DeepSpeech-HW--Vmlldzo5NzE3ODgz

About • Installation • How To Use • Credits • License

About

Automatic Speech Recognition model DeepSpeech2 implemented from scratch in PyTorch.

This repository contains a project on Automatic Speech Recognition (ASR) with all necessary scripts and instructions for training and infering the model. For better evaluation results, you can also use a different language model, I use the pruned one because of the resources constraints.

Installation

Follow these steps to install the project:

(Optional) Create and activate new environment using venv (+pyenv). venv (+pyenv) version:

# create env
~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env

# alternatively, using default python version
python3 -m venv project_env

# activate env
source project_env

Install all required packages
```
pip install -r requirements.txt
```
Install pre-commit:
```
pre-commit install
```

How To Train

To train a model, log in to wandb and run the following commands:

First, train the model with

python train.py -cn=deepspeech2

Then, train it with

python train.py -cn=deepspeech2_360_augs_kaggle

Then, for clean,

python train.py -cn=ds2_finetune_strong_augs

Or, for other,

python train.py -cn=ds2_large_finetune

Where all configs are from src/configs and HYDRA_CONFIG_ARGUMENTS are optional arguments.

How To Evaluate

Download the pretrained models and clean lexicon from here and locate them in your directory. You can also do this by running this commands in command line, but be aware that the files are large. You can download only the clean model because it achieved the highest scores for my grade, but the other model might also make a hit.

To download:

# install gdown
pip install gdown

# download best clean model
gdown 1XpAuRCg8phPTJxmzPyvrAUpc02ZgQC0O

# download the best other model
gdown 197CiNFeESxA6Mo6S5tv-hV8xF528WUrm

# download pretrained LM
gdown 1hqkXgR-OENH3uoILTInHCNKbmQFm-5wr

# download the lexicon for the LM, the default one is wrong
gdown 1HhqKQgOE4O-mnTbTm9s1JHMFZQTGpyyf

To run inference LOCALLY:

To run inference on clean (evaluate the model or save predictions):

python inference.py -cn=inference_clean_local

python inference.py -cn=inference_clean_local '+datasets.test.audio_dir=<YOUR_AUDIO_DIR>' '+datasets.test.transcription_dir=<YOUR_TRANSCRIPTION_DIR>'

To run inference on other:

```bash
python inference.py -cn=inference_other_local '+datasets.test.audio_dir=<YOUR_AUDIO_DIR>' '+datasets.test.transcription_dir=<YOUR_TRANSCRIPTION_DIR>'

To run inference ON KAGGLE:

To run inference on clean (evaluate the model or save predictions) in KAGGLE:

python inference.py -cn=inference_clean '+datasets.test.audio_dir=<YOUR_AUDIO_DIR>' '+datasets.test.transcription_dir=<YOUR_TRANSCRIPTION_DIR>'

To run inference on other:

python inference.py -cn=inference_other '+datasets.test.audio_dir=<YOUR_AUDIO_DIR>' '+datasets.test.transcription_dir=<YOUR_TRANSCRIPTION_DIR>'

Credits

This repository is based on a PyTorch Project Template.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
src		src
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Speech Recognition (ASR) with PyTorch

About

Installation

How To Train

How To Evaluate

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

aapetukhov/ASR-DeepSpeech2

Folders and files

Latest commit

History

Repository files navigation

Automatic Speech Recognition (ASR) with PyTorch

About

Installation

How To Train

How To Evaluate

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages