Skip to content

aapetukhov/ASR-DeepSpeech2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automatic Speech Recognition (ASR) with PyTorch

W&B report link: https://wandb.ai/aapetukhov-new-economic-school/asr_project/reports/ASR-DeepSpeech-HW--Vmlldzo5NzE3ODgz

AboutInstallationHow To UseCreditsLicense

About

Automatic Speech Recognition model DeepSpeech2 implemented from scratch in PyTorch.

This repository contains a project on Automatic Speech Recognition (ASR) with all necessary scripts and instructions for training and infering the model. For better evaluation results, you can also use a different language model, I use the pruned one because of the resources constraints.

Installation

Follow these steps to install the project:

  1. (Optional) Create and activate new environment using venv (+pyenv). venv (+pyenv) version:

    # create env
    ~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env
    
    # alternatively, using default python version
    python3 -m venv project_env
    
    # activate env
    source project_env
  2. Install all required packages

    pip install -r requirements.txt
  3. Install pre-commit:

    pre-commit install

How To Train

To train a model, log in to wandb and run the following commands:

  1. First, train the model with
python train.py -cn=deepspeech2
  1. Then, train it with
python train.py -cn=deepspeech2_360_augs_kaggle
  1. Then, for clean,
python train.py -cn=ds2_finetune_strong_augs

Or, for other,

python train.py -cn=ds2_large_finetune

Where all configs are from src/configs and HYDRA_CONFIG_ARGUMENTS are optional arguments.

How To Evaluate

Download the pretrained models and clean lexicon from here and locate them in your directory. You can also do this by running this commands in command line, but be aware that the files are large. You can download only the clean model because it achieved the highest scores for my grade, but the other model might also make a hit.

  1. To download:
# install gdown
pip install gdown

# download best clean model
gdown 1XpAuRCg8phPTJxmzPyvrAUpc02ZgQC0O

# download the best other model
gdown 197CiNFeESxA6Mo6S5tv-hV8xF528WUrm

# download pretrained LM
gdown 1hqkXgR-OENH3uoILTInHCNKbmQFm-5wr

# download the lexicon for the LM, the default one is wrong
gdown 1HhqKQgOE4O-mnTbTm9s1JHMFZQTGpyyf
  1. To run inference LOCALLY:

To run inference on clean (evaluate the model or save predictions):

python inference.py -cn=inference_clean_local
python inference.py -cn=inference_clean_local '+datasets.test.audio_dir=<YOUR_AUDIO_DIR>' '+datasets.test.transcription_dir=<YOUR_TRANSCRIPTION_DIR>'

To run inference on other:

```bash
python inference.py -cn=inference_other_local '+datasets.test.audio_dir=<YOUR_AUDIO_DIR>' '+datasets.test.transcription_dir=<YOUR_TRANSCRIPTION_DIR>'
  1. To run inference ON KAGGLE:

To run inference on clean (evaluate the model or save predictions) in KAGGLE:

python inference.py -cn=inference_clean '+datasets.test.audio_dir=<YOUR_AUDIO_DIR>' '+datasets.test.transcription_dir=<YOUR_TRANSCRIPTION_DIR>'

To run inference on other:

python inference.py -cn=inference_other '+datasets.test.audio_dir=<YOUR_AUDIO_DIR>' '+datasets.test.transcription_dir=<YOUR_TRANSCRIPTION_DIR>'

Credits

This repository is based on a PyTorch Project Template.

License

License

About

Automatic Speech Recognition from scratch in PyTorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages