# ASR Project Progress --- Speech/Language Understanding
This notebook will present some statistics about the dataset that I've put together from various sources as well as compare results from pretrained, fine-tuned, and normally trained models.

The goal of this stage of the project is for the models to achieve a good speech/language understanding of aviation english. As you will see from the results in this notebook, models pretrained on conversational english datasets (e.g. LibriSpeech and WSJ) perform very poorly on our data, often reaching error rates between 90% and 110%.

The preliminary imports are below e.g. OS interface packages, pytorch, pretty-print functions, etc.

In [20]:
import glob
import os
import torch
import gc
from pprint import pprint

The following cell contains some sanity checks to make sure the model checkpoints I created are still available/exist, that the GPU on the machine is available, and that the pretrained checkpoints from Nvidia are still available.

In [23]:
# GPU sanity check
print(f"GPU available: {torch.cuda.is_available()}\n")

# get list of model checkpoints
checkpoints = glob.glob(os.path.join("checkpoints", "*.nemo"))
pretrained = [
    "stt_en_jasper10x5dr",
    "QuartzNet15x5Base-En"
]
print("Saved model checkpoints:")
print("------------------------")
pprint(checkpoints)

print("\nPretrained models:")
print("------------------")
pprint(pretrained)

# memory clean-up just in case there was stuff running on the GPU previously
gc.collect()
torch.cuda.empty_cache()

GPU available: True

Saved model checkpoints:
------------------------
['checkpoints/jasper_finetuned.nemo',
 'checkpoints/quartznet_finetuned.nemo',
 'checkpoints/ctc_randominit.nemo']

Pretrained models:
------------------
['stt_en_jasper10x5dr', 'QuartzNet15x5Base-En']
