#Efficient Conformer Demo
A quick intro to using pretrained models and how to train/evaluate models.<br>
repo: [https://github.com/burchim/EfficientConformer](https://github.com/burchim/EfficientConformer)

# Install

In [None]:
!git clone https://github.com/burchim/EfficientConformer.git 

In [None]:
import os
os.chdir('EfficientConformer/')

In [None]:
!pip install -r requirements.txt

In [None]:
!git clone --recursive https://github.com/parlance/ctcdecode.git
!cd ctcdecode && pip install .

# Download pretrained models and tokenizer

In [None]:
!pip install gdown

In [None]:
pretrained_models = {
    "EfficientConformerCTCSmall": "1MU49nbRONkOOGzvXHFDNfvWsyFmrrBam",
    "EfficientConformerCTCMedium": "1h5hRG9T_nErslm5eGgVzqx7dWDcOcGDB",
    "EfficientConformerCTCLarge": "1U4iBTKQogX4btE-S4rqCeeFZpj3gcweA"
}

In [None]:
# Select one of the official pretrained models
pretrained_model = "EfficientConformerCTCSmall"

In [None]:
import gdown

# Create model callback directory
if not os.path.exists(os.path.join("callbacks", pretrained_model)):
  os.mkdir(os.path.join("callbacks", pretrained_model))

# Download pretrained model checkpoint
gdown.download("https://drive.google.com/uc?id=" + pretrained_models[pretrained_model], os.path.join("callbacks", pretrained_model, "checkpoints_swa-equal-401-450.ckpt"), quiet=False)

# Create tokenizer directory
if not os.path.exists("datasets/LibriSpeech"):
  os.mkdir("datasets/LibriSpeech")

# Download pretrained model tokenizer
gdown.download("https://drive.google.com/uc?id=1hx2s4ZTDsnOFtx5_h5R_qZ3R6gEFafRx", "datasets/LibriSpeech/LibriSpeech_bpe_256.model", quiet=False)

# Test model on LibriSpeech samples

In [None]:
# Download LibriSPeech dev-clean subset
!cd datasets && wget https://www.openslr.org/resources/12/dev-clean.tar.gz && tar xzf dev-clean.tar.gz

# Download LibriSPeech dev-other subset
!cd datasets && wget https://www.openslr.org/resources/12/dev-other.tar.gz && tar xzf dev-other.tar.gz

In [None]:
import json
import glob
import torch
import torchaudio
import IPython.display as ipd
from functions import create_model
import matplotlib.pyplot as plt


In [None]:
config_file = "configs/" + pretrained_model + ".json"

# Load model Config
with open(config_file) as json_config:
  config = json.load(json_config)

# PyTorch Device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("Device:", device)

# Create and Load pretrained model
model = create_model(config).to(device)
model.summary()
model.eval()
model.load(os.path.join("callbacks", pretrained_model, "checkpoints_swa-equal-401-450.ckpt"))

In [None]:
# Get audio files paths
audio_files = glob.glob("datasets/LibriSpeech/*/*/*/*.flac")
print(len(audio_files), "audio files")

In [None]:
# Random indices
indices = torch.randint(0, len(audio_files), size=(10,))

# Test model
for i in indices:

  # Load audio file
  audio, sr = torchaudio.load(audio_files[i])

  # Plot audio
  plt.title(audio_files[i].split("/")[-1])
  plt.plot(audio[0])
  plt.show()
  print()

  # Display
  ipd.display(ipd.Audio(audio, rate=sr))
  print()

  # Predict sentence
  prediction = model.gready_search_decoding(audio.to(device), x_len=torch.tensor([len(audio[0])], device=device))[0]
  print("model prediction:", prediction, '\n')

  for i in range(100):
    print('*', end='')
  print('\n')


# Training
Download the LibriSpeech dataset using:

- `cd datasets && bash ./download_LibriSpeech.sh`

Or download LibriSpeech train-clean 100h subset with:

- `cd datasets && wget https://www.openslr.org/resources/12/train-clean-100.tar.gz && tar xzf datasets/train-clean-100.tar.gz`

In [None]:
# Download LibriSPeech train-clean-100 subset
!cd datasets && wget https://www.openslr.org/resources/12/train-clean-100.tar.gz && tar xzf train-clean-100.tar.gz

Train an Efficient Conformer CTC Small model.<br>
The `--prepare_dataset` flag will tokenize text sequences and save samples length before training/evaluation.<br>
Use the `--create_tokenizer` flag if you need to create a new sentencepiece tokenizer.<br>
Training mode is selected by default.

In [None]:
# Prepare dataset and train model
!python main.py --config_file configs/EfficientConformerCTCSmall.json --prepare_dataset

# Evaluation
Proceed to a gready search evaluation.
Use the `--mode` flag to select an evaluation mode:

- `validation-clean` for evaluation on the LibriSpeech dev-clean validation set.
- `validation-other` for evaluation on the LibriSpeech dev-other validation set.
- `test-clean` for evaluation on the LibriSpeech test-clean test set.
- `test-other` for evaluation on the LibriSpeech test-other test set.
- `eval_time` to evaluate model inference time on the LibriSpeech dev-clean validation set.

Select a model checkpoint to load for evaluation using the `--initial_epoch` flag.<br>
For example, `--initial_epoch swa-equal-401-450` will load the pretrained checkpoints_swa-equal-401-450.ckpt file.

In [None]:
!python main.py --config_file configs/EfficientConformerCTCSmall.json --mode validation-clean --initial_epoch swa-equal-401-450 --gready