# PART I: Running a SpeechBrain ASR Recipe

## You have to fill in appropriate code in 4 locations in the notebook.

* The four locations start with "####TASK". Read the task specifications mentioned there.
* The required function names are already available in this notebook somewhere. You're only required to find and use the appropriate ones.
* Check the SpeechBrain documentation to find out how to use those functions. Refer to the starting material for more resources.
* By the end of this part of the assignment, you should be comfortable running a Speechbrain recipe.
* **P.S.** Note that none of the four tasks require you to write more than 1 line of code.

### Setting up the codebase.

In [None]:
%%capture

# Clone SpeechBrain repository
!git clone https://github.com/Darshan7575/speechbrain.git
%cd /content/speechbrain/

# Install required dependencies
!pip install -r requirements.txt

# Install SpeechBrain in editable mode
!pip install -e .

In [None]:
# @title
# Required imports
import os
import json
import shutil
import logging
import sys
import torch
from pathlib import Path
import speechbrain as sb
from hyperpyyaml import load_hyperpyyaml
from speechbrain.utils.data_utils import get_all_files, download_file
from speechbrain.dataio.dataio import read_audio
from speechbrain.utils.distributed import run_on_main, if_main_process

# Required variables and loggers
logger = logging.getLogger(__name__)
logger = logging.getLogger(__name__)
MINILIBRI_TRAIN_URL = "http://www.openslr.org/resources/31/train-clean-5.tar.gz"
MINILIBRI_VALID_URL = "http://www.openslr.org/resources/31/dev-clean-2.tar.gz"
MINILIBRI_TEST_URL = "https://www.openslr.org/resources/12/test-clean.tar.gz"
SAMPLERATE = 16000

device="cuda"
run_opts = {'device':device}

### Tokenizer Training
In this section, we will train a BPE tokenizer with **150 tokens** using `Sentencepiece`.



In [None]:
# ############################################################################
# Dataset creation helper functions
# ############################################################################

def prepare_mini_librispeech(
    data_folder, save_json_train, save_json_valid, save_json_test
):
    """
    Prepares the json files for the Mini Librispeech dataset.
    Downloads the dataset if its not found in the `data_folder`.
    """

    # Check if this phase is already done (if so, skip it)
    if skip(save_json_train, save_json_valid, save_json_test):
        logger.info("Preparation completed in previous run, skipping.")
        return

    # If the dataset doesn't exist yet, download it
    train_folder = os.path.join(data_folder, "LibriSpeech", "train-clean-5")
    valid_folder = os.path.join(data_folder, "LibriSpeech", "dev-clean-2")
    test_folder = os.path.join(data_folder, "LibriSpeech", "test-clean")
    if not check_folders(train_folder, valid_folder, test_folder):
        download_mini_librispeech(data_folder)

    # List files and create manifest from list
    logger.info(
        f"Creating {save_json_train}, {save_json_valid}, and {save_json_test}"
    )
    extension = [".flac"]

    # List of flac audio files
    wav_list_train = get_all_files(train_folder, match_and=extension)
    wav_list_valid = get_all_files(valid_folder, match_and=extension)
    wav_list_test = get_all_files(test_folder, match_and=extension)

    # List of transcription file
    extension = [".trans.txt"]
    trans_list = get_all_files(data_folder, match_and=extension)
    trans_dict = get_transcription(trans_list)

    # Create the json files
    create_json(wav_list_train, trans_dict, save_json_train)
    create_json(wav_list_valid, trans_dict, save_json_valid)
    create_json(wav_list_test, trans_dict, save_json_test)


def get_transcription(trans_list):
    """
    Returns a dictionary with the transcription of each sentence in the dataset.
    """
    # Processing all the transcription files in the list
    trans_dict = {}
    for trans_file in trans_list:
        # Reading the text file
        with open(trans_file) as f:
            for line in f:
                uttid = line.split(" ")[0]
                text = line.rstrip().split(" ")[1:]
                text = " ".join(text)
                trans_dict[uttid] = text

    logger.info("Transcription files read!")
    return trans_dict


def create_json(wav_list, trans_dict, json_file):
    """
    Creates the json file given a list of wav files and their transcriptions.
    """
    # Processing all the wav files in the list
    json_dict = {}
    for wav_file in wav_list:

        # Reading the signal (to retrieve duration in seconds)
        signal = read_audio(wav_file)
        duration = signal.shape[0] / SAMPLERATE

        # Manipulate path to get relative path and uttid
        path_parts = wav_file.split(os.path.sep)
        uttid, _ = os.path.splitext(path_parts[-1])
        relative_path = os.path.join("{data_root}", *path_parts[-5:])

        # Create entry for this utterance
        json_dict[uttid] = {
            "wav": relative_path,
            "length": duration,
            "words": trans_dict[uttid],
        }

    # Writing the dictionary to the json file
    with open(json_file, mode="w") as json_f:
        json.dump(json_dict, json_f, indent=2)

    logger.info(f"{json_file} successfully created!")


def skip(*filenames):
    """
    Detects if the data preparation has been already done.
    If the preparation has been done, we can skip it.
    """
    for filename in filenames:
        if not os.path.isfile(filename):
            return False
    return True


def check_folders(*folders):
    """Returns False if any passed folder does not exist."""
    for folder in folders:
        if not os.path.exists(folder):
            return False
    return True


def download_mini_librispeech(destination):
    """Download dataset and unpack it.
    """
    train_archive = os.path.join(destination, "train-clean-5.tar.gz")
    valid_archive = os.path.join(destination, "dev-clean-2.tar.gz")
    test_archive = os.path.join(destination, "test-clean.tar.gz")
    download_file(MINILIBRI_TRAIN_URL, train_archive)
    download_file(MINILIBRI_VALID_URL, valid_archive)
    download_file(MINILIBRI_TEST_URL, test_archive)
    shutil.unpack_archive(train_archive, destination)
    shutil.unpack_archive(valid_archive, destination)
    shutil.unpack_archive(test_archive, destination)

In [None]:
tokenizer_hyperparams = """
# ############################################################################
# Tokenizer: subword BPE with unigram 150
# ############################################################################

output_folder: !ref results/tokenizer/

# Data files
data_folder: data
train_annotation: !ref <data_folder>/train.json
valid_annotation: !ref <data_folder>/valid.json
test_annotation: !ref <data_folder>/test.json

# Tokenizer training parameters
token_type: unigram  # ["unigram", "bpe", "char"]
token_output: 150  # index(blank/eos/bos/unk) = 0
character_coverage: 1.0
json_read: words

tokenizer: !name:speechbrain.tokenizers.SentencePiece.SentencePiece
   model_dir: !ref <output_folder>
   vocab_size: !ref <token_output>
   annotation_train: !ref <train_annotation>
   annotation_read: !ref <json_read>
   annotation_format: json
   model_type: !ref <token_type> # ["unigram", "bpe", "char"]
   character_coverage: !ref <character_coverage>
   annotation_list_to_check: [!ref <train_annotation>, !ref <valid_annotation>]

"""

In [None]:
# load required params from the hyperpyyaml file
hparams = load_hyperpyyaml(tokenizer_hyperparams)

# 1. Dataset creation

## Create experiment directory
sb.create_experiment_directory(
    experiment_directory=hparams["output_folder"],
    overrides=None,
)

## Create dataset
run_on_main(
    prepare_mini_librispeech,
    kwargs={
        "data_folder": hparams["data_folder"],
        "save_json_train": hparams["train_annotation"],
        "save_json_valid": hparams["valid_annotation"],
        "save_json_test": hparams["test_annotation"],
    },
)

# 2. Tokenizer training
hparams["tokenizer"]()

# 3. Saving tokenizer in .ckpt extension
output_path = hparams["output_folder"]
token_output = hparams["token_output"]
token_type = hparams["token_type"]
bpe_model = f"{output_path}/{token_output}_{token_type}.model"
tokenizer_ckpt = f"{output_path}/tokenizer.ckpt"
shutil.copyfile(bpe_model, tokenizer_ckpt)

### Model Training
In this section, we will train a **6 layer Conformer** encoder only architecture with the `CTC objective`.

In [None]:
global_hyperparams = """
# Seed needs to be set at top of yaml, before objects with parameters are made
seed: 2024
__set_seed: !apply:torch.manual_seed [!ref <seed>]

# Data files
data_folder: data

####TASK ADD APPROPRIATE REFERENCES TO LOAD THE FILES ##############

train_annotation: !ref <data_folder>/train.json
valid_annotation: !ref <data_folder>/valid.json
test_annotation: !ref <data_folder>/test.json
#####################################################################

# Language model (LM) pretraining
pretrained_lm_tokenizer_path: ./results/tokenizer

# Training parameters
number_of_epochs: 30 #####CHANGE
batch_size: 8
lr_adam: 0.001
max_grad_norm: 5.0
ckpt_interval_minutes: 15 # save checkpoint every N min
loss_reduction: 'batchmean'

# Dataloader options
train_dataloader_opts:
    batch_size: !ref <batch_size>

valid_dataloader_opts:
    batch_size: !ref <batch_size>

test_dataloader_opts:
    batch_size: !ref <batch_size>

# Feature parameters
sample_rate: 16000
n_fft: 400
n_mels: 80

####################### Model parameters ###########################
# Transformer
d_model: 64
nhead: 4
num_encoder_layers: 6
d_ffn: 256
transformer_dropout: 0.1
activation: !name:torch.nn.GELU
output_neurons: 150
label_smoothing: 0.0
attention_type: RelPosMHAXL

# Outputs
blank_index: 0
pad_index: 0
bos_index: 1
eos_index: 2

# Decoding parameters
min_decode_ratio: 0.0
max_decode_ratio: 1.0
test_beam_size: 1
ctc_weight_decode: 1.0

############################## models ################################

compute_features: !new:speechbrain.lobes.features.Fbank
    sample_rate: !ref <sample_rate>
    n_fft: !ref <n_fft>
    n_mels: !ref <n_mels>

CNN: !new:speechbrain.lobes.models.convolution.ConvolutionFrontEnd
    input_shape: (8, 10, 80)
    num_blocks: 2
    num_layers_per_block: 1
    out_channels: (64, 32)
    kernel_sizes: (3, 3)
    strides: (2, 2)
    residuals: (False, False)

# standard parameters for the BASE model
Transformer: !new:speechbrain.lobes.models.transformer.TransformerASR.TransformerASR
    input_size: 640
    tgt_vocab: !ref <output_neurons>
    d_model: !ref <d_model>
    nhead: !ref <nhead>
    num_encoder_layers: !ref <num_encoder_layers>
    num_decoder_layers: 0
    d_ffn: !ref <d_ffn>
    dropout: !ref <transformer_dropout>
    activation: !ref <activation>
    encoder_module: conformer
    attention_type: !ref <attention_type>
    normalize_before: True

tokenizer: !new:sentencepiece.SentencePieceProcessor

ctc_lin: !new:speechbrain.nnet.linear.Linear
    input_size: !ref <d_model>
    n_neurons: !ref <output_neurons>

normalize: !new:speechbrain.processing.features.InputNormalization
    norm_type: global
    update_until_epoch: 4

modules:
    CNN: !ref <CNN>
    Transformer: !ref <Transformer>
    ctc_lin: !ref <ctc_lin>
    normalize: !ref <normalize>

model: !new:torch.nn.ModuleList
    - [!ref <CNN>, !ref <Transformer>, !ref <ctc_lin>]

# define two optimizers here for two-stage training
Adam: !name:torch.optim.Adam
    lr: !ref <lr_adam>
    betas: (0.9, 0.98)
    eps: 0.000000001

log_softmax: !new:torch.nn.LogSoftmax
    dim: -1

ctc_cost: !name:speechbrain.nnet.losses.ctc_loss
    blank_index: !ref <blank_index>
    reduction: !ref <loss_reduction>

noam_annealing: !new:speechbrain.nnet.schedulers.NoamScheduler
    lr_initial: !ref <lr_adam>
    n_warmup_steps: 1500

epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter
    limit: !ref <number_of_epochs>

error_rate_computer: !name:speechbrain.utils.metric_stats.ErrorRateStats

cer_computer: !name:speechbrain.utils.metric_stats.ErrorRateStats
   split_tokens: True

# The pretrainer allows a mapping between pretrained files and instances that
# are declared in the yaml. E.g here, we will download the file tokenizer.ckpt
# and it will be loaded into "tokenizer" which is pointing to the <pretrained_lm_tokenizer_path> defined
# before.
pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
    loadables:
        tokenizer: !ref <tokenizer>
    paths:
        tokenizer: !ref <pretrained_lm_tokenizer_path>/tokenizer.ckpt
"""

In [None]:
def dataio_prepare(hparams):
    """This function prepares the datasets to be used in the brain class.
    It also defines the data processing pipeline through user-defined functions.
    """
    # Define audio pipeline. In this case, we simply read the path contained
    # in the variable wav with the audio reader.
    @sb.utils.data_pipeline.takes("wav")
    @sb.utils.data_pipeline.provides("sig")
    def audio_pipeline(wav):
        """Load the audio signal. This is done on the CPU in the `collate_fn`."""
        sig = sb.dataio.dataio.read_audio(wav)
        return sig

    tokenizer = hparams["tokenizer"]
    # Define text processing pipeline. We start from the raw text and then
    # encode it using the tokenizer. The tokens with BOS are used for feeding
    # decoder during training, the tokens with EOS for computing the cost function.
    # The tokens without BOS or EOS is for computing CTC loss.
    @sb.utils.data_pipeline.takes("words")
    @sb.utils.data_pipeline.provides(
        "wrd", "tokens_list", "tokens_bos", "tokens_eos", "tokens"
    )
    def text_pipeline(wrd):
        """Processes the transcriptions to generate proper labels"""
        yield wrd
        tokens_list = tokenizer.encode_as_ids(wrd)
        yield tokens_list
        tokens_bos = torch.LongTensor([hparams["bos_index"]] + (tokens_list))
        yield tokens_bos
        tokens_eos = torch.LongTensor(tokens_list + [hparams["eos_index"]])
        yield tokens_eos
        tokens = torch.LongTensor(tokens_list)
        yield tokens

    # Define datasets from json data manifest file
    # Define datasets sorted by ascending lengths for efficiency
    datasets = {}
    data_folder = hparams["data_folder"]
    for dataset in ["train", "valid", "test"]:
        datasets[dataset] = sb.dataio.dataset.DynamicItemDataset.from_json(
            json_path=hparams[f"{dataset}_annotation"],
            replacements={"data_root": data_folder},
            dynamic_items=[audio_pipeline, text_pipeline],
            output_keys=[
                "id",
                "sig",
                "wrd",
                "tokens_bos",
                "tokens_eos",
                "tokens",
            ],
        )
        hparams[f"{dataset}_dataloader_opts"]["shuffle"] = False

    datasets["train"] = datasets["train"].filtered_sorted(sort_key="length")
    hparams["train_dataloader_opts"]["shuffle"] = False

    return (
        datasets["train"],
        datasets["valid"],
        datasets["test"],
        tokenizer
    )

In [None]:
# Define training procedure
class BaseASR(sb.Brain):
    def __init__(
        self,
        modules=None,
        opt_class=None,
        hparams=None,
        run_opts=None,
        checkpointer=None,
        profiler=None,
        tokenizer=None,
    ):
        super(BaseASR, self).__init__(
            modules=modules,
            opt_class=opt_class,
            hparams=hparams,
            run_opts=run_opts,
            checkpointer=checkpointer,
            profiler=profiler
        )
        self.tokenizer = tokenizer

    def compute_forward(self, batch, stage):
        """Performs a forward pass through the encoder"""
        batch = batch.to(self.device)
        wavs, wav_lens = batch.sig
        tokens_bos, _ = batch.tokens_bos

        # compute features
        ####TASK MAKE APPROPRIATE FUNCTION CALLS TO COMPUTE THE FEATURES BELOW
        feats =self.hparams.compute_features(wavs)#### FILL THIS ####
        current_epoch = self.hparams.epoch_counter.current
        feats = self.modules.normalize(feats, wav_lens, epoch=current_epoch)

        # forward modules
        src = self.modules.CNN(feats)

        enc_out, _ = self.modules.Transformer(
            src, tokens_bos, wav_lens, pad_idx=self.hparams.pad_index,
        )

        # output layer for ctc log-probabilities
        logits = self.modules.ctc_lin(enc_out)

        ####TASK CALCULATE THE PROBABILITIES OF THESE LOGITS
        #### USING SPEECHBRAIN
        loss_func = sb.nnet.activations.Softmax(apply_log=True)
        p_ctc = loss_func.forward(logits)#### FILL THIS ####

        # Compute outputs
        hyps = None
        if stage == sb.Stage.TRAIN:
            hyps = None
        else:
            hyps = sb.decoders.ctc_greedy_decode(
                p_ctc, wav_lens, blank_id=self.hparams.blank_index
            )

        return p_ctc, wav_lens, hyps

    def compute_objectives(self, predictions, batch, stage):
        """Computes the CTC loss given predictions and targets."""

        (p_ctc, wav_lens, hyps,) = predictions

        ids = batch.id
        tokens_eos, tokens_eos_lens = batch.tokens_eos
        tokens, tokens_lens = batch.tokens

        # Calculate CTC loss
        ####TASK Make required function call to compute CTC LOSS
        #### You have to aggregate the loss in the end so make appropriate
        #### modifications to the value returned.
        loss = self.hparams.ctc_cost(p_ctc, tokens, wav_lens, tokens_lens)#### FILL THIS ####

        if stage != sb.Stage.TRAIN:
            # Decode token terms to words
            predicted_words = [
                self.tokenizer.decode_ids(utt_seq).split(" ") for utt_seq in hyps
            ]
            target_words = [wrd.split(" ") for wrd in batch.wrd]
            self.wer_metric.append(ids, predicted_words, target_words)
            self.cer_metric.append(ids, predicted_words, target_words)

        return loss

    def on_evaluate_start(self, max_key=None, min_key=None):
        """Performs checkpoint averge if needed"""
        super().on_evaluate_start()

        ckpts = self.checkpointer.find_checkpoints(
            max_key=max_key, min_key=min_key
        )
        ckpt = sb.utils.checkpoints.average_checkpoints(
            ckpts, recoverable_name="model", device=self.device
        )

        self.hparams.model.load_state_dict(ckpt, strict=True)
        self.hparams.model.eval()
        print("Loaded the average")

    def evaluate_batch(self, batch, stage):
        """Computations needed for validation/test batches"""
        with torch.no_grad():
            predictions = self.compute_forward(batch, stage=stage)
            loss = self.compute_objectives(predictions, batch, stage=stage)
        return loss.detach()

    def on_stage_start(self, stage, epoch):
        """Gets called at the beginning of each epoch"""
        if stage != sb.Stage.TRAIN:
            self.cer_metric = self.hparams.cer_computer()
            self.wer_metric = self.hparams.error_rate_computer()

    def on_stage_end(self, stage, stage_loss, epoch):
        """Gets called at the end of a epoch."""
        # Compute/store important stats
        stage_stats = {"loss": stage_loss}
        if stage == sb.Stage.TRAIN:
            self.train_stats = stage_stats
        else:
            stage_stats["CER"] = self.cer_metric.summarize("error_rate")
            stage_stats["WER"] = self.wer_metric.summarize("error_rate")

        # log stats and save checkpoint at end-of-epoch
        if stage == sb.Stage.VALID and sb.utils.distributed.if_main_process():

            lr = self.hparams.noam_annealing.current_lr
            steps = self.optimizer_step
            optimizer = self.optimizer.__class__.__name__

            epoch_stats = {
                "epoch": epoch,
                "lr": lr,
                "steps": steps,
                "optimizer": optimizer,
            }
            self.hparams.train_logger.log_stats(
                stats_meta=epoch_stats,
                train_stats=self.train_stats,
                valid_stats=stage_stats,
            )
            # Save only last 10 checkpoints
            self.checkpointer.save_and_keep_only(
                meta={"loss": stage_loss, "epoch": epoch},
                max_keys=["epoch"],
                num_to_keep=10,
            )

        elif stage == sb.Stage.TEST:
            self.hparams.train_logger.log_stats(
                stats_meta={"Epoch loaded": self.hparams.epoch_counter.current},
                test_stats=stage_stats,
            )
            # Write the WER metric for test dataset
            if if_main_process():
                with open(self.hparams.test_wer_file, "w") as w:
                    self.wer_metric.write_stats(w)

    def fit_batch(self, batch):
        """Performs a forward + backward pass on 1 batch
        """

        should_step = self.step % self.grad_accumulation_factor == 0

        outputs = self.compute_forward(batch, sb.Stage.TRAIN)
        loss = self.compute_objectives(outputs, batch, sb.Stage.TRAIN)
        loss.backward()
        if self.check_gradients(loss):
            self.optimizer.step()
        self.zero_grad()
        self.optimizer_step += 1
        self.hparams.noam_annealing(self.optimizer)

        self.on_fit_batch_end(batch, outputs, loss, should_step)
        return loss.detach().cpu()

In [None]:
task_hyperparameters = """
# Setup the directory to host experiment results
output_folder: !ref results/transformer/Task_1
wer_file: !ref <output_folder>/wer.txt
save_folder: !ref <output_folder>/save
train_log: !ref <output_folder>/train_log.txt

train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
    save_file: !ref <train_log>

checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
    checkpoints_dir: !ref <save_folder>
    recoverables:
        model: !ref <model>
        noam_scheduler: !ref <noam_annealing>
        normalizer: !ref <normalize>
        counter: !ref <epoch_counter>
"""

In [10]:
hyperparams = global_hyperparams + task_hyperparameters
hparams = load_hyperpyyaml(hyperparams)

# Create experiment directory
sb.create_experiment_directory(
    experiment_directory=hparams["output_folder"],
    overrides=None,
)

# Here we create the datasets objects as well as tokenization and encoding
(
    train_data,
    valid_data,
    test_data,
    tokenizer
) = dataio_prepare(hparams)

# We download the pretrained LM from HuggingFace (or elsewhere depending on
# the path given in the YAML file). The tokenizer is loaded at the same time.
run_on_main(hparams["pretrainer"].collect_files)
hparams["pretrainer"].load_collected(device=run_opts["device"])

# Trainer initialization
asr_brain = BaseASR(
    modules=hparams["modules"],
    opt_class=hparams["Adam"],
    hparams=hparams,
    checkpointer=hparams["checkpointer"],
    run_opts=run_opts,
    tokenizer=tokenizer,
)

# adding objects to trainer:
train_dataloader_opts = hparams["train_dataloader_opts"]
valid_dataloader_opts = hparams["valid_dataloader_opts"]

# Training
asr_brain.fit(
    asr_brain.hparams.epoch_counter,
    train_data,
    valid_data,
    train_loader_kwargs=train_dataloader_opts,
    valid_loader_kwargs=valid_dataloader_opts
)

# Testing
asr_brain.hparams.test_wer_file = asr_brain.hparams.wer_file
asr_brain.evaluate(
    test_data,
    max_key="epoch",
    test_loader_kwargs=hparams["test_dataloader_opts"],
)

speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/transformer/Task_1
speechbrain.pretrained.fetching - Destination tokenizer.ckpt: local file in /content/speechbrain/results/tokenizer/tokenizer.ckpt.
speechbrain.utils.parameter_transfer - Set local path in self.paths[tokenizer] = model_checkpoints/tokenizer.ckpt
speechbrain.utils.parameter_transfer - Loading pretrained files for: tokenizer
speechbrain.utils.parameter_transfer - Redirecting (loading from local path): model_checkpoints/tokenizer.ckpt -> model_checkpoints/tokenizer.ckpt
speechbrain.core - Info: max_grad_norm arg from hparam file is used
speechbrain.core - Info: ckpt_interval_minutes arg from hparam file is used
speechbrain.core - 698.9k trainable parameters in BaseASR
speechbrain.utils.checkpoints - Would load a checkpoint here, but none found yet.
speechbrain.utils.epoch_loop - Going into epoch 1


100%|██████████| 190/190 [00:29<00:00,  6.40it/s, train_loss=537]
100%|██████████| 137/137 [00:09<00:00, 14.89it/s]

speechbrain.utils.train_logger - epoch: 1, lr: 1.26e-04, steps: 190, optimizer: Adam - train loss: 5.37e+02 - valid loss: 2.42e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-25-25+00
speechbrain.utils.epoch_loop - Going into epoch 2


100%|██████████| 190/190 [00:25<00:00,  7.57it/s, train_loss=433]
100%|██████████| 137/137 [00:08<00:00, 15.85it/s]

speechbrain.utils.train_logger - epoch: 2, lr: 2.53e-04, steps: 380, optimizer: Adam - train loss: 4.33e+02 - valid loss: 2.34e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-25-59+00
speechbrain.utils.epoch_loop - Going into epoch 3


100%|██████████| 190/190 [00:25<00:00,  7.51it/s, train_loss=431]
100%|██████████| 137/137 [00:08<00:00, 15.62it/s]

speechbrain.utils.train_logger - epoch: 3, lr: 3.79e-04, steps: 570, optimizer: Adam - train loss: 4.31e+02 - valid loss: 2.34e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-26-33+00
speechbrain.utils.epoch_loop - Going into epoch 4


100%|██████████| 190/190 [00:25<00:00,  7.41it/s, train_loss=428]
100%|██████████| 137/137 [00:08<00:00, 15.66it/s]

speechbrain.utils.train_logger - epoch: 4, lr: 5.06e-04, steps: 760, optimizer: Adam - train loss: 4.28e+02 - valid loss: 2.31e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-27-08+00
speechbrain.utils.epoch_loop - Going into epoch 5


100%|██████████| 190/190 [00:25<00:00,  7.38it/s, train_loss=411]
100%|██████████| 137/137 [00:09<00:00, 14.68it/s]

speechbrain.utils.train_logger - epoch: 5, lr: 6.33e-04, steps: 950, optimizer: Adam - train loss: 4.11e+02 - valid loss: 2.13e+02, valid CER: 94.91, valid WER: 99.80





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-27-43+00
speechbrain.utils.epoch_loop - Going into epoch 6


100%|██████████| 190/190 [00:25<00:00,  7.38it/s, train_loss=370]
100%|██████████| 137/137 [00:11<00:00, 11.95it/s]

speechbrain.utils.train_logger - epoch: 6, lr: 7.59e-04, steps: 1140, optimizer: Adam - train loss: 3.70e+02 - valid loss: 1.90e+02, valid CER: 74.91, valid WER: 94.94
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-28-20+00
speechbrain.utils.epoch_loop - Going into epoch 7



100%|██████████| 190/190 [00:25<00:00,  7.31it/s, train_loss=334]
100%|██████████| 137/137 [00:12<00:00, 11.30it/s]

speechbrain.utils.train_logger - epoch: 7, lr: 8.86e-04, steps: 1330, optimizer: Adam - train loss: 3.34e+02 - valid loss: 1.74e+02, valid CER: 69.97, valid WER: 93.98
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-28-59+00
speechbrain.utils.epoch_loop - Going into epoch 8



100%|██████████| 190/190 [00:25<00:00,  7.40it/s, train_loss=306]
100%|██████████| 137/137 [00:12<00:00, 11.08it/s]

speechbrain.utils.train_logger - epoch: 8, lr: 9.94e-04, steps: 1520, optimizer: Adam - train loss: 3.06e+02 - valid loss: 1.61e+02, valid CER: 64.83, valid WER: 91.98
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-29-37+00
speechbrain.utils.epoch_loop - Going into epoch 9



100%|██████████| 190/190 [00:26<00:00,  7.30it/s, train_loss=283]
100%|██████████| 137/137 [00:13<00:00,  9.81it/s]

speechbrain.utils.train_logger - epoch: 9, lr: 9.37e-04, steps: 1710, optimizer: Adam - train loss: 2.83e+02 - valid loss: 1.51e+02, valid CER: 58.19, valid WER: 89.42
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-30-17+00





speechbrain.utils.epoch_loop - Going into epoch 10


100%|██████████| 190/190 [00:25<00:00,  7.39it/s, train_loss=265]
100%|██████████| 137/137 [00:13<00:00, 10.28it/s]

speechbrain.utils.train_logger - epoch: 10, lr: 8.89e-04, steps: 1900, optimizer: Adam - train loss: 2.65e+02 - valid loss: 1.44e+02, valid CER: 56.10, valid WER: 88.05
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-30-56+00





speechbrain.utils.epoch_loop - Going into epoch 11


100%|██████████| 190/190 [00:26<00:00,  7.26it/s, train_loss=250]
100%|██████████| 137/137 [00:13<00:00, 10.23it/s]

speechbrain.utils.train_logger - epoch: 11, lr: 8.47e-04, steps: 2090, optimizer: Adam - train loss: 2.50e+02 - valid loss: 1.39e+02, valid CER: 54.56, valid WER: 87.22
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-31-36+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-25-25+00
speechbrain.utils.epoch_loop - Going into epoch 12


100%|██████████| 190/190 [00:26<00:00,  7.17it/s, train_loss=238]
100%|██████████| 137/137 [00:14<00:00,  9.71it/s]

speechbrain.utils.train_logger - epoch: 12, lr: 8.11e-04, steps: 2280, optimizer: Adam - train loss: 2.38e+02 - valid loss: 1.34e+02, valid CER: 51.63, valid WER: 85.75
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-32-17+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-25-59+00
speechbrain.utils.epoch_loop - Going into epoch 13


100%|██████████| 190/190 [00:25<00:00,  7.31it/s, train_loss=226]
100%|██████████| 137/137 [00:13<00:00,  9.80it/s]

speechbrain.utils.train_logger - epoch: 13, lr: 7.79e-04, steps: 2470, optimizer: Adam - train loss: 2.26e+02 - valid loss: 1.30e+02, valid CER: 50.33, valid WER: 85.40
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-32-57+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-26-33+00
speechbrain.utils.epoch_loop - Going into epoch 14


100%|██████████| 190/190 [00:26<00:00,  7.27it/s, train_loss=217]
100%|██████████| 137/137 [00:14<00:00,  9.31it/s]

speechbrain.utils.train_logger - epoch: 14, lr: 7.51e-04, steps: 2660, optimizer: Adam - train loss: 2.17e+02 - valid loss: 1.28e+02, valid CER: 49.36, valid WER: 84.52





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-33-38+00
speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-27-08+00
speechbrain.utils.epoch_loop - Going into epoch 15


100%|██████████| 190/190 [00:25<00:00,  7.38it/s, train_loss=210]
100%|██████████| 137/137 [00:14<00:00,  9.73it/s]

speechbrain.utils.train_logger - epoch: 15, lr: 7.26e-04, steps: 2850, optimizer: Adam - train loss: 2.10e+02 - valid loss: 1.25e+02, valid CER: 47.39, valid WER: 83.99
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-34-19+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-27-43+00
speechbrain.utils.epoch_loop - Going into epoch 16


100%|██████████| 190/190 [00:25<00:00,  7.33it/s, train_loss=201]
100%|██████████| 137/137 [00:14<00:00,  9.42it/s]

speechbrain.utils.train_logger - epoch: 16, lr: 7.03e-04, steps: 3040, optimizer: Adam - train loss: 2.01e+02 - valid loss: 1.22e+02, valid CER: 46.11, valid WER: 83.20
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-35-00+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-28-20+00
speechbrain.utils.epoch_loop - Going into epoch 17


100%|██████████| 190/190 [00:26<00:00,  7.18it/s, train_loss=195]
100%|██████████| 137/137 [00:14<00:00,  9.42it/s]

speechbrain.utils.train_logger - epoch: 17, lr: 6.82e-04, steps: 3230, optimizer: Adam - train loss: 1.95e+02 - valid loss: 1.20e+02, valid CER: 45.35, valid WER: 82.58
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-35-41+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-28-59+00
speechbrain.utils.epoch_loop - Going into epoch 18


100%|██████████| 190/190 [00:25<00:00,  7.41it/s, train_loss=189]
100%|██████████| 137/137 [00:14<00:00,  9.43it/s]

speechbrain.utils.train_logger - epoch: 18, lr: 6.62e-04, steps: 3420, optimizer: Adam - train loss: 1.89e+02 - valid loss: 1.18e+02, valid CER: 44.37, valid WER: 81.97
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-36-21+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-29-37+00
speechbrain.utils.epoch_loop - Going into epoch 19


100%|██████████| 190/190 [00:26<00:00,  7.14it/s, train_loss=183]
100%|██████████| 137/137 [00:15<00:00,  9.10it/s]

speechbrain.utils.train_logger - epoch: 19, lr: 6.45e-04, steps: 3610, optimizer: Adam - train loss: 1.83e+02 - valid loss: 1.17e+02, valid CER: 43.21, valid WER: 81.49
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-37-03+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-30-17+00
speechbrain.utils.epoch_loop - Going into epoch 20


100%|██████████| 190/190 [00:25<00:00,  7.40it/s, train_loss=177]
100%|██████████| 137/137 [00:14<00:00,  9.16it/s]

speechbrain.utils.train_logger - epoch: 20, lr: 6.28e-04, steps: 3800, optimizer: Adam - train loss: 1.77e+02 - valid loss: 1.16e+02, valid CER: 42.49, valid WER: 81.00
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-37-44+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-30-56+00
speechbrain.utils.epoch_loop - Going into epoch 21


100%|██████████| 190/190 [00:26<00:00,  7.17it/s, train_loss=173]
100%|██████████| 137/137 [00:14<00:00,  9.31it/s]

speechbrain.utils.train_logger - epoch: 21, lr: 6.13e-04, steps: 3990, optimizer: Adam - train loss: 1.73e+02 - valid loss: 1.15e+02, valid CER: 42.70, valid WER: 80.87
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-38-26+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-31-36+00
speechbrain.utils.epoch_loop - Going into epoch 22


100%|██████████| 190/190 [00:25<00:00,  7.34it/s, train_loss=168]
100%|██████████| 137/137 [00:14<00:00,  9.39it/s]

speechbrain.utils.train_logger - epoch: 22, lr: 5.99e-04, steps: 4180, optimizer: Adam - train loss: 1.68e+02 - valid loss: 1.13e+02, valid CER: 41.48, valid WER: 80.62
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-39-07+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-32-17+00
speechbrain.utils.epoch_loop - Going into epoch 23


100%|██████████| 190/190 [00:26<00:00,  7.25it/s, train_loss=164]
100%|██████████| 137/137 [00:14<00:00,  9.15it/s]

speechbrain.utils.train_logger - epoch: 23, lr: 5.86e-04, steps: 4370, optimizer: Adam - train loss: 1.64e+02 - valid loss: 1.13e+02, valid CER: 41.10, valid WER: 80.02
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-39-49+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-32-57+00
speechbrain.utils.epoch_loop - Going into epoch 24


100%|██████████| 190/190 [00:25<00:00,  7.37it/s, train_loss=160]
100%|██████████| 137/137 [00:14<00:00,  9.33it/s]

speechbrain.utils.train_logger - epoch: 24, lr: 5.74e-04, steps: 4560, optimizer: Adam - train loss: 1.60e+02 - valid loss: 1.12e+02, valid CER: 40.74, valid WER: 80.02
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-40-29+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-33-38+00
speechbrain.utils.epoch_loop - Going into epoch 25


100%|██████████| 190/190 [00:25<00:00,  7.37it/s, train_loss=157]
100%|██████████| 137/137 [00:15<00:00,  8.95it/s]

speechbrain.utils.train_logger - epoch: 25, lr: 5.62e-04, steps: 4750, optimizer: Adam - train loss: 1.57e+02 - valid loss: 1.12e+02, valid CER: 40.59, valid WER: 79.55
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-41-11+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-34-19+00
speechbrain.utils.epoch_loop - Going into epoch 26


100%|██████████| 190/190 [00:25<00:00,  7.45it/s, train_loss=153]
100%|██████████| 137/137 [00:14<00:00,  9.37it/s]

speechbrain.utils.train_logger - epoch: 26, lr: 5.51e-04, steps: 4940, optimizer: Adam - train loss: 1.53e+02 - valid loss: 1.11e+02, valid CER: 40.20, valid WER: 78.86
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-41-52+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-35-00+00
speechbrain.utils.epoch_loop - Going into epoch 27


100%|██████████| 190/190 [00:25<00:00,  7.38it/s, train_loss=150]
100%|██████████| 137/137 [00:15<00:00,  8.75it/s]

speechbrain.utils.train_logger - epoch: 27, lr: 5.41e-04, steps: 5130, optimizer: Adam - train loss: 1.50e+02 - valid loss: 1.10e+02, valid CER: 39.93, valid WER: 78.37
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-42-33+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-35-41+00
speechbrain.utils.epoch_loop - Going into epoch 28


100%|██████████| 190/190 [00:25<00:00,  7.43it/s, train_loss=146]
100%|██████████| 137/137 [00:14<00:00,  9.33it/s]

speechbrain.utils.train_logger - epoch: 28, lr: 5.31e-04, steps: 5320, optimizer: Adam - train loss: 1.46e+02 - valid loss: 1.10e+02, valid CER: 39.68, valid WER: 78.15
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-43-14+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-36-21+00
speechbrain.utils.epoch_loop - Going into epoch 29


100%|██████████| 190/190 [00:25<00:00,  7.39it/s, train_loss=143]
100%|██████████| 137/137 [00:15<00:00,  8.75it/s]

speechbrain.utils.train_logger - epoch: 29, lr: 5.22e-04, steps: 5510, optimizer: Adam - train loss: 1.43e+02 - valid loss: 1.09e+02, valid CER: 39.01, valid WER: 78.16
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-43-56+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-37-03+00
speechbrain.utils.epoch_loop - Going into epoch 30


100%|██████████| 190/190 [00:25<00:00,  7.40it/s, train_loss=140]
100%|██████████| 137/137 [00:15<00:00,  9.13it/s]

speechbrain.utils.train_logger - epoch: 30, lr: 5.13e-04, steps: 5700, optimizer: Adam - train loss: 1.40e+02 - valid loss: 1.09e+02, valid CER: 38.70, valid WER: 77.56
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-44-37+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_1/save/CKPT+2024-02-18+05-37-44+00
speechbrain.utils.checkpoints - Loading a checkpoint from results/transformer/Task_1/save/CKPT+2024-02-18+05-44-37+00
Loaded the average


100%|██████████| 328/328 [00:44<00:00,  7.44it/s]

speechbrain.utils.train_logger - Epoch loaded: 30 - test loss: 1.22e+02, test CER: 39.90, test WER: 78.75





122.39182150073164

# Part II(A): CTC is all you need
In this section, you will train a **6-layer Conformer** encoder with both  `CTC` and `inter-CTC` losses.

> Indented block



In [11]:
class ASR_2A(BaseASR):
    def __init__(self, *args, **kwargs):

        super().__init__(*args, **kwargs)
        self.inter_ctc_weight = self.hparams.interctc_weight
        self.intermediate_layers = [int(layer) for layer in self.hparams.intermediate_layers.split(',')]

        # Variable to hold intermediate logits for interCTC loss calculation
        self.inter_logits = []

        # TODO: Define a helper function get_intermediate_output for the forward hook
        def get_intermediate_output(module, input, output):
            # TODO: Complete this function
            self.inter_logits.append(output[0])

        self.hooks = [self.modules.Transformer.encoder.layers[i-1].register_forward_hook(get_intermediate_output) for i in self.intermediate_layers]

        # TODO: Register hooks for all the intermediate encoder layers of interest.
        # TODO: Refer to register_forward_hook (https://pytorch.org/docs/stable/generated/torch.nn.modules.module.register_module_forward_hook.html)
        # TODO: Save all the hooks in a list self.hooks that you can remove later from the module


    def compute_forward(self, batch, stage):
        """Performs a forward pass through the encoder"""
        batch = batch.to(self.device)
        wavs, wav_lens = batch.sig
        tokens_bos, _ = batch.tokens_bos

        # compute features
        feats =  self.hparams.compute_features(wavs) #### TODO: FILL THIS BASED ON PART I ####
        current_epoch = self.hparams.epoch_counter.current
        feats = self.modules.normalize(feats, wav_lens, epoch=current_epoch)

        # forward modules
        src = self.modules.CNN(feats)

        assert len(self.inter_logits) == 0, "self.inter_logits should be empty as we haven't done a forward pass yet"
        enc_out, _ = self.modules.Transformer(
            src, tokens_bos, wav_lens, pad_idx=self.hparams.pad_index,
        )

        # Compute final layer logit
        logits = self.modules.ctc_lin(enc_out)
        loss_func = sb.nnet.activations.Softmax(apply_log=True)
        p_ctc = loss_func.forward(logits) #### TODO: FILL THIS BASED ON PART I ####

        # TODO: Append all the intermediate layer logits to the following list: inter_p_ctc
        # TODO: Go through all the layers in intermediate_layers. Note that the comma-separated list in intermediate_layers is 1-indexed.
        # TODO: Complete code below to populate inter_p_ctc

        inter_p_ctc = [loss_func.forward(self.modules.ctc_lin(i)) for i in self.inter_logits]

        # Flush the logits saved during last forward pass.
        self.inter_logits = []

        # Compute outputs
        hyps = None
        if stage == sb.Stage.TRAIN:
            assert len(inter_p_ctc) != 0 , "inter_p_ctc should NOT be empty as forward pass is already done"
            hyps = None
        else:
            hyps = sb.decoders.ctc_greedy_decode(
                p_ctc, wav_lens, blank_id=self.hparams.blank_index
            )

        return p_ctc, inter_p_ctc, wav_lens, hyps

    def on_evaluate_start(self, max_key=None, min_key=None):
        """Performs sanity operations before inferencing on the test set."""
        if self.checkpointer is not None:
            self.checkpointer.recover_if_possible(
                max_key=max_key,
                min_key=min_key,
                device=torch.device(self.device),
            )

        # Deregister hooks here as they are not needed during evaluation
        for hook in self.hooks:
            hook.remove()

    def compute_objectives(self, predictions, batch, stage):
        """Computes the CTC + inter-CTC loss given predictions and targets."""

        (p_ctc, inter_p_ctc, wav_lens, hyps,) = predictions

        ids = batch.id
        tokens_eos, tokens_eos_lens = batch.tokens_eos
        tokens, tokens_lens = batch.tokens

        # TODO: Compute inter-CTC loss

        loss_inter_ctc = sum([self.hparams.ctc_cost(
        x, tokens, wav_lens, tokens_lens) for x in inter_p_ctc])#### COMPLETE THIS ####
        # TODO: Write code to appropriately accumulate the inter-CTC loss in loss_inter_ctc
        # TODO: using the softmax probabilities saved for each intermediate layer in inter_p_ctc

        # Compute final layer CTC loss
        loss_ctc = self.hparams.ctc_cost(
        p_ctc, tokens, wav_lens, tokens_lens) #### TODO: FILL THIS BASED ON PART I ####

        # Compute final loss as a weighted combination of inter-CTC and CTC
        loss = self.inter_ctc_weight * loss_inter_ctc + (1 - self.inter_ctc_weight) * loss_ctc

        if stage != sb.Stage.TRAIN:
            # Decode token terms to words
            predicted_words = [
                    self.tokenizer.decode_ids(utt_seq).split(" ") for utt_seq in hyps
                ]
            target_words = [wrd.split(" ") for wrd in batch.wrd]
            self.wer_metric.append(ids, predicted_words, target_words)
            self.cer_metric.append(ids, predicted_words, target_words)

        return loss

In [12]:
task_hyperparameters = """

# Setup the directory to host experiment results
output_folder: !ref results/transformer/Part_2A
wer_file: !ref <output_folder>/wer.txt
save_folder: !ref <output_folder>/save
train_log: !ref <output_folder>/train_log.txt

interctc_weight: 0.3
intermediate_layers: '2,4'

train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
    save_file: !ref <train_log>

checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
    checkpoints_dir: !ref <save_folder>
    recoverables:
        model: !ref <model>
        noam_scheduler: !ref <noam_annealing>
        normalizer: !ref <normalize>
        counter: !ref <epoch_counter>
"""

In [13]:
hyperparams = global_hyperparams + task_hyperparameters
hparams = load_hyperpyyaml(hyperparams)

# Create experiment directory
sb.create_experiment_directory(
    experiment_directory=hparams["output_folder"],
    overrides=None,
)

# We download the pretrained LM from HuggingFace (or elsewhere depending on
# the path given in the YAML file). The tokenizer is loaded at the same time.
run_on_main(hparams["pretrainer"].collect_files)
hparams["pretrainer"].load_collected(device=run_opts["device"])

# Trainer initialization
asr_brain = ASR_2A(
    modules=hparams["modules"],
    opt_class=hparams["Adam"],
    hparams=hparams,
    checkpointer=hparams["checkpointer"],
    run_opts=run_opts,
    tokenizer=tokenizer,
)

# adding objects to trainer:
train_dataloader_opts = hparams["train_dataloader_opts"]
valid_dataloader_opts = hparams["valid_dataloader_opts"]

# Training
asr_brain.fit(
    asr_brain.hparams.epoch_counter,
    train_data,
    valid_data,
    train_loader_kwargs=train_dataloader_opts,
    valid_loader_kwargs=valid_dataloader_opts
)

# Testing

asr_brain.hparams.test_wer_file = asr_brain.hparams.wer_file
asr_brain.evaluate(
    test_data,
    max_key="ACC",
    test_loader_kwargs=hparams["test_dataloader_opts"],
)

speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/transformer/Part_2A
speechbrain.pretrained.fetching - Fetch tokenizer.ckpt: Using existing file/symlink in model_checkpoints/tokenizer.ckpt.
speechbrain.utils.parameter_transfer - Set local path in self.paths[tokenizer] = model_checkpoints/tokenizer.ckpt
speechbrain.utils.parameter_transfer - Loading pretrained files for: tokenizer
speechbrain.utils.parameter_transfer - Redirecting (loading from local path): model_checkpoints/tokenizer.ckpt -> model_checkpoints/tokenizer.ckpt
speechbrain.core - Info: max_grad_norm arg from hparam file is used
speechbrain.core - Info: ckpt_interval_minutes arg from hparam file is used
speechbrain.core - 698.9k trainable parameters in ASR_2A
speechbrain.utils.checkpoints - Would load a checkpoint here, but none found yet.
speechbrain.utils.epoch_loop - Going into epoch 1


100%|██████████| 190/190 [00:25<00:00,  7.31it/s, train_loss=747]
100%|██████████| 137/137 [00:08<00:00, 15.33it/s]

speechbrain.utils.train_logger - epoch: 1, lr: 1.26e-04, steps: 190, optimizer: Adam - train loss: 7.47e+02 - valid loss: 3.15e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-45-58+00
speechbrain.utils.epoch_loop - Going into epoch 2


100%|██████████| 190/190 [00:25<00:00,  7.33it/s, train_loss=564]
100%|██████████| 137/137 [00:08<00:00, 15.75it/s]

speechbrain.utils.train_logger - epoch: 2, lr: 2.53e-04, steps: 380, optimizer: Adam - train loss: 5.64e+02 - valid loss: 3.05e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-46-33+00
speechbrain.utils.epoch_loop - Going into epoch 3


100%|██████████| 190/190 [00:26<00:00,  7.24it/s, train_loss=560]
100%|██████████| 137/137 [00:09<00:00, 15.19it/s]

speechbrain.utils.train_logger - epoch: 3, lr: 3.79e-04, steps: 570, optimizer: Adam - train loss: 5.60e+02 - valid loss: 3.04e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-47-09+00
speechbrain.utils.epoch_loop - Going into epoch 4


100%|██████████| 190/190 [00:26<00:00,  7.30it/s, train_loss=558]
100%|██████████| 137/137 [00:08<00:00, 15.48it/s]

speechbrain.utils.train_logger - epoch: 4, lr: 5.06e-04, steps: 760, optimizer: Adam - train loss: 5.58e+02 - valid loss: 3.02e+02, valid CER: 99.71, valid WER: 99.64





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-47-44+00
speechbrain.utils.epoch_loop - Going into epoch 5


100%|██████████| 190/190 [00:26<00:00,  7.29it/s, train_loss=546]
100%|██████████| 137/137 [00:09<00:00, 15.00it/s]

speechbrain.utils.train_logger - epoch: 5, lr: 6.33e-04, steps: 950, optimizer: Adam - train loss: 5.46e+02 - valid loss: 2.88e+02, valid CER: 94.60, valid WER: 99.69
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-48-19+00





speechbrain.utils.epoch_loop - Going into epoch 6


100%|██████████| 190/190 [00:25<00:00,  7.33it/s, train_loss=503]
100%|██████████| 137/137 [00:10<00:00, 13.24it/s]

speechbrain.utils.train_logger - epoch: 6, lr: 7.59e-04, steps: 1140, optimizer: Adam - train loss: 5.03e+02 - valid loss: 2.59e+02, valid CER: 84.33, valid WER: 97.70





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-48-56+00
speechbrain.utils.epoch_loop - Going into epoch 7


100%|██████████| 190/190 [00:26<00:00,  7.27it/s, train_loss=453]
100%|██████████| 137/137 [00:11<00:00, 11.51it/s]

speechbrain.utils.train_logger - epoch: 7, lr: 8.86e-04, steps: 1330, optimizer: Adam - train loss: 4.53e+02 - valid loss: 2.33e+02, valid CER: 71.83, valid WER: 94.87
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-49-34+00





speechbrain.utils.epoch_loop - Going into epoch 8


100%|██████████| 190/190 [00:26<00:00,  7.16it/s, train_loss=411]
100%|██████████| 137/137 [00:12<00:00, 11.00it/s]

speechbrain.utils.train_logger - epoch: 8, lr: 9.94e-04, steps: 1520, optimizer: Adam - train loss: 4.11e+02 - valid loss: 2.14e+02, valid CER: 65.05, valid WER: 92.21
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-50-14+00





speechbrain.utils.epoch_loop - Going into epoch 9


100%|██████████| 190/190 [00:26<00:00,  7.26it/s, train_loss=380]
100%|██████████| 137/137 [00:13<00:00, 10.14it/s]

speechbrain.utils.train_logger - epoch: 9, lr: 9.37e-04, steps: 1710, optimizer: Adam - train loss: 3.80e+02 - valid loss: 2.01e+02, valid CER: 57.38, valid WER: 89.73
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-50-54+00





speechbrain.utils.epoch_loop - Going into epoch 10


100%|██████████| 190/190 [00:26<00:00,  7.24it/s, train_loss=356]
100%|██████████| 137/137 [00:13<00:00, 10.11it/s]

speechbrain.utils.train_logger - epoch: 10, lr: 8.89e-04, steps: 1900, optimizer: Adam - train loss: 3.56e+02 - valid loss: 1.92e+02, valid CER: 55.86, valid WER: 88.31
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-51-34+00





speechbrain.utils.epoch_loop - Going into epoch 11


100%|██████████| 190/190 [00:26<00:00,  7.10it/s, train_loss=338]
100%|██████████| 137/137 [00:13<00:00, 10.30it/s]

speechbrain.utils.train_logger - epoch: 11, lr: 8.47e-04, steps: 2090, optimizer: Adam - train loss: 3.38e+02 - valid loss: 1.86e+02, valid CER: 54.28, valid WER: 87.16
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-52-15+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-45-58+00
speechbrain.utils.epoch_loop - Going into epoch 12


100%|██████████| 190/190 [00:25<00:00,  7.31it/s, train_loss=324]
100%|██████████| 137/137 [00:13<00:00,  9.91it/s]

speechbrain.utils.train_logger - epoch: 12, lr: 8.11e-04, steps: 2280, optimizer: Adam - train loss: 3.24e+02 - valid loss: 1.80e+02, valid CER: 51.79, valid WER: 86.18
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-52-55+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-46-33+00
speechbrain.utils.epoch_loop - Going into epoch 13


100%|██████████| 190/190 [00:26<00:00,  7.04it/s, train_loss=311]
100%|██████████| 137/137 [00:13<00:00,  9.88it/s]

speechbrain.utils.train_logger - epoch: 13, lr: 7.79e-04, steps: 2470, optimizer: Adam - train loss: 3.11e+02 - valid loss: 1.75e+02, valid CER: 50.74, valid WER: 84.97
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-53-37+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-47-09+00
speechbrain.utils.epoch_loop - Going into epoch 14


100%|██████████| 190/190 [00:26<00:00,  7.22it/s, train_loss=300]
100%|██████████| 137/137 [00:14<00:00,  9.72it/s]

speechbrain.utils.train_logger - epoch: 14, lr: 7.51e-04, steps: 2660, optimizer: Adam - train loss: 3.00e+02 - valid loss: 1.71e+02, valid CER: 48.32, valid WER: 84.25
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-54-18+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-47-44+00
speechbrain.utils.epoch_loop - Going into epoch 15


100%|██████████| 190/190 [00:26<00:00,  7.18it/s, train_loss=291]
100%|██████████| 137/137 [00:14<00:00,  9.35it/s]

speechbrain.utils.train_logger - epoch: 15, lr: 7.26e-04, steps: 2850, optimizer: Adam - train loss: 2.91e+02 - valid loss: 1.68e+02, valid CER: 46.72, valid WER: 83.30
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-55-00+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-48-19+00
speechbrain.utils.epoch_loop - Going into epoch 16


100%|██████████| 190/190 [00:26<00:00,  7.21it/s, train_loss=282]
100%|██████████| 137/137 [00:14<00:00,  9.37it/s]

speechbrain.utils.train_logger - epoch: 16, lr: 7.03e-04, steps: 3040, optimizer: Adam - train loss: 2.82e+02 - valid loss: 1.65e+02, valid CER: 45.29, valid WER: 82.53
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-55-41+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-48-56+00
speechbrain.utils.epoch_loop - Going into epoch 17


100%|██████████| 190/190 [00:27<00:00,  7.01it/s, train_loss=274]
100%|██████████| 137/137 [00:14<00:00,  9.39it/s]

speechbrain.utils.train_logger - epoch: 17, lr: 6.82e-04, steps: 3230, optimizer: Adam - train loss: 2.74e+02 - valid loss: 1.63e+02, valid CER: 44.18, valid WER: 81.79
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-56-24+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-49-34+00
speechbrain.utils.epoch_loop - Going into epoch 18


100%|██████████| 190/190 [00:26<00:00,  7.21it/s, train_loss=268]
100%|██████████| 137/137 [00:14<00:00,  9.25it/s]

speechbrain.utils.train_logger - epoch: 18, lr: 6.62e-04, steps: 3420, optimizer: Adam - train loss: 2.68e+02 - valid loss: 1.60e+02, valid CER: 43.50, valid WER: 81.46
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-57-06+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-50-14+00
speechbrain.utils.epoch_loop - Going into epoch 19


100%|██████████| 190/190 [00:26<00:00,  7.07it/s, train_loss=261]
100%|██████████| 137/137 [00:14<00:00,  9.20it/s]

speechbrain.utils.train_logger - epoch: 19, lr: 6.45e-04, steps: 3610, optimizer: Adam - train loss: 2.61e+02 - valid loss: 1.58e+02, valid CER: 42.52, valid WER: 80.84
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-57-48+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-50-54+00
speechbrain.utils.epoch_loop - Going into epoch 20


100%|██████████| 190/190 [00:26<00:00,  7.17it/s, train_loss=255]
100%|██████████| 137/137 [00:15<00:00,  8.78it/s]

speechbrain.utils.train_logger - epoch: 20, lr: 6.28e-04, steps: 3800, optimizer: Adam - train loss: 2.55e+02 - valid loss: 1.56e+02, valid CER: 41.43, valid WER: 80.52





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-58-31+00
speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-51-34+00
speechbrain.utils.epoch_loop - Going into epoch 21


100%|██████████| 190/190 [00:26<00:00,  7.11it/s, train_loss=250]
100%|██████████| 137/137 [00:15<00:00,  8.97it/s]

speechbrain.utils.train_logger - epoch: 21, lr: 6.13e-04, steps: 3990, optimizer: Adam - train loss: 2.50e+02 - valid loss: 1.55e+02, valid CER: 40.80, valid WER: 80.10
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-59-14+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-52-15+00
speechbrain.utils.epoch_loop - Going into epoch 22


100%|██████████| 190/190 [00:26<00:00,  7.09it/s, train_loss=245]
100%|██████████| 137/137 [00:16<00:00,  8.55it/s]

speechbrain.utils.train_logger - epoch: 22, lr: 5.99e-04, steps: 4180, optimizer: Adam - train loss: 2.45e+02 - valid loss: 1.54e+02, valid CER: 40.32, valid WER: 80.26
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-59-58+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-52-55+00
speechbrain.utils.epoch_loop - Going into epoch 23


100%|██████████| 190/190 [00:26<00:00,  7.07it/s, train_loss=240]
100%|██████████| 137/137 [00:15<00:00,  8.63it/s]

speechbrain.utils.train_logger - epoch: 23, lr: 5.86e-04, steps: 4370, optimizer: Adam - train loss: 2.40e+02 - valid loss: 1.52e+02, valid CER: 39.75, valid WER: 79.70





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+06-00-41+00
speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-53-37+00
speechbrain.utils.epoch_loop - Going into epoch 24


100%|██████████| 190/190 [00:26<00:00,  7.06it/s, train_loss=235]
100%|██████████| 137/137 [00:15<00:00,  8.78it/s]

speechbrain.utils.train_logger - epoch: 24, lr: 5.74e-04, steps: 4560, optimizer: Adam - train loss: 2.35e+02 - valid loss: 1.51e+02, valid CER: 39.37, valid WER: 79.21
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+06-01-25+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-54-18+00
speechbrain.utils.epoch_loop - Going into epoch 25


100%|██████████| 190/190 [00:27<00:00,  6.93it/s, train_loss=231]
100%|██████████| 137/137 [00:15<00:00,  8.59it/s]

speechbrain.utils.train_logger - epoch: 25, lr: 5.62e-04, steps: 4750, optimizer: Adam - train loss: 2.31e+02 - valid loss: 1.50e+02, valid CER: 38.90, valid WER: 78.94
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+06-02-09+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-55-00+00
speechbrain.utils.epoch_loop - Going into epoch 26


100%|██████████| 190/190 [00:27<00:00,  6.99it/s, train_loss=228]
100%|██████████| 137/137 [00:16<00:00,  8.25it/s]

speechbrain.utils.train_logger - epoch: 26, lr: 5.51e-04, steps: 4940, optimizer: Adam - train loss: 2.28e+02 - valid loss: 1.49e+02, valid CER: 38.66, valid WER: 78.81
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+06-02-54+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-55-41+00
speechbrain.utils.epoch_loop - Going into epoch 27


100%|██████████| 190/190 [00:26<00:00,  7.15it/s, train_loss=224]
100%|██████████| 137/137 [00:15<00:00,  8.93it/s]

speechbrain.utils.train_logger - epoch: 27, lr: 5.41e-04, steps: 5130, optimizer: Adam - train loss: 2.24e+02 - valid loss: 1.49e+02, valid CER: 38.38, valid WER: 78.60
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+06-03-36+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-56-24+00
speechbrain.utils.epoch_loop - Going into epoch 28


100%|██████████| 190/190 [00:27<00:00,  7.01it/s, train_loss=220]
100%|██████████| 137/137 [00:15<00:00,  8.76it/s]

speechbrain.utils.train_logger - epoch: 28, lr: 5.31e-04, steps: 5320, optimizer: Adam - train loss: 2.20e+02 - valid loss: 1.48e+02, valid CER: 37.96, valid WER: 77.74
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+06-04-20+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-57-06+00
speechbrain.utils.epoch_loop - Going into epoch 29


100%|██████████| 190/190 [00:27<00:00,  7.03it/s, train_loss=217]
100%|██████████| 137/137 [00:16<00:00,  8.31it/s]

speechbrain.utils.train_logger - epoch: 29, lr: 5.22e-04, steps: 5510, optimizer: Adam - train loss: 2.17e+02 - valid loss: 1.48e+02, valid CER: 38.00, valid WER: 77.89
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+06-05-05+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-57-48+00
speechbrain.utils.epoch_loop - Going into epoch 30


100%|██████████| 190/190 [00:26<00:00,  7.12it/s, train_loss=213]
100%|██████████| 137/137 [00:15<00:00,  8.85it/s]

speechbrain.utils.train_logger - epoch: 30, lr: 5.13e-04, steps: 5700, optimizer: Adam - train loss: 2.13e+02 - valid loss: 1.48e+02, valid CER: 37.63, valid WER: 77.50
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+06-05-48+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Part_2A/save/CKPT+2024-02-18+05-58-31+00
speechbrain.utils.checkpoints - Would load a checkpoint here, but none found yet.


100%|██████████| 328/328 [00:44<00:00,  7.31it/s]

speechbrain.utils.train_logger - Epoch loaded: 30 - test loss: 83.68, test CER: 38.40, test WER: 78.32





83.6797285370711

# Task 2.2: The PowerConv Module
In this section, we will update the Conformer encoder by replacing Convolution with **PowerConv**. Rest of the architecture remains the same. Note this will be added on top of inter-CTC.

In [14]:
task_hyperparameters = """

# Setup the directory to host experiment results
output_folder: !ref results/transformer/Task_2B
wer_file: !ref <output_folder>/wer.txt
save_folder: !ref <output_folder>/save
train_log: !ref <output_folder>/train_log.txt

interctc_weight: 0.3
intermediate_layers: '2,4'

train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
    save_file: !ref <train_log>

checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
    checkpoints_dir: !ref <save_folder>
    recoverables:
        model: !ref <model>
        noam_scheduler: !ref <noam_annealing>
        normalizer: !ref <normalize>
        counter: !ref <epoch_counter>
"""

In [15]:
import torch
import speechbrain as sb

from speechbrain.nnet.attention import (
    RelPosMHAXL,
    MultiheadAttention,
    PositionalwiseFeedForward,
)
from speechbrain.nnet.normalization import LayerNorm
from speechbrain.nnet.activations import Swish
from speechbrain.nnet.CNN import Conv1d
from speechbrain.nnet.linear import Linear
from speechbrain.nnet.dropout import Dropout2d

class PowerConv(torch.nn.Module):
    def __init__(
        self,
        input_size,
        kernel_size=31,
        dropout=0.0,
    ):
        super().__init__()

        # We upsample our input by a factor of 2 to
        input_size = input_size*2
        self.input_size = input_size

        n_channels = input_size // 2  # split input channels

        # TODO: First projection feedforward layer to upsample the input
        self.channel_proj1 = Linear(input_size=n_channels,n_neurons=2*n_channels) ### TODO: Fill in
        self.norm = LayerNorm(input_size=n_channels)### TODO: Layer normalization
        self.conv = Conv1d(in_channels=n_channels,out_channels=n_channels,kernel_size=kernel_size,groups=n_channels,padding="same") ### TODO: Initialize depthwise 1D Convolution
        ### TODO: Use the groups parameter in the Conv1D class and set it to n_channels
        ### TODO: Note that this conv operator does not change the feature dimensionality.
        ### TODO: Use the appropriate value for the padding parameter in the Conv1D class to keep the feature dimensionality unaltered.

        # TODO: Second projection feedforward layer
        self.channel_proj2 = Linear(input_size=n_channels,n_neurons=n_channels) ### TODO: Fill in
        self.dropout = Dropout2d(drop_rate=dropout)### TODO: Dropout layer

        # Initialize convolution with ones.
        torch.nn.init.ones_(self.conv.conv.bias)

    def forward(self, x):
        """
            Shape of input x: (B, T, D)
            Return output of shape: (B, T, D)
        """
        # TODO: Implement the PowerConv module as described in the assignment pdf
        X = self.norm(x)
        V = self.channel_proj1(X)
        V1,V2= V[:,:,:self.input_size//2],V[:,:,self.input_size//2:]
        Z = V1*self.conv(self.norm(V2))
        O = self.channel_proj2(self.dropout(Z))
        return O

class CustomConformerEncoderLayer(torch.nn.Module):
    def __init__(
        self,
        d_model,
        d_ffn,
        nhead,
        kernel_size=31,
        kdim=None,
        vdim=None,
        activation=Swish,
        bias=True,
        dropout=0.0,
        causal=False,
        attention_type="RelPosMHAXL",
    ):
        super().__init__()

        # Self attention block
        if attention_type == "regularMHA":
            self.mha_layer = MultiheadAttention(
                nhead=nhead,
                d_model=d_model,
                dropout=dropout,
                kdim=kdim,
                vdim=vdim,
            )
        elif attention_type == "RelPosMHAXL":
            # transformerXL style positional encoding
            self.mha_layer = RelPosMHAXL(
                num_heads=nhead,
                embed_dim=d_model,
                dropout=dropout,
                mask_pos_future=causal,
            )
        else:
            raise ValueError("Unknown attention type")

        # Create instance of our custom convolution block
        self.convolution_module = PowerConv(
            d_model, kernel_size, dropout
        )

        # Feed forward macaron block
        self.ffn_module1 = torch.nn.Sequential(
            torch.nn.LayerNorm(d_model),
            PositionalwiseFeedForward(
                d_ffn=d_ffn,
                input_size=d_model,
                dropout=dropout,
                activation=activation,
            ),
            torch.nn.Dropout(dropout),
        )

        # Feed forward block
        self.ffn_module2 = torch.nn.Sequential(
            torch.nn.LayerNorm(d_model),
            PositionalwiseFeedForward(
                d_ffn=d_ffn,
                input_size=d_model,
                dropout=dropout,
                activation=activation,
            ),
            torch.nn.Dropout(dropout),
        )

        self.norm1 = LayerNorm(d_model)
        self.norm2 = LayerNorm(d_model)
        self.drop = torch.nn.Dropout(dropout)

    def forward(
        self,
        x,
        src_mask = None,
        src_key_padding_mask = None,
        pos_embs = None,
    ):
        conv_mask = None
        if src_key_padding_mask is not None:
            conv_mask = src_key_padding_mask.unsqueeze(-1)

        # ffn module
        x = x + 0.5 * self.ffn_module1(x)

        # muti-head attention module
        skip = x
        x = self.norm1(x)
        x, self_attn = self.mha_layer(
            x,
            x,
            x,
            attn_mask=src_mask,
            key_padding_mask=src_key_padding_mask,
            pos_embs=pos_embs,
        )
        x = x + skip

        # convolution module
        x = x + self.convolution_module(x)

        # ffn module
        x = self.norm2(x + 0.5 * self.ffn_module2(x))

        return x, self_attn


class CustomConformerEncoder(torch.nn.Module):
    def __init__(
        self,
        num_layers,
        d_model,
        d_ffn,
        nhead,
        kernel_size=31,
        kdim=None,
        vdim=None,
        activation=Swish,
        bias=True,
        dropout=0.0,
        causal=False,
        attention_type="RelPosMHAXL",
    ):
        super().__init__()

        # Create layers using our custom encoder layer that utilizes PowerConv
        self.layers = torch.nn.ModuleList(
            [
                CustomConformerEncoderLayer(
                    d_ffn=d_ffn,
                    nhead=nhead,
                    d_model=d_model,
                    kdim=kdim,
                    vdim=vdim,
                    dropout=dropout,
                    activation=activation,
                    kernel_size=kernel_size,
                    bias=bias,
                    causal=causal,
                    attention_type=attention_type,
                )
                for i in range(num_layers)
            ]
        )
        self.norm = LayerNorm(d_model, eps=1e-6)
        self.attention_type = attention_type

    def forward(
        self,
        src,
        src_mask = None,
        src_key_padding_mask = None,
        pos_embs = None,
    ):

        if self.attention_type == "RelPosMHAXL":
            if pos_embs is None:
                raise ValueError(
                    "The chosen attention type for the Conformer is RelPosMHAXL. For this attention type, the positional embeddings are mandatory"
                )

        output = src
        attention_lst = []
        # Loop through the encoder layers
        for enc_layer in self.layers:
            output, attention = enc_layer(
                output,
                src_mask=src_mask,
                src_key_padding_mask=src_key_padding_mask,
                pos_embs=pos_embs,
            )
            attention_lst.append(attention)
        output = self.norm(output)

        return output, attention_lst

In [16]:
class ASR_2B(ASR_2A):
    def __init__(
        self, device="cpu", *args, **kwargs
    ):
        super().__init__(*args, **kwargs)

        # Remove the old hooks as they are not useful
        for hook in self.hooks:
            hook.remove()

        # Instantiate our custom encoder that uses PowerConv
        encoder = CustomConformerEncoder(
            nhead=self.hparams.nhead,
            num_layers=self.hparams.num_encoder_layers,
            d_ffn=self.hparams.d_ffn,
            d_model=self.hparams.d_model,
            dropout=self.hparams.transformer_dropout,
            activation=self.hparams.activation,
            attention_type=self.hparams.attention_type,
        ).to(device)

        # Replace the standard encoder with our encoder
        self.modules.Transformer.encoder = encoder

        # TODO: Copy this code from your implemention in Part II(A) within the __init__ function of ASR_2A that populates self.hooks
        def get_intermediate_output(module, input, output):
            self.inter_logits.append(output[0])

        self.hooks = [self.modules.Transformer.encoder.layers[i-1].register_forward_hook(get_intermediate_output) for i in self.intermediate_layers]

In [17]:
hyperparams = global_hyperparams + task_hyperparameters
hparams = load_hyperpyyaml(hyperparams)

# Create experiment directory
sb.create_experiment_directory(
    experiment_directory=hparams["output_folder"],
    overrides=None,
)

# We download the pretrained LM from HuggingFace (or elsewhere depending on
# the path given in the YAML file). The tokenizer is loaded at the same time.
run_on_main(hparams["pretrainer"].collect_files)
hparams["pretrainer"].load_collected(device=run_opts["device"])

# Trainer initialization
asr_brain = ASR_2B(
    modules=hparams["modules"],
    opt_class=hparams["Adam"],
    hparams=hparams,
    checkpointer=hparams["checkpointer"],
    run_opts=run_opts,
    tokenizer=tokenizer,
    device=device
)

# adding objects to trainer:
train_dataloader_opts = hparams["train_dataloader_opts"]
valid_dataloader_opts = hparams["valid_dataloader_opts"]

# Training
asr_brain.fit(
    asr_brain.hparams.epoch_counter,
    train_data,
    valid_data,
    train_loader_kwargs=train_dataloader_opts,
    valid_loader_kwargs=valid_dataloader_opts
)

# Testing

asr_brain.hparams.test_wer_file = asr_brain.hparams.wer_file
asr_brain.evaluate(
    test_data,
    max_key="ACC",
    test_loader_kwargs=hparams["test_dataloader_opts"],
)

speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/transformer/Task_2B
speechbrain.pretrained.fetching - Fetch tokenizer.ckpt: Using existing file/symlink in model_checkpoints/tokenizer.ckpt.
speechbrain.utils.parameter_transfer - Set local path in self.paths[tokenizer] = model_checkpoints/tokenizer.ckpt
speechbrain.utils.parameter_transfer - Loading pretrained files for: tokenizer
speechbrain.utils.parameter_transfer - Redirecting (loading from local path): model_checkpoints/tokenizer.ckpt -> model_checkpoints/tokenizer.ckpt
speechbrain.core - Info: max_grad_norm arg from hparam file is used
speechbrain.core - Info: ckpt_interval_minutes arg from hparam file is used
speechbrain.core - 698.9k trainable parameters in ASR_2B
speechbrain.utils.checkpoints - Would load a checkpoint here, but none found yet.
speechbrain.utils.epoch_loop - Going into epoch 1


100%|██████████| 190/190 [00:26<00:00,  7.26it/s, train_loss=752]
100%|██████████| 137/137 [00:08<00:00, 15.37it/s]

speechbrain.utils.train_logger - epoch: 1, lr: 1.26e-04, steps: 190, optimizer: Adam - train loss: 7.52e+02 - valid loss: 3.15e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-07-10+00
speechbrain.utils.epoch_loop - Going into epoch 2


100%|██████████| 190/190 [00:25<00:00,  7.46it/s, train_loss=564]
100%|██████████| 137/137 [00:08<00:00, 16.24it/s]

speechbrain.utils.train_logger - epoch: 2, lr: 2.53e-04, steps: 380, optimizer: Adam - train loss: 5.64e+02 - valid loss: 3.05e+02, valid CER: 1.00e+02, valid WER: 1.00e+02
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-07-44+00





speechbrain.utils.epoch_loop - Going into epoch 3


100%|██████████| 190/190 [00:25<00:00,  7.50it/s, train_loss=561]
100%|██████████| 137/137 [00:08<00:00, 15.89it/s]

speechbrain.utils.train_logger - epoch: 3, lr: 3.79e-04, steps: 570, optimizer: Adam - train loss: 5.61e+02 - valid loss: 3.04e+02, valid CER: 1.00e+02, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-08-18+00
speechbrain.utils.epoch_loop - Going into epoch 4


100%|██████████| 190/190 [00:25<00:00,  7.45it/s, train_loss=554]
100%|██████████| 137/137 [00:08<00:00, 15.38it/s]

speechbrain.utils.train_logger - epoch: 4, lr: 5.06e-04, steps: 760, optimizer: Adam - train loss: 5.54e+02 - valid loss: 2.98e+02, valid CER: 97.42, valid WER: 1.00e+02





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-08-53+00
speechbrain.utils.epoch_loop - Going into epoch 5


100%|██████████| 190/190 [00:25<00:00,  7.42it/s, train_loss=527]
100%|██████████| 137/137 [00:09<00:00, 14.85it/s]

speechbrain.utils.train_logger - epoch: 5, lr: 6.33e-04, steps: 950, optimizer: Adam - train loss: 5.27e+02 - valid loss: 2.74e+02, valid CER: 92.44, valid WER: 99.82





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-09-28+00
speechbrain.utils.epoch_loop - Going into epoch 6


100%|██████████| 190/190 [00:25<00:00,  7.55it/s, train_loss=476]
100%|██████████| 137/137 [00:11<00:00, 12.36it/s]

speechbrain.utils.train_logger - epoch: 6, lr: 7.59e-04, steps: 1140, optimizer: Adam - train loss: 4.76e+02 - valid loss: 2.44e+02, valid CER: 74.98, valid WER: 95.03
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-10-05+00





speechbrain.utils.epoch_loop - Going into epoch 7


100%|██████████| 190/190 [00:25<00:00,  7.47it/s, train_loss=427]
100%|██████████| 137/137 [00:12<00:00, 11.19it/s]

speechbrain.utils.train_logger - epoch: 7, lr: 8.86e-04, steps: 1330, optimizer: Adam - train loss: 4.27e+02 - valid loss: 2.21e+02, valid CER: 66.11, valid WER: 92.46
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-10-43+00





speechbrain.utils.epoch_loop - Going into epoch 8


100%|██████████| 190/190 [00:26<00:00,  7.27it/s, train_loss=390]
100%|██████████| 137/137 [00:13<00:00, 10.28it/s]

speechbrain.utils.train_logger - epoch: 8, lr: 9.94e-04, steps: 1520, optimizer: Adam - train loss: 3.90e+02 - valid loss: 2.05e+02, valid CER: 60.73, valid WER: 90.75
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-11-23+00





speechbrain.utils.epoch_loop - Going into epoch 9


100%|██████████| 190/190 [00:25<00:00,  7.48it/s, train_loss=360]
100%|██████████| 137/137 [00:13<00:00, 10.42it/s]

speechbrain.utils.train_logger - epoch: 9, lr: 9.37e-04, steps: 1710, optimizer: Adam - train loss: 3.60e+02 - valid loss: 1.94e+02, valid CER: 56.91, valid WER: 88.57
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-12-02+00





speechbrain.utils.epoch_loop - Going into epoch 10


100%|██████████| 190/190 [00:25<00:00,  7.49it/s, train_loss=337]
100%|██████████| 137/137 [00:13<00:00, 10.29it/s]

speechbrain.utils.train_logger - epoch: 10, lr: 8.89e-04, steps: 1900, optimizer: Adam - train loss: 3.37e+02 - valid loss: 1.86e+02, valid CER: 54.52, valid WER: 87.60
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-12-42+00





speechbrain.utils.epoch_loop - Going into epoch 11


100%|██████████| 190/190 [00:26<00:00,  7.19it/s, train_loss=318]
100%|██████████| 137/137 [00:13<00:00,  9.81it/s]

speechbrain.utils.train_logger - epoch: 11, lr: 8.47e-04, steps: 2090, optimizer: Adam - train loss: 3.18e+02 - valid loss: 1.77e+02, valid CER: 50.10, valid WER: 85.44
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-13-23+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-07-10+00
speechbrain.utils.epoch_loop - Going into epoch 12


100%|██████████| 190/190 [00:25<00:00,  7.41it/s, train_loss=304]
100%|██████████| 137/137 [00:14<00:00,  9.77it/s]

speechbrain.utils.train_logger - epoch: 12, lr: 8.11e-04, steps: 2280, optimizer: Adam - train loss: 3.04e+02 - valid loss: 1.72e+02, valid CER: 48.05, valid WER: 84.19
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-14-04+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-07-44+00
speechbrain.utils.epoch_loop - Going into epoch 13


100%|██████████| 190/190 [00:25<00:00,  7.37it/s, train_loss=291]
100%|██████████| 137/137 [00:14<00:00,  9.34it/s]

speechbrain.utils.train_logger - epoch: 13, lr: 7.79e-04, steps: 2470, optimizer: Adam - train loss: 2.91e+02 - valid loss: 1.68e+02, valid CER: 45.89, valid WER: 83.20
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-14-46+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-08-18+00
speechbrain.utils.epoch_loop - Going into epoch 14


100%|██████████| 190/190 [00:25<00:00,  7.44it/s, train_loss=281]
100%|██████████| 137/137 [00:14<00:00,  9.57it/s]

speechbrain.utils.train_logger - epoch: 14, lr: 7.51e-04, steps: 2660, optimizer: Adam - train loss: 2.81e+02 - valid loss: 1.64e+02, valid CER: 44.53, valid WER: 82.56
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-15-27+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-08-53+00
speechbrain.utils.epoch_loop - Going into epoch 15


100%|██████████| 190/190 [00:25<00:00,  7.42it/s, train_loss=272]
100%|██████████| 137/137 [00:15<00:00,  9.08it/s]

speechbrain.utils.train_logger - epoch: 15, lr: 7.26e-04, steps: 2850, optimizer: Adam - train loss: 2.72e+02 - valid loss: 1.61e+02, valid CER: 43.43, valid WER: 82.21
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-16-08+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-09-28+00
speechbrain.utils.epoch_loop - Going into epoch 16


100%|██████████| 190/190 [00:25<00:00,  7.40it/s, train_loss=262]
100%|██████████| 137/137 [00:14<00:00,  9.43it/s]

speechbrain.utils.train_logger - epoch: 16, lr: 7.03e-04, steps: 3040, optimizer: Adam - train loss: 2.62e+02 - valid loss: 1.58e+02, valid CER: 42.40, valid WER: 80.67
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-16-50+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-10-05+00
speechbrain.utils.epoch_loop - Going into epoch 17


100%|██████████| 190/190 [00:26<00:00,  7.29it/s, train_loss=256]
100%|██████████| 137/137 [00:15<00:00,  9.06it/s]

speechbrain.utils.train_logger - epoch: 17, lr: 6.82e-04, steps: 3230, optimizer: Adam - train loss: 2.56e+02 - valid loss: 1.56e+02, valid CER: 41.48, valid WER: 80.08
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-17-32+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-10-43+00
speechbrain.utils.epoch_loop - Going into epoch 18


100%|██████████| 190/190 [00:25<00:00,  7.41it/s, train_loss=249]
100%|██████████| 137/137 [00:14<00:00,  9.27it/s]

speechbrain.utils.train_logger - epoch: 18, lr: 6.62e-04, steps: 3420, optimizer: Adam - train loss: 2.49e+02 - valid loss: 1.54e+02, valid CER: 40.67, valid WER: 79.83
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-18-14+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-11-23+00
speechbrain.utils.epoch_loop - Going into epoch 19


100%|██████████| 190/190 [00:26<00:00,  7.30it/s, train_loss=242]
100%|██████████| 137/137 [00:14<00:00,  9.29it/s]

speechbrain.utils.train_logger - epoch: 19, lr: 6.45e-04, steps: 3610, optimizer: Adam - train loss: 2.42e+02 - valid loss: 1.53e+02, valid CER: 40.48, valid WER: 78.75
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-18-55+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-12-02+00
speechbrain.utils.epoch_loop - Going into epoch 20


100%|██████████| 190/190 [00:25<00:00,  7.49it/s, train_loss=237]
100%|██████████| 137/137 [00:14<00:00,  9.19it/s]

speechbrain.utils.train_logger - epoch: 20, lr: 6.28e-04, steps: 3800, optimizer: Adam - train loss: 2.37e+02 - valid loss: 1.51e+02, valid CER: 39.42, valid WER: 78.38
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-19-37+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-12-42+00
speechbrain.utils.epoch_loop - Going into epoch 21


100%|██████████| 190/190 [00:25<00:00,  7.34it/s, train_loss=231]
100%|██████████| 137/137 [00:14<00:00,  9.33it/s]

speechbrain.utils.train_logger - epoch: 21, lr: 6.13e-04, steps: 3990, optimizer: Adam - train loss: 2.31e+02 - valid loss: 1.51e+02, valid CER: 39.12, valid WER: 77.61
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-20-19+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-13-23+00
speechbrain.utils.epoch_loop - Going into epoch 22


100%|██████████| 190/190 [00:25<00:00,  7.45it/s, train_loss=226]
100%|██████████| 137/137 [00:14<00:00,  9.21it/s]

speechbrain.utils.train_logger - epoch: 22, lr: 5.99e-04, steps: 4180, optimizer: Adam - train loss: 2.26e+02 - valid loss: 1.48e+02, valid CER: 38.47, valid WER: 77.27
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-21-00+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-14-04+00
speechbrain.utils.epoch_loop - Going into epoch 23


100%|██████████| 190/190 [00:26<00:00,  7.31it/s, train_loss=222]
100%|██████████| 137/137 [00:15<00:00,  9.04it/s]

speechbrain.utils.train_logger - epoch: 23, lr: 5.86e-04, steps: 4370, optimizer: Adam - train loss: 2.22e+02 - valid loss: 1.48e+02, valid CER: 38.05, valid WER: 77.12
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-21-43+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-14-46+00
speechbrain.utils.epoch_loop - Going into epoch 24


100%|██████████| 190/190 [00:25<00:00,  7.40it/s, train_loss=218]
100%|██████████| 137/137 [00:15<00:00,  8.68it/s]

speechbrain.utils.train_logger - epoch: 24, lr: 5.74e-04, steps: 4560, optimizer: Adam - train loss: 2.18e+02 - valid loss: 1.47e+02, valid CER: 37.68, valid WER: 76.76





speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-22-25+00
speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-15-27+00
speechbrain.utils.epoch_loop - Going into epoch 25


100%|██████████| 190/190 [00:25<00:00,  7.41it/s, train_loss=214]
100%|██████████| 137/137 [00:15<00:00,  8.92it/s]

speechbrain.utils.train_logger - epoch: 25, lr: 5.62e-04, steps: 4750, optimizer: Adam - train loss: 2.14e+02 - valid loss: 1.46e+02, valid CER: 37.31, valid WER: 76.19
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-23-08+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-16-08+00
speechbrain.utils.epoch_loop - Going into epoch 26


100%|██████████| 190/190 [00:25<00:00,  7.39it/s, train_loss=211]
100%|██████████| 137/137 [00:15<00:00,  8.65it/s]

speechbrain.utils.train_logger - epoch: 26, lr: 5.51e-04, steps: 4940, optimizer: Adam - train loss: 2.11e+02 - valid loss: 1.46e+02, valid CER: 37.00, valid WER: 76.02
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-23-51+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-16-50+00
speechbrain.utils.epoch_loop - Going into epoch 27


100%|██████████| 190/190 [00:25<00:00,  7.41it/s, train_loss=207]
100%|██████████| 137/137 [00:15<00:00,  9.03it/s]

speechbrain.utils.train_logger - epoch: 27, lr: 5.41e-04, steps: 5130, optimizer: Adam - train loss: 2.07e+02 - valid loss: 1.46e+02, valid CER: 36.85, valid WER: 75.75
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-24-33+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-17-32+00
speechbrain.utils.epoch_loop - Going into epoch 28


100%|██████████| 190/190 [00:26<00:00,  7.27it/s, train_loss=204]
100%|██████████| 137/137 [00:15<00:00,  8.95it/s]

speechbrain.utils.train_logger - epoch: 28, lr: 5.31e-04, steps: 5320, optimizer: Adam - train loss: 2.04e+02 - valid loss: 1.46e+02, valid CER: 36.62, valid WER: 75.94
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-25-16+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-18-14+00
speechbrain.utils.epoch_loop - Going into epoch 29


100%|██████████| 190/190 [00:25<00:00,  7.39it/s, train_loss=201]
100%|██████████| 137/137 [00:15<00:00,  8.97it/s]

speechbrain.utils.train_logger - epoch: 29, lr: 5.22e-04, steps: 5510, optimizer: Adam - train loss: 2.01e+02 - valid loss: 1.45e+02, valid CER: 36.24, valid WER: 75.84
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-25-58+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-18-55+00
speechbrain.utils.epoch_loop - Going into epoch 30


100%|██████████| 190/190 [00:25<00:00,  7.50it/s, train_loss=198]
100%|██████████| 137/137 [00:15<00:00,  8.94it/s]

speechbrain.utils.train_logger - epoch: 30, lr: 5.13e-04, steps: 5700, optimizer: Adam - train loss: 1.98e+02 - valid loss: 1.45e+02, valid CER: 35.93, valid WER: 75.18
speechbrain.utils.checkpoints - Saved an end-of-epoch checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-26-40+00





speechbrain.utils.checkpoints - Deleted checkpoint in results/transformer/Task_2B/save/CKPT+2024-02-18+06-19-37+00
speechbrain.utils.checkpoints - Would load a checkpoint here, but none found yet.


100%|██████████| 328/328 [00:44<00:00,  7.39it/s]

speechbrain.utils.train_logger - Epoch loaded: 30 - test loss: 81.52, test CER: 36.53, test WER: 75.54





81.5217464842446