# **Use Model**

**Master Thesis: Evaluate and Use Different Model Versions**

**Author**: Karin Thommen

**Date**: April/May/June/July 2023

---

**Content of the Notebook**:  Testing and Usage of OpenAi Whisper ASR Model and XLSR-Model

---

## Setup and Import

In [None]:
%%capture
!pip install datasets
!pip install transformers==4.28.0
!pip install librosa
!pip install evaluate>=0.30
!pip install jiwer
!pip install gradio
!pip install audio-metadata
!pip install "dill<0.3.5"
!pip install git-lfs

In [None]:
import pandas as pd
import os
import transformers
import re

from datasets.fingerprint import Hasher
import pickle
import dill

from datasets import ClassLabel
import random
import pandas as pd
from IPython.display import display, HTML
import re
import json

import IPython.display as ipd
import numpy as np
import random

import torch
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Union

import audio_metadata

from datasets import load_dataset, Audio, load_metric, load_from_disk, DatasetDict, list_datasets
from datasets import Dataset, Sequence

from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Union

from transformers import WhisperTokenizer
from transformers import WhisperTokenizerFast
from transformers import WhisperProcessor
from transformers import WhisperFeatureExtractor
from huggingface_hub import notebook_login

from transformers import Wav2Vec2ForCTC

from transformers import Wav2Vec2FeatureExtractor
from transformers import Wav2Vec2ForCTC
from transformers import Wav2Vec2CTCTokenizer
from transformers import Wav2Vec2FeatureExtractor
from transformers import Wav2Vec2Processor

from google.colab import drive

import evaluate
#from jiwer import wer, cer

from transformers import pipeline
from transformers import WhisperForConditionalGeneration, WhisperProcessor, AutoModelForSpeechSeq2Seq, TFWhisperForConditionalGeneration, AutoModelForCTC

import gradio as gr

from transformers import WhisperForConditionalGeneration

## Login

In [None]:
# login to huggingface account for data
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Load SDS Test Data

In [None]:
# load dataset from huggingface (after uploading it via local machine to huggingface)
sds_test = load_dataset("karinthommen/sds200", split="test")

Downloading readme:   0%|          | 0.00/620 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/401M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/452M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/436M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/358M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/428M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/369M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/416M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/427M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/449M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/109M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/114M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/135271 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/3638 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/3636 [00:00<?, ? examples/s]

In [None]:
# downsample dataset to a sampling rate of 16kHz for the model
sds_test = sds_test.cast_column("audio", Audio(sampling_rate=16000))

In [None]:
import re
chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\“\%\‘\”\�]'

def remove_special_characters(batch):
    batch["transcription"] = re.sub(chars_to_ignore_regex, '', batch["transcription"]).lower()
    return batch

sds_test_prep = sds_test.map(remove_special_characters)

Map:   0%|          | 0/3636 [00:00<?, ? examples/s]

In [None]:
def preparation(batch):
  batch["transcription"] = batch["transcription"].lower()
  batch["speech"] = batch["audio"]["array"]
  return batch

sds_test_prep = sds_test_prep.map(preparation)

Map:   0%|          | 0/3636 [00:00<?, ? examples/s]

In [None]:
sds_test

Dataset({
    features: ['audio', 'transcription', 'canton', 'duration'],
    num_rows: 3636
})

In [None]:
sds_test = sds_test.remove_columns(["canton", "duration"])
sds_test_prep = sds_test_prep.remove_columns(["canton", "duration"])

### make vocab file for tokenizer for zero shot XLSR

In [None]:
# https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_Tune_XLSR_Wav2Vec2_on_Turkish_ASR_with_%F0%9F%A4%97_Transformers.ipynb#scrollTo=LwCshNbbeRZR

def extract_all_chars(batch):
  all_text = " ".join(batch["transcription"])
  vocab = list(set(all_text))
  return {"vocab": [vocab], "all_text": [all_text]}
vocab = sds_test.map(extract_all_chars, batched=True, batch_size=-1, keep_in_memory=True, remove_columns=sds_test.column_names)
vocab_list = list(set(vocab["vocab"][0]))
vocab_dict = {v: k for k, v in enumerate(vocab_list)}
vocab_dict["|"] = vocab_dict[" "]
del vocab_dict[" "]
vocab_dict["[UNK]"] = len(vocab_dict)
vocab_dict["[PAD]"] = len(vocab_dict)
import json
with open('vocab.json', 'w') as vocab_file:
    json.dump(vocab_dict, vocab_file)

Map:   0%|          | 0/3636 [00:00<?, ? examples/s]

## Load Schawinski

There exist two version of the schawinski test data.

**Version 1**

In [None]:
# load schawinski dataset from huggingface (after uploading it via local machine to huggingface) (Version 1)
schawinski_1 = load_dataset("karinthommen/schawinski", split="test")

Downloading readme:   0%|          | 0.00/580 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/427M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/458M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/432M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/361M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/437M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/3009 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/753 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/941 [00:00<?, ? examples/s]

**Version 2**

In [None]:
# load schawinski dataset from huggingface (after uploading it via local machine to huggingface) (Version 2)  ()
schawinski_2 = load_dataset("karinthommen/schawinski_V2", split="test")

Downloading readme:   0%|          | 0.00/585 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/363M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/403M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/419M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/399M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/330M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/366M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/3544 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/645 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/647 [00:00<?, ? examples/s]

In [None]:
schawinski_1 = schawinski_1.remove_columns(["duration"]) # remove duration column
schawinski_2 = schawinski_2.remove_columns(["duration"]) # remove duration column
schawinski_1 = schawinski_1.filter(lambda example: not example["transcription"].startswith("[speech-in-speech]"))
schawinski_2 = schawinski_2.filter(lambda example: not example["transcription"].startswith("[speech-in-speech]"))

Filter:   0%|          | 0/941 [00:00<?, ? examples/s]

Filter:   0%|          | 0/647 [00:00<?, ? examples/s]

In [None]:
def preprocess(batch):
  chars_to_remove_regex = '[\-\;\:\"\“\%\‘\”\�\'\$]' # keep punctuation
  batch["transcription"] = re.sub('\[noise\]', '', batch["transcription"])
  batch["transcription"] = re.sub('\[speech-in-noise\]', '', batch["transcription"])
  batch["transcription"] = re.sub('\[breath_mouth_noise\]', '', batch["transcription"])
  batch["transcription"] = re.sub('\[no_relevant_speech\]', '', batch["transcription"])
  batch["transcription"] = re.sub('\[no-relevant-speech\]', '', batch["transcription"])
  batch["transcription"] = re.sub('\[laughter\]', '', batch["transcription"])
  batch["transcription"] = re.sub('\[speech-in-speech\]', '', batch["transcription"])
  batch["transcription"] = re.sub(r"\\", '', batch["transcription"])
  batch["transcription"] = re.sub(r"/", '', batch["transcription"])
  batch["transcription"] = re.sub('\*', '', batch["transcription"])
  batch["transcription"] = re.sub(chars_to_remove_regex, '', batch["transcription"]).lower()
  batch["transcription"] = batch["transcription"].strip()
  return batch

In [None]:
schawinski_1_prep = schawinski_1.map(preprocess, num_proc=1)
schawinski_2_prep = schawinski_2.map(preprocess, num_proc=1)

Map:   0%|          | 0/502 [00:00<?, ? examples/s]

Map:   0%|          | 0/393 [00:00<?, ? examples/s]

In [None]:
# downsample dataset to a sampling rate of 16kHz for the model
schawinski_1 = schawinski_1.cast_column("audio", Audio(sampling_rate=16000))
schawinski_2 = schawinski_2.cast_column("audio", Audio(sampling_rate=16000))
schawinski_1_prep = schawinski_1_prep.cast_column("audio", Audio(sampling_rate=16000))
schawinski_2_prep = schawinski_2_prep.cast_column("audio", Audio(sampling_rate=16000))

In [None]:
# check shape
print(schawinski_1)
print(schawinski_2)
print(schawinski_1_prep)
print(schawinski_2_prep)

Dataset({
    features: ['audio', 'transcription'],
    num_rows: 502
})
Dataset({
    features: ['audio', 'transcription'],
    num_rows: 393
})
Dataset({
    features: ['audio', 'transcription'],
    num_rows: 502
})
Dataset({
    features: ['audio', 'transcription'],
    num_rows: 393
})


In [None]:
# check if there are no empty references
schawinski_1 = schawinski_1.filter(lambda example: len(example["transcription"])!=0)
schawinski_2 = schawinski_2.filter(lambda example: len(example["transcription"])!=0)
schawinski_1_prep = schawinski_1_prep.filter(lambda example: len(example["transcription"])!=0)
schawinski_2_prep = schawinski_2_prep.filter(lambda example: len(example["transcription"])!=0)
schawinski_1 = schawinski_1.filter(lambda example: example["transcription"]!=" ")
schawinski_2 = schawinski_2.filter(lambda example: example["transcription"]!=" ")
schawinski_1_prep = schawinski_1_prep.filter(lambda example: example["transcription"]!=" ")
schawinski_2_prep = schawinski_2_prep.filter(lambda example: example["transcription"]!=" ")

Filter:   0%|          | 0/502 [00:00<?, ? examples/s]

Filter:   0%|          | 0/393 [00:00<?, ? examples/s]

Filter:   0%|          | 0/502 [00:00<?, ? examples/s]

Filter:   0%|          | 0/393 [00:00<?, ? examples/s]

Filter:   0%|          | 0/502 [00:00<?, ? examples/s]

Filter:   0%|          | 0/393 [00:00<?, ? examples/s]

Filter:   0%|          | 0/501 [00:00<?, ? examples/s]

Filter:   0%|          | 0/385 [00:00<?, ? examples/s]

In [None]:
print(schawinski_2["transcription"])

['[noise]', 'die partei mues erkänt ha das es jezt würklich viertelvor zwölfi isch [breath_mouth_noise]', 'mir chönd ois äfacht nüme erlaube [breath_mouth_noise]', 'das mir sololoifer händ wo letschtlich dänn de partei schaded [breath_mouth_noise]', 'mir schaffed nuur wider trit überzchoo wämmer gmäinsaam kämpfed [breath_mouth_noise]', 'und entlich söttigi schpiili uufhöred', '[breath_mouth_noise] ich glaube das mir gar nöd so schlächt uufgschtelt sind wämmer', 'ja aso wämmer', 'das eso wänd aaluege cha mer säge das sit nünzähundertdrüüedachzg die partei', '[breath_mouth_noise] kä wääleraateil fürschi gmacht hät das schtimt', '[breath_mouth_noise] das wäär dänn e lengeri', 'das won ich presidäntin gsii bin im kanton züri [breath_mouth_noise]', 'das mir mit em schulterschluss vier bürgerlichi regierigsräät uf aahiib ine praacht händ', '[breath_mouth_noise] das mer uf aahiib de feeligs guzwiiler ine praacht händ aso [breath_mouth_noise]', 'ganz eso dramatisch isch das wider', 'also eerli

#### make vocab file for tokenizer for zero shot XLSR on spontaneous speech

In [None]:
vocab = schawinski_2_prep.map(extract_all_chars, batched=True, batch_size=-1, keep_in_memory=True, remove_columns=schawinski_2_prep.column_names)
vocab_list = list(set(vocab["vocab"][0]))
vocab_dict = {v: k for k, v in enumerate(vocab_list)}
vocab_dict["|"] = vocab_dict[" "]
del vocab_dict[" "]
vocab_dict["[UNK]"] = len(vocab_dict)
vocab_dict["[PAD]"] = len(vocab_dict)
import json
with open('vocab.json', 'w') as vocab_file:
    json.dump(vocab_dict, vocab_file)

Map:   0%|          | 0/385 [00:00<?, ? examples/s]

In [None]:
# check shape for all sets
print("SDS 200")
print(sds_test)
print(sds_test_prep)
print("*************")
print("Schawinski")
print(schawinski_1)
print(schawinski_2)
print(schawinski_1_prep)
print(schawinski_2_prep)


SDS 200
Dataset({
    features: ['audio', 'transcription'],
    num_rows: 3636
})
Dataset({
    features: ['audio', 'transcription', 'speech'],
    num_rows: 3636
})
*************
Schawinski
Dataset({
    features: ['audio', 'transcription'],
    num_rows: 502
})
Dataset({
    features: ['audio', 'transcription'],
    num_rows: 393
})
Dataset({
    features: ['audio', 'transcription'],
    num_rows: 501
})
Dataset({
    features: ['audio', 'transcription'],
    num_rows: 385
})


## Evaluation metrics

In [None]:
wer = evaluate.load("wer") # wer.compute(predictions=predictions, references=references)
cer = evaluate.load("cer") # cer.compute(predictions=predictions, references=references)
bleu = evaluate.load("bleu") # bleu.compute(predictions=predictions, references=references)

Downloading builder script:   0%|          | 0.00/4.49k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/5.60k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/5.94k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

# Methods for Evaluation

### XLSR Loading

In [None]:
# reference: https://github.com/huggingface/transformers/blob/main/examples/research_projects/wav2vec2/FINE_TUNE_XLSR_WAV2VEC2.md

def xlsr_load(repo_name):
  model = Wav2Vec2ForCTC.from_pretrained(repo_name).to("cuda")
  processor = Wav2Vec2Processor.from_pretrained(repo_name)
  return model, processor

def pred_xlsr(batch):
  inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)

  with torch.no_grad():
	  logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits

  pred_ids = torch.argmax(logits, dim=-1)
  batch["pred_strings"] = processor.batch_decode(pred_ids)
  return batch

### Whisper Loading

In [None]:
# Reference: https://huggingface.co/openai/whisper-medium.en

def whisper_load(repo_name):
  processor = WhisperProcessor.from_pretrained(repo_name, language="german")
  model = WhisperForConditionalGeneration.from_pretrained(repo_name).to("cuda")
  tokenizer = WhisperTokenizer.from_pretrained(repo_name, language="german", task="transcribe")
  return processor, model, tokenizer

def pred_whisper(batch):
    audio = batch["audio"]
    input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features.to("cuda")
    batch["transcription"] = processor.tokenizer._normalize(batch['transcription'])

    with torch.no_grad():
        predicted_ids = model.generate(input_features)[0]
    prediction = processor.decode(predicted_ids)
    batch["prediction"] = processor.tokenizer._normalize(prediction)

    return batch

def pred_whisper_disfluency(batch):
    audio = batch["audio"]
    input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features.to("cuda")
    batch["reference"] = processor.tokenizer._normalize(batch['transcription'])

    with torch.no_grad():
        predicted_ids = model.generate(input_features)[0]
    prediction = processor.decode(predicted_ids)
    batch["prediction"] = processor.tokenizer._normalize(prediction)

    return batch

### Calculations

In [None]:
def pred_vs_ref(prediction, transcription):
  wer_calc = []
  cer_calc = []
  bleu_calc = []
  t_list = []
  p_list = []

  for t, p in zip(transcription, prediction): # go through all pairs
    if len(t) > 0 and len(p) > 0:
      wer_calc.append(100 * wer.compute(predictions=[p], references=[t]))
      cer_calc.append(100 * cer.compute(predictions=[p], references=[t]))
      bleu_calc.append(bleu.compute(predictions=[p], references=[t]))
      t_list.append(t)
      p_list.append(p)

  evaluation = {"transcription": t_list, "prediction": p_list, "WER": wer_calc, "CER": cer_calc, "Bleu": bleu_calc}

  df_eval = pd.DataFrame(data=evaluation)
  return df_eval

# Comparable Models

## Small XLSR

### Prepared Speech on Prepared Speech

In [None]:
# load all parts
repo_name = "karinthommen/xlsr-prep-small-2"
print("============== \nXLSR small model2\n==============")
model, processor = xlsr_load(repo_name)

# get predictions
result = sds_test_prep.map(pred_xlsr)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = [item for sublist in result['pred_strings'] for item in sublist]

# calculate wer, cer and bleu
print("============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

XLSR small model2


Downloading (…)lve/main/config.json:   0%|          | 0.00/2.06k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.26G [00:00<?, ?B/s]

Downloading (…)rocessor_config.json:   0%|          | 0.00/256 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/373 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/490 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/96.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Map:   0%|          | 0/3636 [00:00<?, ? examples/s]

EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 78.9516157327813
CHARACTER ERROR RATE: 34.484329344843296
BLEU SCORE: 0.054319450792108176


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_xlsr = pred_vs_ref(predictions, transcriptions)
df_xlsr = df_xlsr.append(row_for_df, ignore_index=True)
df_xlsr.to_csv("test_xlsr_prep_small_on_prep.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Prepared Speech on Spontaneous Speech

In [None]:
# load all parts
repo_name = "karinthommen/xlsr-prep-small-2"
print("============== \nXLSR model small model prep on spontaneous speech \n==============")
model, processor = xlsr_load(repo_name)

# get predictions
schawinski_2_prep_2 = schawinski_2_prep.map(preparation)
result = schawinski_2_prep_2.map(pred_xlsr)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = [item for sublist in result['pred_strings'] for item in sublist]

# calculate wer, cer and bleu
print("============== \nEVALUATION ON Schawinski SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

XLSR model small model prep on spontaneous speech 


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Map:   0%|          | 0/385 [00:00<?, ? examples/s]

Map:   0%|          | 0/385 [00:00<?, ? examples/s]

EVALUATION ON Schawinski SPLIT
----------
WORD ERROR RATE: 93.26999208234362
CHARACTER ERROR RATE: 44.088364951211055
BLEU SCORE: 0.006806343326941553


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_xlsr_spont = pred_vs_ref(predictions, transcriptions)
df_xlsr_spont = df_xlsr_spont.append(row_for_df, ignore_index=True)
df_xlsr_spont.to_csv("test_xlsr_prep_small_on_spont.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Spontaneous Speech on Spontaneous Speech

In [None]:
# load all parts
repo_name = "karinthommen/spont-xlsr-V1-2"
print("============== \nSpontaneous XLSR model on spontaneous speech\n==============")
model, processor = xlsr_load(repo_name)

# get predictions
schawinski_2_prep_2 = schawinski_2_prep.map(preparation)
result = schawinski_2_prep_2.map(pred_xlsr)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = [item for sublist in result['pred_strings'] for item in sublist]

# calculate wer, cer and bleu
print("============== \nEVALUATION ON Schawinski SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous XLSR model on spontaneous speech


Downloading (…)lve/main/config.json:   0%|          | 0.00/2.06k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.26G [00:00<?, ?B/s]

Downloading (…)rocessor_config.json:   0%|          | 0.00/256 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/373 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/372 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/96.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Map:   0%|          | 0/385 [00:00<?, ? examples/s]

EVALUATION ON Schawinski SPLIT
----------
WORD ERROR RATE: 60.8867775138559
CHARACTER ERROR RATE: 16.741790083708953
BLEU SCORE: 0.17783670518045924


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_xlsr_spont = pred_vs_ref(predictions, transcriptions)
df_xlsr_spont = df_xlsr_spont.append(row_for_df, ignore_index=True)
df_xlsr_spont.to_csv("test_xlsr_spont_on_spont.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Spontaneous Speech on Prepared Speech

In [None]:
# load all parts
repo_name = "karinthommen/spont-xlsr-V1-2"
print("============== \nSpontaneous XLSR model on prepared speech\n==============")
model, processor = xlsr_load(repo_name)

# get predictions
result = sds_test_prep.map(pred_xlsr)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = [item for sublist in result['pred_strings'] for item in sublist]

# calculate wer, cer and bleu
print("============== \nEVALUATION ON SDS 200 SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous XLSR model on prepared speech


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Map:   0%|          | 0/3636 [00:00<?, ? examples/s]

EVALUATION ON SDS 200 SPLIT
----------
WORD ERROR RATE: 100.90874004420898
CHARACTER ERROR RATE: 52.135342021353416
BLEU SCORE: 0.0


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_xlsr_spont = pred_vs_ref(predictions, transcriptions)
df_xlsr_spont = df_xlsr_spont.append(row_for_df, ignore_index=True)
df_xlsr_spont.to_csv("test_xlsr_spont_on_prep.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


In [None]:
from google.colab import files
files.download('/content/test_xlsr_prep_small_on_prep.csv')
files.download('/content/test_xlsr_prep_small_on_spont.csv')
files.download('/content/test_xlsr_spont_on_spont.csv')
files.download('/content/test_xlsr_spont_on_prep.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Small Whisper

### Prepared Speech on Prepared Speech (german tokenizer)

In [None]:
# load all parts
repo_name = "karinthommen/whisper-V4-small"
print("============== \nWhisper Small on Prepared Speech\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = sds_test.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Whisper Small on Prepared Speech

EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 86.80229525299947
CHARACTER ERROR RATE: 62.36768044584821
BLEU SCORE: 0.05794046817076659


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
whisper_small_prep_german = pred_vs_ref(predictions, transcriptions)
whisper_small_prep_german = whisper_small_prep_german.append(row_for_df, ignore_index=True)
whisper_small_prep_german.to_csv("test_whisper_small_prep_on_prep_german.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


In [None]:
files.download('/content/test_whisper_small_prep_on_prep_german.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

and on spontaneous speech

In [None]:
# load all parts
repo_name = "karinthommen/whisper-V4-small"
print("============== \nWhisper Small Prepared on Spontaneous Speech\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2_prep.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON Schawinski Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Whisper Small Prepared on Spontaneous Speech


Map:   0%|          | 0/385 [00:00<?, ? examples/s]


EVALUATION ON Schawinski Test SPLIT
----------
WORD ERROR RATE: 103.03670451544758
CHARACTER ERROR RATE: 78.04310131285608
BLEU SCORE: 0.0


### Prepared Speech on Prepared Speech

In [None]:
# load all parts
repo_name = "karinthommen/whisper-V4-small-3"
print("============== \nWhisper Small on Prepared Speech\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = sds_test.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Whisper Small on Prepared Speech

EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 86.4962615197357
CHARACTER ERROR RATE: 63.79925728971807
BLEU SCORE: 0.036117384315157264


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
whisper_small_prep = pred_vs_ref(predictions, transcriptions)
whisper_small_prep = whisper_small_prep.append(row_for_df, ignore_index=True)
whisper_small_prep.to_csv("test_whisper_small_prep_on_prep.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


In [None]:
from google.colab import files

In [None]:
files.download('/content/test_whisper_small_prep_on_prep.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Prepared Speech on Spontaneous Speech

In [None]:
# load all parts
repo_name = "karinthommen/whisper-V4-small-3"
print("============== \nWhisper Small Prepared on Spontaneous Speech\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2_prep.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON Schawinski Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Whisper Small Prepared on Spontaneous Speech


Map:   0%|          | 0/385 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON Schawinski Test SPLIT
----------
WORD ERROR RATE: 98.78531819382097
CHARACTER ERROR RATE: 75.25885558583106
BLEU SCORE: 0.0


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
whisper_small_prep_spont = pred_vs_ref(predictions, transcriptions)
whisper_small_prep_spont = whisper_small_prep_spont.append(row_for_df, ignore_index=True)
whisper_small_prep_spont.to_csv("test_whisper_small_prep_on_spont-2.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


In [None]:
files.download('/content/test_whisper_small_prep_on_spont-2.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Spontaneous Speech on Spontaneous Speech

In [None]:
# load all parts
repo_name = "karinthommen/spontaneous-whisper-v2-4"
print("============== \nSpontaneous Whisper Comparable on Spontaneous Speech\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2_prep.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON Schawinski Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous Whisper Comparable on Spontaneous Speech


Downloading (…)rocessor_config.json:   0%|          | 0.00/339 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/3.53k [00:00<?, ?B/s]

Map:   0%|          | 0/385 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON Schawinski Test SPLIT
----------
WORD ERROR RATE: 67.01874834961711
CHARACTER ERROR RATE: 33.00966063908844
BLEU SCORE: 0.1561312706308019


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
spont_whisper = pred_vs_ref(predictions, transcriptions)
spont_whisper = spont_whisper.append(row_for_df, ignore_index=True)
spont_whisper.to_csv("test_spont_whisper_on_spont.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Spontaneous Speech on Prepared Speech

In [None]:
# load all parts
repo_name = "karinthommen/spontaneous-whisper-v2-4"
print("============== \nSpontaneous Whisper Comparable on Prepared Speech \n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = sds_test.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous Whisper Comparable on Prepared Speech 





EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 115.8059467918623
CHARACTER ERROR RATE: 64.50644181825247
BLEU SCORE: 0.0014684295906285782


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
spont_whisper_prep = pred_vs_ref(predictions, transcriptions)
spont_whisper_prep = spont_whisper_prep.append(row_for_df, ignore_index=True)
spont_whisper_prep.to_csv("test_spont_whisper_on_prep.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


In [None]:
files.download('/content/test_whisper_small_prep_on_prep.csv')
files.download('/content/test_whisper_small_prep_on_spont.csv')
files.download('/content/test_spont_whisper_on_spont.csv')
files.download('/content/test_spont_whisper_on_prep.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Default Whisper

### Default Whisper Prepared Speech LARGE on Prepared Speech

In [None]:
# load all parts
repo_name = "karinthommen/whisper-V2"
print("============== \nWhisper V2 on Prepared Speech\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = sds_test.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Whisper V2 on Prepared Speech


Downloading (…)rocessor_config.json:   0%|          | 0.00/339 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/3.51k [00:00<?, ?B/s]

Map:   0%|          | 0/3636 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 42.56651017214397
CHARACTER ERROR RATE: 24.1219865557227
BLEU SCORE: 0.4120665565505942


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_whisper_v2 = pred_vs_ref(predictions, transcriptions)
df_whisper_v2 = df_whisper_v2.append(row_for_df, ignore_index=True)
df_whisper_v2.to_csv("test_whisper_v2_prep.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Default Whisper Prepared Speech LARGE on Spontaneous Speech

In [None]:
# load all parts
repo_name = "karinthommen/whisper-V2"
print("============== \nWhisper V2 Prepared on Spontaneous Speech\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2_prep.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON Schawinski Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Whisper V2 Prepared on Spontaneous Speech


Map:   0%|          | 0/385 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON Schawinski Test SPLIT
----------
WORD ERROR RATE: 101.68999207816213
CHARACTER ERROR RATE: 63.55709685409958
BLEU SCORE: 0.015417111765548538


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_whisper_v2 = pred_vs_ref(predictions, transcriptions)
df_whisper_v2 = df_whisper_v2.append(row_for_df, ignore_index=True)
df_whisper_v2.to_csv("test_whisper_v2_prep_on_spont.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Default SMALL Whisper Prepared Speech on Prepared Speech

In [None]:
# load all parts
repo_name = "karinthommen/whisper-V2-default-small"
print("============== \nSmall Whisper V2 on Prepared Speech\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = sds_test.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Small Whisper V2 on Prepared Speech


Downloading (…)rocessor_config.json:   0%|          | 0.00/339 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Map:   0%|          | 0/3636 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 38.65762476091115
CHARACTER ERROR RATE: 21.907177867455633
BLEU SCORE: 0.44095632389442896


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_whisper_v2_small = pred_vs_ref(predictions, transcriptions)
df_whisper_v2_small = df_whisper_v2_small.append(row_for_df, ignore_index=True)
df_whisper_v2_small.to_csv("test_whisper_v2_small_prep_on_prep.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Default SMALL Whisper Prepared Speech on Spontaneous Speech

In [None]:
# load all parts
repo_name = "karinthommen/whisper-V2-default-small"
print("============== \nSMALL Whisper V2 Prepared on Spontaneous Speech\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2_prep.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON Schawinski Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

SMALL Whisper V2 Prepared on Spontaneous Speech


Map:   0%|          | 0/385 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON Schawinski Test SPLIT
----------
WORD ERROR RATE: 90.22973329812517
CHARACTER ERROR RATE: 53.638840723309386
BLEU SCORE: 0.02749006744460128


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_whisper_v2_small = pred_vs_ref(predictions, transcriptions)
df_whisper_v2_small = df_whisper_v2_small.append(row_for_df, ignore_index=True)
df_whisper_v2_small.to_csv("test_whisper_v2_small_prep_on_spont.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Default Whisper Spontaneous Speech on Spontaneous Speech

In [None]:
# load all parts
repo_name = "karinthommen/spont-whisper-default"
print("============== \nSpontaneous Whisper Default on Spontaneous Speech\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2_prep.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON Schawinski Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous Whisper Default on Spontaneous Speech


Downloading (…)rocessor_config.json:   0%|          | 0.00/339 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Map:   0%|          | 0/385 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON Schawinski Test SPLIT
----------
WORD ERROR RATE: 41.51043041985741
CHARACTER ERROR RATE: 12.073321773594254
BLEU SCORE: 0.349764411727205


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
spont_whisper = pred_vs_ref(predictions, transcriptions)
spont_whisper = spont_whisper.append(row_for_df, ignore_index=True)
spont_whisper.to_csv("test_spont_default_whisper_on_spont.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Default Whisper Spontaneous Speech on Prepared Speech

In [None]:
# load all parts
repo_name = "karinthommen/spont-whisper-default"
print("============== \nSpontaneous Whisper Default on Prepared Speech \n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = sds_test.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous Whisper Default on Prepared Speech 


Map:   0%|          | 0/3636 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 95.4094940010433
CHARACTER ERROR RATE: 44.64588040033083
BLEU SCORE: 0.010353961448151029


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
spont_whisper_prep = pred_vs_ref(predictions, transcriptions)
spont_whisper_prep = spont_whisper_prep.append(row_for_df, ignore_index=True)
spont_whisper_prep.to_csv("test_spont_default_whisper_on_prep.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


In [None]:
#files.download('/content/test_whisper_v2_prep.csv')
files.download('/content/test_whisper_v2_prep_on_spont.csv')

files.download('/content/test_whisper_v2_small_prep_on_prep.csv')
files.download('/content/test_whisper_v2_small_prep_on_spont.csv')

files.download('/content/test_spont_default_whisper_on_spont.csv')
files.download('/content/test_spont_default_whisper_on_prep.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Prepared Speech: XLS-R

### XLS-R Zero Shot

In [None]:
# load all parts
print("============== \nXLSR model Zero Shot\n==============")
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-xls-r-300m").to("cuda")
feature_extractor = Wav2Vec2FeatureExtractor(feature_size=1, sampling_rate=16000, padding_value=0.0, do_normalize=True, return_attention_mask=True)
tokenizer = Wav2Vec2CTCTokenizer("./vocab.json", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|")
processor = Wav2Vec2Processor(feature_extractor=feature_extractor, tokenizer=tokenizer)


# get predictions
result = sds_test_prep.map(pred_xlsr)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = [item for sublist in result['pred_strings'] for item in sublist]

# calculate wer, cer and bleu
print("============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

XLSR model Zero Shot


Some weights of the model checkpoint at facebook/wav2vec2-xls-r-300m were not used when initializing Wav2Vec2ForCTC: ['quantizer.codevectors', 'project_q.weight', 'project_hid.bias', 'quantizer.weight_proj.weight', 'project_q.bias', 'quantizer.weight_proj.bias', 'project_hid.weight']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-xls-r-300m and are newly initialized: ['lm_head.weight', 'lm_head.bias']
You should probably TRAIN this model on a down-stream task to be able to use it 

Map:   0%|          | 0/3636 [00:00<?, ? examples/s]

EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 100.18595838742502
CHARACTER ERROR RATE: 188.66224488662243
BLEU SCORE: 0.0


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_xlsr_zero = pred_vs_ref(predictions, transcriptions)
df_xlsr_zero = df_xlsr_zero.append(row_for_df, ignore_index=True)
df_xlsr_zero.to_csv("test_xlsr_zero_prep.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### XLS-R V2 Model

In [None]:
# load all parts
repo_name = "karinthommen/xlsr-V2"
print("============== \nXLSR model V2\n==============")
model, processor = xlsr_load(repo_name)

# get predictions
result = sds_test_prep.map(pred_xlsr)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = [item for sublist in result['pred_strings'] for item in sublist]

# calculate wer, cer and bleu
print("============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

XLSR model V2


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Map:   0%|          | 0/3636 [00:00<?, ? examples/s]

EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 37.18466018736184
CHARACTER ERROR RATE: 16.947749669477496
BLEU SCORE: 0.4055254110985991


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_xlsr_v2 = pred_vs_ref(predictions, transcriptions)
df_xlsr_v2 = df_xlsr_v2.append(row_for_df, ignore_index=True)
df_xlsr_v2.to_csv("test_xlsr_v2_prep.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### XLS-R V3.2 Model

In [None]:
# load all parts
repo_name = "karinthommen/xlsr-V3-2"
print("============== \nXLSR model V3.2\n==============")
model, processor = xlsr_load(repo_name)

# get predictions
result = sds_test_prep.map(pred_xlsr)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = [item for sublist in result['pred_strings'] for item in sublist]

# calculate wer, cer and bleu
print("============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

XLSR model V3.2


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Map:   0%|          | 0/3636 [00:00<?, ? examples/s]

EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 36.030314725799094
CHARACTER ERROR RATE: 16.462242664622426
BLEU SCORE: 0.42221414349359454


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_xlsr_v3_2 = pred_vs_ref(predictions, transcriptions)
df_xlsr_v3_2 = df_xlsr_v3_2.append(row_for_df, ignore_index=True)
df_xlsr_v3_2.to_csv("test_xlsr_v3_2_prep.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### XLSR V3.2 on Spontaneous Speech

In [None]:
# load all parts
repo_name = "karinthommen/xlsr-V3-2"
print("============== \nXLSR model V3.2\n==============")
model, processor = xlsr_load(repo_name)

# get predictions
schawinski_2_prep_2 = schawinski_2_prep.map(preparation)
result = schawinski_2_prep_2.map(pred_xlsr)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = [item for sublist in result['pred_strings'] for item in sublist]

# calculate wer, cer and bleu
print("============== \nEVALUATION ON Schawinski SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

XLSR model V3.2


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Map:   0%|          | 0/385 [00:00<?, ? examples/s]

EVALUATION ON Schawinski SPLIT
----------
WORD ERROR RATE: 87.6748482449195
CHARACTER ERROR RATE: 42.72128386745258
BLEU SCORE: 0.013162417882716928


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_xlsr_v3_2_spont = pred_vs_ref(predictions, transcriptions)
df_xlsr_v3_2_spont = df_xlsr_v3_2_spont.append(row_for_df, ignore_index=True)
df_xlsr_v3_2_spont.to_csv("test_xlsr_v3_2_spont.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


# Prepared Speech: Whisper

### Whisper Zero Shot

In [None]:
# load all parts
repo_name = "openai/whisper-small"
print("============== \nWhisper Zero Shot on Prepared Speech\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = sds_test.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Whisper Zero Shot on Prepared Speech





EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 110.735524256651
CHARACTER ERROR RATE: 70.25273243001703
BLEU SCORE: 0.0064177779998913715


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_whisper_zero = pred_vs_ref(predictions, transcriptions)
df_whisper_zero = df_whisper_zero.append(row_for_df, ignore_index=True)
df_whisper_zero.to_csv("test_whisper_zero_prep.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Whisper V2

### Whisper V4.2

In [None]:
# load all parts
repo_name = "karinthommen/whisper-V4-2"
print("============== \nWhisper V4.2 on Prepared Speech\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = sds_test.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Whisper V4.2 on Prepared Speech


Downloading (…)rocessor_config.json:   0%|          | 0.00/339 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/3.53k [00:00<?, ?B/s]

Map:   0%|          | 0/3636 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 64.44792210050426
CHARACTER ERROR RATE: 45.189313409306635
BLEU SCORE: 0.232914056561467


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_whisper_v4 = pred_vs_ref(predictions, transcriptions)
df_whisper_v4 = df_whisper_v4.append(row_for_df, ignore_index=True)
df_whisper_v4.to_csv("test_whisper_v4_prep.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Whisper V4.2 on Spontaneous Speech

In [None]:
# load all parts
repo_name = "karinthommen/whisper-V4-2"
print("============== \nWhisper V4.2 on Spontaneous Speech\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2_prep.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Whisper V4.2 on Spontaneous Speech


Map:   0%|          | 0/385 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 95.66939529970954
CHARACTER ERROR RATE: 66.14812979935596
BLEU SCORE: 0.0


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_whisper_v4_spont = pred_vs_ref(predictions, transcriptions)
df_whisper_v4_spont = df_whisper_v4.append(row_for_df, ignore_index=True)
df_whisper_v4_spont.to_csv("test_whisper_v4_spont.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


# Spontaneous Speech: XLS-R

### Spontaneous XLSR Zero Shot

In [None]:
# load all parts
print("============== \nXLSR model Zero Shot\n==============")
model=Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-xls-r-300m").to("cuda")
feature_extractor = Wav2Vec2FeatureExtractor(feature_size=1, sampling_rate=16000, padding_value=0.0, do_normalize=True, return_attention_mask=True)
tokenizer = Wav2Vec2CTCTokenizer("./vocab.json", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|")
processor = Wav2Vec2Processor(feature_extractor=feature_extractor, tokenizer=tokenizer)


# get predictions
schawinski_2_prep_2 = schawinski_2_prep.map(preparation)
result = schawinski_2_prep_2.map(pred_xlsr)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = [item for sublist in result['pred_strings'] for item in sublist]

# calculate wer, cer and bleu
print("============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

XLSR model Zero Shot


Some weights of the model checkpoint at facebook/wav2vec2-xls-r-300m were not used when initializing Wav2Vec2ForCTC: ['quantizer.codevectors', 'project_q.weight', 'project_hid.bias', 'quantizer.weight_proj.weight', 'project_q.bias', 'quantizer.weight_proj.bias', 'project_hid.weight']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-xls-r-300m and are newly initialized: ['lm_head.weight', 'lm_head.bias']
You should probably TRAIN this model on a down-stream task to be able to use it 

Map:   0%|          | 0/385 [00:00<?, ? examples/s]

EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 100.0
CHARACTER ERROR RATE: 200.88662142751002
BLEU SCORE: 0.0


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_xlsr_zero_spont = pred_vs_ref(predictions, transcriptions)
df_xlsr_zero_spont = df_xlsr_zero_spont.append(row_for_df, ignore_index=True)
df_xlsr_zero_spont.to_csv("test_xlsr_zero_spont.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Spontaneous XLSR V1

In [None]:
# load all parts
repo_name = "karinthommen/spont-xlsr-V1"
print("============== \nSpontaneous XLSR model V1\n==============")
model, processor = xlsr_load(repo_name)

# get predictions
schawinski_2_prep_2 = schawinski_2_prep.map(preparation)
result = schawinski_2_prep_2.map(pred_xlsr)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = [item for sublist in result['pred_strings'] for item in sublist]

# calculate wer, cer and bleu
print("============== \nEVALUATION ON Schawinski SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous XLSR model V1


Downloading (…)lve/main/config.json:   0%|          | 0.00/2.06k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.26G [00:00<?, ?B/s]

Downloading (…)rocessor_config.json:   0%|          | 0.00/256 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/373 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/469 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/96.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Map:   0%|          | 0/385 [00:00<?, ? examples/s]

EVALUATION ON Schawinski SPLIT
----------
WORD ERROR RATE: 65.39984164687253
CHARACTER ERROR RATE: 39.065827926098365
BLEU SCORE: 0.11008074389848449


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_xlsr_spont = pred_vs_ref(predictions, transcriptions)
df_xlsr_spont = df_xlsr_spont.append(row_for_df, ignore_index=True)
df_xlsr_spont.to_csv("test_xlsr_spont_V1.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


# Spontaneous Speech: Whisper

### Spontaneous Whisper Zero Shot

In [None]:
# load all parts
repo_name = "openai/whisper-small"
print("============== \nWhisper Zero Shot on Prepared Speech\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2_prep.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Whisper Zero Shot on Prepared Speech


Downloading (…)rocessor_config.json:   0%|          | 0.00/185k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/842 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.20M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.97k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/3.51k [00:00<?, ?B/s]

Map:   0%|          | 0/385 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 105.33403749669922
CHARACTER ERROR RATE: 73.98563289571463
BLEU SCORE: 0.0


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_whisper_spont_zero = pred_vs_ref(predictions, transcriptions)
df_whisper_spont_zero = df_whisper_spont_zero.append(row_for_df, ignore_index=True)
df_whisper_spont_zero.to_csv("test_spont_whisper_zero_prep.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Spontaneous Whisper V2-3

In [None]:
# load all parts
repo_name = "karinthommen/spontaneous-whisper-v2-3"
print("============== \nSpontaneous Whisper V2.3\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2_prep.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous Whisper V2.3


Downloading (…)rocessor_config.json:   0%|          | 0.00/339 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/3.53k [00:00<?, ?B/s]

Map:   0%|          | 0/385 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 66.06812780565092
CHARACTER ERROR RATE: 31.062670299727518
BLEU SCORE: 0.19216972841128863


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_spont_whisper_v2_3 = pred_vs_ref(predictions, transcriptions)
df_spont_whisper_v2_3 = df_spont_whisper_v2_3.append(row_for_df, ignore_index=True)
df_spont_whisper_v2_3.to_csv("test_spont_whisper_v2_3.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Spontaneous Whisper V4

In [None]:
# load all parts
repo_name = "karinthommen/spontaneous-whisper-v4"
print("============== \nSpontaneous Whisper V4\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2_prep.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON Schawinski SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous Whisper V4





EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 97.86110377607605
CHARACTER ERROR RATE: 56.641070101560565
BLEU SCORE: 0.02908299337745442


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_spont_whisper_v4 = pred_vs_ref(predictions, transcriptions)
df_spont_whisper_v4 = df_spont_whisper_v4.append(row_for_df, ignore_index=True)
df_spont_whisper_v4.to_csv("test_spont_whisper_v4.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Spontaneous Whisper V4.2

In [None]:
# load all parts
repo_name = "karinthommen/spontaneous-whisper-v4-2"
print("============== \nSpontaneous Whisper V4.2\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2_prep.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON Schawinski Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous Whisper V4.2


Downloading (…)rocessor_config.json:   0%|          | 0.00/339 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/3.53k [00:00<?, ?B/s]

Map:   0%|          | 0/385 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON Schawinski Test SPLIT
----------
WORD ERROR RATE: 99.47187747557433
CHARACTER ERROR RATE: 81.1047807778053
BLEU SCORE: 0.0


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_spont_whisper_v4_2 = pred_vs_ref(predictions, transcriptions)
df_spont_whisper_v4_2 = df_spont_whisper_v4_2.append(row_for_df, ignore_index=True)
df_spont_whisper_v4_2.to_csv("test_spont_whisper_v4_2.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


In [None]:
files.download('/content/test_spont_whisper_v4_2.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Spontaneous Whisper V5

In [None]:
# load all parts
repo_name = "karinthommen/spontaneous-whisper-v5"
print("============== \nSpontaneous Whisper V5\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2_prep.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON SDS_200 Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous Whisper V5


Downloading (…)rocessor_config.json:   0%|          | 0.00/339 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/3.53k [00:00<?, ?B/s]

Map:   0%|          | 0/385 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 100.84499603908105
CHARACTER ERROR RATE: 68.37255387664108
BLEU SCORE: 0.07213401655156244


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_spont_whisper_v5 = pred_vs_ref(predictions, transcriptions)
df_spont_whisper_v5 = df_spont_whisper_v5.append(row_for_df, ignore_index=True)
df_spont_whisper_v5.to_csv("test_spont_whisper_v5.csv")

### Spontaneous Whisper V5.2

In [None]:
# load all parts
repo_name = "karinthommen/spontaneous-whisper-v5-2"
print("============== \nSpontaneous Whisper V5.2\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2_prep.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON Schawinski Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous Whisper V5.2


Downloading (…)rocessor_config.json:   0%|          | 0.00/339 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/3.53k [00:00<?, ?B/s]

Map:   0%|          | 0/385 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON SDS_200 Test SPLIT
----------
WORD ERROR RATE: 72.64325323475046
CHARACTER ERROR RATE: 51.959375774089665
BLEU SCORE: 0.16529272108560528


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_spont_whisper_v5_2 = pred_vs_ref(predictions, transcriptions)
df_spont_whisper_v5_2 = df_spont_whisper_v5_2.append(row_for_df, ignore_index=True)
df_spont_whisper_v5_2.to_csv("test_spont_whisper_v5_2.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Spontaneous Whisper V5.3

In [None]:
# load all parts
repo_name = "karinthommen/spontaneous-whisper-v5-3"
print("============== \nSpontaneous Whisper V5.3\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2_prep.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON Schawinski Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous Whisper V5.3


Downloading (…)rocessor_config.json:   0%|          | 0.00/339 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/3.53k [00:00<?, ?B/s]

Map:   0%|          | 0/385 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON Schawinski Test SPLIT
----------
WORD ERROR RATE: 68.15421177713229
CHARACTER ERROR RATE: 33.18305672529106
BLEU SCORE: 0.1601367010022261


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_spont_whisper_v5_3 = pred_vs_ref(predictions, transcriptions)
df_spont_whisper_v5_3 = df_spont_whisper_v5_3.append(row_for_df, ignore_index=True)
df_spont_whisper_v5_3.to_csv("test_spont_whisper_v5_3.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


In [None]:
files.download('/content/test_spont_whisper_v5_3.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Spontaneous Whisper V5.4

In [None]:
# load all parts
repo_name = "karinthommen/spontaneous-whisper-v5-4"
print("============== \nSpontaneous Whisper V5.4\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2_prep.map(pred_whisper)

# get transcription and prediction from result
transcriptions = result['transcription']
predictions = result['prediction']

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON Schawinski Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous Whisper V5.4


Downloading (…)rocessor_config.json:   0%|          | 0.00/339 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/3.53k [00:00<?, ?B/s]

Map:   0%|          | 0/385 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON Schawinski Test SPLIT
----------
WORD ERROR RATE: 68.15421177713229
CHARACTER ERROR RATE: 33.18305672529106
BLEU SCORE: 0.1601367010022261


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_spont_whisper_v5_4 = pred_vs_ref(predictions, transcriptions)
df_spont_whisper_v5_4 = df_spont_whisper_v5_4.append(row_for_df, ignore_index=True)
df_spont_whisper_v5_4.to_csv("test_spont_whisper_v5_4.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


In [None]:
files.download('/content/test_spont_whisper_v5_4.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Spontaneous Whisper V6

In [None]:
# load all parts
repo_name = "karinthommen/spontaneous-whisper-v6"
print("============== \nSpontaneous Whisper V6\n==============")
processor, model, tokenizer = whisper_load(repo_name)


# get predictions
result = schawinski_2.map(pred_whisper_disfluency)

# get transcription and prediction from result
chars = '[\$\*\/\_\]\[\"\“\‘\”\�\']'
reference = result['transcription']
predictions = result['prediction']

transcriptions = []
for i in reference:
  t = i.lower()
  t = re.sub(chars, '', t)
  t = re.sub('ä', 'a', t)
  t = re.sub('ö', 'o', t)
  t = re.sub('ü', 'u', t)
  t = re.sub('\$', '', t)
  transcriptions.append(t)


for t, p in zip(transcriptions, predictions):  # additional check for emtpy strings because not prepared dataset
  if len(t) < 1 or len(p) < 1:
    transcriptions.remove(t)
    predictions.remove(p)


# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON Schawinski Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous Whisper V6

EVALUATION ON Schawinski Test SPLIT
----------
WORD ERROR RATE: 83.5377821393523
CHARACTER ERROR RATE: 47.79259140986748
BLEU SCORE: 0.11914896871250096


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_spont_whisper_v6 = pred_vs_ref(predictions, transcriptions)
df_spont_whisper_v6 = df_spont_whisper_v6.append(row_for_df, ignore_index=True)
df_spont_whisper_v6.to_csv("test_spont_whisper_v6.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


In [None]:
from google.colab import files
files.download('/content/test_spont_whisper_v6.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Spontaneous Whisper V6.2

In [None]:
# load all parts
repo_name = "karinthommen/spontaneous-whisper-v6-2"
print("============== \nSpontaneous Whisper V6.2\n==============")
processor, model, tokenizer = whisper_load(repo_name)


# get predictions
result = schawinski_2.map(pred_whisper_disfluency)

# get transcription and prediction from result
chars = '[\$\*\/\_\]\[\"\“\‘\”\�\']'
reference = result['transcription']
predictions = result['prediction']

transcriptions = []
for i in reference:
  t = i.lower()
  t = re.sub(chars, '', t)
  t = re.sub('ä', 'a', t)
  t = re.sub('ö', 'o', t)
  t = re.sub('ü', 'u', t)
  t = re.sub('\$', '', t)
  transcriptions.append(t)

# get transcription and prediction from result
for t, p in zip(transcriptions, predictions):
  if len(t) < 1 or len(p) < 1: # additional check for emtpy strings because not prepared dataset
    transcriptions.remove(t)
    predictions.remove(p)

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON Schawinski Test SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous Whisper V6.2


Downloading (…)rocessor_config.json:   0%|          | 0.00/339 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/3.53k [00:00<?, ?B/s]

Map:   0%|          | 0/393 [00:00<?, ? examples/s]


EVALUATION ON Schawinski Test SPLIT
----------
WORD ERROR RATE: 84.9841810659528
CHARACTER ERROR RATE: 51.24668435013262
BLEU SCORE: 0.0415813568536159


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_spont_whisper_v6_2 = pred_vs_ref(predictions, transcriptions)
df_spont_whisper_v6_2 = df_spont_whisper_v6_2.append(row_for_df, ignore_index=True)
df_spont_whisper_v6_2.to_csv("test_spont_whisper_v6_2.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


### Spontaneous Whisper V6.3

In [None]:
# load all parts
repo_name = "karinthommen/spontaneous-whisper-v6-3"
print("============== \nSpontaneous Whisper V6.3\n==============")
processor, model, tokenizer = whisper_load(repo_name)

# get predictions
result = schawinski_2.map(pred_whisper_disfluency)

# get transcription and prediction from result
chars = '[\$\*\/\_\]\[\"\“\‘\”\�\']'
reference = result['transcription']
predictions = result['prediction']

transcriptions = []
for i in reference:
  t = i.lower()
  t = re.sub(chars, '', t)
  t = re.sub('ä', 'a', t)
  t = re.sub('ö', 'o', t)
  t = re.sub('ü', 'u', t)
  t = re.sub('\$', '', t)
  transcriptions.append(t)

for t, p in zip(transcriptions, predictions):  # additional check for emtpy strings because not prepared dataset
  if len(t) < 1 or len(p) < 1:
    transcriptions.remove(t)
    predictions.remove(p)

# calculate wer, cer and bleu
print("\n============== \nEVALUATION ON Schawinski SPLIT\n----------")
wer_score = 100 * wer.compute(predictions=predictions, references=transcriptions)
cer_score = 100 * cer.compute(predictions=predictions, references=transcriptions)
bleu_score = bleu.compute(predictions=predictions, references=transcriptions)

print("WORD ERROR RATE:", wer_score)
print("CHARACTER ERROR RATE:", cer_score)
print("BLEU SCORE:", bleu_score["bleu"])
print("==============")

# save information for whole set in a separate row
row_for_df = {"transcription": transcriptions, "prediction": predictions, "WER": wer_score, "CER": cer_score, "Bleu": bleu_score}

Spontaneous Whisper V6.3


Map:   0%|          | 0/393 [00:00<?, ? examples/s]

    Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.



EVALUATION ON Schawinski SPLIT
----------
WORD ERROR RATE: 68.64573110893032
CHARACTER ERROR RATE: 35.34647932300815
BLEU SCORE: 0.18522160791611533


In [None]:
# calculate wer, cer and bleu per entry and save it to csv
df_spont_whisper_v6_3 = pred_vs_ref(predictions, transcriptions)
df_spont_whisper_v6_3 = df_spont_whisper_v6_3.append(row_for_df, ignore_index=True)
df_spont_whisper_v6_3.to_csv("test_spont_whisper_v6_3.csv")

    The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


In [None]:
from google.colab import files
files.download('/content/test_spont_whisper_v6_3.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
from google.colab import files
files.download('/content/test_xlsr_spont_V1.csv')
files.download('/content/test_xlsr_zero_prep.csv')
files.download('/content/test_xlsr_zero_spont.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Interface

In [None]:
iface = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(source="microphone", type="filepath"),
    outputs="text",
    title="Whisper Small Swiss German",
    description="Realtime demo for Swiss German speech recognition using a fine-tuned Whisper small model.",
)

iface.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://8912a7de413778e294.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces


