In [2]:
pip install krixik

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.3.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
import sys 
sys.path.append('..')
from dotenv import load_dotenv
import os
load_dotenv()

LUCAS_STAGING_API_KEY=os.getenv('LUCAS_STAGING_API_KEY')
LUCAS_STAGING_API_URL=os.getenv('LUCAS_STAGING_API_URL')

# import Krixik
from krixik import krixik
krixik.init(api_key = LUCAS_STAGING_API_KEY, 
            api_url = LUCAS_STAGING_API_URL)

import json
def json_print(data):
    print(json.dumps(data, indent=2))

%load_ext autoreload
%autoreload 2 

SUCCESS: You are now authenticated.


---

---

---

# Transcription Accuracy Differentials Across the Whisper Family

OpenAI's [Whisper family](https://github.com/openai/whisper) of transcription models is composed of five models: [whisper-tiny](https://huggingface.co/openai/whisper-tiny), [whisper-base](https://huggingface.co/openai/whisper-base), [whisper-small](https://huggingface.co/openai/whisper-small), [whisper-medium](https://huggingface.co/openai/whisper-medium), and [whisper-large](https://huggingface.co/openai/whisper-large-v3) (which is the only multilingual option and comes in versions v1, v2, and v3). As their names suggest, these vary greatly in size, with [whisper-tiny](https://huggingface.co/openai/whisper-tiny) at 39M parameters and [whisper-large](https://huggingface.co/openai/whisper-large-v3) at 1,550M parameters. In theory, the larger models are more accurate, but they are also slower and more expensive to run.

This is one of two articles comparing performance across the Whisper family. **Here we'll examine accuracy differentials**, so click here[LINK] if you'd like to read the companion piece evaluating model processing times instead.

To compare transcription accuracy across the Whisper family of models we'll process the same file through each of the models. We'll be looking to confirm two basic hypotheses:

- That transcriptions are more accurate with larger models.
- That the difference in accuracy between smaller models and larger models is meaningful.

We'll first need to [create](https://krixik-docs.readthedocs.io/en/latest/system/pipeline_creation/create_pipeline/) a single [Krixik pipeline](https://krixik-docs.readthedocs.io/en/latest/) with a [transcribe](https://krixik-docs.readthedocs.io/en/latest/modules/ai_modules/transcribe_module/) module in it. Given that we can change the model we're applying every time we [process](https://krixik-docs.readthedocs.io/en/latest/system/parameters_processing_files_through_pipelines/process_method/) a file, we'll only need one pipeline for the entire exercise.

We create our [single-module](https://krixik-docs.readthedocs.io/en/latest/examples/single_module_pipelines/single_transcribe/) transcription pipeline as follows:

In [4]:
# create a single-module pipeline with a transcribe module in it
pipeline_1 = krixik.create_pipeline(name='my_transcribe_pipeline',
                                    module_chain=['transcribe'])

We'll begin with [whisper-tiny](https://huggingface.co/openai/whisper-tiny). To process our file—a two minute clip of a man speaking about the beautiful country of Colombia—we simply need a single line of code. As [whisper-tiny](https://huggingface.co/openai/whisper-tiny) is the [default model](https://krixik-docs.readthedocs.io/en/latest/modules/ai_modules/transcribe_module/#available-models-in-the-transcribe-module) for this module, it need not be specified:

In [5]:
#process our file through a transcribe module with whisper-tiny, the default model, active
pipeline_1.process(local_file_path='./test_files/about_Colombia.mp3',
                   verbose=False)

{'status_code': 200,
 'pipeline': 'my_transcribe_pipeline',
 'request_id': '5a8233a2-7f9d-457e-9d31-529f2ee581dc',
 'file_id': 'dd31cb8f-3f86-4063-9c69-371815559a1f',
 'message': 'SUCCESS - output fetched for file_id dd31cb8f-3f86-4063-9c69-371815559a1f.Output saved to location(s) listed in process_output_files.',
 'process_output': [{'transcript': " This episode, looking at the great country of Columbia, we looked at some really basic facts. It's name, a bit of its history, the type of people that live there, land size, and all that jazz. But in this video, we're going to go into a little bit more of a detailed look. Yo, what is going on guys? Welcome back to F2D facts. The channel where I look at people cultures and places, my name is Dave Wouple, and today we are going to be looking more at Columbia in our Columbia Part 2 video. Which just reminds me guys, this is part of our Columbia playlist. So put it down in the description box below and I'll talk about that more at the end of t

The transcript looks as follows. In theory, given that we used the smallest model available, this is the least accurate transcript we'll generate with this exercise:

>This episode, looking at the great country of Columbia, we looked at some really basic facts. It's name, a bit of its history, the type of people that live there, land size, and all that jazz. But in this video, we're going to go into a little bit more of a detailed look. Yo, what is going on guys? Welcome back to F2D facts. The channel where I look at people cultures and places, my name is Dave Wouple, and today we are going to be looking more at Columbia in our Columbia Part 2 video. Which just reminds me guys, this is part of our Columbia playlist. So put it down in the description box below and I'll talk about that more at the end of the video. But if you're new here, join me every single Monday to learn about new countries from around the world. You can do that by hitting that subscribe and that belt notification button. But let's get started. So we all know, Columbia is famous for its coffee, right? Yes, right. I know. You guys are sitting there going, five bucks says he's going to talk about coffee. Well, I am. That's right, because I got my van, Columbia coffee. Right here. Boom advertisement. Yeah. Pain me for this. I'm care. So which might not know about coffee is yes, you probably already know that a lot of companies actually buy it up. Starbucks buys all had a coffee from Columbia. It's kind of like their favorite place to buy coffee. And kind of to pay tribute to that Starbucks when they were making their 1,000th store in 2016, they decided, yo, we're going to put it in Columbia. And this was in the town of Medellin, Columbia. Now here's the thing when it comes to coffee in Columbia. They are the third largest producing and exporting coffee country in the world. The amount of coffee that is exported from Columbia equals about 810,000 metric tons. Or approximately 11.5 million bags. However, although it might be beaten by countries like Brazil, it is actually the number one or highest country for producing and growing a specific type of being known as the Arabica being. And I know coffee is really important when it comes to talking about Columbia, but yes, *[The clip ends mid-sentence; that's not the model's fault]*

There are several evident errors here, but this is a very good transcript. That said, given that for instance it refers to the "belt notification button", says "Starbucks buys all had a coffee" instead of "Starbucks buys a lot of coffee" and "being" instead of "bean", and entirely skips the softly spoken line after "but let's get started"—along with a few other errors—there's room for improvement.

Let's compare it to what the transcript from the next biggest model, [whisper-base](https://huggingface.co/openai/whisper-base), looks like:

In [6]:
#process our file through a transcribe module with whisper-base as the active model
pipeline_1.process(local_file_path='./test_files/about_Colombia.mp3',
                   modules={'transcribe': {'model': 'whisper-base', 'params': {}}},
                   verbose=False)

{'status_code': 200,
 'pipeline': 'my_transcribe_pipeline',
 'request_id': '7dbecefa-6f35-444c-b54f-dfad8a660ace',
 'file_id': '8c907e0c-3064-49cf-9cc1-c557c4c9935d',
 'message': 'SUCCESS - output fetched for file_id 8c907e0c-3064-49cf-9cc1-c557c4c9935d.Output saved to location(s) listed in process_output_files.',
 'process_output': [{'transcript': " episode looking at the great country of Columbia. We looked at some really just basic facts. Its name, a bit of its history, the type of people that lived there, land size and all that jazz. But in this video we're going to go into a little bit more of a detailed look. Yo, what is going on guys welcome back to F2DFacts. The channel where I look at people cultures and places my name is Dave Walpole and today we are going to be looking more at Columbia in our Columbia Part 2 video which just reminds me guys this is part of our Columbia playlist. I'll put it down there in the description box below and I'll talk about that more at the end of t

The transcript reads as follows:

> episode looking at the great country of Columbia. We looked at some really just basic facts. Its name, a bit of its history, the type of people that lived there, land size and all that jazz. But in this video we're going to go into a little bit more of a detailed look. Yo, what is going on guys welcome back to F2DFacts. The channel where I look at people cultures and places my name is Dave Walpole and today we are going to be looking more at Columbia in our Columbia Part 2 video which just reminds me guys this is part of our Columbia playlist. I'll put it down there in the description box below and I'll talk about that more at the end of the video. But if you're new here join me every single Monday to learn about new countries from around the world you can do that by hitting that subscribe and that bell notification button. But let's get started. Learn about Columbia. So we all know Columbia is famous for its coffee right? Yes right. I know you guys are sitting there going five bucks says he's going to talk about coffee. Well I am. That's right because I got my van huge Columbia coffee right here boom advertisement. Yeah. They're not paying me for this. I don't care. So which you might not know about coffee is yes you probably already know that a lot of companies actually buy it up. Starbucks buys a lot of coffee from Columbia. It's kind of like their favorite place to buy coffee and kind of to pay tribute to that Starbucks when they were making their 1000th store in 2016 they decided yo we're going to put it in Columbia and this was in the town of Medellin Columbia. Now here's the thing when it comes to coffee in Columbia they are the third largest producing and exporting coffee country in the world. The amount of coffee that is exported from Columbia equals about 810 thousand metric tons or approximately 11.5 million bags. However although it might be beaten by countries like Brazil it is actually the number one or highest country for producing and growing a specific type of bean known as the arabica bean. And I know coffee is really important when it comes to talking about Columbia but you guys really

This second transcript improves on several things. Some of these are:

- It's now, correctly, the "bell notification button"
- It's also now correctly "Starbucks buys a lot of coffee"
- At the end we now have "bean" instead of "being"

Other matters are fixed, including grammatical improvements (e.g. "Its" instead of "It's").

However, new failures—things that the previous model got right—fail here. Look at, for instance:

- A few additional words (hallucinations) are incorporated at the very end after "yes"
- Some commas, serial and otherwise, are no longer present. At least one period is also gone, from right after "around the world"
- Capitalization missteps, such as with "Arabica"

The larger model, [whisper-base](https://huggingface.co/openai/whisper-base), is not an absolute improvement over [whisper-tiny](https://huggingface.co/openai/whisper-tiny) in terms of accuracy, and it's certainly slower/more expensive to run.

Let's move on to the next bigger model, [whisper-small](https://huggingface.co/openai/whisper-small):

In [7]:
#process our file through a transcribe module with whisper-small as the active model
pipeline_1.process(local_file_path='./test_files/about_Colombia.mp3',
                   modules={'transcribe': {'model': 'whisper-small', 'params': {}}},
                   verbose=False)

{'status_code': 200,
 'pipeline': 'my_transcribe_pipeline',
 'request_id': '758a90d1-069c-4fcd-aabf-fb57381456d1',
 'file_id': '14a7035c-b048-43a9-a2e2-ebf495cc2ca9',
 'message': 'SUCCESS - output fetched for file_id 14a7035c-b048-43a9-a2e2-ebf495cc2ca9.Output saved to location(s) listed in process_output_files.',
 'process_output': [{'transcript': " Next episode looking at the great country of Colombia, we looked at some really just basic facts. Its name, a bit of its history, the type of people that live there, land size and all that jazz. But in this video we're going to go into a little bit more of a detailed look. Yo, what is going on guys? Welcome back to FTD Facts, the channel where I look at people cultures and places. My name is Dave Walpole and today we are going to be looking more at Colombia in our Colombia Part 2 video, which just reminds me guys this is part of our Colombia playlist. I'll put it down in the description box below and I'll talk about that more at the end of

The new transcript says:

> Next episode looking at the great country of Colombia, we looked at some really just basic facts. Its name, a bit of its history, the type of people that live there, land size and all that jazz. But in this video we're going to go into a little bit more of a detailed look. Yo, what is going on guys? Welcome back to FTD Facts, the channel where I look at people cultures and places. My name is Dave Walpole and today we are going to be looking more at Colombia in our Colombia Part 2 video, which just reminds me guys this is part of our Colombia playlist. I'll put it down in the description box below and I'll talk about that more at the end of the video. But if you're new here, join me every single Monday to learn about new countries from around the world. You can do that by hitting that subscribe and that bell notification button. But let's get started. Learn about Colombia. So we all know Colombia is famous for its coffee, right? Yes, right. I know, you guys are sitting there going, five bucks says he's going to talk about coffee. Well I am. That's right because I got my VanHute Colombia coffee right here. Boom, advertisement. Yeah. I'm in pain for this. I don't care. So what you might not know about coffee is yes, you probably already know that a lot of companies actually buy it up. Starbucks buys a lot of coffee from Colombia. It's kind of like their favorite place to buy coffee. And kind of to pay tribute to that Starbucks when they were making their 1000th store in 2016, they decided yo, we're going to put it in Colombia. And this was in the town of Medellin, Colombia. Now here's the thing when it comes to coffee in Colombia. They are the third largest producing and exporting coffee country in the world. The amount of coffee that is exported from Colombia equals about 810,000 metric tons. Or approximately 11.5 million bags. However, although it might be beaten by countries like Brazil, it is actually the number one or highest country for producing and growing a specific type of bean known as the Arabica Bean. And I know coffee is really important when it comes to talking about Colombia, but you guys really

This third transcript is an evident improvement over the previous one. "Colombia" is spelled properly, so top marks for that. Commas are also generally much better incorporated (again), and some of the trickier, less-well-pronounced lines are transcribed as they should be. [whisper-small](https://huggingface.co/openai/whisper-small) is a strong step in the right direction.

That said, it's not perfect yet. For instance:

- The hallucination at the very end continues.
- We're back to "in pain" in the line that the previous model had correctly transcribed as "They're not paying me".

So a step in the right direction, but not quite there, and we have more than 6x'd the number of parameters since 39M in [whisper-tiny](https://huggingface.co/openai/whisper-tiny).

On to the next model we'll look at, [whisper-medium](https://huggingface.co/openai/whisper-medium):

In [8]:
#process our file through a transcribe module with whisper-medium as the active model
pipeline_1.process(local_file_path='./test_files/about_Colombia.mp3',
                   modules={'transcribe': {'model': 'whisper-medium', 'params': {}}},
                   verbose=False)

{'status_code': 200,
 'pipeline': 'my_transcribe_pipeline',
 'request_id': 'e2da92e8-c21c-42d5-949f-cb637def266b',
 'file_id': 'd1db2347-9fcc-480a-985e-261dcf895010',
 'message': 'SUCCESS - output fetched for file_id d1db2347-9fcc-480a-985e-261dcf895010.Output saved to location(s) listed in process_output_files.',
 'process_output': [{'transcript': " Episode looking at the great country of Colombia. We looked at some really just basic facts It's name a bit of its history the type of people that live their land size and all that jazz But in this video, we're gonna go into a little bit more of a detailed look. Yo, what is going on guys? Welcome back to F2D facts a channel where I look at people cultures and places My name is Dave Walpole and today we are gonna be looking more at Colombia in our Colombia part two video Which just reminds me guys. This is part of our Colombia playlist I'll put it down there in the description box below and I'll talk about that more at the end of the video 

The result is a disappointment. Take a look:

> Episode looking at the great country of Colombia. We looked at some really just basic facts It's name a bit of its history the type of people that live their land size and all that jazz But in this video, we're gonna go into a little bit more of a detailed look. Yo, what is going on guys? Welcome back to F2D facts a channel where I look at people cultures and places My name is Dave Walpole and today we are gonna be looking more at Colombia in our Colombia part two video Which just reminds me guys. This is part of our Colombia playlist I'll put it down there in the description box below and I'll talk about that more at the end of the video But if you're new here join me every single Monday to learn about new countries from around the world You can do that by hitting that subscribe and that belt notification button, but let's get started learn about Colombia So we all know Colombia is famous for its coffee, right? Yes, right? I know you guys are sitting there going five bucks says he's gonna talk about coffee. Well, I am That's right, because I got my van Hute Colombia coffee right here. Boom Advertisement yeah, don't I'm paying me for this. I don't care. So what you might not know about coffee is yes You probably already know that a lot of companies actually buy it up Starbucks buys a lot of coffee from Colombia It's kind of like their favorite place to buy coffee and kind of to pay tribute to that Starbucks when they were making their 1000th store in 2016 they decided yo, we're gonna put it in Colombia and this was in the town of Medellin, Colombia Now here's the thing when it comes to coffee in Colombia, they are the third largest producing and exporting coffee country in the world the amount of coffee that is exported from Colombia equals about 810 thousand metric tons or Approximately eleven point five million bags. However, although it might be beaten by countries like Brazil It is actually the number one or highest country for producing and growing a specific type of bean known as the Arabica bean and I know coffee is really important when it comes to talking about Colombia, but yes really

This transcript feels like such a step down from that of the previous model that we stopped to review the code just to confirm that [whisper-medium](https://huggingface.co/openai/whisper-medium) had indeed been applied.

New and reappearing issues include:

- Butchered punctuation (commas, periods, apostrophes within words, capitalized words without periods, etc.)
- For the first time in this excercise we have "their" instead of "there"
- The "belt notification" button is back

This is probably the worst transcript we've seen so far, worse even than the one we got from our smallest, least expensive model, [whisper-tiny](https://huggingface.co/openai/whisper-tiny). Keep in mind that the difference in parameters between those two models is *19.7x*. Caveat emptor, folks... bigger and more expensive doesn't always mean better.

Now for our final transcription model, [whisper-large](https://huggingface.co/openai/whisper-large-v3):

In [9]:
#process our file through a transcribe module with whisper-large-v3 as the active model
pipeline_1.process(local_file_path='./test_files/about_Colombia.mp3',
                   modules={'transcribe': {'model': 'whisper-large-v3', 'params': {}}},
                   verbose=False)

{'status_code': 200,
 'pipeline': 'my_transcribe_pipeline',
 'request_id': '358d1a06-8f34-4f31-9aa9-588c1cfda7f4',
 'file_id': '004d17a4-9609-485e-8211-2f9bfe860756',
 'message': 'SUCCESS - output fetched for file_id 004d17a4-9609-485e-8211-2f9bfe860756.Output saved to location(s) listed in process_output_files.',
 'process_output': [{'transcript': " Episode looking at the great country of Colombia We looked at some really just basic facts its name a bit of its history the type of people that live there Landsize and all that jazz, but in this video, we're gonna go into a little bit more of a detailed look Yo, what is going on guys? Welcome back to have to D facts a channel where I look at people cultures and places My name is Dave Walpole and today We are gonna be looking more at Colombia in our Columbia part 2 video, which just reminds me guys This is part of our Columbia playlist I'll put it down there in the description box below and I'll talk about that more at the end of the video

Very surprisingly, this final transcription is just as bad as the one from [whisper-medium](https://huggingface.co/openai/whisper-medium), if not worse. As you go over it, bear in mind that [whisper-large](https://huggingface.co/openai/whisper-large-v3) is a model with 1.55 *billion* parameters:

>Episode looking at the great country of Colombia We looked at some really just basic facts its name a bit of its history the type of people that live there Landsize and all that jazz, but in this video, we're gonna go into a little bit more of a detailed look Yo, what is going on guys? Welcome back to have to D facts a channel where I look at people cultures and places My name is Dave Walpole and today We are gonna be looking more at Colombia in our Columbia part 2 video, which just reminds me guys This is part of our Columbia playlist I'll put it down there in the description box below and I'll talk about that more at the end of the video But if you're new here join me every single Monday to learn about new countries from around the world You can do that by hitting that subscribe and that belt notification button, but let's get started Columbia so we all know Columbia is famous for its coffee, right? Yes, right I know you guys are sitting there going five bucks says he's gonna talk about coffee Well, I am that's right because I got my van huge Columbia coffee right here. Boom Advertisement. Yeah Yeah They're not even paying me for this. I don't care So what you might not know about coffee is yes You probably already know that a lot of companies actually buy it up Starbucks buys all had a coffee from Columbia It's kind of like their favorite place to buy coffee and kind of to pay tribute to that Starbucks when they were making their 1000th store in 2016 they decided yo we're gonna put it in Columbia and this was in the town of medellin, Colombia Now here's the thing when it comes to coffee in Colombia. They are the third largest largest producing and exporting coffee country in the world the amount of coffee that is exported from Colombia equals about 810 thousand metric tons or approximately 11.5 million bags. However, although it might be beaten by countries like Brazil it is actually the number one or highest country for producing and growing a specific type of bean known as the Arabica bean and I know coffee is really important when it comes to talking about Columbia, but you guys really

Maybe a couple of errors from the previous transcription have been fixed, but it's mostly all a letdown:

- Punctuation continues to be a mess
- We're now back to "Columbia"—but it's sometimes "Columbia" and sometimes "Colombia", so not even consistent
- It skips spaces between words, which we haven't seen before. Look at "Landsize" close to the beginning
- "Medellín", for the first time in this exercise, isn't capitalized
- And others

### Conclusion

The two hypotheses we kicked this exercise off with were the following:

- That transcriptions are more accurate with larger models.
- That the difference in accuracy between smaller models and larger models is meaningful.

Surprisingly, despite both looking reasonable, both of them were proven inaccurate.

The largest model didn't provide the best transcription. In fact, the two worst transcriptions are from the two biggest and most expensive models,[whisper-medium](https://huggingface.co/openai/whisper-medium) and [whisper-large-v3](https://huggingface.co/openai/whisper-large-v3). They were not only worse than [whisper-small](https://huggingface.co/openai/whisper-small), which was the best one of the set, but their transcripts even proved inferior to output from the smallest models, [whisper-base](https://huggingface.co/openai/whisper-base) and [whisper-tiny](https://huggingface.co/openai/whisper-tiny), which was arguably second best. The ranking of accuracy is not 5-4-3-2-1, as expected, but 3-1-2-4-5. First hypothesis smashed.

The second hypothesis suffers greatly because of the first hypothesis' failure, given that it assumes that larger models are more accurate. But let's take the only two models for which this held true, [whisper-small](https://huggingface.co/openai/whisper-small) and [whisper-tiny](https://huggingface.co/openai/whisper-tiny) ([whisper-base](https://huggingface.co/openai/whisper-base) we'll leave out because it was no better than [whisper-tiny](https://huggingface.co/openai/whisper-tiny)). We're thus comparing the two best performers in this experiment.

[whisper-small](https://huggingface.co/openai/whisper-small) has 6.26 times more parameters than [whisper-tiny](https://huggingface.co/openai/whisper-tiny). That means significantly more cost, particularly when run at scale. But is [whisper-small](https://huggingface.co/openai/whisper-small)'s really 6.26 times better than [whisper-tiny](https://huggingface.co/openai/whisper-tiny)'s, or anything close to that? Certainly not—it's a bit better, but although this is difficult to quantify, I'd hesitate to say it's even twice as good.

What's the conclusion here? [whisper-tiny](https://huggingface.co/openai/whisper-tiny) is the smallest (and thus most-cost-effective) of the [Whisper family](https://github.com/openai/whisper), and is second-best in accuracy by a relatively small margin. Our position is that, based on these results, despite it being the smallest model, [whisper-tiny](https://huggingface.co/openai/whisper-tiny) is by far the best option to leverage when you need a transcription model.

There are many caveats here, naturally. This may be an unusual file, for instance. Or model output isn't being compared "correctly". But even a single experiment like this can be very telling. We recommend that, when choosing between models, you do some tests of your own and draw your own conclusions. Don't simply believe the hype and assume that bigger/pricier is better. Sometimes that's just not quite right, and sometimes it's downright inaccurate.