In [1]:
pip install krixik

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.3.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import sys 
sys.path.append('..')
from dotenv import load_dotenv
import os
load_dotenv()

LUCAS_STAGING_API_KEY=os.getenv('LUCAS_STAGING_API_KEY')
LUCAS_STAGING_API_URL=os.getenv('LUCAS_STAGING_API_URL')

# import Krixik
from krixik import krixik
krixik.init(api_key = LUCAS_STAGING_API_KEY, 
            api_url = LUCAS_STAGING_API_URL)

import json
def json_print(data):
    print(json.dumps(data, indent=2))

%load_ext autoreload
%autoreload 2 

SUCCESS: You are now authenticated.


---

---

---

# Transcription Processing Time Differentials Across the Whisper Family

OpenAI's [Whisper family](https://github.com/openai/whisper) of transcription models is composed of five models: [whisper-tiny](https://huggingface.co/openai/whisper-tiny), [whisper-base](https://huggingface.co/openai/whisper-base), [whisper-small](https://huggingface.co/openai/whisper-small), [whisper-medium](https://huggingface.co/openai/whisper-medium), and [whisper-large](https://huggingface.co/openai/whisper-large-v3) (which is the only multilingual option and comes in versions v1, v2, and v3). As their names suggest, these vary greatly in size, with [whisper-tiny](https://huggingface.co/openai/whisper-tiny) at 39M parameters and [whisper-large](https://huggingface.co/openai/whisper-large-v3) at 1,550M parameters. In theory, the larger models are more accurate, but they are also slower and more expensive to run.

This is one of two articles comparing performance across the Whisper family. Here we'll examine processing time differentials, so click here[LINK] if you'd like to read the companion piece on model accuracy instead.

To compare processing times across Whisper family models we'll process the same file through each model four times in quick succession. This will provide, for each of the five models, a read on processing time when it is [cold](https://en.wikipedia.org/wiki/Cold_start_(computing)), warming up, and warm.

In order to achieve this, we'll need to [create](https://krixik-docs.readthedocs.io/en/latest/system/pipeline_creation/create_pipeline/) a single [Krixik pipeline](https://krixik-docs.readthedocs.io/en/latest/) with a [transcribe](https://krixik-docs.readthedocs.io/en/latest/modules/ai_modules/transcribe_module/) module in it. Given that we can change the model we're applying every time we [process](https://krixik-docs.readthedocs.io/en/latest/system/parameters_processing_files_through_pipelines/process_method/) a file, we'll only need one pipeline for the entire exercise.

We create our [single-module](https://krixik-docs.readthedocs.io/en/latest/examples/single_module_pipelines/single_transcribe/) transcription pipeline as follows:

In [4]:
# create a single-module pipeline with a transcribe module in it
pipeline_1 = krixik.create_pipeline(name='my_transcribe_pipeline',
                                    module_chain=['transcribe'])

We'll begin with [whisper-tiny](https://huggingface.co/openai/whisper-tiny). To process our file—a two minute clip of a man speaking about the beautiful country of Colombia—we simply need a single line of code. As [whisper-tiny](https://huggingface.co/openai/whisper-tiny) is the [default model](https://krixik-docs.readthedocs.io/en/latest/modules/ai_modules/transcribe_module/#available-models-in-the-transcribe-module) for this module, it need not be specified:

In [4]:
#process our file through a transcribe module with whisper-tiny, the default model, active
pipeline_1.process(local_file_path='./test_files/about_Colombia.mp3',
                   verbose=False)

{'status_code': 200,
 'pipeline': 'my_transcribe_pipeline',
 'request_id': 'edd94bd6-640b-44d6-978a-aba625e6f144',
 'file_id': '2070bb26-136e-4935-9fdc-656e40c904b2',
 'message': 'SUCCESS - output fetched for file_id 2070bb26-136e-4935-9fdc-656e40c904b2.Output saved to location(s) listed in process_output_files.',
 'process_output': [{'transcript': " This episode, looking at the great country of Columbia, we looked at some really basic facts. It's name, a bit of its history, the type of people that live there, land size, and all that jazz. But in this video, we're going to go into a little bit more of a detailed look. Yo, what is going on guys? Welcome back to F2D facts. The channel where I look at people cultures and places, my name is Dave Wouple, and today we are going to be looking more at Columbia in our Columbia Part 2 video. Which just reminds me guys, this is part of our Columbia playlist. So put it down in the description box below and I'll talk about that more at the end of t

Now three more times in quick succession:

In [5]:
#process our file through a transcribe module with whisper-tiny active a second time in quick succession
pipeline_1.process(local_file_path='./test_files/about_Colombia.mp3',
                   verbose=False)

{'status_code': 200,
 'pipeline': 'my_transcribe_pipeline',
 'request_id': '411f3d36-a9a1-4569-b517-e48fab6e0625',
 'file_id': '97409406-8b56-4450-b2ea-5847d28b5635',
 'message': 'SUCCESS - output fetched for file_id 97409406-8b56-4450-b2ea-5847d28b5635.Output saved to location(s) listed in process_output_files.',
 'process_output': [{'transcript': " This episode, looking at the great country of Columbia, we looked at some really basic facts. It's name, a bit of its history, the type of people that live there, land size, and all that jazz. But in this video, we're going to go into a little bit more of a detailed look. Yo, what is going on guys? Welcome back to F2D facts. The channel where I look at people cultures and places, my name is Dave Wouple, and today we are going to be looking more at Columbia in our Columbia Part 2 video. Which just reminds me guys, this is part of our Columbia playlist. So put it down in the description box below and I'll talk about that more at the end of t

In [6]:
#process our file through a transcribe module with whisper-tiny active a third time in quick succession
pipeline_1.process(local_file_path='./test_files/about_Colombia.mp3',
                   verbose=False)

{'status_code': 200,
 'pipeline': 'my_transcribe_pipeline',
 'request_id': '53c6e04e-bb43-470f-b1dd-2f7cc4a3c750',
 'file_id': '9bb37086-f3ea-45dc-b8fd-a695f800f53e',
 'message': 'SUCCESS - output fetched for file_id 9bb37086-f3ea-45dc-b8fd-a695f800f53e.Output saved to location(s) listed in process_output_files.',
 'process_output': [{'transcript': " This episode, looking at the great country of Columbia, we looked at some really basic facts. It's name, a bit of its history, the type of people that live there, land size, and all that jazz. But in this video, we're going to go into a little bit more of a detailed look. Yo, what is going on guys? Welcome back to F2D facts. The channel where I look at people cultures and places, my name is Dave Wouple, and today we are going to be looking more at Columbia in our Columbia Part 2 video. Which just reminds me guys, this is part of our Columbia playlist. So put it down in the description box below and I'll talk about that more at the end of t

In [7]:
#process our file through a transcribe module with whisper-tiny active a fourth time in quick succession
pipeline_1.process(local_file_path='./test_files/about_Colombia.mp3',
                   verbose=False)

{'status_code': 200,
 'pipeline': 'my_transcribe_pipeline',
 'request_id': '3c6d495b-30d4-485e-b964-b434efc55eb7',
 'file_id': 'd400ac97-9082-46ea-a756-df91b322ca38',
 'message': 'SUCCESS - output fetched for file_id d400ac97-9082-46ea-a756-df91b322ca38.Output saved to location(s) listed in process_output_files.',
 'process_output': [{'transcript': " This episode, looking at the great country of Columbia, we looked at some really basic facts. It's name, a bit of its history, the type of people that live there, land size, and all that jazz. But in this video, we're going to go into a little bit more of a detailed look. Yo, what is going on guys? Welcome back to F2D facts. The channel where I look at people cultures and places, my name is Dave Wouple, and today we are going to be looking more at Columbia in our Columbia Part 2 video. Which just reminds me guys, this is part of our Columbia playlist. So put it down in the description box below and I'll talk about that more at the end of t

This article has been written in a [Jupyter notebook](https://jupyter.org/) that conveniently indicates total runtime for the code in each cell. We can thus see that, in the order above, runtimes are:

- 1st iteration: 35.0s
- 2nd iteration: 20.2s
- 3rd iteration: 20.1s
- 4th iteration: 20.7s

As expected, the model is progressively faster once the initial [cold start](https://en.wikipedia.org/wiki/Cold_start_(computing)) is overcome. It only takes one model invocation to warm up. The second, third, and fourth iterations essentially have the same time, allowing for a bit of variability.

On to the next model in our list, [whisper-base](https://huggingface.co/openai/whisper-base). As it is not the [default model](https://krixik-docs.readthedocs.io/en/latest/modules/ai_modules/transcribe_module/#available-models-in-the-transcribe-module) for this module, it must be specified in the code:

In [8]:
#process our file through a transcribe module with whisper-base as the active model
pipeline_1.process(local_file_path='./test_files/about_Colombia.mp3',
                   modules={'transcribe': {'model': 'whisper-base', 'params': {}}},
                   verbose=False)

{'status_code': 200,
 'pipeline': 'my_transcribe_pipeline',
 'request_id': '3f05daaa-6dbe-4885-bbea-7bcb2ab4204d',
 'file_id': '53de6791-e39e-4e99-9c30-9a7e497ec9dc',
 'message': 'SUCCESS - output fetched for file_id 53de6791-e39e-4e99-9c30-9a7e497ec9dc.Output saved to location(s) listed in process_output_files.',
 'process_output': [{'transcript': " episode looking at the great country of Columbia. We looked at some really just basic facts. Its name, a bit of its history, the type of people that lived there, land size and all that jazz. But in this video we're going to go into a little bit more of a detailed look. Yo, what is going on guys welcome back to F2DFacts. The channel where I look at people cultures and places my name is Dave Walpole and today we are going to be looking more at Columbia in our Columbia Part 2 video which just reminds me guys this is part of our Columbia playlist. I'll put it down there in the description box below and I'll talk about that more at the end of t

As expected, it takes a little longer to process the same file through the slightly larger model (almost twice as many parameters): 52.9s instead of 35.0s.

Also bear in mind that, given the way Krixik is architected, models are self-contained and do not warm each other up. In other words, despite the above being the fifth time we run this pipeline, it is the first time we run this model, and it thus undergoes a [cold start](https://en.wikipedia.org/wiki/Cold_start_(computing)) because of the selected model.

After repeating this excercise three more times in quick succession (code that we'll here exclude for brevity) we get the following results:

- 1st iteration: 52.9s
- 2nd iteration: 24.6s
- 3rd iteration: 24.3s
- 4th iteration: 30.3s

Let's fast-forward to the point where all four processes have been run for each of the five models. Our final results:

| iteration | whisper-tiny | whisper-base | whisper-small | whisper-medium | whisper-large-v3 |
| --- | --- | --- | --- | --- | --- |
|    1st    |     35.0s    |     52.9s    |    1m 18.0s   |    2m 23.5s    |     3m 56.5s     |
|    2nd    |     20.2s    |     24.6s    |    1m 0.1s    |    1m 35.9s    |     2m 34.3s     |
|    3rd    |     20.1s    |     24.3s    |     55.4s     |    1m 35.2s    |     2m 30.4s     |
|    4th    |     20.7s    |     30.3s    |     55.5s     |    1m 35.1s    |     2m 34.0s     |

The pattern we observed between [whisper-tiny](https://huggingface.co/openai/whisper-tiny) and [whisper-base](https://huggingface.co/openai/whisper-base) is repeated all the way through the table. The larger the model is, the longer it takes to process the same file. All models get faster after overcoming their cold start, but 'model warmth' being equal, the smaller model will always be faster.

[An important caveat: managed services, particularly serverless managed services, sometimes behave unpredictably. If time precision is important to you, make allowances for this. For instance, while writing this article, [whisper-small](https://huggingface.co/openai/whisper-small) times tripled for about a day and a half, and [whisper-large](https://huggingface.co/openai/whisper-large-v3) times up to *quintupled*. They eventually returned to normal, but for a little while the file was taking over 3.5 min to process with cold start on the former and 14m on the latter.]

The conclusion here is straightforward: if you seek speed and aren't terribly concerned about anything else, smaller models will always be faster. If you'd like to confirm that this holds true for other types of models, which not create a few [Krixik pipelines](https://krixik-docs.readthedocs.io/en/latest/) and give it a go?

We don't exist in a single-variable world, of course; other factors beside speed also matter (although speed is directly tied to the cost, which to many is the defining element). To read our analysis on another critical element, model accuracy, click here[LINK].

---