# Learn OpenAI Whisper - Chapter 7
## Notebook 1: Quantizing Whisper with Ctranslate2 and running inference with Faster-Whisper

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1lFKZCc-mDIf8xH_v7_M1m1hfA-Ke772d)

This notebook outlines a comprehensive process for quantizing the Whisper model using [CTranslate2](https://opennmt.net/CTranslate2/guides/transformers.html#whisper), a library designed for efficient inference with transformer models. This process is crucial for deploying Automated Speech Recognition (ASR) models like Whisper in environments where computational resources are limited.

![ch07_1-quantizing-whisper-with-ctranslate2.png](https://raw.githubusercontent.com/PacktPublishing/Learn-OpenAI-Whisper/main/Chapter07/ch07_1-quantizing-whisper-with-ctranslate2.png)

## Prerequisites



In [None]:
from huggingface_hub import notebook_login

notebook_login()

In [None]:
# Verify authentication
from huggingface_hub import whoami
whoami()
# you should see something like {'type': 'user',  'id': '...',  'name': 'Wauplin', ...}

### 1.	Installing libraries:

The code begins with installing ctranslate2, transformers, and faster-whisper.

These libraries are essential for quantization and leveraging the Whisper model's capabilities.


In [None]:
!pip -q install ctranslate2
!pip -q install transformers[torch]>=4.23
!pip -q install faster-whisper

### 2.	Downloading sample audio files
Two are downloaded from our GitHub repository to test the Whisper model's transcription capabilities.

In [None]:
!wget -nv https://github.com/PacktPublishing/Learn-OpenAI-Whisper/raw/main/Chapter01/Learn_OAI_Whisper_Sample_Audio01.mp3
!wget -nv https://github.com/PacktPublishing/Learn-OpenAI-Whisper/raw/main/Chapter01/Learn_OAI_Whisper_Sample_Audio02.mp3

### 3.	Preprocessing audio files
The audio files are loaded and resampled to a sampling frequency of 16,000 Hz using librosa. This step is crucial for ensuring that the audio data is in the correct format for processing by the Whisper model.

In [None]:
import ctranslate2
from IPython.display import Audio
import librosa
import transformers
# Load and resample the audio file.
sampling_frequency = 16000
audio, _ = librosa.load("Learn_OAI_Whisper_Sample_Audio01.mp3", sr=sampling_frequency, mono=True)
Audio(audio, rate=sampling_frequency)

In [None]:
import torch
this_device = "cuda" if torch.cuda.is_available() else "cpu"

### 4.	Converting to CTranslate2 format:
In this step, we convert the Whisper models `openai/whisper-tiny` and `openai/whisper-base` to the CTranslate2 format, a more efficient inference format.

In [None]:
!ct2-transformers-converter --force --model openai/whisper-tiny --output_dir whisper-tiny-ct2

In [None]:
!ct2-transformers-converter --force --model openai/whisper-base --output_dir whisper-base-ct2

### 5.	Performing quantization
The models are then quantized to an 8-bit integer format (int8)

In [None]:
!ct2-transformers-converter --force --model openai/whisper-tiny --output_dir whisper-tiny-ct2-int8 \
--copy_files tokenizer.json preprocessor_config.json --quantization int8

In [None]:
!ct2-transformers-converter --force --model openai/whisper-base --output_dir whisper-base-ct2-int8 \
--copy_files tokenizer.json preprocessor_config.json --quantization int8

### 6. Detecting language
The quantized model detects the language of the provided audio samples

In [None]:
# Load the model on device
model = ctranslate2.models.Whisper("whisper-tiny-ct2-int8", device=this_device)

In [None]:
processor = transformers.WhisperProcessor.from_pretrained("openai/whisper-tiny")
inputs = processor(audio, return_tensors="np", sampling_rate=sampling_frequency)
features = ctranslate2.StorageView.from_array(inputs.input_features)

Compute and display the features of the first 30 seconds of audio.

In [None]:
# Detect the language.
results = model.detect_language(features)
language, probability = results[0][0]
print("Detected language %s with probability %f" % (language, probability))

### 7.	Transcribing audio files
The quantized model generates transcriptions for the audio samples

In [None]:
# Describe the task in the prompt.
# See the prompt format in https://github.com/openai/whisper.
prompt = processor.tokenizer.convert_tokens_to_ids(
    [
        "<|startoftranscript|>",
        language,
        "<|transcribe|>",
        "<|notimestamps|>",  # Remove this token to generate timestamps.
    ]
)

In [None]:
# Load the model on device
model = ctranslate2.models.Whisper("whisper-tiny-ct2-int8", device=this_device)

In [None]:
# Run generation for the 30-second window.
results = model.generate(features, [prompt])
transcription = processor.decode(results[0].sequences_ids[0])
print(transcription)

### 8.	Evaluating performance
 After the audio transcription, the code evaluates the performance of the quantized model, such as measuring the time taken for transcription.

In [None]:
# Load and resample the audio file.
sampling_frequency = 16000
audio, _ = librosa.load("Learn_OAI_Whisper_Sample_Audio02.mp3", sr=sampling_frequency, mono=True)
Audio(audio, rate=sampling_frequency)

In [None]:
from faster_whisper import WhisperModel
import time
import datetime

model_size = "whisper-tiny-ct2"
model = WhisperModel(model_size, device=this_device, compute_type="int8")
segments, info = model.transcribe("Learn_OAI_Whisper_Sample_Audio02.mp3", beam_size=5)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

start = time.time()
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
# Print the end time and the delta in seconds and fractions of a second.
end = time.time()
print('start: ', start)
print('end: ', end)
print('delta: ', end - start)
print('delta: ', datetime.timedelta(seconds=end - start))

In [None]:
from faster_whisper import WhisperModel
import time
import datetime

model_size = "whisper-tiny-ct2-int8"
model = WhisperModel(model_size, device=this_device, compute_type="int8")
segments, info = model.transcribe("Learn_OAI_Whisper_Sample_Audio02.mp3", beam_size=5)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

start = time.time()
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
# Print the end time and the delta in seconds and fractions of a second.
end = time.time()
print('start: ', start)
print('end: ', end)
print('delta: ', end - start)
print('delta: ', datetime.timedelta(seconds=end - start))

In [None]:
from faster_whisper import WhisperModel
import time
import datetime

model_size = "whisper-base-ct2"
model = WhisperModel(model_size, device=this_device, compute_type="int8")
segments, info = model.transcribe("Learn_OAI_Whisper_Sample_Audio02.mp3", beam_size=5)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

start = time.time()
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
# Print the end time and the delta in seconds and fractions of a second.
end = time.time()
print('start: ', start)
print('end: ', end)
print('delta: ', end - start)
print('delta: ', datetime.timedelta(seconds=end - start))

In [None]:
from faster_whisper import WhisperModel
import time
import datetime

model_size = "whisper-base-ct2-int8"
model = WhisperModel(model_size, device=this_device, compute_type="int8")
segments, info = model.transcribe("Learn_OAI_Whisper_Sample_Audio02.mp3", beam_size=5)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

start = time.time()
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
# Print the end time and the delta in seconds and fractions of a second.
end = time.time()
print('start: ', start)
print('end: ', end)
print('delta: ', end - start)
print('delta: ', datetime.timedelta(seconds=end - start))