# Using Whisper for transcription and translation
This notebook provides a simple template for using OpenAI's Whisper for audio transcription in Google Colab.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1NhpG_iZSRxbENy8yXm5exiXyHcBIG34A)
## Install Whisper
Run the cell below to install Whisper.

The Python libraries `openai`, `cohere`, and `tiktoken` are also installed because of dependencies for the `llmx` library. That is because `llmx` relies on them to function correctly. Each of these libraries provides specific functionalities that `llmx` uses.

1. `openai`: This is the official Python library for the OpenAI API. It provides convenient access to the OpenAI REST API from any Python 3.7+ application. The library includes type definitions for all request parameters and response fields, and offers both synchronous and asynchronous clients powered by `httpx`.

2. `cohere`: The Cohere platform builds natural language processing and generation into your product with a few lines of code. It can solve a broad spectrum of natural language use cases, including classification, semantic search, paraphrasing, summarization, and content generation.

3. `tiktoken`: This is a fast Byte Pair Encoding (BPE) tokenizer for use with OpenAI's models. It's used to tokenize text into subwords, a necessary step before feeding text into many modern language models.

In [None]:
!pip install -q cohere openai tiktoken
!pip install -q "git+https://github.com/openai/whisper.git"
!pip install -q "git+https://github.com/garywu007/pytube.git"

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.2/52.2 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.7/226.7 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m29.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.9/76.9 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ...

In [None]:
import re
from pytube import YouTube

video_url = "https://youtu.be/5bs9XoTac88" #@param {type:"string"}
# episode_date = "20231220-" #@param {type:"string"}
drive_folder = "" #@param {type:"string"}

yt = YouTube(video_url)
episode_date = yt.publish_date.strftime('%Y%m%d-')
source_audio = drive_folder + episode_date + (re.sub('[^A-Za-z0-9 ]+', '', yt.title).replace(' ', '_')) + ".mp4"

audio_file = YouTube(video_url).streams.filter(only_audio=True).first().download(filename=source_audio)
print(f"Downloaded '{source_audio}")

Downloaded '20140328-1993_Procesador_Intel_i486_DX2_Anuncio_Asegrese_de_que_su_prximo_PC_lo_lleva_dentro_En_Espaol.mp4


In [None]:
import ipywidgets as widgets
widgets.Audio.from_file(audio_file, autoplay=False, loop=False)

Audio(value=b'\x00\x00\x00\x18ftypdash\x00\x00\x00\x00iso6mp41\x00\x00\x02imoov\x00\x00\x00lmvhd\x00\x00\x00\x…

In [None]:
import whisper
import torch

model = whisper.load_model("small")

audio = whisper.load_audio(audio_file)
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
audio_lang = max(probs, key=probs.get)
print(f"Detected language: {audio_lang}")

Detected language: es


In [None]:
# NLTK helps to split the transcription sentence by sentence
# and shows it in a neat manner one below another. You will see it in the output below.

import nltk
nltk.download('punkt')
from nltk import sent_tokenize

# decode the audio
options = whisper.DecodingOptions(fp16=torch.cuda.is_available(), language=audio_lang, task='transcribe')
result = whisper.decode(model, mel, options)

# print the recognized text
print("----\nTranscription from audio:")
for sent in sent_tokenize(result.text):
  print(sent)

# decode the audio
options = whisper.DecodingOptions(fp16=torch.cuda.is_available(), language=audio_lang, task='translate')
result = whisper.decode(model, mel, options)

# print the recognized text
print("----\nTranslation from audio:")
for sent in sent_tokenize(result.text):
  print(sent)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


----
Transcription from audio:
¿Quiere utilizar sus programas a toda velocidad?
¿Necesita algo dentro de su Pfe que le dé más potencia y que esté preparado para el software del futuro?
¿Necesita el microprocesador Intel 486 DX2?
Y como es de Intel, usted sabe que es compatible con todo tipo de software.
Intel 486 DX2, asegúrese de que su próximo Pfe lo lleva dentro.
----
Translation from audio:
Do you want to use your programs at full speed?
Do you need something inside your PC that gives you more power and that is prepared for the software of the future?
Do you need the micro processor Intel 486DX2?
And as it is Intel, you know that it is compatible with all kinds of software.
Intel 486DX2, make sure that your next PC takes it inside.
