<a href="https://colab.research.google.com/github/JSchoonmaker/Research/blob/main/Copy_of_OpenAI's_Whisper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# OpenAI's Whisper

This is a simple example of using OpenAI's Whisper, a general-purpose speech recognition model.

In [None]:
!pip install git+https://github.com/openai/whisper.git
!pip install gradio

## Available models

I copied the following list of available models from [Whisper's repo](https://github.com/openai/whisper):

|  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
|  tiny  |    39 M    |     `tiny.en`      |       `tiny`       |     ~1 GB     |      ~32x      |
|  base  |    74 M    |     `base.en`      |       `base`       |     ~1 GB     |      ~16x      |
| small  |   244 M    |     `small.en`     |      `small`       |     ~2 GB     |      ~6x       |
| medium |   769 M    |    `medium.en`     |      `medium`      |     ~5 GB     |      ~2x       |
| large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |


In [None]:
import whisper
import gradio as gr

model = whisper.load_model("medium")

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


Moving 0 files to the new cache system


0it [00:00, ?it/s]

100%|██████████████████████████████████████| 1.42G/1.42G [00:10<00:00, 140MiB/s]


We are going to use two functions to **transcribe** and **translate** audio files. 

In [None]:
def transcribe(file):
    options = dict(task="transcribe", best_of=5)
    text = model.transcribe(file, **options)["text"]
    return text.strip()

def translate(file):
    options = dict(task="translate", best_of=5)
    text = model.transcribe(file, **options)["text"]
    return text.strip()

# Gradio interface

Here is a simple Gradio interface that you can use to record audio directly from your computer and a couple of buttons to transcribe and translate that audio.

This interface was inspired by this [HuggingFace's notebook](https://colab.research.google.com/drive/1xO45FeNFBYfN6GyyUr3nEa08S0iHnWKM?usp=sharing).

In [None]:
block = gr.Blocks()

with block:
    with gr.Group():
        audio = gr.Audio(
            show_label=False,
            source="microphone",
            type="filepath"
        )
        with gr.Box():
            with gr.Row().style(equal_height=True):
                transcribe_button = gr.Button("Transcribe")
                translate_button = gr.Button("Translate")
        
        textbox = gr.Textbox(show_label=False)
        
        transcribe_button.click(transcribe, inputs=[audio], outputs=[textbox])
        translate_button.click(translate, inputs=[audio], outputs=[textbox])
 
block.launch()

Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`
Running on public URL: https://13801.gradio.app

This share link expires in 72 hours. For free permanent hosting, check out Spaces: https://huggingface.co/spaces


(<gradio.routes.App at 0x7f45c3d1f1d0>,
 'http://127.0.0.1:7860/',
 'https://13801.gradio.app')

You can also upload individual audio files and transcribe or translate them using the functions directly.

In [None]:
print(transcribe("audio.m4a"))
print(transcribe("spanish.m4a"))
print(translate("spanish.m4a"))