# Whisper ASR: Real-time speech recognition with Gradio



In [None]:
#from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#cd '/content/drive/MyDrive/Colab Notebooks/Be10X/Be10X'

/content/drive/MyDrive/Colab Notebooks/Be10X/Be10X


## Library Installation

In [None]:
! pip install -q h5py


In [None]:
! pip install  -q typing-extensions


In [None]:
! pip install -q wheel

In [None]:

! pip install -q -U openai-whisper

## Load the Model

In [None]:

import whisper
model = whisper.load_model("base")


## Download Test Audio Files

This repository has a couple of pre-recorded MP3s to run through the transcribe function. You can listen to them with the audio widgets displayed below.

In [None]:
from IPython.display import Audio

Audio("/content/drive/MyDrive/Colab Notebooks/Be10X/Be10X/mixkit-male-deep-voice-countdown-925.wav")

Output hidden; open in https://colab.research.google.com to view.

In [None]:
from IPython.display import Audio

Audio("/content/drive/MyDrive/Colab Notebooks/Be10X/Be10X/robot-voice-drop-the-bass-82798.mp3")

## Transcribe Function

Now that we have loaded the model and have the code, we can define a function that takes an audio file path as an input and returns the recognized text, as well as logging what it thinks the language is.

In [None]:
def transcribe(voice):

    # load audio
    voice = whisper.load_audio(voice)
    voice = whisper.pad_or_trim(voice)

    # make log-Mel spectrogram and move to the same device as the model
    sig_freq = whisper.log_mel_spectrogram(voice).to(model.device)

    # detect the spoken language
    _, s_lang = model.detect_language(sig_freq)
    print(f"Detected language: {max(s_lang, key=s_lang.get)}")

    # decode the audio
    selections = whisper.DecodingOptions(fp16 = False)
    output = whisper.decode(model, sig_freq, selections)
    return output.text


## Conduct a test with pre-recorded audio



 we will run the transcribe() function on a couple of wav & MP3 files that we have downloaded. The output for 'mixkit-male-deep-voice-countdown-925.wav.mp3', which I recorded as an example of clear audio, should be "10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0".

The second audio file is more challenging to transcribe because the audio is very distorted. However, the model does a good job of transcribing the phrase "Drop the Bass."

In [None]:
print('===========================================')
print('******************* Clear Audio Output ***********************')
easy_text = transcribe("/content/drive/MyDrive/Colab Notebooks/Be10X/Be10X/mixkit-male-deep-voice-countdown-925.wav")


print(easy_text)
print('===========================================')
print('******************** Distorted Audio Output ********************')

hard_text = transcribe("/content/drive/MyDrive/Colab Notebooks/Be10X/Be10X/robot-voice-drop-the-bass-82798.mp3")
print(hard_text)
print('===========================================')

******************* Clear Audio Output ***********************
Detected language: en
10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0.
******************** Distorted Audio Output ********************
Detected language: en
Drop the bass.


# Launching the user interface  to record your own live audio

## Install the Web UI Toolkit

Gradio will be our tool for creating the widgets that we need for audio recording.

In [None]:
! pip install gradio -q

In [None]:
import gradio as gr
import time

## Web Interface

After running this script, you should see two widgets below that you can use to record live audio and see the transcription, as described in the introduction.

In [None]:

gr.Interface(
    title = 'Web-based Speech recognition with OpenAI Whisper and Gradio',
    fn=transcribe,
    inputs=[
        gr.inputs.Audio(source="microphone", type="filepath")
    ],
    outputs=[
        "textbox"
    ],
    live=True).launch()

  super().__init__(source=source, type=type, label=label, optional=optional)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Note: opening Chrome Inspector may crash demo inside Colab notebooks.

To create a public link, set `share=True` in `launch()`.


<IPython.core.display.Javascript object>

