
# 🚀  Applied Deep Learning

**Topic:** Create a transcript for podcast of specific topics with whisper


## Load the model

In [1]:

import whisper

model = whisper.load_model("base")


AttributeError: module 'whisper' has no attribute 'load_model'

## Check we have a GPU

You should see the output `device(type='cuda', index=0)` below. If you don't, you may be on a CPU-only Colab instance which will run more slowly. Go to `Runtime->Change Runtime Type` to fix this.

In [None]:
model.device

## Download Test Audio Files

Here we have downloaded some podcasts episode from Apple podcast from a very good podcast called Ologies with Alie Ward

In [4]:
from IPython.display import Audio
Audio("Assignment_3/Audio files/default.mp3_ywr3ahjkcgo_c5cfa38d9e841f4e8d21d628f6afcd3f_87957109.mp3")

ValueError: rate must be specified when data is a numpy array or list of audio samples.

In [None]:
from IPython.display import Audio
Audio("/Users/yasminesarraj/Documents/GitHub/M3-Assignment-Deep-Learning/Assignment_3/Audio files/default.mp3_ywr3ahjkcgo_c5cfa38d9e841f4e8d21d628f6afcd3f_87957109.mp3")

## Define the Transcribe Function

Now we've loaded the model, and have the code, this is the function that takes an audio file path as an input and returns the recognized text (and logs what it thinks the language is).

In [None]:
def transcribe(audio):
    
    # load audio and pad/trim it to fit 30 seconds
    audio = whisper.load_audio(audio)
    audio = whisper.pad_or_trim(audio)

    # make log-Mel spectrogram and move to the same device as the model
    mel = whisper.log_mel_spectrogram(audio).to(model.device)

    # detect the spoken language
    _, probs = model.detect_language(mel)
    print(f"Detected language: {max(probs, key=probs.get)}")

    # decode the audio
    options = whisper.DecodingOptions()
    result = whisper.decode(model, mel, options)
    return result.text


In [None]:
import gradio as gr 
import time

In [None]:

gr.Interface(
    title = 'OpenAI Whisper ASR Gradio Web UI', 
    fn=transcribe, 
    inputs=[
        gr.inputs.Audio(source="microphone", type="filepath")
    ],
    outputs=[
        "textbox"
    ],
    live=True).launch()