**Whisper demo**

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification. This demo cuts audio after around 30 secs.



### 1. Install dependencies

In [None]:
!git clone https://huggingface.co/spaces/openai/whisper
%cd whisper
!pip install -r requirements.txt
!pip install gradio

In [None]:
!nvidia-smi

In [None]:
!pip uninstall triton
!pip install triton==2.0.0

Found existing installation: triton 2.1.0
Uninstalling triton-2.1.0:
  Would remove:
    /usr/local/lib/python3.10/dist-packages/triton-2.1.0.dist-info/*
    /usr/local/lib/python3.10/dist-packages/triton/*
Proceed (Y/n)? 

In [None]:
!pip uninstall triton
!pip install openai-whisper==20230918

Found existing installation: triton 2.1.0
Uninstalling triton-2.1.0:
  Would remove:
    /usr/local/lib/python3.10/dist-packages/triton-2.1.0.dist-info/*
    /usr/local/lib/python3.10/dist-packages/triton/*
Proceed (Y/n)? Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/base_command.py", line 169, in exc_logging_wrapper
    status = run_func(*args)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/commands/uninstall.py", line 105, in run
    uninstall_pathset = req.uninstall(
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/req/req_install.py", line 680, in uninstall
    uninstalled_pathset.remove(auto_confirm, verbose)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/req/req_uninstall.py", line 375, in remove
    if auto_confirm or self._allowed_to_proceed(verbose):
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/req/req_uninstall.py", line 415, in _allowed_to_proceed
    return ask

In [None]:
!pip install --upgrade git+https://github.com/huggingface/diffusers.git transformers accelerate scipy xformers gradio -q

In [None]:
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler
import torch

In [None]:
model_id = "stabilityai/stable-diffusion-2"

# Use the Euler scheduler here instead
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

### 2. Run

Run this cell to show the gradio app

In [None]:
pip install -U openai-whisper

In [None]:
import gradio as gr
import whisper

model = whisper.load_model("large")

def inference(audio):
    audio = whisper.load_audio(audio)
    audio = whisper.pad_or_trim(audio)

    mel = whisper.log_mel_spectrogram(audio).to(model.device)

    _, probs = model.detect_language(mel)

    options = whisper.DecodingOptions(fp16=False)
    result = whisper.decode(model, mel, options)

    return result.text

def txt2img(prompt):
    # Generate an image using the text as a prompt
    image = pipe(prompt, height=768, width=768, guidance_scale=10).images[0]

    # Save the image as "sd_image.png"
    image.save("sd_image.png")

    return image

def audio_to_image(audio):
    # Convert audio to text using inference
    text = inference(audio)

    # Generate an image using the text as a prompt
    image = txt2img(text)
    image2 = txt2img(text)
    image3 = txt2img(text)

    return text, image , image2 , image3

# Create a Gradio interface with microphone input
iface = gr.Interface(
    fn=audio_to_image,
    inputs=gr.Audio(label="Input Audio", show_label=True, source="microphone", type="filepath"),
    outputs=[gr.Textbox(show_label=False, elem_id="result-textarea"), "image" ,  "image", "image"],
)

# Launch the Gradio interface
iface.launch()