# 🗣️ Speech and Audio Processing with Whisper

This notebook will guide you through using OpenAI's **Whisper model** for speech-to-text transcription. We will also deploy a **FastAPI backend** and a **Streamlit web app** for real-time transcription.

## 1️⃣ Install Dependencies
Install the required libraries for Whisper, FastAPI, and Streamlit.

In [1]:
!pip install openai-whisper ffmpeg numpy soundfile streamlit fastapi uvicorn

Collecting openai-whisper
  Downloading openai-whisper-20240930.tar.gz (800 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/800.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━[0m [32m593.9/800.5 kB[0m [31m18.4 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m800.5/800.5 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[0mCollecting ffmpeg
  Downloading ffmpeg-1.4.tar.gz (5.1 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting streamlit
  Downloading streamlit-1.42.2-py2.py3-none-any.whl.metadata (8.9 kB)
Collecting fastapi
  Downloading fastapi-0.115.8-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn
  Downloading uvicorn-0.34.0-py3-none-any.whl.metadata 

## 2️⃣ Load Whisper Model
We load the **Whisper model** (small version) for speech-to-text transcription.

In [2]:
import whisper

# Load Whisper model
model = whisper.load_model('small')

100%|███████████████████████████████████████| 461M/461M [00:09<00:00, 49.9MiB/s]
  checkpoint = torch.load(fp, map_location=device)


## 🛠️ Audio Preprocessing for Better Accuracy
We apply noise reduction before transcription.

In [3]:
!pip install noisereduce


Collecting noisereduce
  Downloading noisereduce-3.0.3-py3-none-any.whl.metadata (14 kB)
Downloading noisereduce-3.0.3-py3-none-any.whl (22 kB)
Installing collected packages: noisereduce
Successfully installed noisereduce-3.0.3


In [4]:
import librosa
import numpy as np
import noisereduce as nr
import soundfile as sf

def preprocess_audio(audio_path, output_path="processed_audio.wav"):
    y, sr = librosa.load(audio_path, sr=16000)
    reduced_noise = nr.reduce_noise(y=y, sr=sr)
    sf.write(output_path, reduced_noise, sr)
    return output_path


In [5]:
audio_path = "/content/sample_data/temp_audio.mp3"  # Change as needed
processed_audio_path = preprocess_audio(audio_path)

result = model.transcribe(processed_audio_path)
print("Transcription:", result['text'])


Transcription:  Hello, how are you?


## 🔍 Summarization of Transcribed Text
We use a transformer-based model to extract key insights.

In [6]:
from transformers import pipeline

summarizer = pipeline('summarization')

def summarize_text(text):
    return summarizer(text, max_length=100, min_length=30, do_sample=False)[0]['summary_text']

summary = summarize_text(result['text'])
print("Summary:", summary)



No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Device set to use cuda:0
Your max_length is set to 100, but your input_length is only 9. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=4)


Summary:  CNN.com will feature iReporter photos in a weekly Travel Snapshots gallery . Please submit your best shots for next week's gallery of snapshots of your favorite places in the gallery . Send your photos to CNN iReport .


## 3️⃣ Download and Transcribe an Example Audio File
Download an example **audio file** and process it using Whisper.

## 4️⃣ Upload and Transcribe Custom Audio
Upload an audio file (MP3, WAV, M4A) and transcribe it using Whisper.

In [7]:
import soundfile as sf
from google.colab import files

# Upload an audio file
uploaded = files.upload()
file_name = list(uploaded.keys())[0]

# Transcribe uploaded file
result = model.transcribe(file_name)
print('Transcription:', result['text'])

Saving Ward Pkwy.mp3 to Ward Pkwy.mp3
Transcription:  Hello, how are you?


## 5️⃣ Building a Web Application (FastAPI Backend)
Create a FastAPI backend to process uploaded audio files and return transcriptions.

In [8]:
!pip install python-multipart

Collecting python-multipart
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Downloading python_multipart-0.0.20-py3-none-any.whl (24 kB)
Installing collected packages: python-multipart
Successfully installed python-multipart-0.0.20


In [9]:
!pip install nest_asyncio




In [10]:
from fastapi import FastAPI, UploadFile, File
import whisper
import uvicorn

app = FastAPI()
model = whisper.load_model('small')

@app.post('/transcribe/')
async def transcribe_audio(file: UploadFile = File(...)):
    with open(file.filename, 'wb') as buffer:
        buffer.write(file.file.read())

    result = model.transcribe(file.filename)
    return {'transcription': result['text']}

import nest_asyncio
nest_asyncio.apply()

uvicorn.run(app, host='0.0.0.0', port=8000)


  checkpoint = torch.load(fp, map_location=device)
INFO:     Started server process [420]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [420]


## 6️⃣ Deploying a Streamlit Web App
We create a **Streamlit frontend** that allows users to upload audio files and transcribe them using our FastAPI backend.

In [18]:
!pip install streamlit openai torch sounddevice numpy scipy


Collecting sounddevice
  Downloading sounddevice-0.5.1-py3-none-any.whl.metadata (1.4 kB)
Downloading sounddevice-0.5.1-py3-none-any.whl (32 kB)
Installing collected packages: sounddevice
Successfully installed sounddevice-0.5.1


In [22]:
!pip install openai-whisper ffmpeg




In [23]:
%%writefile streamlit_app.py
import whisper
import streamlit as st
import os

st.title("🎤 Speech-to-Text Transcription")

# Load Whisper model
model = whisper.load_model("base")

# Upload audio file
uploaded_file = st.file_uploader("Upload an audio file", type=["wav", "mp3", "m4a"])

if uploaded_file:
    file_path = f"temp_audio.{uploaded_file.name.split('.')[-1]}"
    with open(file_path, "wb") as f:
        f.write(uploaded_file.getbuffer())

    st.success("File uploaded successfully!")

    # Transcribe audio
    st.info("Transcribing...")
    result = model.transcribe(file_path)
    st.subheader("Transcription:")
    st.write(result["text"])


Overwriting streamlit_app.py


## 7️⃣ Running the Streamlit Web App
Start the **Streamlit web application** by running the command below:

In [25]:
!pip install openai-whisper




In [26]:
!pip install pyngrok



In [28]:
from pyngrok import ngrok
import os

# Authenticate ngrok (only needed once)
ngrok.set_auth_token("2suy40kBbJbJZebBZ5sYNAQFc39_3bNiSzwj1aZ7u1R3JZjKw")

# Start Streamlit in the background
os.system("streamlit run streamlit_app.py --server.port 8501 &")

# Open an ngrok tunnel
public_url = ngrok.connect(8501)
print(f"Streamlit is live at: {public_url}")


Streamlit is live at: NgrokTunnel: "https://983f-34-125-146-111.ngrok-free.app" -> "http://localhost:8501"
