# Whisper Transcription via OpenAI Client (OpenAI‑compatible endpoint)

This notebook demonstrates using the **OpenAI Python SDK** against an **OpenAI-compatible Whisper** service (e.g., your in-cluster service in OpenShift) to transcribe local audio files.

**Highlights**
- Uses `OpenAI` client (`openai` package) instead of manual `requests`.
- Works with an endpoint that implements `/v1/audio/transcriptions`.
- Small ipywidgets UI to browse, play, and transcribe files.


In [1]:
# Optional: install dependencies if needed
# You may already have these in your environment. Uncomment if necessary.
# %pip install --quiet --upgrade openai ipywidgets
# %pip install --quiet --upgrade soundfile  # sometimes required for Audio playback backends

# If running in JupyterLab, enable widgets once (restart kernel might be required):
# %pip install --quiet jupyterlab-widgets ipywidgets
# %jupyter nbextension enable --py widgetsnbextension

## 1) Configuration
Adjust the host/paths to match your environment. Your service should be reachable from where this notebook runs.

In [2]:
from pathlib import Path
import os

# === CONFIG ===
WHISPER_HOST = "http://whisper-large-v3-predictor.whisper-proj.svc.cluster.local:8080"  # cluster-internal service
WHISPER_MODEL = "whisper-large-v3"  # your model identifier

# Local audio directory to browse:
LOCAL_AUDIO_DIR = Path("/opt/app-root/src/audio_data/")

# Audio extensions to include:
AUDIO_EXTS = (".wav", ".mp3", ".flac", ".ogg", ".m4a", ".aac")

# If your service expects a bearer token, set it here or via environment variable OPENAI_API_KEY
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "")

## 2) Imports & Utilities

In [3]:
import os
from typing import List, Any, Dict
from IPython.display import Audio, display, clear_output
import ipywidgets as widgets
from openai import OpenAI

def list_local_audio_files(directory: Path, exts=AUDIO_EXTS) -> List[Path]:
    directory = Path(directory)
    return sorted([p for p in directory.glob("*") if p.suffix.lower() in exts and p.is_file()])

## 3) OpenAI client setup
We point the client at the **OpenAI-compatible** base URL. Note we include `/v1` in the base URL.

In [4]:
OPENAI_BASE_URL = f"{WHISPER_HOST}/v1"

client = OpenAI(
    base_url=OPENAI_BASE_URL,
    api_key=OPENAI_API_KEY,  # if your gateway ignores auth, empty string is fine
)

## 4) Transcription helper (OpenAI client)
This function calls `client.audio.transcriptions.create(...)`. You can pass optional Whisper parameters via `**extra`.

In [5]:
def transcribe_with_openai_client(
    local_audio_path: Path,
    model_name: str = WHISPER_MODEL,
    **extra: Dict[str, Any],
) -> str | dict:
    """
    Uses the OpenAI Python SDK against an OpenAI-compatible Whisper endpoint.
    Returns the 'text' when available; otherwise returns the raw object for inspection.
    You can pass extras (e.g., language="de", response_format="verbose_json", temperature=0, prompt="...").
    """
    with open(local_audio_path, "rb") as f:
        resp = client.audio.transcriptions.create(
            model=model_name,
            file=f,
            **extra
        )

    # Many servers return an object with a .text field (OpenAI style)
    text = getattr(resp, "text", None)
    if text is not None:
        return text

    # Some adapters return plain dicts
    if isinstance(resp, dict):
        if "text" in resp:
            return resp["text"]
        if "choices" in resp and resp["choices"]:
            choice = resp["choices"][0]
            if isinstance(choice, dict) and "text" in choice:
                return choice["text"]

    return resp  # fallback

## 5) UI widgets
A small UI to:
- choose a folder
- refresh file list
- play a selected file
- transcribe via the OpenAI client

In [6]:
# Inputs
dir_text = widgets.Text(value=str(LOCAL_AUDIO_DIR), description="Folder:", layout=widgets.Layout(width="60%"))
refresh_btn = widgets.Button(description="Refresh", icon="refresh")
file_dd = widgets.Dropdown(options=[], description="File:", layout=widgets.Layout(width="70%"))

# Actions
play_btn = widgets.Button(description="Play", icon="play")
transcribe_btn_openai = widgets.Button(description="Transcribe (OpenAI client)", icon="microphone")

# Outputs
status_out = widgets.Output()
audio_out = widgets.Output()
text_out = widgets.Output()

def refresh_files(_=None):
    folder = Path(dir_text.value).expanduser()
    files = list_local_audio_files(folder)
    file_dd.options = files
    with status_out:
        clear_output(wait=True)
        if files:
            print(f"Found {len(files)} audio file(s) in {folder}")
        else:
            print(f"No audio files found in {folder}")

def play_audio(_=None):
    sel = file_dd.value
    if not sel:
        return
    with audio_out:
        clear_output(wait=True)
        display(Audio(filename=str(sel), autoplay=False))

def run_transcription_openai(_=None):
    sel = file_dd.value
    if not sel:
        return
    with status_out:
        clear_output(wait=True)
        print(f"Transcribing (OpenAI client): {Path(sel).name}")
    try:
        txt = transcribe_with_openai_client(
            Path(sel),
            # Optional extras:
            # language="sv",  # force language if desired
            # response_format="verbose_json",  # or "text", "json", "srt", "vtt"
            # temperature=0,
            # prompt="Domain-specific hints here",
        )
        with text_out:
            clear_output(wait=True)
            print("=== Transcript (OpenAI client) ===")
            print(txt if isinstance(txt, str) else str(txt))
        with status_out:
            clear_output(wait=True)
            print("Done.")
    except Exception as e:
        with text_out:
            clear_output(wait=True)
            print("Transcription failed (OpenAI client):", e)

refresh_btn.on_click(refresh_files)
play_btn.on_click(play_audio)
transcribe_btn_openai.on_click(run_transcription_openai)

# Render the UI
display(widgets.HBox([dir_text, refresh_btn]))
display(file_dd)
display(widgets.HBox([play_btn, transcribe_btn_openai]))
display(status_out, audio_out, text_out)

# Initial populate
refresh_files()

HBox(children=(Text(value='/opt/app-root/src/audio_data', description='Folder:', layout=Layout(width='60%')), …

Dropdown(description='File:', layout=Layout(width='70%'), options=(), value=None)

HBox(children=(Button(description='Play', icon='play', style=ButtonStyle()), Button(description='Transcribe (O…

Output()

Output()

Output()

## 6) Quick smoke test cell (optional)
Runs transcription on the currently selected file, if any.

In [7]:
test_file = file_dd.value or (list_local_audio_files(Path(dir_text.value))[:1] or [None])[0]
if test_file:
    print(f"Transcribing via OpenAI client: {Path(test_file).name}")
    try:
        txt = transcribe_with_openai_client(Path(test_file))
        print("=== Transcript ===")
        print(txt if isinstance(txt, str) else str(txt))
    except Exception as e:
        print("Transcription failed:", e)
else:
    print("No audio file selected/found for quick test.")

Transcribing via OpenAI client: Speaker_0007_00000.wav
=== Transcript ===
 Hey everyone, this is Reshma from Edureka and today we'll be learning what is Ansible. Thank you all the attendees for joining today's session. So let's get started with it. First let us look at the topics that we'll be learning today. Well it's quite a long list, it means we'll be learning a lot of things today. Let us take a look at them one by one. So first we'll see the problems that were before configuration management and how configuration configuration management help to solve it. We'll see what Ansible is and the different features of Ansible. After that, we'll see how NASA has implemented Ansible to solve all their problems. After that, we'll see how we can use Ansible for orchestration, provisioning, configuration management, application deployment, and security. And in the end, we'll write some Ansible playbooks to install LAMP stack on my node machine and host a website in my node machine.


## 7) (Optional) Async variant
If you prefer `asyncio`, you can use `AsyncOpenAI`. Uncomment the cell content to use.

In [8]:
# from openai import AsyncOpenAI
# import asyncio
# async_client = AsyncOpenAI(base_url=OPENAI_BASE_URL, api_key=OPENAI_API_KEY)
#
# async def atranscribe_with_openai_client(local_audio_path: Path, model_name: str = WHISPER_MODEL, **extra):
#     with open(local_audio_path, "rb") as f:
#         resp = await async_client.audio.transcriptions.create(
#             model=model_name,
#             file=f,
#             **extra
#         )
#     return getattr(resp, "text", resp)
#
# # Example usage:
# # await atranscribe_with_openai_client(Path(file_dd.value), language="sv")

### Notes
- Ensure this environment can resolve and reach `whisper-...svc.cluster.local:8080`.
- If your service requires auth, set `OPENAI_API_KEY` (env var or in the config cell).
- Many Whisper adapters support extras like `language`, `temperature`, `prompt`, and `response_format` (`text`, `json`, `verbose_json`, `srt`, `vtt`).