### Important Libraries

This notebook utilizes the following key libraries:

*   `os`: For interacting with the operating system, particularly for environment variables.
*   `gradio`: To create interactive web interfaces for machine learning models.
*   `whisper`: OpenAI's robust speech-to-text model for transcription.
*   `openai`: The official Python client for the OpenAI API, used here to interact with OpenRouter.
*   `pathlib`: For object-oriented filesystem paths.

In [1]:
!pip install -q openai-whisper

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/803.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m803.2/803.2 kB[0m [31m40.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for openai-whisper (pyproject.toml) ... [?25l[?25hdone


In [5]:
import os
import gradio as gr
import whisper
import openai
from pathlib import Path
from openai import OpenAI # Import the OpenAI client class

### Configure OpenRouter API

This section sets up the OpenRouter API for accessing various LLMs. Make sure to replace `<YOUR_OPENROUTER_KEY>` with your actual API key. You can also use Colab's secrets manager for more secure handling of your API key.

In [6]:
# Configure OpenRouter
from google.colab import userdata

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key=userdata.get('OPENROUTER_API_KEY'),
)

### Load Whisper Model

This loads the `base` Whisper model for speech-to-text transcription. Depending on your needs and available resources, you might choose a larger model (e.g., `small`, `medium`, `large`) for better accuracy, or a smaller one (e.g., `tiny`) for faster inference.

In [4]:
model = whisper.load_model("base")

100%|████████████████████████████████████████| 139M/139M [00:01<00:00, 110MiB/s]


### Define Transcription and Summarization Function

This Python function `transcribe_and_summarize_or_translate` takes an audio file, transcribes it using Whisper, and optionally summarizes the transcript using an LLM via OpenRouter. It also detects the language and prepares a downloadable text file.

In [7]:
def transcribe_and_summarize_or_translate(audio, summarize, target_language=None):
    # Step 1: Whisper transcription
    result = model.transcribe(audio)
    transcript = result['text']
    detected_language = result.get('language', 'unknown')

    summary = None
    if summarize:
        # Example: Summarization via LLM (through OpenRouter) using the new client
        resp = client.chat.completions.create(
            model="openai/gpt-oss-20b:free",  # or other model available on OpenRouter
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": f"Please summarize the following text:\n{transcript}"}
            ],
            temperature=0.7,
            max_tokens=400
        )
        summary = resp.choices[0].message.content

    # Build download text
    output_text = transcript
    if summary:
        output_text += "\n\n--- Summary ---\n" + summary

    download_path = Path("transcript.txt")
    download_path.write_text(output_text, encoding="utf-8")

    return transcript, summary or "", str(download_path)

### Launch Gradio Interface

This section creates and launches the Gradio web interface. It allows you to record audio directly from your microphone or upload an audio file, and then process it through the defined function. You can also choose to generate a summary.

In [11]:
iface = gr.Interface(
    fn=transcribe_and_summarize_or_translate,
    inputs=[
        gr.Audio(type="filepath", label="Record or Upload Audio"),
        gr.Checkbox(label="Summarize Transcript?", value=False)
    ],
    outputs=[
        gr.Textbox(label="Transcript"),
        gr.Textbox(label="Summary"),
        gr.File(label="Download Transcript")
    ],
    title="WhisperNotes APP",
    description="Transcribe speech with Whisper; optionally summarize using any LLM via OpenRouter"
)

iface.launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://055045d30f37764314.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


