Skip to content

Task4: Add OpenAI Whisper API transcriber support#34

Merged
Gambitnl merged 2 commits intomainfrom
claude/add-openai-transcriber-01SEE2crJqAcPvg42ZBJ8kGm
Nov 14, 2025
Merged

Task4: Add OpenAI Whisper API transcriber support#34
Gambitnl merged 2 commits intomainfrom
claude/add-openai-transcriber-01SEE2crJqAcPvg42ZBJ8kGm

Conversation

@Gambitnl
Copy link
Copy Markdown
Owner

No description provided.

Implements OpenAITranscriber as an alternative transcription backend alongside
existing FasterWhisper and Groq options.

Changes:
- Added OpenAITranscriber class in src/transcriber.py following the same
  pattern as GroqTranscriber
- Includes API key validation, preflight checks, and retry logic
- Updated TranscriberFactory to return OpenAITranscriber for 'openai' backend
- Added comprehensive test coverage for OpenAITranscriber
- Tests include API call mocking, segment parsing, and error handling

The implementation uses OpenAI's whisper-1 model with verbose JSON response
format to get segment and word-level timestamps, matching the capabilities
of other transcriber backends.
Updated documentation to reflect the new OpenAI transcriber backend:

- .env.example: Clarified WHISPER_BACKEND options and added note about cloud APIs
- README.md: Updated transcription options to list OpenAI as official cloud option
  instead of "fallback", and updated technology stack table
- CLOUD_INFERENCE_OPTIONS.md: Added comprehensive OpenAI Whisper API section with
  setup instructions, pricing ($0.006/min), features, and configuration examples

The documentation now properly presents all three transcription options:
1. Local (faster-whisper) - free, runs on your hardware
2. Groq API - free cloud option with generous tier
3. OpenAI API - paid cloud option with official support
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Gambitnl, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the application's transcription capabilities by integrating the OpenAI Whisper API. This new backend provides users with a high-quality, cloud-based option for audio transcription, complementing the existing local and Groq-based solutions. The changes include the implementation of a dedicated OpenAI transcriber, updates to configuration, and thorough documentation to guide users on setup and usage, offering greater flexibility and choice in transcription services.

Highlights

  • OpenAI Whisper API Support: Added full support for OpenAI Whisper API as a new transcription backend, offering a high-quality, cloud-based alternative.
  • New Transcriber Implementation: Introduced a new OpenAITranscriber class responsible for handling API key validation, temporary audio file management, and robust transcription logic with OpenAI's service.
  • Configuration and Documentation Updates: Updated configuration options in .env.example to include OpenAI as a WHISPER_BACKEND choice and enhanced documentation in CLOUD_INFERENCE_OPTIONS.md and README.md with detailed setup, pricing, features, and usage examples.
  • Comprehensive Unit Testing: Implemented comprehensive unit tests for the OpenAITranscriber to ensure correct functionality, API key handling, and accurate parsing of transcription results.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully adds support for the OpenAI Whisper API as a transcription backend. The changes include the core implementation in src/transcriber.py, comprehensive unit tests, and updates to documentation in README.md and CLOUD_INFERENCE_OPTIONS.md. The code is generally well-written and follows good practices like using try...finally for resource cleanup. My main feedback focuses on improving maintainability by reducing code duplication and making the preflight checks more robust. Overall, this is a great addition to the project.

Comment thread src/transcriber.py
)


class OpenAITranscriber(BaseTranscriber):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is significant code duplication between this new OpenAITranscriber and the existing GroqTranscriber. Both classes handle temporary file creation, API calls with file objects, response parsing, and cleanup in a very similar way. This duplication increases maintenance overhead, as any bug fix or improvement would need to be applied in two places.

Consider introducing a new abstract base class, for example CloudAPITranscriber, that inherits from BaseTranscriber and abstracts this common logic. This new base class could implement the transcribe_chunk method to handle the file I/O and response parsing, while leaving an abstract _make_api_call method for subclasses to implement.

This refactoring would greatly reduce code duplication and make the cloud-based transcribers easier to maintain and extend in the future.

Comment thread src/transcriber.py
Comment on lines +508 to +514
# Test API with minimal request
response = self.client.chat.completions.create(
messages=[{"role": "user", "content": "test"}],
model="gpt-3.5-turbo",
max_tokens=1,
)
self.logger.debug("OpenAI API preflight check passed")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The preflight check currently verifies the API key by making a call to the chat.completions endpoint. While this confirms the key is valid, it doesn't guarantee that it has the necessary permissions for the audio transcription API. A more robust check would be to test the audio.transcriptions endpoint directly.

You can do this by sending a very short, silent audio clip. This ensures that the key is not only authentic but also authorized for the specific service this class uses.

Suggested change
# Test API with minimal request
response = self.client.chat.completions.create(
messages=[{"role": "user", "content": "test"}],
model="gpt-3.5-turbo",
max_tokens=1,
)
self.logger.debug("OpenAI API preflight check passed")
# Test API with minimal audio request
import io
import soundfile as sf
import numpy as np
# Create a silent 0.1s audio clip to test the transcription endpoint directly
sample_rate = 16000
silent_audio = np.zeros(int(0.1 * sample_rate), dtype=np.float32)
buffer = io.BytesIO()
sf.write(buffer, silent_audio, sample_rate, format='WAV')
buffer.seek(0)
buffer.name = "preflight.wav" # The OpenAI SDK requires a name attribute on the file-like object
# Test API with minimal audio request
self.client.audio.transcriptions.create(
file=buffer,
model="whisper-1",
)
self.logger.debug("OpenAI API preflight check passed")

@Gambitnl Gambitnl merged commit 3042d98 into main Nov 14, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants