Task4: Add OpenAI Whisper API transcriber support by Gambitnl · Pull Request #34 · Gambitnl/Video_chunking

Gambitnl · 2025-11-14T14:48:38Z

No description provided.

Implements OpenAITranscriber as an alternative transcription backend alongside existing FasterWhisper and Groq options. Changes: - Added OpenAITranscriber class in src/transcriber.py following the same pattern as GroqTranscriber - Includes API key validation, preflight checks, and retry logic - Updated TranscriberFactory to return OpenAITranscriber for 'openai' backend - Added comprehensive test coverage for OpenAITranscriber - Tests include API call mocking, segment parsing, and error handling The implementation uses OpenAI's whisper-1 model with verbose JSON response format to get segment and word-level timestamps, matching the capabilities of other transcriber backends.

Updated documentation to reflect the new OpenAI transcriber backend: - .env.example: Clarified WHISPER_BACKEND options and added note about cloud APIs - README.md: Updated transcription options to list OpenAI as official cloud option instead of "fallback", and updated technology stack table - CLOUD_INFERENCE_OPTIONS.md: Added comprehensive OpenAI Whisper API section with setup instructions, pricing ($0.006/min), features, and configuration examples The documentation now properly presents all three transcription options: 1. Local (faster-whisper) - free, runs on your hardware 2. Groq API - free cloud option with generous tier 3. OpenAI API - paid cloud option with official support

gemini-code-assist · 2025-11-14T14:48:58Z

Summary of Changes

Hello @Gambitnl, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the application's transcription capabilities by integrating the OpenAI Whisper API. This new backend provides users with a high-quality, cloud-based option for audio transcription, complementing the existing local and Groq-based solutions. The changes include the implementation of a dedicated OpenAI transcriber, updates to configuration, and thorough documentation to guide users on setup and usage, offering greater flexibility and choice in transcription services.

Highlights

OpenAI Whisper API Support: Added full support for OpenAI Whisper API as a new transcription backend, offering a high-quality, cloud-based alternative.
New Transcriber Implementation: Introduced a new OpenAITranscriber class responsible for handling API key validation, temporary audio file management, and robust transcription logic with OpenAI's service.
Configuration and Documentation Updates: Updated configuration options in .env.example to include OpenAI as a WHISPER_BACKEND choice and enhanced documentation in CLOUD_INFERENCE_OPTIONS.md and README.md with detailed setup, pricing, features, and usage examples.
Comprehensive Unit Testing: Implemented comprehensive unit tests for the OpenAITranscriber to ensure correct functionality, API key handling, and accurate parsing of transcription results.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request successfully adds support for the OpenAI Whisper API as a transcription backend. The changes include the core implementation in src/transcriber.py, comprehensive unit tests, and updates to documentation in README.md and CLOUD_INFERENCE_OPTIONS.md. The code is generally well-written and follows good practices like using try...finally for resource cleanup. My main feedback focuses on improving maintainability by reducing code duplication and making the preflight checks more robust. Overall, this is a great addition to the project.

gemini-code-assist · 2025-11-14T14:50:11Z

        )


+class OpenAITranscriber(BaseTranscriber):


There is significant code duplication between this new OpenAITranscriber and the existing GroqTranscriber. Both classes handle temporary file creation, API calls with file objects, response parsing, and cleanup in a very similar way. This duplication increases maintenance overhead, as any bug fix or improvement would need to be applied in two places.

Consider introducing a new abstract base class, for example CloudAPITranscriber, that inherits from BaseTranscriber and abstracts this common logic. This new base class could implement the transcribe_chunk method to handle the file I/O and response parsing, while leaving an abstract _make_api_call method for subclasses to implement.

This refactoring would greatly reduce code duplication and make the cloud-based transcribers easier to maintain and extend in the future.

gemini-code-assist · 2025-11-14T14:50:11Z

+            # Test API with minimal request
+            response = self.client.chat.completions.create(
+                messages=[{"role": "user", "content": "test"}],
+                model="gpt-3.5-turbo",
+                max_tokens=1,
+            )
+            self.logger.debug("OpenAI API preflight check passed")


The preflight check currently verifies the API key by making a call to the chat.completions endpoint. While this confirms the key is valid, it doesn't guarantee that it has the necessary permissions for the audio transcription API. A more robust check would be to test the audio.transcriptions endpoint directly.

You can do this by sending a very short, silent audio clip. This ensures that the key is not only authentic but also authorized for the specific service this class uses.

Suggested change

# Test API with minimal request

response = self.client.chat.completions.create(

messages=[{"role": "user", "content": "test"}],

model="gpt-3.5-turbo",

max_tokens=1,

)

self.logger.debug("OpenAI API preflight check passed")

# Test API with minimal audio request

import io

import soundfile as sf

import numpy as np

# Create a silent 0.1s audio clip to test the transcription endpoint directly

sample_rate = 16000

silent_audio = np.zeros(int(0.1 * sample_rate), dtype=np.float32)

buffer = io.BytesIO()

sf.write(buffer, silent_audio, sample_rate, format='WAV')

buffer.seek(0)

buffer.name = "preflight.wav" # The OpenAI SDK requires a name attribute on the file-like object

# Test API with minimal audio request

self.client.audio.transcriptions.create(

file=buffer,

model="whisper-1",

)

self.logger.debug("OpenAI API preflight check passed")

claude added 2 commits November 14, 2025 00:42

gemini-code-assist Bot reviewed Nov 14, 2025

View reviewed changes

Gambitnl merged commit 3042d98 into main Nov 14, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task4: Add OpenAI Whisper API transcriber support#34

Task4: Add OpenAI Whisper API transcriber support#34
Gambitnl merged 2 commits intomainfrom
claude/add-openai-transcriber-01SEE2crJqAcPvg42ZBJ8kGm

Gambitnl commented Nov 14, 2025

Uh oh!

gemini-code-assist Bot commented Nov 14, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Nov 14, 2025

Uh oh!

gemini-code-assist Bot Nov 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            # Test API with minimal request
-            response = self.client.chat.completions.create(
-                messages=[{"role": "user", "content": "test"}],
-                model="gpt-3.5-turbo",
-                max_tokens=1,
-            )
-            self.logger.debug("OpenAI API preflight check passed")
+            # Test API with minimal audio request
+            import io
+            import soundfile as sf
+            import numpy as np
+            # Create a silent 0.1s audio clip to test the transcription endpoint directly
+            sample_rate = 16000
+            silent_audio = np.zeros(int(0.1 * sample_rate), dtype=np.float32)
+            buffer = io.BytesIO()
+            sf.write(buffer, silent_audio, sample_rate, format='WAV')
+            buffer.seek(0)
+            buffer.name = "preflight.wav"  # The OpenAI SDK requires a name attribute on the file-like object
+            # Test API with minimal audio request
+            self.client.audio.transcriptions.create(
+                file=buffer,
+                model="whisper-1",
+            )
+            self.logger.debug("OpenAI API preflight check passed")

Conversation

Gambitnl commented Nov 14, 2025

Uh oh!

gemini-code-assist Bot commented Nov 14, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants