feat: add OpenAI GPT-4o speaker diarization support by syou6162 · Pull Request #3814 · BasedHardware/omi

syou6162 · 2025-12-17T05:28:27Z

Summary

Add support for OpenAI's gpt-4o-transcribe-diarize model to enable speaker diarization when using OpenAI as the STT provider.

Background

The default STT engine (Deepgram) supports speaker diarization, allowing the app to annotate which speaker said each segment. However, for users who need better recognition accuracy in certain languages (e.g., Japanese), OpenAI Whisper is a better choice.

The problem is that the standard OpenAI Whisper model (whisper-1) does not support speaker diarization, resulting in all segments being attributed to "Speaker 0". This makes it difficult to:

Correctly annotate who said what
Review conversations with multiple speakers in the app

Solution

OpenAI recently released gpt-4o-transcribe-diarize, a model that provides both high-quality transcription and speaker diarization. This PR adds it as a selectable option in the STT settings.

References

Add support for OpenAI's gpt-4o-transcribe-diarize model with speaker diarization functionality. This enables automatic identification and separation of different speakers in transcriptions. Changes: - Add openaiDiarize enum and provider configuration - Add openAIDiarize response schema for diarized_json format - Extend speaker ID parsing to support "A", "B", "C" format - Add factory method for OpenAI diarize provider

gemini-code-assist

Code Review

This pull request adds support for OpenAI's GPT-4o speaker diarization feature. The changes include adding a new SttProvider enum, defining its configuration and response schema, and updating the speaker ID parsing logic to handle the new format.

The implementation is mostly correct, but I've identified a couple of areas for improvement regarding maintainability. Specifically, there's a duplication of configuration logic for the new provider, and a hardcoded model name that could be made more flexible. My comments provide details and suggestions for refactoring these parts to make the code more robust and easier to maintain.

…parameter - Change default language from 'ja' to 'en' for openAIDiarize provider - Fix model parameter to use dynamic value instead of hardcoded string - Ensure consistency with other STT providers

beastoin · 2025-12-18T02:52:01Z

sir give me the demo then we go.

@syou6162

syou6162 · 2025-12-18T16:46:43Z

@beastoin I tested this branch on the iOS Simulator and verified the functionality!

Steps:

Go to "Developer Settings" → "Transcription"
Select "OpenAI GPT-4o (Speaker)"
Save the settings and start transcription

Test case:
I used BBC News audio as an example of multi-speaker content. The OpenAI model successfully identified and separated multiple speakers in the transcription.

Demo video and screenshots are attached below.

Simulator.Screen.Recording.-.iPhone15Pro-iOS17.-.2025-12-19.at.01.33.27.mov

beastoin · 2025-12-19T02:01:56Z

LGTM @syou6162

Thank you and congrats on the first contribution to OMI!

* feat: add OpenAI GPT-4o speaker diarization support Add support for OpenAI's gpt-4o-transcribe-diarize model with speaker diarization functionality. This enables automatic identification and separation of different speakers in transcriptions. Changes: - Add openaiDiarize enum and provider configuration - Add openAIDiarize response schema for diarized_json format - Extend speaker ID parsing to support "A", "B", "C" format - Add factory method for OpenAI diarize provider * fix: update OpenAI Diarize default language to English and fix model parameter - Change default language from 'ja' to 'en' for openAIDiarize provider - Fix model parameter to use dynamic value instead of hardcoded string - Ensure consistency with other STT providers

gemini-code-assist Bot reviewed Dec 17, 2025

View reviewed changes

Comment thread app/lib/models/stt_provider.dart Outdated

Comment thread app/lib/services/sockets/transcription_polling_service.dart

fix: update OpenAI Diarize default language to English and fix model …

85a23c5

…parameter - Change default language from 'ja' to 'en' for openAIDiarize provider - Fix model parameter to use dynamic value instead of hardcoded string - Ensure consistency with other STT providers

syou6162 marked this pull request as ready for review December 17, 2025 06:20

beastoin merged commit cff2113 into BasedHardware:main Dec 19, 2025

syou6162 deleted the feature/openai-diarize-support branch December 19, 2025 02:17

syou6162 mentioned this pull request Dec 25, 2025

docs: add DeepWiki badge to README for LLM-powered documentation search #3894

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add OpenAI GPT-4o speaker diarization support#3814

feat: add OpenAI GPT-4o speaker diarization support#3814
beastoin merged 2 commits into
BasedHardware:mainfrom
syou6162:feature/openai-diarize-support

syou6162 commented Dec 17, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

beastoin commented Dec 18, 2025

Uh oh!

syou6162 commented Dec 18, 2025

Uh oh!

beastoin commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

syou6162 commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Solution

References

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

beastoin commented Dec 18, 2025

Uh oh!

syou6162 commented Dec 18, 2025

Uh oh!

beastoin commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

syou6162 commented Dec 17, 2025 •

edited

Loading