Skip to content

Conversation

@gvioss
Copy link
Contributor

@gvioss gvioss commented Oct 31, 2025

Handle Deepgram diarized transcript

When using diarize=true in the request, Deepgram is actually handling it in different ways :

  • Add speaker property to word
  • Add speaker property to paragraph if present
  • Returning a diarized transcript in ['results']['channel'][0]["alternatives"][0]['paragraphs']['transcript'], if paragraph is present (depending on the use of paragraph=true or smart_format=true in the request).

For now, in the transformation.py related to Deepgram, it's always a non diarized transcript returned (with ['results']['channel'][0]["alternatives"][0]['transcript']).

This PR add the handling of the diarized transcription and returned it in place of the non diarized one if exist.
It firstly search for diarized result by looking for speaker property in words (because words are always present in the response). If the result have diarization, it will look for the existence of paragraphs property to get the full transcript. If paragraphs is absent, it will build the transcript based on the words properties, with taking care of the text formatting (when using smart_format or punctuation in the request).

Relevant issues

Fixes #16095

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
image

Type

🐛 Bug Fix
✅ Test

@vercel
Copy link

vercel bot commented Oct 31, 2025

@gvioss is attempting to deploy a commit to the CLERKIEAI Team on Vercel.

A member of the Team first needs to authorize it.

@gvioss gvioss force-pushed the fix/handle-deepgram-diarization branch from 812839f to 92745d4 Compare October 31, 2025 14:25
@gvioss gvioss force-pushed the fix/handle-deepgram-diarization branch from 92745d4 to c0a2d32 Compare October 31, 2025 14:29
@krrishdholakia krrishdholakia merged commit 3922bb6 into BerriAI:main Nov 2, 2025
3 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Deepgram diarized transcription is not returned in litellm response

2 participants