You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Audio File where multiple speakers talk with less space than 0.8 seconds of silence between them
Expected behavior
One of 2 outcomes - both kind of requiring changes to the api result as both have inconsistencies that are tricky to realise!
Each Utterance to be of exactly one speaker
To have a new utterance whenever the speaker changes. Then there does not need to be a speaker type on the word and the speaker type on the utterance can stay the same.
Each Word has a speaker property
this would require that the word type gets the speaker property as well (it does exist in the api response already), and to indicate that there are multiple potential speakers to have the speaker on the utterance level to be of type number[] (as there can be multiple within one utterance)
Thanks for reporting this @wodka. This has been a discussion internally for the past week. #18 should address the optional speaker property of the wordBase type and I updated the description of the speaker property of an utterance to denote that it is the predicted speaker based on all the words in the utterance rather than a definitive speaker. It is derived without the diarizer's input.
Once that PR is merged and released, I'll close this issue as the product & engineering teams will handle any changes to the API moving forward and those changes are out of scope for this repository.
What is the current behavior?
The Words Base Type https://github.com/deepgram/node-sdk/blob/main/src/types/wordBase.ts does not include the speaker right now. Further on the Utterance level it has a speaker https://github.com/deepgram/node-sdk/blob/main/src/types/utterance.ts but that does not mean that all words belong to that speaker.
Steps to reproduce
Audio File where multiple speakers talk with less space than 0.8 seconds of silence between them
Expected behavior
One of 2 outcomes - both kind of requiring changes to the api result as both have inconsistencies that are tricky to realise!
Each Utterance to be of exactly one speaker
To have a new utterance whenever the speaker changes. Then there does not need to be a speaker type on the word and the speaker type on the utterance can stay the same.
Each Word has a speaker property
this would require that the word type gets the speaker property as well (it does exist in the api response already), and to indicate that there are multiple potential speakers to have the speaker on the utterance level to be of type
number[]
(as there can be multiple within one utterance)Please tell us about your environment
Other information
https://developers.deepgram.com/documentation/features/diarize/
The text was updated successfully, but these errors were encountered: