Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diarization Result Types #17

Closed
wodka opened this issue Dec 8, 2021 · 1 comment
Closed

Diarization Result Types #17

wodka opened this issue Dec 8, 2021 · 1 comment

Comments

@wodka
Copy link

wodka commented Dec 8, 2021

What is the current behavior?

The Words Base Type https://github.com/deepgram/node-sdk/blob/main/src/types/wordBase.ts does not include the speaker right now. Further on the Utterance level it has a speaker https://github.com/deepgram/node-sdk/blob/main/src/types/utterance.ts but that does not mean that all words belong to that speaker.

Steps to reproduce

Audio File where multiple speakers talk with less space than 0.8 seconds of silence between them

Expected behavior

One of 2 outcomes - both kind of requiring changes to the api result as both have inconsistencies that are tricky to realise!

Each Utterance to be of exactly one speaker

To have a new utterance whenever the speaker changes. Then there does not need to be a speaker type on the word and the speaker type on the utterance can stay the same.

Each Word has a speaker property

this would require that the word type gets the speaker property as well (it does exist in the api response already), and to indicate that there are multiple potential speakers to have the speaker on the utterance level to be of type number[] (as there can be multiple within one utterance)

Please tell us about your environment

  • Operating System/Version: OSX
  • Language: TypeScript
  • Browser: Chrome

Other information

https://developers.deepgram.com/documentation/features/diarize/

@michaeljolley
Copy link
Contributor

Thanks for reporting this @wodka. This has been a discussion internally for the past week. #18 should address the optional speaker property of the wordBase type and I updated the description of the speaker property of an utterance to denote that it is the predicted speaker based on all the words in the utterance rather than a definitive speaker. It is derived without the diarizer's input.

Once that PR is merged and released, I'll close this issue as the product & engineering teams will handle any changes to the API moving forward and those changes are out of scope for this repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants