How hard would it be to integrate openai whisper to this part of the stack? #72

lahwran · 2022-09-23T23:01:30Z

lahwran
Sep 23, 2022

would this be an appropriate place to integrate openai whisper, for handling of longer dictation? note that as of right now, whisper doesn't support streaming, and it looks to me like it may require transfer learning to be able to add streaming support to it.

(I've asked this question in other repos as well (Caster), and I'm crosslinking between the questions)

lahwran · 2022-09-25T17:27:15Z

lahwran
Sep 25, 2022
Author

answered on the Caster repo, looks like the correct level would be to write a module that depends on whisper and provides a dragonfly api. the whisper implementation will likely need to reuse components from KAG, in particular whisper doesn't seem to have VAD or any sort of guided decoding. If anyone here can point me to guidance about how to extract the guided decoding code from dragonfly/KAG, it may save me time; if not, I may figure it out.

1 reply

lahwran Sep 26, 2022
Author

https://github.com/openai/whisper/blob/5d8d3e75a4826fe5f01205d81c3017a805fc2bf9/whisper/model.py#L223 looks like it should be a matter of masking unwanted decodings, should just be a matter of rejecting logits that the grammar disallows. a question I still have from my initial read-through is how to get the context of previous iterations, because presumably the model is invoked autoregressively? I'll look more later. taking notes here so it's public somewhere.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How hard would it be to integrate openai whisper to this part of the stack? #72

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How hard would it be to integrate openai whisper to this part of the stack? #72

lahwran Sep 23, 2022

Replies: 1 comment · 1 reply

lahwran Sep 25, 2022 Author

lahwran Sep 26, 2022 Author

lahwran
Sep 23, 2022

Replies: 1 comment 1 reply

lahwran
Sep 25, 2022
Author

lahwran Sep 26, 2022
Author