Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment order regression since 10mb chunking #163

Closed
iandundas opened this issue Jun 11, 2024 · 3 comments · Fixed by #183
Closed

Segment order regression since 10mb chunking #163

iandundas opened this issue Jun 11, 2024 · 3 comments · Fixed by #183
Labels
bug Something isn't working

Comments

@iandundas
Copy link
Contributor

iandundas commented Jun 11, 2024

Since #158 was merged, we're seeing segments being delivered in the wrong order, including in the example app.

8fcfadb (correct before) 25a0749 (incorrect after)
image image

Settings:

image

Sample file: http://172.104.253.215/atp-7-min-clip.m4a

Full transcripts:

Full correct transcript
Full incorrect transcript

@iandundas iandundas changed the title Segement order regression since 10mb chunking Segment order regression since 10mb chunking Jun 11, 2024
@ZachNagengast
Copy link
Contributor

Looking at this shortly, do you have any sense of what parts specifically changed between then? Might give a clue

@iandundas
Copy link
Contributor Author

iandundas commented Jun 13, 2024

I don't have a great handle on it, it seems completely reordered and some segments are missing

For example, in the correct transcription the word "easter" occurs once:

[WhisperKit] [Segment 115] [474.04 --> 476.70] So, you know Easter just happened in.

Whilst in the bad transcription it appears four times:

CleanShot 2024-06-13 at 12 49 36@2x

meanwhile, the first line of the good transcription contains

[WhisperKit] [Segment 0] [0.00 --> 30.00] Do you have also just finishing listening to the hot pockets episode?

whilst this doesn't appear in the bad transcription at all

@iandundas
Copy link
Contributor Author

good.txt
bad.txt

@ZachNagengast ZachNagengast added the bug Something isn't working label Jun 18, 2024
@ZachNagengast ZachNagengast linked a pull request Jul 6, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants