Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Medium models missing? #141

Closed
coder543 opened this issue May 20, 2024 · 4 comments
Closed

Medium models missing? #141

coder543 opened this issue May 20, 2024 · 4 comments

Comments

@coder543
Copy link

The WhisperAX example seems to be missing the medium and medium.en models on my iPhone 15 Pro Max, even though it offers a few "large" models in the dropdown. I took a quick look around the code, and couldn't understand why those models were missing. Is this intentional?

As a side note, I did see a small bug in the example app code. When "show timestamps" is disabled, the various Text components over here are still adding a leading space, instead of having that space be a suffix that's part of the timestampText conditional. In the part of the code that deals with copying the transcribed text, it looks like it handles that distinction correctly.

Anyways, really cool demo app and I'm happy to see such a comprehensive open source Whisper library for iOS apps!

@atiorh
Copy link
Contributor

atiorh commented May 21, 2024

Thanks for the note @coder543! medium models were hitting an edge case with the Neural Engine that we triaged away for now. Technically, you can still use https://github.com/argmaxinc/whisperkittools to prepare the medium and medium.en model assets and use them with cpuAndGPU compute units without issues. We decided not to fork availability of models across different compute units and preserve a non-leaky abstraction for seamless switching.

@coder543
Copy link
Author

Gotcha, that is unfortunate, since in my extensive testing of other Whisper apps on iPhone, the Medium model is the best one that can realistically run in real time over long durations. But, small is pretty good too, I guess!

@atiorh
Copy link
Contributor

atiorh commented May 21, 2024

@coder543 Have you noticed large models being too slow? Would be great to get an example audio/video where it falls back in streaming mode on iPhone 12+. We are always looking to improve based on feedback and we can follow up when we improve performance.

@coder543
Copy link
Author

coder543 commented May 21, 2024

On the 15 Pro Max that I have, the large models run at an RTF of slightly greater than 1, and they’re just slow in general. The medium models are half the size, so they are just about perfect. When I’ve tested things more in Hello Transcribe over the months, the large models are tolerable and seem to barely keep up with real time… but I prefer the balance that medium provides, if I’m running a model on my phone. (On a powerful desktop, the large models are great.)

I didn’t spend too much time trying the distil models, but I’ve had mixed feeling about the accuracy of those models in past testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants