Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instructions for running the cli version? #140

Closed
jrp2014 opened this issue May 18, 2024 · 3 comments
Closed

Instructions for running the cli version? #140

jrp2014 opened this issue May 18, 2024 · 3 comments
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement Improves existing code good first issue Good for newcomers

Comments

@jrp2014
Copy link

jrp2014 commented May 18, 2024

Is the word whisperkit-cli missing from the README?

swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-large-v3" --audio-path ~/.cache/whisper/alice.mp3 
Building for debugging...
[1/1] Write swift-version--58304C5D6DBC2206.txt
Build complete! (0.09s)

If I don't include it, I get error: no executable product named 'transcribe'.

Transcription seems to be pretty slow, with no use of the GPU.

The output is a wall of text, with some capitalisation anomalies.

Using the mlx whisper, you can add timestamps to the output, so that if two people are speaking, the transcript starts each change of speaker on a new line. Is the same capability available here?

I'm not sure what MP3 formats are supported? I got a Error when transcribing /Users/xxx.mp3: loadAudioFailed("Unable to resample audio") from a stereo 44.1 kHz .mp3 file.

I'm not sure whether I'm using the large-v3 for 30s clips, or the one for full length transcripts.

@ZachNagengast ZachNagengast added documentation Improvements or additions to documentation enhancement Improves existing code good first issue Good for newcomers bug Something isn't working labels May 19, 2024
@ZachNagengast
Copy link
Contributor

It's mentioned in the readme here: https://github.com/argmaxinc/WhisperKit?tab=readme-ov-file#swift-cli

Did you see somewhere else that had swift run transcribe? We will update if you can point us to it.

Regarding the timestamps, we do have a parameter clipTimestamps in the swift library, but it's not currently in the CLI, making a note to get that brought over.

The mp3 resample bug you posted is interesting, I've yet to see this error, are you able to provide the audio file you used so we can debug?

@jrp2014
Copy link
Author

jrp2014 commented May 19, 2024

Thanks. The README. now seems to be corrected.

I'm sorry that I can't share the mp3. Perhaps the app ran out of memory as the clip is quite long.

@ZachNagengast
Copy link
Contributor

Ok, if you can replicate it with a file you can share let us know. Memory seems like a good candidate, will see if there is a better error message we can give there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement Improves existing code good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants