Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Add Whisper support #1723

Open
apiweb opened this issue Apr 25, 2023 · 14 comments
Open

[Enhancement]: Add Whisper support #1723

apiweb opened this issue Apr 25, 2023 · 14 comments
Labels
enhancement New feature or request

Comments

@apiweb
Copy link

apiweb commented Apr 25, 2023

Describe the feature/enhancement

Hi there!

I've been using AudioBookShelf for a while now, and I love the platform. I was thinking about how it could be improved, and I had an idea that I wanted to share with you all.

I think it would be great if AudioBookShelf could integrate with Whisper speech-to-text model to automatically generate subtitles for audiobooks. This could be an external tool like Tone and ffmpeg that the user could enable or disable as needed.

With Whisper, it would be possible to transcribe speech across dozens of languages and even handle poor audio quality or excessive background noise. It would make it easier for people who are hard of hearing or have difficulty understanding accents to enjoy audiobooks.

Here are some tips on how to integrate this feature into the AudioBookShelf flow:

  • Use the metadata language tag to automatically set the Whisper language.
  • Automatically save the srt using the Title Folder Naming structure.

I hope you will consider this suggestion for future updates to AudioBookShelf. Let me know if you have any questions or concerns.

Thank you for all your hard work making AudioBookShelf a great platform!

@apiweb apiweb added the enhancement New feature or request label Apr 25, 2023
@advplyr
Copy link
Owner

advplyr commented Apr 26, 2023

Maybe we first add support for an srt subtitle file.

The opposite of this was also requested for ebooks #601

@tehguitarist
Copy link

I guess a natural extension (and this WOULD make audiobookshelf an Audible killer) would be a whispersync like syncing with ebooks. I guess that would just be shifting the bookmark position of the ebook/audiobook whenever one or the other is progressed.

Not a lot of work specifically, but some from a quick google: https://github.com/readbeyond/aeneas/ and https://github.com/r4victor/syncabook.

Though any library that can match up audio based on a text file, obviously there's a bunch of work to find the start of the chapter, and the start of the audiobook and match that up but that's more of a pipe dream. Aens probably shows the most reasonable promise, but obvious difference between the formats (any preamble by narrators, or table of contents with ebooks) would all be factors. Without digging into the libraries, as long as there was enough error handling to wait til both files had matches (and skip over extras in one or the other) that may go quite smoothly.

@advplyr
Copy link
Owner

advplyr commented May 4, 2023

There is also this issue #189

@damajor
Copy link

damajor commented May 5, 2023

I tested Whisper on my setup and results are kind of good but far from perfect.

Using base model it took me around 4 minutes to transcribe 1 hour audiobook.
Using large model it took me a bit more than 1 hour to transcribe 1 hour audiobook.

Those tests were done on 5950X with 12 parallel threads (no GPU involved).

@turnercore
Copy link

I think Whisper (or some kind of speach-to-text) integration could be really nice to be able to transcribe audiobooks if people wanted subtitles. It would be a nice accessibility feature for someone listening in a second language, for example.

@Ed1ks
Copy link

Ed1ks commented Oct 31, 2023

👍thumbs up. Would be very nice for self recorded audiofiles.

@rounakdatta
Copy link

Should we consider working on this? I can volunteer to start contributing.
Currently Snipd is the only app which gets AI-generated subtitles UX perfectly right, and this could be our chance to make Audiobookshelf the FOSS alternative to it.

@advplyr
Copy link
Owner

advplyr commented Nov 2, 2023

Yeah I'm interested in this but I can't put much attention towards it now. We were talking about it in Discord the other day. If anyone wants to start putting something together or setup a proof of concept that would be great. We can chat about it in Discord

@yuchen-lea
Copy link

yuchen-lea commented Nov 21, 2023

Maybe we first add support for an srt subtitle file.

The opposite of this was also requested for ebooks #601

@advplyr I agree with what you said about adding support for SRT subtitle files first. I have now used Whisper to generate corresponding subtitles for my local podcasts. On my computer, I can search and view them. Displaying subtitles while playing on the ABS mobile app is the final piece of the jigsaw.

I think these two things can share the same UI: LRC files #817 (external LRC files with the same name as the audio file or ID3 information embedded in the audio file) and SRT files #2257 (external SRT files with the same name as the audio file).

@iamhenry
Copy link

Should we consider working on this? I can volunteer to start contributing. Currently Snipd is the only app which gets AI-generated subtitles UX perfectly right, and this could be our chance to make Audiobookshelf the FOSS alternative to it.

this is exactly what would be great with ABS. I have the paid feature for Snipd and now it's hard to take notes without it using audiobooks.

@turnercore
Copy link

I agree, and it isn't hard to generate the .srt files from audio now. Maybe there should be a branch to work on this, I'd propose doing it in this order:

  1. Getting ABS to recognize .srt files next to audio files and displaying that in the UI in some way, like Snipd (I agree they do a great job with the UI).
  2. Adding .srt upload option for the files in the UI
  3. Creating a function that can use a whisper url to transcribe the files automatically if set up
  4. Adding settings for whisper api url and options to auto-transcribe new files & transcribe button

Honestly as a further extension I would LOVE if you could do audio-clips like Snipd that could export to Obsidian or something, but I think having the ability and UI set up for transcriptions would be the first hurdle for that. We could add audio clips on back button like Snipd does after that.

@barolo
Copy link

barolo commented Apr 27, 2024

just FYI there are several implementations of whisper specifically tailored to subtitle generation. This one for example https://github.com/jianfch/stable-ts can not only generate srt, but also ssa/ass karaoke style subs [meaning that the current spoken word is highlighted] bringing us even closer to snipd. From my experience base and small models are enough with it.

@turnercore
Copy link

just FYI there are several implementations of whisper specifically tailored to subtitle generation. This one for example https://github.com/jianfch/stable-ts can not only generate srt, but also ssa/ass karaoke style subs [meaning that the current spoken word is highlighted] bringing us even closer to snipd. From my experience base and small models are enough with it.

Thanks for the heads up, hopefully I can get some time to work on a PR for this type of thing. I haven't contributed yet though so I imagine it will take me a bit to get familiar with the code base and what needs to be updated for this kind of feature.

@iamhenry
Copy link

iamhenry commented May 2, 2024

@turnercore that would be awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

10 participants