[Enhancement]: Add Whisper support #1723

apiweb · 2023-04-25T20:41:12Z

Describe the feature/enhancement

Hi there!

I've been using AudioBookShelf for a while now, and I love the platform. I was thinking about how it could be improved, and I had an idea that I wanted to share with you all.

I think it would be great if AudioBookShelf could integrate with Whisper speech-to-text model to automatically generate subtitles for audiobooks. This could be an external tool like Tone and ffmpeg that the user could enable or disable as needed.

With Whisper, it would be possible to transcribe speech across dozens of languages and even handle poor audio quality or excessive background noise. It would make it easier for people who are hard of hearing or have difficulty understanding accents to enjoy audiobooks.

Here are some tips on how to integrate this feature into the AudioBookShelf flow:

Use the metadata language tag to automatically set the Whisper language.
Automatically save the srt using the Title Folder Naming structure.

I hope you will consider this suggestion for future updates to AudioBookShelf. Let me know if you have any questions or concerns.

Thank you for all your hard work making AudioBookShelf a great platform!

advplyr · 2023-04-26T22:40:19Z

Maybe we first add support for an srt subtitle file.

The opposite of this was also requested for ebooks #601

tehguitarist · 2023-05-04T20:30:34Z

I guess a natural extension (and this WOULD make audiobookshelf an Audible killer) would be a whispersync like syncing with ebooks. I guess that would just be shifting the bookmark position of the ebook/audiobook whenever one or the other is progressed.

Not a lot of work specifically, but some from a quick google: https://github.com/readbeyond/aeneas/ and https://github.com/r4victor/syncabook.

Though any library that can match up audio based on a text file, obviously there's a bunch of work to find the start of the chapter, and the start of the audiobook and match that up but that's more of a pipe dream. Aens probably shows the most reasonable promise, but obvious difference between the formats (any preamble by narrators, or table of contents with ebooks) would all be factors. Without digging into the libraries, as long as there was enough error handling to wait til both files had matches (and skip over extras in one or the other) that may go quite smoothly.

advplyr · 2023-05-04T21:08:52Z

There is also this issue #189

damajor · 2023-05-05T00:47:08Z

I tested Whisper on my setup and results are kind of good but far from perfect.

Using base model it took me around 4 minutes to transcribe 1 hour audiobook.
Using large model it took me a bit more than 1 hour to transcribe 1 hour audiobook.

Those tests were done on 5950X with 12 parallel threads (no GPU involved).

turnercore · 2023-09-13T15:57:00Z

I think Whisper (or some kind of speach-to-text) integration could be really nice to be able to transcribe audiobooks if people wanted subtitles. It would be a nice accessibility feature for someone listening in a second language, for example.

Ed1ks · 2023-10-31T08:19:34Z

👍thumbs up. Would be very nice for self recorded audiofiles.

rounakdatta · 2023-11-02T15:02:36Z

Should we consider working on this? I can volunteer to start contributing.
Currently Snipd is the only app which gets AI-generated subtitles UX perfectly right, and this could be our chance to make Audiobookshelf the FOSS alternative to it.

advplyr · 2023-11-02T15:27:17Z

Yeah I'm interested in this but I can't put much attention towards it now. We were talking about it in Discord the other day. If anyone wants to start putting something together or setup a proof of concept that would be great. We can chat about it in Discord

yuchen-lea · 2023-11-21T21:06:58Z

Maybe we first add support for an srt subtitle file.

The opposite of this was also requested for ebooks #601

@advplyr I agree with what you said about adding support for SRT subtitle files first. I have now used Whisper to generate corresponding subtitles for my local podcasts. On my computer, I can search and view them. Displaying subtitles while playing on the ABS mobile app is the final piece of the jigsaw.

I think these two things can share the same UI: LRC files #817 (external LRC files with the same name as the audio file or ID3 information embedded in the audio file) and SRT files #2257 (external SRT files with the same name as the audio file).

iamhenry · 2024-04-19T21:00:55Z

Should we consider working on this? I can volunteer to start contributing. Currently Snipd is the only app which gets AI-generated subtitles UX perfectly right, and this could be our chance to make Audiobookshelf the FOSS alternative to it.

this is exactly what would be great with ABS. I have the paid feature for Snipd and now it's hard to take notes without it using audiobooks.

turnercore · 2024-04-20T21:37:31Z

I agree, and it isn't hard to generate the .srt files from audio now. Maybe there should be a branch to work on this, I'd propose doing it in this order:

Getting ABS to recognize .srt files next to audio files and displaying that in the UI in some way, like Snipd (I agree they do a great job with the UI).
Adding .srt upload option for the files in the UI
Creating a function that can use a whisper url to transcribe the files automatically if set up
Adding settings for whisper api url and options to auto-transcribe new files & transcribe button

Honestly as a further extension I would LOVE if you could do audio-clips like Snipd that could export to Obsidian or something, but I think having the ability and UI set up for transcriptions would be the first hurdle for that. We could add audio clips on back button like Snipd does after that.

barolo · 2024-04-27T10:43:15Z

just FYI there are several implementations of whisper specifically tailored to subtitle generation. This one for example https://github.com/jianfch/stable-ts can not only generate srt, but also ssa/ass karaoke style subs [meaning that the current spoken word is highlighted] bringing us even closer to snipd. From my experience base and small models are enough with it.

turnercore · 2024-05-01T16:45:30Z

just FYI there are several implementations of whisper specifically tailored to subtitle generation. This one for example https://github.com/jianfch/stable-ts can not only generate srt, but also ssa/ass karaoke style subs [meaning that the current spoken word is highlighted] bringing us even closer to snipd. From my experience base and small models are enough with it.

Thanks for the heads up, hopefully I can get some time to work on a PR for this type of thing. I haven't contributed yet though so I imagine it will take me a bit to get familiar with the code base and what needs to be updated for this kind of feature.

iamhenry · 2024-05-02T04:21:52Z

@turnercore that would be awesome!

apiweb added the enhancement New feature or request label Apr 25, 2023

dadino mentioned this issue Apr 5, 2024

[Enhancement]: generate images from text and show them while playing the book #2824

Open

iamhenry mentioned this issue Apr 19, 2024

Request: Bookmark Summary rasmuslos/ShelfPlayer#80

Open

This was referenced May 4, 2024

WIP: Adding Transcription/Subtitle Viewing Support to the Web Player (VTT) #2918

Draft

[Enhancement]: Adding Transcription/Subtitle Viewing Support #2919

Open

advplyr mentioned this issue Jun 17, 2024

[Enhancement]: Enable EPUB 3 Support in Audiobookshelf for Synced Audiobooks and Ebooks #3084

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement]: Add Whisper support #1723

[Enhancement]: Add Whisper support #1723

apiweb commented Apr 25, 2023

advplyr commented Apr 26, 2023

tehguitarist commented May 4, 2023

advplyr commented May 4, 2023

damajor commented May 5, 2023

turnercore commented Sep 13, 2023

Ed1ks commented Oct 31, 2023

rounakdatta commented Nov 2, 2023

advplyr commented Nov 2, 2023

yuchen-lea commented Nov 21, 2023 •

edited

Loading

iamhenry commented Apr 19, 2024

turnercore commented Apr 20, 2024

barolo commented Apr 27, 2024 •

edited

Loading

turnercore commented May 1, 2024

iamhenry commented May 2, 2024

[Enhancement]: Add Whisper support #1723

[Enhancement]: Add Whisper support #1723

Comments

apiweb commented Apr 25, 2023

Describe the feature/enhancement

advplyr commented Apr 26, 2023

tehguitarist commented May 4, 2023

advplyr commented May 4, 2023

damajor commented May 5, 2023

turnercore commented Sep 13, 2023

Ed1ks commented Oct 31, 2023

rounakdatta commented Nov 2, 2023

advplyr commented Nov 2, 2023

yuchen-lea commented Nov 21, 2023 • edited Loading

iamhenry commented Apr 19, 2024

turnercore commented Apr 20, 2024

barolo commented Apr 27, 2024 • edited Loading

turnercore commented May 1, 2024

iamhenry commented May 2, 2024

yuchen-lea commented Nov 21, 2023 •

edited

Loading

barolo commented Apr 27, 2024 •

edited

Loading