You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a crazy idea and we're probably a few years away from being able to do it on our local machines. Treat this request more as a conversation.
When something like this is implemented to generate subtitles from the audio files, we could generate images for each page/minute/chapter/scene/whatever and show them in the player when listening to an audiobook.
When a audiobook is imported and his subtitles are generated, queue image generation using a predefined prompt (something like "an illustration for a book of this scene: %s") that could also be changed by the user (maybe I want a specific style for the illustration).
Since image generation prompts are usually short, I guess we should give all the previous (read) text to an LLM and ask it to "create a short prompt of the current scene/chapter to use in an image generator model, from this text: %s".
The file structure would be pretty simple, a folder with images and a text file with timestamps (like an srt). This could spark a community generated library of "book illustrations". It could also mean that ABS could just be a consumer of this format, not the creator.
Why?
I listen to audiobooks in small chunk of 5-15 minutes and having some context when I start a session would be great.
I also love cool images and having illustration for spaceship battles while listening to Expeditionary Forces, changing with every battle, switching to some tacticool Ruhar soldier when boots hit the ground or some crazy plan takes place on an alien planet, would throw me in book right away.
Describe the feature/enhancement
This is a crazy idea and we're probably a few years away from being able to do it on our local machines. Treat this request more as a conversation.
When something like this is implemented to generate subtitles from the audio files, we could generate images for each page/minute/chapter/scene/whatever and show them in the player when listening to an audiobook.
When a audiobook is imported and his subtitles are generated, queue image generation using a predefined prompt (something like "an illustration for a book of this scene: %s") that could also be changed by the user (maybe I want a specific style for the illustration).
Since image generation prompts are usually short, I guess we should give all the previous (read) text to an LLM and ask it to "create a short prompt of the current scene/chapter to use in an image generator model, from this text: %s".
The file structure would be pretty simple, a folder with images and a text file with timestamps (like an srt). This could spark a community generated library of "book illustrations". It could also mean that ABS could just be a consumer of this format, not the creator.
Why?
I listen to audiobooks in small chunk of 5-15 minutes and having some context when I start a session would be great.
I also love cool images and having illustration for spaceship battles while listening to Expeditionary Forces, changing with every battle, switching to some tacticool Ruhar soldier when boots hit the ground or some crazy plan takes place on an alien planet, would throw me in book right away.
The Reddit post that sparked this idea for me
The text was updated successfully, but these errors were encountered: