i made (vibe coded) a script that takes large audio and optional source material to make a training zip. along with a batch and some edits to the main app #40

on22s · 2026-05-25T19:59:22Z

on22s
May 25, 2026

it took a while and it has extensive logging because of constant crashes in the beginning. So, far it doesn't crash anymore, aside from my newest feature batch LLM chunk processing. This was only designed with ROCm in mind because what i have.
made with Claude for most of it, then Goggle CLI, and finished with QWEN.

the readme has what i have done and how it was prepared

README.md

alexandria_preparer_rocm_compatible.py
alexandria_compare.py
alexandria_alignment.py
alexandria_batch_processor.py
PREPARER_GUIDE.md
BATCH_PROCESSOR_GUIDE.md
COMPARE_GUIDE.md
llm_enricher.py
download_model.py

on22s · 2026-05-25T22:22:01Z

on22s
May 25, 2026
Author

here are the latest files after some of the bug testing in my latest feature. ideally it should be fixed now.

alexandria_preparer_rocm_compatible.py
alexandria_compare.py
alexandria_alignment.py
alexandria_batch_processor.py
PREPARER_GUIDE.md
BATCH_PROCESSOR_GUIDE.md
COMPARE_GUIDE.md
README.md
llm_enricher.py
download_model.py

0 replies

on22s · 2026-05-26T01:41:31Z

on22s
May 26, 2026
Author

last bug fixes for today
alexandria_preparer_rocm_compatible.py
README_CHANGES.md
requirements.txt

0 replies

Finrandojin · 2026-05-26T12:39:17Z

Finrandojin
May 26, 2026
Maintainer

Thanks for sharing this, I can see a lot of work went into the tooling.

Looking at the pipeline (audiobook audio in → ASR → ebook alignment → voice training dataset out), I'm struggling to identify a use case for this that doesn't involve copyright infringement. The workflow is built around ingesting a published audiobook recording and its corresponding ebook text to extract a training dataset that clones the narrator's voice. That raises issues on multiple levels:

Copyright on the audio recording (owned by the publisher/producer)
Copyright on the narrator's performance (their creative work)
Right of publicity, cloning a real person's voice without their consent
Copyright on the book text used for alignment

There are edge cases where this could be legitimate (processing your own recordings, public domain LibriVox narrations with narrator consent, content you've explicitly licensed), but the tool isn't framed around those cases, the documentation describes working with published audiobooks and ebook source material.

For context, the use cases I'm building Alexandria around are:

Authors producing audiobooks of their own works, writers who lack the budget or access to hire narrators
Personal format-shifting, ebook/web-fiction readers converting text they already own into audiobook format for personal listening
Public domain works, audiobook conversion of copyright-free texts
Original creative projects, game dialogue, podcasts, and other content where users generate their own scripts

These all start from text the user has the right to use, with TTS voices that don't clone a real person's performance without consent. The voice cloning and LoRA training features in Alexandria are designed around short user-provided samples or the built-in Voice Designer, not extraction from existing commercial recordings.

That said, there's a core piece of this that could be genuinely useful if reframed. A lot of the hard work here (ASR transcription, chunking at natural boundaries, metadata generation) could be repurposed as a generic training dataset builder: feed it large Wav file or a collection of WAV files you have the rights to (your own recordings, Creative Commons narrations, permissively licensed audio, etc.), and it transcribes and packages
them into a metadata.jsonl + segmented WAVs ready for LoRA training. That's a much simpler reframing than a full overhaul, drop the ebook alignment step and the assumption of commercial audiobook input, and you'd have a tool that helps Alexandria users build voice training datasets from legitimately sourced audio. That's something I'd be interested in exploring.

0 replies

on22s · 2026-05-26T12:49:42Z

on22s
May 26, 2026
Author

When I was designing it I was going for the hardest pipeline so that simplier ones could flow though without issue. How would you like it see it reframed? I can look into modifying it to suit a more open ended pipeline.

…

On Tue, May 26, 2026, 7:39 AM Finrandojin ***@***.***> wrote: Thanks for sharing this, I can see a lot of work went into the tooling. Looking at the pipeline (audiobook audio in → ASR → ebook alignment → voice training dataset out), I'm struggling to identify a use case for this that doesn't involve copyright infringement. The workflow is built around ingesting a published audiobook recording and its corresponding ebook text to extract a training dataset that clones the narrator's voice. That raises issues on multiple levels: - Copyright on the audio recording (owned by the publisher/producer) - Copyright on the narrator's performance (their creative work) - Right of publicity, cloning a real person's voice without their consent - Copyright on the book text used for alignment There are edge cases where this could be legitimate (processing your own recordings, public domain LibriVox narrations with narrator consent, content you've explicitly licensed), but the tool isn't framed around those cases, the documentation describes working with published audiobooks and ebook source material. For context, the use cases I'm building Alexandria around are: 1. Authors producing audiobooks of their own works, writers who lack the budget or access to hire narrators 2. Personal format-shifting, ebook/web-fiction readers converting text they already own into audiobook format for personal listening 3. Public domain works, audiobook conversion of copyright-free texts 4. Original creative projects, game dialogue, podcasts, and other content where users generate their own scripts These all start from text the user has the right to use, with TTS voices that don't clone a real person's performance without consent. The voice cloning and LoRA training features in Alexandria are designed around short user-provided samples or the built-in Voice Designer, not extraction from existing commercial recordings. That said, there's a core piece of this that could be genuinely useful if reframed. A lot of the hard work here (ASR transcription, chunking at natural boundaries, metadata generation) could be repurposed as a generic training dataset builder: feed it large Wav file or a collection of WAV files you have the rights to (your own recordings, Creative Commons narrations, permissively licensed audio, etc.), and it transcribes and packages them into a metadata.jsonl + segmented WAVs ready for LoRA training. That's a much simpler reframing than a full overhaul, drop the ebook alignment step and the assumption of commercial audiobook input, and you'd have a tool that helps Alexandria users build voice training datasets from legitimately sourced audio. That's something I'd be interested in exploring. — Reply to this email directly, view it on GitHub <#40?email_source=notifications&email_token=BVCILFQBRDMIYFXOZEM5ICT44WGATA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNZQGYYTQNRRUZZGKYLTN5XKMYLVORUG64VFMV3GK3TUVRTG633UMVZF6Y3MNFRWW#discussioncomment-17061861>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BVCILFXRW5GYVCILQDJWP4344WGATAVCNFSM6AAAAACZMVCRTOVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTOMBWGE4DMMI> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: <Finrandojin/alexandria-audiobook/repo-discussions/40/comments/17061861@ github.com>

2 replies

Finrandojin May 26, 2026
Maintainer

The way I see it working: Alexandria already has a Dataset Builder tab where users create LoRA training datasets sample-by-sample (define text, generate audio via Voice Designer, review, save). What your tool could add is a second entry point, "Create from Audio", where instead of generating samples, users import their own audio files, your ASR pipeline transcribes and chunks them at natural boundaries, and the results
populate the same sample editor table. From there the existing workflow takes over: the user reviews/corrects transcriptions, selects a reference sample, and saves to a training-ready dataset in the same format the Training tab already expects.

Your ASR + chunking logic is the core piece that's missing from Alexandria. The project management, sample editing UI, and save-to-dataset packaging already exist, so the integration would be wiring your transcription/segmentation pipeline into that existing structure rather than building a standalone tool around it.

on22s May 26, 2026
Author

I can see if i can do that. It won’t be for a few days tho. I kinda burnt myself making it and im out of credits lol.

on22s · 2026-05-26T13:34:37Z

on22s
May 26, 2026
Author

There was also an edit to the project.py in the readme apparently there might be a bug

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i made (vibe coded) a script that takes large audio and optional source material to make a training zip. along with a batch and some edits to the main app #40

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 5 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

i made (vibe coded) a script that takes large audio and optional source material to make a training zip. along with a batch and some edits to the main app #40

Uh oh!

Uh oh!

on22s May 25, 2026

Replies: 5 comments · 2 replies

Uh oh!

on22s May 25, 2026 Author

Uh oh!

on22s May 26, 2026 Author

Uh oh!

Finrandojin May 26, 2026 Maintainer

Uh oh!

on22s May 26, 2026 Author

Uh oh!

Finrandojin May 26, 2026 Maintainer

Uh oh!

on22s May 26, 2026 Author

Uh oh!

on22s May 26, 2026 Author

on22s
May 25, 2026

Replies: 5 comments 2 replies

on22s
May 25, 2026
Author

on22s
May 26, 2026
Author

Finrandojin
May 26, 2026
Maintainer

on22s
May 26, 2026
Author

Finrandojin May 26, 2026
Maintainer

on22s May 26, 2026
Author

on22s
May 26, 2026
Author