i made (vibe coded) a script that takes large audio and optional source material to make a training zip. along with a batch and some edits to the main app #40
Replies: 5 comments 2 replies
-
|
here are the latest files after some of the bug testing in my latest feature. ideally it should be fixed now. alexandria_preparer_rocm_compatible.py |
Beta Was this translation helpful? Give feedback.
-
|
last bug fixes for today |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for sharing this, I can see a lot of work went into the tooling. Looking at the pipeline (audiobook audio in → ASR → ebook alignment → voice training dataset out), I'm struggling to identify a use case for this that doesn't involve copyright infringement. The workflow is built around ingesting a published audiobook recording and its corresponding ebook text to extract a training dataset that clones the narrator's voice. That raises issues on multiple levels:
There are edge cases where this could be legitimate (processing your own recordings, public domain LibriVox narrations with narrator consent, content you've explicitly licensed), but the tool isn't framed around those cases, the documentation describes working with published audiobooks and ebook source material. For context, the use cases I'm building Alexandria around are:
These all start from text the user has the right to use, with TTS voices that don't clone a real person's performance without consent. The voice cloning and LoRA training features in Alexandria are designed around short user-provided samples or the built-in Voice Designer, not extraction from existing commercial recordings. That said, there's a core piece of this that could be genuinely useful if reframed. A lot of the hard work here (ASR transcription, chunking at natural boundaries, metadata generation) could be repurposed as a generic training dataset builder: feed it large Wav file or a collection of WAV files you have the rights to (your own recordings, Creative Commons narrations, permissively licensed audio, etc.), and it transcribes and packages |
Beta Was this translation helpful? Give feedback.
-
|
When I was designing it I was going for the hardest pipeline so that
simplier ones could flow though without issue. How would you like it see it
reframed? I can look into modifying it to suit a more open ended pipeline.
…On Tue, May 26, 2026, 7:39 AM Finrandojin ***@***.***> wrote:
Thanks for sharing this, I can see a lot of work went into the tooling.
Looking at the pipeline (audiobook audio in → ASR → ebook alignment →
voice training dataset out), I'm struggling to identify a use case for this
that doesn't involve copyright infringement. The workflow is built around
ingesting a published audiobook recording and its corresponding ebook text
to extract a training dataset that clones the narrator's voice. That raises
issues on multiple levels:
- Copyright on the audio recording (owned by the publisher/producer)
- Copyright on the narrator's performance (their creative work)
- Right of publicity, cloning a real person's voice without their
consent
- Copyright on the book text used for alignment
There are edge cases where this could be legitimate (processing your own
recordings, public domain LibriVox narrations with narrator consent,
content you've explicitly licensed), but the tool isn't framed around those
cases, the documentation describes working with published audiobooks and
ebook source material.
For context, the use cases I'm building Alexandria around are:
1. Authors producing audiobooks of their own works, writers who lack
the budget or access to hire narrators
2. Personal format-shifting, ebook/web-fiction readers converting text
they already own into audiobook format for personal listening
3. Public domain works, audiobook conversion of copyright-free texts
4. Original creative projects, game dialogue, podcasts, and other
content where users generate their own scripts
These all start from text the user has the right to use, with TTS voices
that don't clone a real person's performance without consent. The voice
cloning and LoRA training features in Alexandria are designed around short
user-provided samples or the built-in Voice Designer, not extraction from
existing commercial recordings.
That said, there's a core piece of this that could be genuinely useful if
reframed. A lot of the hard work here (ASR transcription, chunking at
natural boundaries, metadata generation) could be repurposed as a generic
training dataset builder: feed it large Wav file or a collection of WAV
files you have the rights to (your own recordings, Creative Commons
narrations, permissively licensed audio, etc.), and it transcribes and
packages
them into a metadata.jsonl + segmented WAVs ready for LoRA training.
That's a much simpler reframing than a full overhaul, drop the ebook
alignment step and the assumption of commercial audiobook input, and you'd
have a tool that helps Alexandria users build voice training datasets from
legitimately sourced audio. That's something I'd be interested in exploring.
—
Reply to this email directly, view it on GitHub
<#40?email_source=notifications&email_token=BVCILFQBRDMIYFXOZEM5ICT44WGATA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNZQGYYTQNRRUZZGKYLTN5XKMYLVORUG64VFMV3GK3TUVRTG633UMVZF6Y3MNFRWW#discussioncomment-17061861>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BVCILFXRW5GYVCILQDJWP4344WGATAVCNFSM6AAAAACZMVCRTOVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTOMBWGE4DMMI>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
<Finrandojin/alexandria-audiobook/repo-discussions/40/comments/17061861@
github.com>
|
Beta Was this translation helpful? Give feedback.
-
|
There was also an edit to the project.py in the readme apparently there might be a bug |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
it took a while and it has extensive logging because of constant crashes in the beginning. So, far it doesn't crash anymore, aside from my newest feature batch LLM chunk processing. This was only designed with ROCm in mind because what i have.
made with Claude for most of it, then Goggle CLI, and finished with QWEN.
the readme has what i have done and how it was prepared
README.md
alexandria_preparer_rocm_compatible.py
alexandria_compare.py
alexandria_alignment.py
alexandria_batch_processor.py
PREPARER_GUIDE.md
BATCH_PROCESSOR_GUIDE.md
COMPARE_GUIDE.md
llm_enricher.py
download_model.py
Beta Was this translation helpful? Give feedback.
All reactions