Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
convert.py		convert.py
convertBulkAudio.py		convertBulkAudio.py
model.py		model.py
tagger.py		tagger.py
train.py		train.py
transcript.txt		transcript.txt

Repository files navigation

TTSGan - A Text To Speech Generative Adversarial Network for Audiobook Revoicing

Doesn't currently produce anything more than stylized gibberish syllables with untrained inputs.

Expects the setup of:

source.mp3 as a source audio recording
transcript.txt as the transcipt of that recording.

High level script usage:

convertBulkAudio.py will split the recoding into 5 second chunks and save them as pairs of wav files and mel spectrograms.
tagger.py will play the recordings in order and allow the user to hilight a section of the transcript, on clicking assign the png will be renamed to match the selected section of text.
train.py uses these renamed pngs to train a GAN that reproduces the audio from a one hot encoding of the selected transcription section.
convert.py feeds the one hot encodings back into the saved network to recover the original audio.

Approximately 20 minutes of tagged audio are needed to staturate the 'literal' capacity of the network and move onto mapping text to speech.

About

TTSGan

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%