This is a simple text-to-speech synthesizer that uses a Markov Chain to generate speech. It written in python and uses gradio for the web interface. Use it to train or use a Markov Chain model to generate speech.
- The model is not very good at generating speech when the model has multiple voices, make sure your data has 1 voice.
- The model size increases exponentially making it hard to train on large datasets.
- Install the requirements:
pip install gradio numpy librosa tqdm gzip soundfile- Run the script:
python tts.pyYou can also make the Markov Chain speak by using the "Length of Extra Sequence" slider to set the extra length of the generated speech. Each extra sequence will be generated by the Markov Chain and concatenated to the previous sequence, this will make it speak nonsense, but it's fun to play with.
The dataset should be a directory containing WAV files, and optionally TXT files with the same name as the WAV file. The TXT file should contain the transcript of the WAV file, if a file has no corresponding TXT file, the filename will be used as the transcript. All audio files must be a single word. The model will not work if the transcript is more than one word.
To train the model, run the script, navigate to the web interface, and click the "Train" tab. Select the dataset directory and click "Train Model".
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.
This project was made possible by the following libraries:
(Add your name here in your first PR, or not, if you don't want to be listed)