GitHub - BudEcosystem/Tansen: About Democratizing access to generative-ai-models for the open-source community. Let's advance AI, together.

Democratizing access to LLMs, Multi-Modal Gen AI models for the open-source community.
Let's advance AI, together.

Tansen is a text-to-speech program built with the following priorities:

Strong multi-voice capabilities.
Highly realistic prosody and intonation.
Speaking rate control

Huggingface 🤗 Models

🎧 Demos

Demos

random_0_0.webm

random_0_1.webm

random_0_2.webm

💻 Getting Started on GitHub

Ready to dive in? Here's how you can get started with our repo on GitHub.

1️⃣ : Clone our GitHub repository

First things first, you'll need to clone our repository. Open up your terminal, navigate to the directory where you want the repository to be cloned, and run the following command:

conda create --name Tansen python=3.9 numba inflect
conda activate Tansen
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
conda install transformers=4.29.2
git clone https://github.com/BudEcosystem/Tansen.git
cd Tansen

2️⃣ : Install dependencies

python setup.py install

3️⃣ : Generate Audio

do_tts.py

This script allows you to speak a single phrase with one or more voices.

python do_tts.py --text "I'm going to speak this" --voice random --preset fast

read.py

This script provides tools for reading large amounts of text.

python Tansen/read.py --textfile <your text to be read> --voice random

This will break up the textfile into sentences, and then convert them to speech one at a time. It will output a series of spoken clips as they are generated. Once all the clips are generated, it will combine them into a single file and output that as well.

Sometimes Tansen screws up an output. You can re-generate any bad clips by re-running read.py with the --regenerate argument.

Intrested in running as as API ?

🐍 Usage in Python

Tansen can be used programmatically :

reference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]
tts = api.TextToSpeech(use_deepspeed=True, kv_cache=True, half=True)
pcm_audio = tts.tts_with_preset("your text here", voice_samples=reference_clips, preset='fast')

Loss Curves

loss_mel_ce

loss_text_ce

Training Information

Device : A Single A100

Dataset : 876 hours

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
results		results
scripts		scripts
tortoise		tortoise
Instagram post - 4.png		Instagram post - 4.png
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
tortoise_tts.ipynb		tortoise_tts.ipynb
tortoise_v2_examples.html		tortoise_v2_examples.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎧 Demos

Demos

💻 Getting Started on GitHub

1️⃣ : Clone our GitHub repository

2️⃣ : Install dependencies

3️⃣ : Generate Audio

do_tts.py

read.py

🐍 Usage in Python

Loss Curves

Training Information

About

Releases

Packages

Contributors 4

Languages

License

BudEcosystem/Tansen

Folders and files

Latest commit

History

Repository files navigation

🎧 Demos

Demos

💻 Getting Started on GitHub

1️⃣ : Clone our GitHub repository

2️⃣ : Install dependencies

3️⃣ : Generate Audio

do_tts.py

read.py

🐍 Usage in Python

Loss Curves

Training Information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages