Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more VITS voices via piper #215

Open
jpenguin opened this issue Feb 29, 2024 · 11 comments
Open

Add more VITS voices via piper #215

jpenguin opened this issue Feb 29, 2024 · 11 comments
Assignees
Labels
enhancement New feature or request

Comments

@jpenguin
Copy link

I know I previously mention edge-tts, which is cloud-based, fast and free, but under the GPL. I have recently been trying out https://github.com/rhasspy/piper/, which uses the VITS model and is under the MIT.

@danielw97
Copy link

Whilst epub2tts is extremely stable as far as features and functionality currently at least for me, just a +1 for this.
Piper might not sound quite as good as something like Coqui in most circumstances, although it is extremely fast and is under active development.
I'm not sure how easy it would be to implement, although it is something that would be nice to see in the future.

@jpenguin
Copy link
Author

jpenguin commented Feb 29, 2024

Yes, on my 24 thread 3900X, it take around 10 hours to encode a 5 hour book with xtts (wish it could run on OpenCL or OneAPI from my A770). VITS is decent (epub2tts already has 335 & 307 VITS models) and quick.
I don't know python (just a little C), but there is a python interface and it's under a compatible license.

@aedocw
Copy link
Owner

aedocw commented Feb 29, 2024

This seems like it would be worth adding, I will take a look at https://github.com/rhasspy/piper/?tab=readme-ov-file#running-in-python and see. It will probably be a while (a few weeks at least) before I've got time again for this, but it looks like it might not be very difficult.

@aedocw aedocw self-assigned this Feb 29, 2024
@aedocw aedocw added the enhancement New feature or request label Feb 29, 2024
@aedocw
Copy link
Owner

aedocw commented Mar 4, 2024

This is going to be a problem that needs to be resolved before incorporating piper: rhasspy/piper#395

I'm able to install on linux without trouble, but my primary dev environment is macOS, and I would not want to introduce a dependency that makes it so epub2tts can not be installed on mac. I'll keep an eye on this though while I poke at a test integration branch.

@aedocw
Copy link
Owner

aedocw commented Mar 5, 2024

There's a first pass at this, still needs a lot of work around model name and speaker, and I have not tested anything with other languages, etc. BUT the branch https://github.com/aedocw/epub2tts/tree/add-piper has a very simple implementation that seems to work in a minimal sense. Adds --engine piper option, and in the future will support --model <piper model> and --speaker <piper speaker>. Might also have to support --language but that might be covered by your model choice, need to check.

@michaelachrisco
Copy link

michaelachrisco commented Mar 26, 2024

There's a first pass at this, still needs a lot of work around model name and speaker, and I have not tested anything with other languages, etc. BUT the branch https://github.com/aedocw/epub2tts/tree/add-piper has a very simple implementation that seems to work in a minimal sense. Adds --engine piper option, and in the future will support --model <piper model> and --speaker <piper speaker>. Might also have to support --language but that might be covered by your model choice, need to check.

Awesome, this is great!

Just FYI, I got the following error using the branch with Ubuntu/PopOS:

Engine is Piper, model is /home/my_user/.local/piper/en_US-lessac-medium.onnx
Traceback (most recent call last):
  File "/home/my_user/venv/bin/epub2tts", line 33, in <module>
    sys.exit(load_entry_point('epub2tts==2.4.0', 'console_scripts', 'epub2tts')())
  File "/home/my_user/venv/lib/python3.10/site-packages/epub2tts.py", line 837, in main
    mybook.read_book(
  File "/home/my_user/venv/lib/python3.10/site-packages/epub2tts.py", line 486, in read_book
    self.voice = PiperVoice.load(self.model_name)
  File "/home/my_user/venv/lib/python3.10/site-packages/piper/voice.py", line 34, in load
    with open(config_path, "r", encoding="utf-8") as config_file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/my_user/.local/piper/en_US-lessac-medium.onnx.json'

In order to fix, I manually pulled in the correct piper model via:

echo 'Welcome to the world of speech synthesis!' | piper   --model en_US-lessac-medium   --output_file welcome.wav

and then added the folder:

mkdir /home/my_user/.local/piper
cd /home/my_user/.local/piper
cp '/home/my_user/epub2tts/en_US-lessac-medium.onnx' .
cp '/home/my_user/epub2tts/en_US-lessac-medium.onnx.json' .

Obviously this was a quick fix and should be done differently, but it worked for a quick and dirty solution. Piper sounds great and ill probably use this branch as a starting point with a couple of creative commons books (more of a proof of concept than anything, comparing them all). Piper is very quick compared to the others, but sounds a bit more robotic, which is fine to me.

Here is the sample: https://github.com/michaelachrisco/epub2tts/blob/add-piper/sample-piper.m4b

@jpenguin
Copy link
Author

jpenguin commented Mar 27, 2024

Tested on a debian testing VM with pipx. Same as Michael, doesn't download model, but works

ln -s ~/.local/share/pipx/venvs/epub2tts/bin/piper ~/.local/bin/; echo 'Downloading voice' | piper --model en_US-lessac-medium --output-raw | aplay -r 22050 -f S16_LE -t raw -; mkdir ~/.local/piper; cp './en_US-lessac-medium.onnx' ~/.local/piper; cp './en_US-lessac-medium.onnx.json' ~/.local/piper
epub2tts ./sample.txt --sayparts --engine piper works after that

@aedocw
Copy link
Owner

aedocw commented Mar 27, 2024

Thanks both of you for sharing this. Once the issues with installing on apple silicon are resolved, I'll do some more work to clean this up and make it usable.

In the mean time I suggest you check out the --engine edge option. It's not super fast, but it doesn't use local CPU so it's pretty painless to leave running in screen/tmux, and the quality is better than almost everything else. Arguably XTTS still sounds better, but the occasional repeats and gibberish get annoying (to me) after a while.

@danielw97
Copy link

Hi again,
I'm just curious if there is any update on this at all?
Although it's somewhat robotic, piper is extremely fast even on cpu which is a plus for certain applications.
I picked up an m1 mac recently, so am happy to test if that would help and come back with some feedback.
If you've moved on to epub2tts-edge I also understand, and really appreciate the work you've done on this utility.
It is really the best program out there for converting books into good quality tts that I've found, as the rest use either openai, azure or another online text to speech service.
Thanks as always.

@aedocw
Copy link
Owner

aedocw commented May 4, 2024

It looks like there are still open issues around installing on mac (rhasspy/piper#395). Feel free though to try it out and see if the issues are resolved now on mac. I don't think I'll have much time to play with this over the next few weeks (getting busy these days!) but update here if it is actually working OK on mac, then I could try to clean up that piper branch and add it to main.

@danielw97
Copy link

Hi,
I've done some testing, and it looks as though piper still doesn't install on arm-based macs unfortunately.
I'll keep an eye on the open pull request and test it if and when it's merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

4 participants