Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Here is a model in Swedish #257

Closed
ViktorAlm opened this issue Jan 3, 2020 · 14 comments
Closed

Here is a model in Swedish #257

ViktorAlm opened this issue Jan 3, 2020 · 14 comments

Comments

@ViktorAlm
Copy link

Not sure how to share this, but feel free to add.
https://drive.google.com/drive/folders/1jrMWhkK3KrOKsbPMYtCCcPczPmf8V-eQ?usp=sharing

It uses the default English encoder. To adapt tacotron text pre-processing to Swedish follow this code:
https://github.com/codemandosch/taco2swe

It is far from perfect but was a fun experiment.

@githubtrip
Copy link

@ViktorAlm Could you make a google colab notebook of it?

@rdrlima
Copy link

rdrlima commented Jun 25, 2020

Can you share how did you do it? I want to try training a portuguese model

@ghost
Copy link

ghost commented Jul 5, 2020

@ViktorAlm I tried this out but I am getting an error message when the synthesizer checkpoint is loaded.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [66,512] rhs shape= [72,512]
[[{{node save/Assign_40}}]]

What I did is replace the synthesizer and vocoder with your models. I did not make any synthesizer code changes for Swedish yet. Can you let me know if you changed any files or hparams to make this work?

Error message including full traceback
Initialized Tacotron model. Dimensions (? = dynamic shape): 
  Train mode:               False
  Eval mode:                False
  GTA mode:                 False
  Synthesis mode:           True
  Input:                    (?, ?)
  device:                   0
  embedding:                (?, ?, 512)
  enc conv out:             (?, ?, 512)
  encoder out (cond):       (?, ?, 768)
  decoder out:              (?, ?, 80)
  residual out:             (?, ?, 512)
  projected residual out:   (?, ?, 80)
  mel out:                  (?, ?, 80)
  <stop_token> out:         (?, ?)
  Tacotron Parameters       28.439 Million.
Loading checkpoint: synthesizer/saved_models/logs-pretrained/taco_pretrained/tacotron_model.ckpt-200000
2020-07-05 14:59:25.497588: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fa249744430 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-05 14:59:25.497641: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Traceback (most recent call last):
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [66,512] rhs shape= [72,512]
	 [[{{node save/Assign_40}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1290, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [66,512] rhs shape= [72,512]
	 [[node save/Assign_40 (defined at venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'save/Assign_40':
  File "demo_cli.py", line 95, in <module>
    mels = synthesizer.synthesize_spectrograms(texts, embeds)
  File "/Real-Time-Voice-Cloning/synthesizer/inference.py", line 77, in synthesize_spectrograms
    self.load()
  File "/Real-Time-Voice-Cloning/synthesizer/inference.py", line 58, in load
    self._model = Tacotron2(self.checkpoint_fpath, hparams)
  File "/Real-Time-Voice-Cloning/synthesizer/tacotron2.py", line 62, in __init__
    saver = tf.compat.v1.train.Saver()
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 350, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/training/saving/saveable_object_util.py", line 73, in restore
    self.op.get_shape().is_fully_defined())
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/ops/state_ops.py", line 227, in assign
    validate_shape=validate_shape)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_state_ops.py", line 66, in assign
    use_locking=use_locking, name=name)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "demo_cli.py", line 95, in <module>
    mels = synthesizer.synthesize_spectrograms(texts, embeds)
  File "/Real-Time-Voice-Cloning/synthesizer/inference.py", line 77, in synthesize_spectrograms
    self.load()
  File "/Real-Time-Voice-Cloning/synthesizer/inference.py", line 58, in load
    self._model = Tacotron2(self.checkpoint_fpath, hparams)
  File "/Real-Time-Voice-Cloning/synthesizer/tacotron2.py", line 63, in __init__
    saver.restore(self.session, checkpoint_path)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1326, in restore
    err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [66,512] rhs shape= [72,512]
	 [[node save/Assign_40 (defined at venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'save/Assign_40':
  File "demo_cli.py", line 95, in <module>
    mels = synthesizer.synthesize_spectrograms(texts, embeds)
  File "/Real-Time-Voice-Cloning/synthesizer/inference.py", line 77, in synthesize_spectrograms
    self.load()
  File "/Real-Time-Voice-Cloning/synthesizer/inference.py", line 58, in load
    self._model = Tacotron2(self.checkpoint_fpath, hparams)
  File "/Real-Time-Voice-Cloning/synthesizer/tacotron2.py", line 62, in __init__
    saver = tf.compat.v1.train.Saver()
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 350, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/training/saving/saveable_object_util.py", line 73, in restore
    self.op.get_shape().is_fully_defined())
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/ops/state_ops.py", line 227, in assign
    validate_shape=validate_shape)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_state_ops.py", line 66, in assign
    use_locking=use_locking, name=name)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "venv/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

@ghost
Copy link

ghost commented Jul 18, 2020

@ViktorAlm Can you share your code with me? If needed, in a private repository so I can figure out exactly what changes are needed for your Swedish pretrained models to work?

@ghost
Copy link

ghost commented Jul 26, 2020

@ViktorAlm This is the difference between your model structure and the default. The input embedding batch size changed from 66 to 72. Do you know where that could be coming from? I diffed the synthesizer code of taco2swe and this repo, and can't find where that is coming from. Using the taco2swe version of the hparams doesn't seem to fix it either.

- ('Tacotron_model/Tacotron_model/inference/inputs_embedding/Adam', [66, 512]),
- ('Tacotron_model/Tacotron_model/inference/inputs_embedding/Adam_1', [66, 512]),
+ ('Tacotron_model/Tacotron_model/inference/inputs_embedding/Adam', [72, 512]),
+ ('Tacotron_model/Tacotron_model/inference/inputs_embedding/Adam_1', [72, 512]),
- ('Tacotron_model/inference/inputs_embedding', [66, 512]),
+ ('Tacotron_model/inference/inputs_embedding', [72, 512]),

@ghost
Copy link

ghost commented Jul 29, 2020

This is resolved @ViktorAlm ! Just needed to add the extra characters in the Swedish alphabet to synthesizer/utils/symbols.py. If anyone wants to use these models it might be helpful to use the 400_pretrained_swe_301 branch of my fork. I will accept pull requests if anyone wants to contribute improvements.

I am finding that all cloned voices sound male. The sound quality can be good for some speakers and prosody overall feels natural though fast. It might make a decent starting point for a Swedish TTS.

https://github.com/blue-fish/Real-Time-Voice-Cloning/compare/054f16e...blue-fish:400_pretrained_swe_301

Edit: Fixed the sound issue, I had a bad encoder model in my folder. Once fixed it worked much better, see samples in following post.

Thank you for sharing your models @ViktorAlm , I do think that others will find it useful for text to speech.

@ghost
Copy link

ghost commented Jul 29, 2020

Here are some samples.zip, where I take VCTK p240 and p260 and clone an English utterance with Corentin's models, and a Swedish utterance with these models. In my opinion the Swedish performs better for this example, though they both leave something to be desired.

  • Text: "Lite är känt om dess föda eller häckning och lätet som tillskrivs zapatarallen kan tillhöra en annan art." (source)
  • English translation: "Little is known about its food or nesting and the sound attributed to the zapatara may belong to another species."

@ViktorAlm
Copy link
Author

ViktorAlm commented Jul 30, 2020

Hey! Havent followed this. Yeah the results were not that good but I managed to get it to say a few words that were difficult to distinguish. Tried it on > 100 people and only one could guess all the cloned voices. If I did it again I would test something else, like mellotron or something that followed / transformer based. In general I think you need a lot more data and cleaner than what I was using. It mixes dialects to the left and right so another voice embedding modell trained on Swedish would probably be necessary.

@Archviz360
Copy link

Hm?? strange i got this error

Could not find any synthesizer weights under synthesizer\saved_models\pretrained\taco_pretrained

Can someone help me??

@ghost ghost mentioned this issue Oct 8, 2021
@Archviz360
Copy link

i got this error

Command "python setup.py egg_info" failed with error code 1 in C:\Users\dacia\AppData\Local\Temp\pip-install-1lqbf8bh\PyQt5\

obs, i run win11 could it be why I get that error??

@Archviz360
Copy link

PS C:\Users\dacia\Desktop\Eudora\Real-Time-Voice-Cloning-400_pretrained_swe_301> python3 demo_toolbox.py
Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Manage App Execution Aliases.
PS C:\Users\dacia\Desktop\Eudora\Real-Time-Voice-Cloning-400_pretrained_swe_301> python demo_toolbox.py
C:\Users\dacia\Desktop\Eudora\Real-Time-Voice-Cloning-400_pretrained_swe_301\encoder\audio.py:13: UserWarning: Unable to import 'webrtcvad'. This package enables noise removal and is recommended.
warn("Unable to import 'webrtcvad'. This package enables noise removal and is recommended.")
Traceback (most recent call last):
File "demo_toolbox.py", line 2, in
from toolbox import Toolbox
File "C:\Users\dacia\Desktop\Eudora\Real-Time-Voice-Cloning-400_pretrained_swe_301\toolbox_init_.py", line 1, in
from toolbox.ui import UI
File "C:\Users\dacia\Desktop\Eudora\Real-Time-Voice-Cloning-400_pretrained_swe_301\toolbox\ui.py", line 15, in
import umap
File "C:\Users\dacia\AppData\Local\Programs\Python\Python37\lib\site-packages\umap_init_.py", line 2, in
from .umap_ import UMAP
File "C:\Users\dacia\AppData\Local\Programs\Python\Python37\lib\site-packages\umap\umap_.py", line 47, in
from pynndescent import NNDescent
File "C:\Users\dacia\AppData\Local\Programs\Python\Python37\lib\site-packages\pynndescent_init_.py", line 3, in
from .pynndescent_ import NNDescent, PyNNDescentTransformer
File "C:\Users\dacia\AppData\Local\Programs\Python\Python37\lib\site-packages\pynndescent\pynndescent_.py", line 16, in
import pynndescent.sparse as sparse
File "C:\Users\dacia\AppData\Local\Programs\Python\Python37\lib\site-packages\pynndescent\sparse.py", line 10, in
from pynndescent.utils import norm, tau_rand
File "C:\Users\dacia\AppData\Local\Programs\Python\Python37\lib\site-packages\pynndescent\utils.py", line 8, in
from numba.core import types
ModuleNotFoundError: No module named 'numba.core'
PS C:\Users\dacia\Desktop\Eudora\Real-Time-Voice-Cloning-400_pretrained_swe_301>

@Archviz360
Copy link

hm? its gone i cant down load the required saved models for swedish version. :( cna some one please send it to my email peacenet@hotmail.se

when i go to this link https://github.com/blue-fish/Real-Time-Voice-Cloning/tree/400_pretrained_swe_301 it says not found. i remeber there it was important files that i needed to get it to work. if some one could send them to me so it would be greate. does some one have all the files so we can restore the page??

@raccoonML
Copy link

All files needed to run the Swedish RTVC models from @ViktorAlm can be found here.
https://github.com/raccoonML/Real-Time-Voice-Cloning/releases/tag/Swedish-1

@bryant0918
Copy link

bryant0918 commented Jun 10, 2022

What did you train your Swedish models on? I would love a pointer to some good datasets. I'm trying to replicate your Swedish Tensorflow models in PyTorch. @ViktorAlm

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants