Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to integrate into 馃 Hub #555

Merged
merged 7 commits into from
May 14, 2021

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Apr 30, 2021

Hi TensorSpeech team! I hereby propose an integration with the HuggingFace model hub 馃

This integration would allow you to freely download/upload models from/to the Hugging Face Hub: https://huggingface.co/.

Your users could then directly download model weights, etc within Python without having to manually downloads weights.
Taking your fastspeech_2_inference.ipynb example the following diff would show the code could change to be able to directly download weights from the model hub.

import tensorflow as tf

-from tensorflow_tts.inference import AutoConfig
from tensorflow_tts.inference import TFAutoModel
from tensorflow_tts.inference import AutoProcessor

processor = AutoProcessor.from_pretrained(
-    pretrained_path="../tensorflow_tts/processor/pretrained/ljspeech_mapper.json"
+   pretrained_path="tensorspeech/fastspeech2_tts"
)

input_text = "i love you so much."
input_ids = processor.text_to_sequence(input_text)

-config = AutoConfig.from_pretrained("../examples/fastspeech2/conf/fastspeech2.v1.yaml")
fastspeech2 = TFAutoModel.from_pretrained(
-    config=config, 
-    pretrained_path="../examples/fastspeech2/checkpoints/model-150000.h5",
+   pretrained_path="tensorspeech/fastspeech2_tts"
    is_build=True,
    name="fastspeech2"
)

mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference(
    input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
    speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32),
    speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
    f0_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32),
    energy_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32),
)

As an example, I uploaded a fastspeech model to this repo of the HF hub:
I uploaded some weights exemplary to the hub here: https://huggingface.co/patrickvonplaten/tf_tts_fast_speech_2.
If you'd like to add this feature to your library we would obviously change the organization name from patrickvonplaten to tensorspeech.

You can try it out by running the following code:

import tensorflow as tf

from tensorflow_tts.inference import TFAutoModel
from tensorflow_tts.inference import AutoProcessor

processor = AutoProcessor.from_pretrained(pretrained_path="patrickvonplaten/tf_tts_fast_speech_2")

input_text = "i love you so much."
input_ids = processor.text_to_sequence(input_text)

fastspeech2 = TFAutoModel.from_pretrained(
    pretrained_path="patrickvonplaten/tf_tts_fast_speech_2",
    is_build=True,
    name="fastspeech2"
)

mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference(
    input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
    speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32),
    speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
    f0_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
    energy_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
)

Besides freely storing your model weights, we also provide git version control and download statistics for your models :-) We can also provide you with a hosted inference API where users could try out your models directly on the website.

We've already integrated with a couple of other libraries - you can check them out here:

Sorry for the missing tests in the PR - I just did the minimal changes to showcase you how the integration with the HF hub could look like :-) I'd also be more than happy to add you guys to a Slack channel where we could discuss further.

Cheers,
Patrick & Hugging Face team

Also cc @julien-c

@dathudeptrai dathudeptrai self-requested a review May 2, 2021 12:35
@dathudeptrai dathudeptrai self-assigned this May 2, 2021
@dathudeptrai dathudeptrai added enhancement 馃殌 New feature or request Feature Request 馃 Feature support labels May 2, 2021
@dathudeptrai
Copy link
Collaborator

@patrickvonplaten Thank you so much, this is a really great and useful feature :D. I have learned a lot from the huggingface transformers repo and as you can see, our repo has the same structure as the transformers repo then it would easily to integrated with HuggingFace_hub. I'm on a vacation and will be back in a few days. :D.

@dathudeptrai dathudeptrai merged commit f53ecd9 into TensorSpeech:master May 14, 2021
@dathudeptrai
Copy link
Collaborator

@patrickvonplaten Merged :D. Can you tell me what is a next steps ?

@patrickvonplaten
Copy link
Contributor Author

Hey @dathudeptrai,

Awesome to see that the PR is merged 馃コ In a next step, I think we can create an organization on the hub, here: https://huggingface.co/organizations/new (maybe called Tensorspeech ?) and then if you want we can upload a bunch of your models and create a demo widget to showcase them 馃檪

Also cc @julien-c , @osanseviero

@dathudeptrai
Copy link
Collaborator

@patrickvonplaten I just added tensorspeech organization in hf hub. Let me do the remaining jobs :D.

@osanseviero
Copy link

@dathudeptrai thank you for creating the org! That's awesome.

There are some additional steps in our side. The two main things missing, I think, are:

  • Add a code snippet that says how to use the model with TensorFlowTTS. Something along these lines but for TensorFlowTTS.

Screen Shot 2021-05-17 at 10 15 15 AM

  • Add widget for TensorFlowTTS models. User would input a sentence and then we can provide an audio. This will be a great way to showcase the models! Something like this:

Screen Shot 2021-05-17 at 10 14 22 AM

@dathudeptrai something that could be interesting is to implement a push_to_hub method. This will allow your users to easily share their models by uploading to the hub. This also facilitates creating automatic model cards, making sure all the tags are correct, and more. What do you think?

@osanseviero
Copy link

@dathudeptrai by looking at the examples and familiarizing myself with the library, I was wondering if you would have an idea of the example code snippet that will be shown to the users. From what I see, there are two open questions:

  • After doing model.inference to generate the mel-spectogram, we'll still need to do melgan.inference on top of it to get speech. Is this right, or is there a better approach to make the speech? Alternatively, would it be ok if the code snippet only shows how to do the initial inference?
  • I see that the .inference method signature is different depending on which model we're using, which might make it a bit harder to implement things in a generic way which works for all of them. If you have an example function that deals with these differences, it would be greatly appreciated.

Thank you for the library! I've been playing with it and it's awesome!

@dathudeptrai
Copy link
Collaborator

dathudeptrai commented May 17, 2021

@osanseviero

After doing model.inference to generate the mel-spectogram, we'll still need to do melgan.inference on top of it to get speech. Is this right, or is there a better approach to make the speech? Alternatively, would it be ok if the code snippet only shows how to do the initial inference?

Yes, almost TTS model now is 2 stages (text2mel and mel2wav). We can combine into one end2end model for the inference stage :D.

I see that the .inference method signature is different depending on which model we're using, which might make it a bit harder to implement things in a generic way which works for all of them. If you have an example function that deals with these differences, it would be greatly appreciated.

Unlike transformers for NLP where the input is almost the same, the text2mel's inputs are varied, they can add more input such as speaker_ids (for multi-speakers), language_ids (for multi-lingual), speaker_embeddings (for voice clone), style embedding (for emotional TTS) and some inputs to adjust speed, f0, energy ... But generally, we only need 2 inputs (input_ids and speaker_ids) :D.

@osanseviero
Copy link

Hi @dathudeptrai. We got some exciting news!

Last week our team worked on open-sourcing the code for adding code snippets as well as running the inference API for other libraries. This is in the huggingface_hub repo. This PR adds the code snippet as we discussed :) your users will already benefit from being able to search for all TensorFlowTTS models.

@dathudeptrai
Copy link
Collaborator

@osanseviero Awesome! :D. I'm uploading all our models to https://huggingface.co/tensorspeech, will add a model card soon :D

@osanseviero
Copy link

Awesome! I'm looking forward to see this :)

As a tip, you can use different tags text-to-mel and mel-to-wav so the code snippets are more complete for your users. Example.

@neso613
Copy link

neso613 commented Jul 10, 2021

@patrickvonplaten patrickvonplaten deleted the add_tf_hub branch July 10, 2021 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 馃殌 New feature or request Feature Request 馃 Feature support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants