Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding TTS Tutorials #1584

Merged
merged 7 commits into from
Jun 2, 2022
Merged

Adding TTS Tutorials #1584

merged 7 commits into from
Jun 2, 2022

Conversation

Aya-AlJafari
Copy link
Contributor

No description provided.

Copy link
Contributor

@reuben reuben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed feedback on Element. Looking good 🚀

@erogol
Copy link
Member

erogol commented May 25, 2022

Looking good but notebooks are not testable. So far any notebook we released as a tutorial could not be maintained. We need a way to have this notebook in the CI tests.

@reuben
Copy link
Contributor

reuben commented May 25, 2022

They are testable, we test our notebooks in the STT CI. Can probably copy that and adapt.

@Aya-AlJafari Aya-AlJafari changed the title Adding inferencing notebook Adding TTS Tutorials May 27, 2022
@erogol
Copy link
Member

erogol commented May 29, 2022

They are testable, we test our notebooks in the STT CI. Can probably copy that and adapt.

can you link me where in the STT?

@erogol
Copy link
Member

erogol commented May 29, 2022

@Aya-AlJafari I see you are still committing. Should I wait for more?

@reuben
Copy link
Contributor

reuben commented May 29, 2022

They are testable, we test our notebooks in the STT CI. Can probably copy that and adapt.

can you link me where in the STT?

Sorry I shared with Aya on chat but forgot to add here.

the STT notebook CI I referred to in the PR is here and here
the gist of it is that you can use jupyter nbconvert --to notebook --execute to run all cells of a notebook programmatically
there are also ways to replace variables or disable cells in certain cases but I'm not too familiar, I can do some research if we need that

"\n",
"So, let's jump right in!\n",
"\n",
"*PS - If you just want a working, off-the-shelf model, check out the [🐸 Model Zoo](https://www.coqui.ai/models)*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model zoo doesn't have TTS models.

"\n",
"If you have a single audio file and you need to **split** it into clips. It is also important to use a lossless audio file format to prevent compression artifacts. We recommend using **wav** file format.\n",
"\n",
"The data format we will be adopting for this tutorial is taken from widely-used the **LJSpeech** dataset, where **waves** are collected under a folder:\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The data format we will be adopting for this tutorial is taken from widely-used the **LJSpeech** dataset, where **waves** are collected under a folder:\n",
"The data format we will be adopting for this tutorial is taken from the widely-used **LJSpeech** dataset, where **waves** are collected under a folder:\n",

"\n",
"### **First things first**: we need some data.\n",
"\n",
"We're training a Text-to-Speech model, so we need some _text_ and we need some _speech_. Specificially, we want _transcribed speech_. The speech must be divided into audio clips and each clip needs transcription. \n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also many other requirements in terms of the recording characteristics, background noise, vocabulary coverage, etc. Even if going into details is not appropriate here we should at least link to more extensive documentation.

"<span style=\"color:purple;font-size:15px\">\n",
"/wavs<br /> \n",
" &emsp;| - audio1.wav<br /> \n",
" &emsp;| - udio2.wav<br /> \n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" &emsp;| - udio2.wav<br /> \n",
" &emsp;| - audio2.wav<br /> \n",

" ...<br /> \n",
"</span>\n",
"\n",
"and a **metdata.txt** file will have the audioname in parallel to the transcript, delimeted by `|`: \n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"and a **metdata.txt** file will have the audioname in parallel to the transcript, delimeted by `|`: \n",
"and a **metadata.txt** file will have the audio file name in parallel to the transcript, delimited by `|`: \n",

"## ⏳️ Loading your dataset\n",
"Load one of the dataset supported by 🐸TTS.\n",
"\n",
"For this tutorial we will be using LJSpeech dataset.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already said above.

" os.makedirs(output_path)\n",
"\n",
"dataset_config = BaseDatasetConfig(\n",
" name=\"ljspeech\", meta_file_train=\"metadata.csv\", path=os.path.join(output_path, \"LJSpeech-1.1/\")\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the examples above the metadata file has a .txt extension.

Copy link
Contributor Author

@Aya-AlJafari Aya-AlJafari May 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's a bug in the documentation as well. @erogol it should be CSV right? as opposed to what's in this page
And for a CSV, should we add a header of audioname|text?

"dataset_config = BaseDatasetConfig(\n",
" name=\"ljspeech\", meta_file_train=\"metadata.csv\", path=os.path.join(output_path, \"LJSpeech-1.1/\")\n",
")\n",
"# You need to download LJSpeech inside output_path\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make the notebook do this instead of asking people to do it.

Comment on lines +375 to +376
" --model_path $test_ckpt \\\n",
" --config_path $test_config \\\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jupyter lets you access Python variables in inline shell calls like this, so you don't have to set them in os.environ above, just create normal Python variables test_ckpt and test_config.

"metadata": {},
"source": [
"## 🎉 Congratulations! 🎉 You now have trained your first TTS model! \n",
"Follow up with the next tutorials to learn more adnavced material."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Follow up with the next tutorials to learn more adnavced material."
"Follow up with the next tutorials to learn more advanced material."

@Aya-AlJafari
Copy link
Contributor Author

@Aya-AlJafari I see you are still committing. Should I wait for more?

@erogol yes I will be adding one more tutorial today

@erogol
Copy link
Member

erogol commented Jun 1, 2022

@TrycsPublic interesting way to send commits :)

How about sending a PR? It is challenging this way to see what you changed.

@reuben
Copy link
Contributor

reuben commented Jun 1, 2022

You can even make a PR for another PR by setting the base branch to tutorials instead of dev :)

@erogol erogol merged commit 68cef28 into dev Jun 2, 2022
@erogol erogol deleted the tutorials branch June 2, 2022 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants