# Text to Speech 🗣️

## Introduction
This notebook demonstrates how to convert `text to speech` (TTS) using the 🤗 Hugging Face Transformers library. The TTS technology allows you to convert written text into spoken words, making it useful for various applications such as accessibility, virtual assistants, and more. We will go through the steps of setting up the environment, install the necessary dependencies, suppress non-critical warnings, create a TTS pipeline, and convert text into speech.

### Install and Import Libraries

Here is a brief description of the required libraries:
- The trasformers library is a library by Hugging Face that provides pre-trained models and tools for natural language processing (NLP) and other tasks such as computer vision and audio processing. 

- The Gradio library is a library for building user-friendly web-based interfaces for machine learning models and data pipelines. It allows you to create interactive demos with minimal code.e. 

- The PyTorch Image Models (Timm) library is a collection of pre-trained image models and tools for image classification and other computer vision tasks.

- Inflect is a library for handling English word pluralization, singularization, and inflection of numbers into words. It is useful for generating grammatically correct text dynamically. 

- Phonemizer is a library for converting text into phonemes using different backend phonetic transcribers. It supports multiple languages and phonetic notations.


In [1]:
# Install and update the necessary libraries
%pip install transformers
%pip install gradio
%pip install timm
%pip install inflect
%pip install phonemizer

Collecting transformers
  Downloading transformers-4.47.1-py3-none-any.whl.metadata (44 kB)
Collecting huggingface-hub<1.0,>=0.24.0 (from transformers)
  Downloading huggingface_hub-0.27.1-py3-none-any.whl.metadata (13 kB)
Collecting regex!=2019.12.17 (from transformers)
  Downloading regex-2024.11.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers)
  Downloading tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting safetensors>=0.4.1 (from transformers)
  Downloading safetensors-0.5.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Collecting tqdm>=4.27 (from transformers)
  Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Downloading transformers-4.47.1-py3-none-any.whl (10.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m78.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading huggin

Collecting pydantic-core==2.27.2 (from pydantic>=2.0->gradio)
  Downloading pydantic_core-2.27.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting click>=8.0.0 (from typer<1.0,>=0.12->gradio)
  Downloading click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting shellingham>=1.3.0 (from typer<1.0,>=0.12->gradio)
  Downloading shellingham-1.5.4-py2.py3-none-any.whl.metadata (3.5 kB)
Collecting rich>=10.11.0 (from typer<1.0,>=0.12->gradio)
  Downloading rich-13.9.4-py3-none-any.whl.metadata (18 kB)
Collecting markdown-it-py>=2.2.0 (from rich>=10.11.0->typer<1.0,>=0.12->gradio)
  Downloading markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0,>=0.12->gradio)
  Downloading mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)
Downloading gradio-5.11.0-py3-none-any.whl (57.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.5/57.5 MB[0m [31m58.9 MB/s[0m eta 

In [5]:
%pip install torch

Note: you may need to restart the kernel to use updated packages.


**Note:**  `py-espeak-ng`is a text-to-speech software that must be installed on your system.

- REQUIREMENTS TO INSTALL E-SPEAK in GitHub Codespace
  - Install espeak in Codespace Run using the following commands in the terminal:
    - sudo apt-get update
    - sudo apt-get install -y espeak
  - or
    - sudo apt-get update
    - sudo apt-get install espeak-ng
    - pip install py-espeak-ng
  - Verify Installation Check if espeak is installed correctly:
    - espeak --version

- Requirements to save changes in a GitHub Codespace, following these steps:
  - Make your changes in the file.
  - Open the terminal in your Codespace.
  - Stage the changes:
    - git add .
    - git commit -m "Installation espeak"
    - git push

In [6]:
# Import necessary libraries
import warnings
from transformers import pipeline
from IPython.display import Audio as IPythonAudio
from transformers.utils import logging

RuntimeError: Failed to import transformers.pipelines because of the following error (look up to see its traceback):
partially initialized module 'torchvision' has no attribute 'extension' (most likely due to a circular import)

If you encounter any errors, be sure to check for compatibility issues between the libraries and resolve them accordingly.

In [7]:
# Update/Reinstall Libraries
%pip uninstall transformers torch torchvision -y
%pip install torch==2.5.1 torchvision==0.20.1 transformers

Found existing installation: transformers 4.47.1
Uninstalling transformers-4.47.1:
  Successfully uninstalled transformers-4.47.1
Found existing installation: torch 2.5.1+cpu
Uninstalling torch-2.5.1+cpu:
  Successfully uninstalled torch-2.5.1+cpu
Found existing installation: torchvision 0.20.1
Uninstalling torchvision-0.20.1:
  Successfully uninstalled torchvision-0.20.1
Note: you may need to restart the kernel to use updated packages.
Collecting torch==2.5.1
  Downloading torch-2.5.1-cp312-cp312-manylinux1_x86_64.whl.metadata (28 kB)
Collecting torchvision==0.20.1
  Using cached torchvision-0.20.1-cp312-cp312-manylinux1_x86_64.whl.metadata (6.1 kB)
Collecting transformers
  Using cached transformers-4.47.1-py3-none-any.whl.metadata (44 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch==2.5.1)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch==2.5.1)
  Downloading nvidia

In [1]:
# Verify the installation to check the installed versions to ensure they align
%pip show torch torchvision transformers

Name: torch
Version: 2.5.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3-Clause
Location: /usr/local/python/3.12.1/lib/python3.12/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvtx-cu12, setuptools, sympy, triton, typing-extensions
Required-by: timm, torchvision
---
Name: torchvision
Version: 0.20.1
Summary: image and video datasets and models for torch deep learning
Home-page: https://github.com/pytorch/vision
Author: PyTorch Core Team
Author-email: soumith@pytorch.org
License: BSD
Location: /usr/local/python/3.12.1/lib/python3.12/site-packages
Requires: numpy, pillow, torch
Required-by: timm


In [2]:
# Import necessary libraries
import warnings
from transformers import pipeline
from IPython.display import Audio as IPythonAudio
from transformers.utils import logging

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
#from transformers.utils import logging
# Suppress non-critical log messages
logging.set_verbosity_error()

# Suppress specific warnings
#warnings.filterwarnings("ignore", message="Using the model-agnostic default `max_length`")

### Build the `text-to-speech` pipeline using the 🤗 Transformers Library

#### Build the TTS Pipeline

The "kakao-enterprise/vits-ljs" model has been selected for converting text input into human-like speech. Info about [kakao-enterprise/vits-ljs](https://huggingface.co/kakao-enterprise/vits-ljs)

In [4]:
#from transformers import pipeline
narrator = pipeline("text-to-speech", model="kakao-enterprise/vits-ljs")
                    

If you encounter any errors, be sure to check for compatibility issues between the libraries and resolve them accordingly.

In [5]:
# Display information about the installed versions of the transformers, torch, and torchvision libraries
%pip show transformers torch torchvision


Name: transformers
Version: 4.47.1
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /usr/local/python/3.12.1/lib/python3.12/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: 
---
Name: torch
Version: 2.5.1+cpu
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3-Clause
Location: /home/codespace/.local/lib/python3.12/site-packages
Requires: filelock, fsspec, jinja2, networkx, setuptools, sympy, typing-extensions
Required-by: timm, torchvision
---
Name: torchvisio

In [6]:
# Update/Reinstall Libraries
%pip uninstall transformers torch torchvision -y
%pip install torch==2.5.1 torchvision==0.20.1 transformers

Found existing installation: transformers 4.47.1
Uninstalling transformers-4.47.1:
  Successfully uninstalled transformers-4.47.1
Found existing installation: torch 2.5.1+cpu
Uninstalling torch-2.5.1+cpu:
  Successfully uninstalled torch-2.5.1+cpu
Found existing installation: torchvision 0.20.1
Uninstalling torchvision-0.20.1:
  Successfully uninstalled torchvision-0.20.1
Note: you may need to restart the kernel to use updated packages.
Collecting torch==2.5.1
  Downloading torch-2.5.1-cp312-cp312-manylinux1_x86_64.whl.metadata (28 kB)
Collecting torchvision==0.20.1
  Using cached torchvision-0.20.1-cp312-cp312-manylinux1_x86_64.whl.metadata (6.1 kB)
Collecting transformers
  Using cached transformers-4.47.1-py3-none-any.whl.metadata (44 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch==2.5.1)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch==2.5.1)
  Downloading nvidia

In [1]:
# Verify the installation to check the installed versions to ensure they align
%pip show torch torchvision transformers

Name: torch
Version: 2.5.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3-Clause
Location: /usr/local/python/3.12.1/lib/python3.12/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvtx-cu12, setuptools, sympy, triton, typing-extensions
Required-by: timm, torchvision
---
Name: torchvision
Version: 0.20.1
Summary: image and video datasets and models for torch deep learning
Home-page: https://github.com/pytorch/vision
Author: PyTorch Core Team
Author-email: soumith@pytorch.org
License: BSD
Location: /usr/local/python/3.12.1/lib/python3.12/site-packages
Requires: numpy, pillow, torch
Required-by: timm


In [5]:
#Re-built the TTS pipeline

# Import the pipeline function from the transformers library
from transformers import pipeline

# Create a text-to-speech pipeline using the pre-trained model "kakao-enterprise/vits-ljs"
narrator = pipeline("text-to-speech", model="kakao-enterprise/vits-ljs")
                    

- Define Input Text

In [6]:
# Define the text that you want to convert to speech
text = """
Researchers at the Allen Institute for AI, \
HuggingFace, Microsoft, the University of Washington, \
Carnegie Mellon University, and the Hebrew University of \
Jerusalem developed a tool that measures atmospheric \
carbon emitted by cloud servers while training machine \
learning models. After a model’s size, the biggest variables \
were the server’s location and time of day it was active.
"""

In [7]:
# Use the text-to-speech pipeline to convert the text to speech
narrated_text = narrator(text)
# Uncomment the following line to print the details of the generated audio
#print(narrated_text)

- Display and play the audio output in the notebook

In [8]:
from IPython.display import Audio as IPythonAudio

# Display and play the audio output in the notebook
audio_data = narrated_text["audio"]
IPythonAudio(audio_data, rate=narrated_text["sampling_rate"])

## Conclusion
In conclusion, this notebook demonstrates how to effectively set up and use the Hugging Face Transformers library for `text-to-speech tasks`. The outlined process leverages the powerful capabilities of pre-trained models to generate high-quality audio, making it a valuable tool for various applications, including accessibility and virtual assistants. The model "kakao-enterprise/vits-ljs" is a pre-trained neural network designed for TTS tasks and it was trained on a large dataset of English speech. It converts text input into human-like speech with a focus on natural prosody and tone, making it suitable for applications like virtual assistants, and narrations. In summary, kakao-enterprise/vits-ljs offers high-quality and natural-sounding speech but may be resource-intensive and limited to specific languages and use cases. If you encounter any errors, be sure to check for compatibility issues between the libraries and resolve them accordingly.

### Next Steps
- Try this model with your own text to speech examples!