#Running MMS-TTS inference in Colab
In this notebook, we give an example on how to run text-to-speech inference using MMS TTS models. 

By default, we run inference on a GPU.  If you want to perform CPU inference, go to "Runtiime" menu -> "Change runtime type" and set "Hardware accelerator" to "None" before running.

## 1. Preliminaries
This section installs necessary python packages for the other sections. Run it first.

In [1]:
!pip install ttsmms

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ttsmms
  Downloading ttsmms-0.7-py3-none-any.whl (29 kB)
Collecting phonemizer (from ttsmms)
  Downloading phonemizer-3.2.1-py3-none-any.whl (90 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.6/90.6 kB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
Collecting Unidecode (from ttsmms)
  Downloading Unidecode-1.3.6-py3-none-any.whl (235 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m235.9/235.9 kB[0m [31m32.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting monotonic-align (from ttsmms)
  Downloading monotonic_align-1.0.0.tar.gz (4.8 kB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting segments (from phonemizer->ttsmms)
  Downloading segments-2.2.1-py2.py3-

## 2. Choose a language and download its checkpoint
Find the ISO code for your target language [here](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html). You can find more details about the languages we currently support for TTS in this [table](https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html).

This line is using curl, a command-line tool used for transferring data specified with URL syntax. It is downloading a file located at https://dl.fbaipublicfiles.com/mms/tts/urd-script_arabic.tar.gz and saving it as urd-script_arabic.tar.gz in the current directory.

In [2]:
!curl https://dl.fbaipublicfiles.com/mms/tts/urd-script_arabic.tar.gz --output urd-script_arabic.tar.gz #update lang

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  128M  100  128M    0     0   197M      0 --:--:-- --:--:-- --:--:--  196M


!mkdir -p audio_to_text && tar -xzf urd-script_arabic.tar.gz -C data/
This command is doing two things:

mkdir -p audio_to_text is creating a new directory named "audio_to_text". The -p flag tells mkdir to create parent directories as needed.
tar -xzf urd-script_arabic.tar.gz -C audio_to_text/ is extracting the contents of the file urd-script_arabic.tar.gz into the directory audio_to_text/.

In [3]:
!mkdir -p audio_to_text && tar -xzf urd-script_arabic.tar.gz -C audio_to_text/ # change langcode urd-script_arabic

## 4. Generate an audio given text
Specify the sentence you want to synthesize and generate the audio,Here, an instance of the TTS class is being created with the argument "audio_to_text/urd-script_arabic", which is the path to the extracted files of the Urdu script. This instance is stored in the variable urdu_lang

In [4]:
from ttsmms import TTS
urdu_lang =TTS("audio_to_text/urd-script_arabic") #change lang code urd-script_arabic

voice_ai = urdu_lang.synthesis(" لاہور (ڈیلی پاکستان آن لائن) دنیا بھر میں ماحول کے تحفظ سے متعلق شعور اجاگر کرنے کیلئے  خصوصی دن آج منایا جا رہا ہے")
##The synthesis method is used to convert the input text to speech. 
#The result is a dictionary which includes the synthesized audio data and its sampling rate.

from IPython.display import Audio
Audio(voice_ai["x"], rate= voice_ai["sampling_rate"])


In [5]:

text = 'تفصیلات کے مطابق پاکستان موسمیاتی تبدیلیوں سے سب سے زیادہ متاثر ہونے والے ممالک میں شامل ہے، گذشتہ برس ملک میں آنے والے سیلاب سے بڑے پیمانے پر تباہی ہوئی، یہ سیلاب قدرتی آفت نہیں، بنی نوع انسان کی جانب سے آلودگی پھیلانے کے نتیجے میں ماحول سے چھیڑ چھاڑ کے سبب آیا۔'

In [6]:
voice_ai_2 = urdu_lang.synthesis(text)

from IPython.display import Audio

Audio(voice_ai_2["x"], rate= voice_ai_2["sampling_rate"])

In [7]:
from scipy.io.wavfile import write

# use the "sampling_rate" for the rate, "x" for the data
write("output.wav", voice_ai_2["sampling_rate"], voice_ai_2["x"])


In [8]:
from google.colab import files

files.download('output.wav')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>