# Speech to text inference using Massively Multilingual Speech (MMS): Text-to-Speech English and Polish

Import necessary libraries.

Load the pre-trained MMS-TTS model for the English language.
* `from_pretrained` downloads the model's weights and configuration from Hugging Face Hub.
* Load the corresponding tokenizer, which knows how to process English text for this model.

Define the input text in English and tokenize the input text.

An at the end perform the text-to-speech inference.

In [None]:
from transformers import VitsModel, AutoTokenizer
import torch

model = VitsModel.from_pretrained("facebook/mms-tts-eng")
tokenizer = AutoTokenizer.from_pretrained("facebook/mms-tts-eng")

text = "some example text in the English language"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    output = model(**inputs).waveform

## Audio playback in an interactive environment.

Create and display an audio player.
* `output.numpy()` converts the PyTorch tensor to a NumPy array.
* `rate=model.config.sampling_rate` sets the audio playback sample rate, which is required for correct speed.


In [None]:
from IPython.display import Audio

Audio(output.numpy(), rate=model.config.sampling_rate)

## Saving the generated audio to a WAV file.

In [None]:
import scipy

scipy.io.wavfile.write("techno.wav", rate=model.config.sampling_rate, data=output.float().numpy().T)

## Speech generation for Polish language

Steps are the same as for the English

In [None]:
from transformers import VitsModel, AutoTokenizer
import torch

model = VitsModel.from_pretrained("facebook/mms-tts-pol")
tokenizer = AutoTokenizer.from_pretrained("facebook/mms-tts-pol")

text = "w szczebrzeszynie chrząszcz brzmi w trzcinie"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    output = model(**inputs).waveform

In [None]:
from IPython.display import Audio

Audio(output.numpy(), rate=model.config.sampling_rate)

In [None]:
import scipy

scipy.io.wavfile.write("output.wav", rate=model.config.sampling_rate, data=output.float().numpy().T)