# First pass at developing the TTS pipeline

Using off the shelf hugging-face models to build the transcription -> translation -> summarisation pipeline.

### Lets start with a transcription model

Looks like the `openai/whisper-small` model would be appropriate, it does French to French transcription.

In [3]:
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from datasets import Audio, load_dataset

In [12]:
# Loade model and processor
transcription_processor = WhisperProcessor.from_pretrained("openai/whisper-small")
transcription_model = WhisperForConditionalGeneration.from_pretrained(
    "openai/whisper-small"
)
forced_decoder_ids = transcription_processor.get_decoder_prompt_ids(
    language="french", task="transcribe"
)

In [13]:
# load streaming dataset and read first audio sample
ds = load_dataset(
    "facebook/multilingual_librispeech", "french", split="test", streaming=True
)
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
input_speech = next(iter(ds))["audio"]
input_features = processor(
    input_speech["array"],
    sampling_rate=input_speech["sampling_rate"],
    return_tensors="pt",
).input_features

Resolving data files:   0%|          | 0/48 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/34 [00:00<?, ?it/s]

In [14]:
# generate token ids
predicted_ids = transcription_model.generate(
    input_features, forced_decoder_ids=forced_decoder_ids
)

In [15]:
# decode token ids to text
transcription = transcription_processor.batch_decode(predicted_ids)
transcription

["<|startoftranscript|><|fr|><|transcribe|><|notimestamps|> Pendant le second siècle, je fis serment d'ouvrir tous les trésors de la terre, à qui compte-me mettre en liberté. Mais je ne fus pas plus heureux. Dans le troisième, je promis de faire puissant mon arc, mon libérateur, d'être toujours près de lui en esprit."]

In [16]:
# transcription without special characters
transcription = transcription_processor.batch_decode(
    predicted_ids, skip_special_tokens=True
)
transcription

[" Pendant le second siècle, je fis serment d'ouvrir tous les trésors de la terre, à qui compte-me mettre en liberté. Mais je ne fus pas plus heureux. Dans le troisième, je promis de faire puissant mon arc, mon libérateur, d'être toujours près de lui en esprit."]

### And now onto translation

Should be relatively straightforward

In [1]:
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

In [17]:
# load model and tokenizer
translation_model = MBartForConditionalGeneration.from_pretrained(
    "facebook/mbart-large-50-many-to-many-mmt"
)
translation_tokenizer = MBart50TokenizerFast.from_pretrained(
    "facebook/mbart-large-50-many-to-many-mmt"
)



In [18]:
# translate from french to english
translation_tokenizer.src_lang = "fr_XX"
encode_fr = translation_tokenizer(transcription[0], return_tensors="pt")
generated_tokens = translation_model.generate(
    **encode_fr, forced_bos_token_id=translation_tokenizer.lang_code_to_id["en_XX"]
)

In [19]:
translation = translation_tokenizer.batch_decode(
    generated_tokens, skip_special_tokens=True
)
translation

['In the second century, I swore to open all the treasures of the earth, to whom I was about to release, but I was no happier. In the third, I promised to make my bow, my liberator, powerful, to be always close to him in mind.']

### And Finally: Summarisation

Lets use the facebook model

In [21]:
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

In [None]:
print(summarizer(translation[0], max_length="30", min_length="10", do_sample=False))