# Text to Audio

1. Prompt user for advertisement description
2. Perform sentimental analysis
3. Add the corresponding sentimental analysis into the user prompt
4. Generate the audio for the advertisement

In [1]:
# Ensure relevant libraries are installed

!pip install transformers
!pip install scipy

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


### Sentimental Analysis

In [2]:
from transformers import pipeline
classifier = pipeline("text-classification", model="j-hartmann/emotion-english-distilroberta-base", return_all_scores=True)

# Input from user is stored into userInput
userInput = input("Enter your ideal advertisement description: ")

# analysis stores the classified user input for with the different confidence level of the emotions detected
analysis = classifier(userInput)

  from .autonotebook import tqdm as notebook_tqdm
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [3]:
analysis

[[{'label': 'anger', 'score': 0.0018198109464719892},
  {'label': 'disgust', 'score': 0.0022392459213733673},
  {'label': 'fear', 'score': 0.0018623342039063573},
  {'label': 'joy', 'score': 0.4943576157093048},
  {'label': 'neutral', 'score': 0.009080654941499233},
  {'label': 'sadness', 'score': 0.4619666635990143},
  {'label': 'surprise', 'score': 0.028673775494098663}]]

Filtering of emotions to be added into the userInput, set a confidence level of minimum 0.40 to be considered as part of the mood in the advertisement.

In [4]:
filtered_analysis = sorted([item for item in analysis[0] if item['score'] > 0.4], key=lambda x: x['score'], reverse=True)

filtered_analysis

[{'label': 'joy', 'score': 0.4943576157093048},
 {'label': 'sadness', 'score': 0.4619666635990143}]

In [5]:
labels_string = "mood: " + ", ".join([item['label'] for item in filtered_analysis])
print(labels_string)

mood: joy, sadness


Final user input to be used to generate the audio

In [6]:
edited_user_input = "\n".join([userInput, labels_string])
print(edited_user_input)

A happy woman walking down the street sees a lonely cat
mood: joy, sadness


### Audio Generation from User Input

In [7]:
from transformers import pipeline
import scipy

synthesiser = pipeline("text-to-audio", "facebook/musicgen-small")

# Use the edited_user_input which contains additional information from the sentimental analysis
music = synthesiser(edited_user_input, forward_params={"do_sample": True})

# Output the final audio into a .wav file
scipy.io.wavfile.write("musicgen_out.wav", rate=music["sampling_rate"], data=music["audio"])

  self.register_buffer("padding_total", torch.tensor(kernel_size - stride, dtype=torch.int64), persistent=False)
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
