# Story Score
Story-Score is an example project that aims at automatically adding background scores to textual storylines.

Let us consider the following example story line. This project showcases `huggingface transformer` models in conjunction with `beatoven.ai` to create a background score for the text.

In [None]:
storylines = [
    "Oh my god, that was so scary. The ghost of Colonel Sanders was eating at my local K F C.",
    "As Rachel was looking around, everyone was in happy spirits as they were dancing."
]

### Install dependencies
```
pip install -r requirements.txt
```

Here we utilize the microsoft speecht5 model to synthesis speech from out text above.

In [None]:
from transformers import pipeline
speech_synthesiser = pipeline("text-to-speech", "microsoft/speecht5_tts")

In [None]:
from datasets import load_dataset
import torch
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embedding = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)

Also, we utilize a emotion detector model hosted on huggingface Hub

In [None]:
from transformers import pipeline
text_emotion_detector = pipeline(task="text-classification", model="SamLowe/roberta-base-go_emotions", top_k=3)

We utilize Beatoven.ai to create our background score, for this we need the [Beatoven Public SDK](https://github.com/Beatoven/public-api/tree/main/sdk) and an `API_KEY` as per the usage example in the [docs](https://github.com/Beatoven/public-api/tree/main/sdk#usage).

```python
os.environ["BEATOVEN_API_KEY"] = ""
```

In [None]:
from dotenv import load_dotenv
load_dotenv()

In [None]:
from pydub import AudioSegment
from scipy.io import wavfile
from beatoven_sdk import compose_new_track
from utils import extract_emotion, download_track

output_audio = AudioSegment.empty()
count = 0
for sentence in storylines:
    count += 1
    speech = speech_synthesiser(sentence, forward_params={"speaker_embeddings": speaker_embedding})
    sampling_rate = speech["sampling_rate"]
    audio_data = speech["audio"].squeeze()
    duration = (len(audio_data)/sampling_rate)*1000
    wavfile.write("audio/text_speech_"+str(count)+".wav", rate=sampling_rate, data=audio_data)

    mood = extract_emotion(text_emotion_detector(sentence)[0])
    print(mood)
    track_id, track_url = await compose_new_track(
        title="my story lines "+str(count),
        track_duration=duration,
        track_genre="cinematic",
        track_tempo="medium",
        mood=mood,
    )
    print(track_url)
    await download_track(track_url,"audio/composed_track_"+str(count)+".mp3")

    text_speech = AudioSegment.from_wav("./audio/text_speech_"+str(count)+".wav")
    background_score = AudioSegment.from_mp3("./audio/composed_track_"+str(count)+".mp3")
    background_score = background_score-6
    output_audio = text_speech.overlay(background_score, position=0)
    output_audio.export("audio/output_"+str(count)+".mp3", format="mp3")

In [None]:
from IPython.display import Audio
Audio("output.mp3", rate=sampling_rate)