# Mapping Emotions Between Music and Lyrics

In this project we are explore the alignment of emotions between music and lyrics. 

To this end we are leveraging `MuLan embeddings`, a joint audio-language embedding model. By mapping music audio and natural language descriptions into a shared embedding space, MuLan enables us to capture semantic and emotional similarities across modalities. Our aim is to analyze and visualize the emotional alignment between music and lyrics.

## Initial Setup

Here we will setup basic dependencies for our project. Like enviorment file reading library loading etc 

In [1]:
from acrcloud.recognizer import ACRCloudRecognizer
import os
from dotenv import load_dotenv
from Meta_Data_Collection import MetaDataCollection 

load_dotenv()

acr_host=os.getenv('HOST')
acr_access_key=os.getenv('ACCESS_KEY')
arc_secret=os.getenv('SECRET_KEY')

## Data Collection

For our data we are using a combination of the [DEAM dataset](https://cvml.unige.ch/databases/DEAM/) for audio and [LyricWiki](https://pypi.org/project/lyricwikia/) for the lyrics. 

- DEAM contains over 1,800 songs with annotations for dynamic and continuous emotion labels (valence and arousal) at the level of short segments. 

- While it does not contain raw audio data, LyricWiki and Musixmatch add lyric annotations for semantic and sentiment analysis.

### Getting Audio Meta-data

Altought DEAM contains the raw audio files with annotations, it however does not contain the meta-data (Song Titel, Author, Year of release.. etc) which will be required to get the lyrics from lyric wiki. So for this we will be using `ACR Cloud` api. 

- ACR Cloud is recognition services that can identify music based on audio fingerprints.

In [2]:
## Setup ACR Cloud for music recognition and get all the metadata

config = {
    "host": acr_host, 
    "access_key": acr_access_key,
    "access_secret": arc_secret,
    "timeout": 10 #seconds
}

recognizer = ACRCloudRecognizer(config)
metadata_collector = MetaDataCollection(recognizer)

audio_directory = "Data/MEMD_audio"
output_csv = "song_metadata.csv"

# metadata_collector.process_audio_files(audio_directory, output_csv)

In [3]:
metadata_collector.process_audio_files("Data/Test_data", "test_metadata.csv")

Processing: Data/Test_data/2041.mp3
{'cost_time': 0.93899989128113, 'result_type': 0, 'status': {'code': 0, 'msg': 'Success', 'version': '1.0'}, 'metadata': {'timestamp_utc': '2024-11-16 19:49:11', 'music': [{'title': 'Little White Church', 'album': {'name': 'Hymns About Her'}, 'external_ids': {'isrc': 'USCGJ0603914', 'upc': '0842841006908'}, 'duration_ms': 155400, 'db_begin_time_offset_ms': 0, 'artists': [{'name': 'Steven Dunston'}], 'sample_begin_time_offset_ms': 0, 'sample_end_time_offset_ms': 9540, 'acrid': 'cd1cf39209e5ee0c46e223734756b802', 'result_from': 1, 'release_date': '2006-01-01', 'external_metadata': {'deezer': {'artists': [{'id': '312289', 'name': 'Steven Dunston'}], 'album': {'id': '7578517', 'name': 'Hymns About Her'}, 'track': {'id': '76429152', 'name': 'Little White Church'}}, 'spotify': {'artists': [{'id': '4Vn0PVNmV1KLouN7qQQ74I', 'name': 'Steven Dunston'}], 'album': {'id': '4z1QEGLAbKzlaPrXvSnSCj', 'name': 'Hymns About Her'}, 'track': {'id': '6RMgrY7XTjhkiSKnd8X2r

In [32]:
result = recognizer.recognize_by_file("Data/Test_Audio/2040.mp3", 0)

In [33]:
import pprint
from json import loads

res_dict = loads(result)

pprint.pprint(res_dict.get("metadata"))

{'music': [{'acrid': '3b197df2f91af89d59af236799f4c5a4',
            'album': {'name': 'The 9/11 Conspiracy Blues'},
            'artists': [{'name': 'Ralph Buckley'}],
            'db_begin_time_offset_ms': 0,
            'db_end_time_offset_ms': 9000,
            'duration_ms': 221780,
            'external_ids': {'isrc': 'USY280786817', 'upc': '0634479656262'},
            'external_metadata': {'deezer': {'album': {'id': '803322',
                                                       'name': 'The 9/11 '
                                                               'Conspiracy '
                                                               'Blues'},
                                             'artists': [{'id': '1102664',
                                                          'name': 'Ralph '
                                                                  'Buckley'}],
                                             'track': {'id': '8742204',
                                    