# Mapping Emotions Between Music and Lyrics

In this project we are explore the alignment of emotions between music and lyrics. 

To this end we are leveraging `MuLan embeddings`, a joint audio-language embedding model. By mapping music audio and natural language descriptions into a shared embedding space, MuLan enables us to capture semantic and emotional similarities across modalities. Our aim is to analyze and visualize the emotional alignment between music and lyrics.

# Initial Setup

Here we will setup basic dependencies for our project. Like enviorment file reading library loading etc 

In [5]:
# from acrcloud.recognizer import ACRCloudRecognizer
import os
# from dotenv import load_dotenv
from Meta_Data_Collection import MetaDataCollection 
from Meta_Data_Collection_AudD import AudDMetaDataCollection
import pandas as pd
from Lyrics_Extraction import LyricsExtractor
import time 
from lyrics_preprocessing import process_lyrics_folder
from lyrics_audio_embeddings import process_all_embeddings


# load_dotenv()

# Data Collection

For our data we are using a combination of the [DEAM dataset](https://cvml.unige.ch/databases/DEAM/) for audio and [LyricWiki](https://pypi.org/project/lyricwikia/) for the lyrics. 

- DEAM contains over 1,800 songs with annotations for dynamic and continuous emotion labels (valence and arousal) at the level of short segments. 

- While it does not contain raw audio data, LyricWiki and Musixmatch add lyric annotations for semantic and sentiment analysis.

## Getting Audio Meta-data

Altought DEAM contains the raw audio files with annotations, it however does not contain structured meta-data (Song Titel, Author, Year of release.. etc) for the 2014 entries(and since 2014 has the most number of entries) so we will get the metadata using some other apis which will be required to get the lyrics from lyric wiki.

- ACR Cloud is recognition services that can identify music based on audio fingerprints.

In [None]:
## Let's first get the metadata and lyrics for the files with good metadata provided by DEAM

df_2013 = pd.read_csv("Data/metadata/metadata_2013.csv")
df_2015 = pd.read_csv("Data/metadata/metadata_2015.csv", on_bad_lines='skip')

df_2013 = df_2013.rename(columns={
    'song_id': 'song_id',
    'Artist': 'artist',
    'Song title': 'title',
    'Genre': 'genre'
})

# Standardize column names for 2015 dataset
df_2015 = df_2015.rename(columns={
    'id': 'song_id',
    'title': 'title',
    'artist': 'artist',
    'genre': 'genre'
})

df_2013 = df_2013[['song_id', 'artist', 'title', 'genre']]
df_2015 = df_2015[['song_id', 'artist', 'title', 'genre']]

df_2013['full song'] = False
df_2015['full song'] = True

combined_df = pd.concat([df_2013, df_2015], ignore_index=True)

for col in combined_df.select_dtypes(include=['object']):  # Select only text columns
    combined_df[col] = combined_df[col].str.strip('\t')


combined_df.to_csv('./Data/metadata/combined_metadata.csv', index=False)

print("Combined dataset saved as 'combined_songs.csv'")

## We already got the lyrics and combined them with lyrics from other metadata (Was done in other branch, lost some code here)

In [2]:
acr_host=os.getenv('HOST')
acr_access_key=os.getenv('ACCESS_KEY')
arc_secret=os.getenv('SECRET_KEY')

## Setup ACR Cloud for music recognition and get all the metadata

config = {
    "host": acr_host, 
    "access_key": acr_access_key,
    "access_secret": arc_secret,
    "timeout": 10 #seconds
}

recognizer = ACRCloudRecognizer(config)
metadata_collector = MetaDataCollection(recognizer)

audio_directory = "Data/MEMD_audio"
output_csv = "song_metadata.csv"

metadata_collector.process_audio_files(audio_directory, output_csv)

Processing: Data/MEMD_audio/10.mp3
Processing: Data/MEMD_audio/1000.mp3
Processing: Data/MEMD_audio/1001.mp3
Processing: Data/MEMD_audio/1002.mp3
Processing: Data/MEMD_audio/1003.mp3
Processing: Data/MEMD_audio/1004.mp3
Processing: Data/MEMD_audio/1005.mp3
Processing: Data/MEMD_audio/1006.mp3
Processing: Data/MEMD_audio/1007.mp3
Processing: Data/MEMD_audio/1008.mp3
Processing: Data/MEMD_audio/1009.mp3
Processing: Data/MEMD_audio/101.mp3
Processing: Data/MEMD_audio/1010.mp3
Processing: Data/MEMD_audio/1011.mp3
Processing: Data/MEMD_audio/1012.mp3
Processing: Data/MEMD_audio/1013.mp3
Processing: Data/MEMD_audio/1014.mp3
Processing: Data/MEMD_audio/1015.mp3
Processing: Data/MEMD_audio/1016.mp3
Processing: Data/MEMD_audio/1017.mp3
Processing: Data/MEMD_audio/1018.mp3
Processing: Data/MEMD_audio/1019.mp3
Processing: Data/MEMD_audio/102.mp3
Processing: Data/MEMD_audio/1020.mp3
Processing: Data/MEMD_audio/1021.mp3
Processing: Data/MEMD_audio/1022.mp3
Processing: Data/MEMD_audio/1023.mp3
Proce

### Trying with different sources

Since we only got the metadata for around 800 files, let's try different sources 

We will use AudD, but since it has a Api limit of 300 per month, let's do it in batches

In [9]:
songs = pd.read_csv("Data/Useable_data/song_metadata.csv")
songs.replace("Not Found", pd.NA, inplace=True)
songs_with_no_metadata = songs[songs['artist'].isna() & songs['title'].isna()]

In [None]:
aud_api_token = os.getenv("AUDD_API_TOKEN")

data = {
    'api_token': aud_api_token,
    'return': 'apple_music,spotify',
}
    
aud_collector = AudDMetaDataCollection(data, songs_with_no_metadata[901:])

aud_collector.process_audio_files("./Data/Useable_data/Other_Songs_Metadata.csv")

In [10]:
meta_data_from_aud = pd.read_csv("./Data/Useable_data/Other_Songs_Metadata.csv")

prev_songs = songs.copy()
songs.update(meta_data_from_aud)

total_len = len(songs)
print(f"Current Meta_data length: {total_len - len(songs[songs['artist'].isna() & songs['title'].isna()])}\nPrev Meta_data length: {total_len - len(prev_songs[prev_songs['artist'].isna() & prev_songs['title'].isna()])}")

Current Meta_data length: 925
Prev Meta_data length: 816


In [11]:
songs.to_csv("./Data/Useable_data/song_metadata_updated.csv")

## Getting lyrics

Now that we have the metadata, let's load it and take a look. 

We don't have the meta-data for all the files, but let's work with what we have, and try to get the lyrics from it. 

In [2]:
## Read the metadata from the CSV file

metadata_df = pd.read_csv("Data/Useable_data/song_metadata_updated.csv")
metadata_df.replace("Not Found", pd.NA, inplace=True)
metadata_df.head()

Unnamed: 0.1,Unnamed: 0,audio_file,artist,title,genres,release_date,label
0,0,1002.mp3,Shun & Hoshina Anniversary,Dirty Shoes,,2012-07-18,Den Haku Records
1,1,1011.mp3,BERNARD MARQUIÉ,Wayne's Reflexions,,2022-12-28,BERNARD MARQUIÉ
2,2,1045.mp3,MarmutzZ,Semangat (Acoustic),,2023-12-09,MarmutzZ
3,3,1071.mp3,Subin Kumar,Mayur Nache,,2023-10-20,BS Music Entertainment
4,4,1074.mp3,Alan Clouds,Game,,2015-01-20,Alan Clouds


Let's drop the rows where all the rows are empty.

In [3]:
non_empty_metadata_df = metadata_df.dropna(how='all', subset=["artist", "title", "genres", "release_date", "label"])
non_empty_metadata_df.head()

Unnamed: 0.1,Unnamed: 0,audio_file,artist,title,genres,release_date,label
0,0,1002.mp3,Shun & Hoshina Anniversary,Dirty Shoes,,2012-07-18,Den Haku Records
1,1,1011.mp3,BERNARD MARQUIÉ,Wayne's Reflexions,,2022-12-28,BERNARD MARQUIÉ
2,2,1045.mp3,MarmutzZ,Semangat (Acoustic),,2023-12-09,MarmutzZ
3,3,1071.mp3,Subin Kumar,Mayur Nache,,2023-10-20,BS Music Entertainment
4,4,1074.mp3,Alan Clouds,Game,,2015-01-20,Alan Clouds


Now let's get on with the lyrics.

In [4]:
GENIUS_API_TOKEN = os.getenv("GENIUS_API_TOKEN")
extractor = LyricsExtractor(GENIUS_API_TOKEN)

non_empty_metadata_df['lyrics'] = pd.NA

for idx, row in non_empty_metadata_df.iterrows():
    artist = row['label'] if pd.isna(row['artist']) else row['artist']
    print(f"Fetching lyrics for: {row['title']} by {artist}")
    lyrics = extractor.fetch_lyrics(row["title"], artist)
    non_empty_metadata_df.at[idx, "lyrics"] = lyrics
    time.sleep(1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  non_empty_metadata_df['lyrics'] = pd.NA


Fetching lyrics for: Dirty Shoes by Shun & Hoshina Anniversary
Lyrics not found for 'Dirty Shoes' by 'Shun & Hoshina Anniversary'.
Fetching lyrics for: Wayne's Reflexions by BERNARD MARQUIÉ
Lyrics not found for 'Wayne's Reflexions' by 'BERNARD MARQUIÉ'.
Fetching lyrics for: Semangat (Acoustic) by MarmutzZ
Lyrics not found for 'Semangat (Acoustic)' by 'MarmutzZ'.
Fetching lyrics for: Mayur Nache by Subin Kumar
Lyrics not found for 'Mayur Nache' by 'Subin Kumar'.
Fetching lyrics for: Game by Alan Clouds
Lyrics not found for 'Game' by 'Alan Clouds'.
Fetching lyrics for: Neapolitana by ZveroboyZ
Lyrics not found for 'Neapolitana' by 'ZveroboyZ'.
Fetching lyrics for: Wait for U by araa mikey
Lyrics not found for 'Wait for U' by 'araa mikey'.
Fetching lyrics for: Crying by Christian Hovda
Lyrics not found for 'Crying' by 'Christian Hovda'.
Fetching lyrics for: Capítulo 1.15 - los Cuervos del Vaticano - No Dramatizado by Eric Frattini
Lyrics not found for 'Capítulo 1.15 - los Cuervos del Vati

In [36]:
lyrics_df = non_empty_metadata_df[pd.isna(non_empty_metadata_df['lyrics']) == False]
lyrics_df = lyrics_df.drop(columns=['Unnamed: 0'])
print(f"Songs with lyrics avaliable: {len(lyrics_df)}")

Songs with lyrics avaliable: 126


In [38]:
## Let's save the lyrics in seperate files with names from auido_file so we can get then
## Then save the new lyrics_df (But removing the lyrics col) in a seperate csv for better retreval

## Ensure lyrics_df is a copy, not a view
lyrics_df = lyrics_df.copy()

for idx, row in lyrics_df.iterrows():
    audio_file = row['audio_file'].split(".")[0]
    lyrics = row['lyrics']
    
    with open(f"Data/Useable_data/lyrics/{audio_file}.txt", "w") as f:
        f.write(lyrics)
    
# Drop the 'lyrics'
lyrics_df = lyrics_df.drop('lyrics', axis=1)

# Reset the index
lyrics_df.reset_index(drop=True, inplace=True)

lyrics_df.to_csv("Data/Useable_data/songs_with_lyrics.csv", mode='a', header=None, index=False)

In [6]:
lyrics_df = pd.read_csv("./Data/Useable_data/songs_with_lyrics.csv")
lyrics_df.drop_duplicates(subset=['audio_file'], inplace=True)
lyrics_df.drop(columns=['label', 'release_date'], inplace=True)
lyrics_df.to_csv("./Data/Useable_data/songs_with_lyrics_updated.csv", index=False)

We got around `150 songs with lyrics`, there are some other songs we might be able to use:

- Instrumental songs, songs with no lyrics
- Re-try with other lyrics api
- Use shazam for manual metadata collection for some songs

# Data Preprocessing

We will need to preform 2 types of data pre-processing 

- **For Audio**: 
    - Convert audio files to mono and resample to 16kHz if needed for MuLan compatibility.
    - Segment the audio into clips

- **For Text**: 
    - Remove special characters, stop words, and tokenize the lyrics.
    - Store the processed text for embedding extraction.


## Embeddings

Since we can't get the MuLan Embeddings directly we will use [Musiclm_pytorch](https://github.com/lucidrains/musiclm-pytorch) an open source library 

In [14]:
import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /Users/user/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [16]:
input_folder = "./Data/Useable_data/lyrics"
output_folder = "./Data/Useable_data/lyrics_cleaned"

process_lyrics_folder(input_folder, output_folder)  

print(f"Lyrics processed and saved to {output_folder}")

Lyrics processed and saved to ./Data/Useable_data/lyrics_cleaned


In [6]:
lyrics_folder = './Data/Useable_data/lyrics_cleaned'
audio_folder = './Data/MEMD_audio'
lyrics_output = './Data/Embeddings/lyrics_embeddings.npy'
audio_output = './Data/Embeddings/audio_embeddings.npy'

# Process embeddings
process_all_embeddings(lyrics_folder, audio_folder, lyrics_output, audio_output)

Processing lyrics...
Lyrics embeddings saved to ./Data/Embeddings/lyrics_embeddings.npy
Processing audio...
spectrogram yielded shape of (513, 1409), but had to be cropped to (512, 1408) to be patchified for transformer
Successfully processed 746.mp3
Successfully processed 1588.mp3
Successfully processed 1563.mp3
Successfully processed 1205.mp3
Successfully processed 791.mp3
Successfully processed 949.mp3
Successfully processed 1211.mp3
Successfully processed 1577.mp3


: 

# Testing

In [9]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity

class EmbeddingAnalyzer:
    def __init__(self, lyrics_embeddings_path, audio_embeddings_path):
        """Load embeddings from numpy files and convert to proper format"""
        # Load embeddings
        self.lyrics_embeddings = np.load(lyrics_embeddings_path, allow_pickle=True)
        self.audio_embeddings = np.load(audio_embeddings_path, allow_pickle=True)
        
        # Convert lists to numpy arrays
        self.lyrics_embeddings = np.array(self.lyrics_embeddings)
        self.audio_embeddings = np.array(self.audio_embeddings)
        
        print(f"Original shapes:")
        print(f"Lyrics: {self.lyrics_embeddings.shape}")
        print(f"Audio: {self.audio_embeddings.shape}")
        
        # Verify data is valid
        if self.audio_embeddings.size == 0:
            raise ValueError("Audio embeddings are empty!")
        
        if len(self.lyrics_embeddings) != len(self.audio_embeddings):
            raise ValueError(f"Number of lyrics ({len(self.lyrics_embeddings)}) and audio ({len(self.audio_embeddings)}) embeddings don't match!")

    def analyze_alignment(self):
        """Analyze alignment between modalities"""
        # Compute similarity matrix
        similarity_matrix = cosine_similarity(self.lyrics_embeddings, self.audio_embeddings)
        
        # Get matching pairs (diagonal) similarities
        matching_similarities = np.diag(similarity_matrix)
        
        # Get non-matching pairs similarities
        mask = ~np.eye(similarity_matrix.shape[0], dtype=bool)
        non_matching_similarities = similarity_matrix[mask]
        
        # Calculate statistics
        stats = {
            'avg_matching_sim': np.mean(matching_similarities),
            'std_matching_sim': np.std(matching_similarities),
            'avg_nonmatching_sim': np.mean(non_matching_similarities),
            'std_nonmatching_sim': np.std(non_matching_similarities),
            'best_match_idx': np.argmax(matching_similarities),
            'worst_match_idx': np.argmin(matching_similarities)
        }
        
        return similarity_matrix, stats
    
    def visualize_alignment(self):
        """Create visualization plots"""
        similarity_matrix, stats = self.analyze_alignment()
        
        plt.figure(figsize=(15, 5))
        
        # 1. Similarity Distribution
        plt.subplot(1, 3, 1)
        matching_sims = np.diag(similarity_matrix)
        mask = ~np.eye(similarity_matrix.shape[0], dtype=bool)
        nonmatching_sims = similarity_matrix[mask]
        
        plt.hist([matching_sims, nonmatching_sims], 
                label=['Matching pairs', 'Non-matching pairs'],
                bins=20, alpha=0.6, density=True)
        plt.xlabel('Cosine Similarity')
        plt.ylabel('Density')
        plt.title('Distribution of Similarities')
        plt.legend()
        
        # 2. Similarity Matrix Heatmap
        plt.subplot(1, 3, 2)
        sns.heatmap(similarity_matrix, 
                   cmap='YlOrRd',
                   xticklabels=False, 
                   yticklabels=False)
        plt.title('Cross-modal Similarity Matrix')
        plt.xlabel('Audio Embeddings')
        plt.ylabel('Lyrics Embeddings')
        
        # 3. Alignment Scores
        plt.subplot(1, 3, 3)
        plt.plot(matching_sims, 'b.')
        plt.axhline(y=stats['avg_matching_sim'], 
                   color='r', 
                   linestyle='--', 
                   label=f'Mean: {stats["avg_matching_sim"]:.3f}')
        plt.xlabel('Song Index')
        plt.ylabel('Alignment Score')
        plt.title('Per-Song Alignment Scores')
        plt.legend()
        
        plt.tight_layout()
        plt.show()
        
        # Print statistics
        print("\nAlignment Statistics:")
        print(f"Number of songs: {len(matching_sims)}")
        print(f"\nMatching pairs similarity: {stats['avg_matching_sim']:.3f} ± {stats['std_matching_sim']:.3f}")
        print(f"Non-matching pairs similarity: {stats['avg_nonmatching_sim']:.3f} ± {stats['std_nonmatching_sim']:.3f}")
        
        # Calculate effectiveness
        effectiveness = (stats['avg_matching_sim'] - stats['avg_nonmatching_sim']) / stats['std_nonmatching_sim']
        print(f"\nAlignment effectiveness (σ above random): {effectiveness:.2f}")

# First, let's inspect the data
lyrics_data = np.load('./Data/Embeddings/lyrics_embeddings.npy', allow_pickle=True)
audio_data = np.load('./Data/Embeddings/audio_embeddings.npy', allow_pickle=True)

print("Checking data format:")
print("Lyrics data type:", type(lyrics_data))
print("Audio data type:", type(audio_data))
print("\nLyrics data length:", len(lyrics_data))
print("Audio data length:", len(audio_data))

if len(lyrics_data) > 0:
    print("\nSample lyrics embedding shape:", np.array(lyrics_data[0]).shape)
if len(audio_data) > 0:
    print("Sample audio embedding shape:", np.array(audio_data[0]).shape)

# Then run the analysis
analyzer = EmbeddingAnalyzer(
    lyrics_embeddings_path='./Data/Embeddings/lyrics_embeddings.npy',
    audio_embeddings_path='./Data/Embeddings/audio_embeddings.npy'
)

analyzer.visualize_alignment()

Checking data format:
Lyrics data type: <class 'numpy.ndarray'>
Audio data type: <class 'numpy.ndarray'>


TypeError: len() of unsized object