# Spotify Metadata and Audio Feature Extraction

This notebook processes a collection of MP3 files to extract metadata, search for corresponding tracks on Spotify, and retrieve audio features using the Spotify Web API. It demonstrates the integration of local file analysis with cloud-based music data services to enrich a music dataset.

## Import Libraries

In [1]:
import os
import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
import json
from mutagen.mp3 import MP3
from mutagen.easyid3 import EasyID3
from tqdm import tqdm
from glob import glob

## Initialize Spotify Client

The Spotify client is initialized with credentials stored in a JSON file. These credentials include the `client_id` and `client_secret` necessary for accessing the Spotify Web API. The `SpotifyClientCredentials` manager handles the OAuth 2.0 flow for server-to-server authentication.

Using any text editor, create a new text file containing the following information. Save the text file as "spotify_credentials.json" and place in '../data/reference' folder. 

```
{
  "client_id":"REPLACE WITH CLIENT ID",
  "client_secret":"REPLACE WITH SECRET",
  "user_id":"REPLACE WITH SPOTIFY USERNAME"
}
```
Retrieve your client ID and secret here: https://developer.spotify.com/dashboard/

And your username here: https://www.spotify.com/us/account/overview/

Add "http://localhost:8080/callback" to Spotify redirect_uri

In [2]:
# Load the credentials
credentials_path = r'../data/reference/spotify_credentials.json' # Path to your spotify_credentials.json file

with open(credentials_path, 'r') as file:
    creds = json.load(file)

# Initialize the Spotify client credentials manager with your client_id and client_secret
auth_manager = SpotifyClientCredentials(client_id=creds['client_id'], client_secret=creds['client_secret'])

# Initialize the Spotipy client with the credentials manager
sp = spotipy.Spotify(auth_manager=auth_manager)

### Functions

In [3]:
def parse_mp3_metadata(file_path):
    """
    Reads MP3 metadata from a given file path.

    This function extracts the track's title and artist(s) from an MP3 file's metadata.
    It handles multiple artists separated by commas or slashes and cleans up the title
    by removing any '(feat.)' annotations.

    Parameters:
    - file_path (str): The file path of the MP3 file to read metadata from.

    Returns:
    - tuple: A tuple containing the track's title and a list of artist names.
             If an error occurs, it returns None for the track title and an empty list for artists.
    """
    try:
        audiofile = MP3(file_path, ID3=EasyID3)
        track_title = audiofile.get('title', [None])[0]
        artist_string = audiofile.get('artist', [None])[0]
        
        # Split artists by comma and then by slash
        artists = []
        if artist_string:
            artist_parts = artist_string.split(',')
            for part in artist_parts:
                artists.extend(part.split('/'))
        
        # Remove any leading and trailing spaces from artist names
        artists = [artist.strip() for artist in artists]
        
        # Clean up the track title
        track_title = track_title.split(' (feat.')[0].strip() if track_title else None
        
        return track_title, artists
    except Exception as e:
        print(f"Error reading metadata for {file_path}: {e}")
        return None, []


def get_ids_and_genre(track_title, artists, sp):
    """
    Searches for a track's ID and associated genres on Spotify using a track's title and artists.

    The function performs a search on Spotify for each artist combined with the track's title.
    It returns the first match's track ID, artist IDs, and a list of unique genres associated
    with all artists. The search stops once a match is found.

    Parameters:
    - track_title (str): The title of the track to search for.
    - artists (list): A list of artist names associated with the track.
    - sp (Spotify client): An authenticated instance of the Spotify client to perform API calls.

    Returns:
    - tuple: A tuple containing a list of artist IDs, a list of unique genres, and the track's ID.
             If no match is found, it returns None for artist IDs and track ID, and an empty list for genres.
    """
    unique_genres = set()  # Use a set to automatically discard duplicates
    track_id, artist_ids = None, None

    for artist in artists:
        # Format the search query
        query = f"track:{track_title} artist:{artist.strip()}"

        # Search for tracks
        track_results = sp.search(q=query, type='track', limit=1)
        if track_results['tracks']['items']:
            track_item = track_results['tracks']['items'][0]
            track_id = track_item['id']
            artist_ids = [artist['id'] for artist in track_item['artists']]

            # Get genres from the track's artists
            for artist_id in artist_ids:
                artist_genre = sp.artist(artist_id)['genres']
                unique_genres.update(artist_genre)

            # Since we found the track, we don't need to continue searching
            break

    # Return the results
    if track_id and artist_ids:
        return artist_ids, list(unique_genres), track_id
    else:
        return None, None, []  # Return None and an empty list if no match was found


def get_spotify_audio_features(artist_ids, genre_list, track_id, sp):
    """
    Retrieves audio features for a given track ID from Spotify and returns them in a DataFrame.

    This function calls the Spotify API to get audio features for a specific track ID.
    It then creates a pandas DataFrame with these audio features, along with additional
    columns for the track ID, artist IDs, and genres. Certain unnecessary columns are dropped
    from the DataFrame before it is returned.

    Parameters:
    - artist_ids (list): A list of Spotify artist IDs.
    - genre_list (list): A list of genres associated with the track.
    - track_id (str): The Spotify track ID for which to retrieve audio features.
    - sp (Spotify client): An authenticated instance of the Spotify client to perform API calls.

    Returns:
    - DataFrame: A pandas DataFrame containing the audio features, track ID, artist IDs, and genres.
                 If no audio features are found, an empty DataFrame with the specified columns is returned.
    """
    
    # Retrieve audio features for the given track_id using the Spotify API
    audio_features = sp.audio_features(track_id)[0]
    
    # If the audio_features is not None, create a DataFrame
    if audio_features:
        # Create a DataFrame from the audio features dictionary
        df = pd.DataFrame([audio_features])
        
        # Add track_id, artist_ids, and genre_list as additional columns
        df['track_id'] = track_id
        df['artist_ids'] = [artist_ids]  # Enclose in a list to ensure it's treated as a single column value
        df['genre_list'] = [genre_list]  # Enclose in a list to ensure it's treated as a single column value
        
        # Drop specified columns from the DataFrame
        columns_to_drop = ['type', 'id', 'uri', 'track_href', 'analysis_url', 'duration_ms']
        df.drop(columns=columns_to_drop, inplace=True)
        
    else:
        # If audio_features is None, create an empty DataFrame with the columns set to None
        df = pd.DataFrame({
            'track_id': [track_id],
            'artist_ids': [artist_ids],
            'genre_list': [genre_list],
            # Additional keys can be set to None if needed, depending on the expected audio feature keys
        })
        # Ensure the columns to be dropped are also reflected in the empty DataFrame
        for column in ['type', 'id', 'uri', 'track_href', 'analysis_url', 'duration_ms']:
            df[column] = None
        df.drop(columns=columns_to_drop, inplace=True)

    return df


def main(sp, directory):
    """
    Processes MP3 files in a given directory, extracting metadata, searching for Spotify track and artist IDs and genres,
    and retrieving audio features to generate a comprehensive DataFrame.

    Parameters:
    - sp (SpotifyClientCredentials): An authenticated instance of the Spotify client to perform API calls.
    - directory (str): The directory path containing MP3 files to be processed.

    Returns:
    - DataFrame: A pandas DataFrame containing Spotify audio features and metadata for all processed MP3 files.
    """

    # Initialize an empty DataFrame to store all tracks' info
    all_tracks_df = pd.DataFrame()
    
    # Get a list of all MP3 files in the directory
    mp3_files = glob(os.path.join(directory, '*.mp3'))

    # Start processing each MP3 file
    for index, file_path in enumerate(mp3_files, start=1):
        print(f"Processing file {index} of {len(mp3_files)}: {os.path.basename(file_path)}")

        # Parse MP3 metadata (Assuming 'parse_mp3_metadata' is a defined function)
        track_title, artists = parse_mp3_metadata(file_path)

        # Continue only if metadata was successfully parsed
        if track_title and artists:
            # Search for Spotify track and artist IDs, and genres
            artist_ids, genres, track_id = get_ids_and_genre(track_title, artists, sp)

            # Continue only if a Spotify match was found
            if track_id and artist_ids:
                # Retrieve Spotify audio features (Assuming 'get_spotify_audio_features' is a defined function)
                audio_features_df = get_spotify_audio_features(artist_ids, genres, track_id, sp)

                # Continue only if audio features were successfully retrieved
                if not audio_features_df.empty:
                    # Add track_name, artist_names, and filename columns
                    audio_features_df['track_name'] = track_title
                    audio_features_df['artist_names'] = ', '.join(artists)
                    audio_features_df['filename'] = os.path.basename(file_path)

                    # Append this track's DataFrame to the main DataFrame
                    all_tracks_df = pd.concat([all_tracks_df, audio_features_df], ignore_index=True)

    return all_tracks_df

In [6]:
directory = r'../data/audio_files/raw_download'
tracks_df = main(sp, directory)

Processing file 1 of 364: 88rising, BIBI - The Weekend.mp3
Processing file 2 of 364: AC Slater - Bass Inside.mp3
Processing file 3 of 364: AC Slater, Chris Lorenzo - Fly Kicks - Wax Motif Remix.mp3
Processing file 4 of 364: AIKA, TOFIE - Space Girl.mp3
Processing file 5 of 364: Alesso, Nico & Vinz - I Wanna Know - Alesso & Deniz Koyu Remix.mp3
Processing file 6 of 364: Alina Baraz - Between Us.mp3
Processing file 7 of 364: Allie X, Mitski - Susie Save Your Love.mp3
Processing file 8 of 364: Anamanaguchi, HANA - On My Own.mp3
Processing file 9 of 364: Andrea Chahayed - Belong.mp3
Processing file 10 of 364: Ark Patrol, Veronika Redd - Let Go - Phantoms Remix.mp3
Processing file 11 of 364: Audien - Escape (with Nomra).mp3
Processing file 12 of 364: Audien - See You On The Other Side [Upmost Remix].mp3
Processing file 13 of 364: Audien - See You On The Other Side.mp3
Processing file 14 of 364: Audien, Cate Downey - Wish It Was You.mp3
Processing file 15 of 364: Audien, Lady A - Something B

Processing file 125 of 364: Isaiah Rashad - Wat U Sed (feat. Doechii & Kal Banx).mp3
Processing file 126 of 364: Jenevieve - Rendezvous.mp3
Processing file 127 of 364: JNTHN STEIN - Jam.mp3
Processing file 128 of 364: Jo Hill - HONEYMOON.mp3
Processing file 129 of 364: jomm, readyaimfire27 - Come and Go.mp3
Processing file 130 of 364: Jonas Aden, RebMoe - I Don't Speak French (Adieu).mp3
Processing file 131 of 364: Josie Man - jealous in my dreams.mp3
Processing file 132 of 364: JOYRYDE - 4AM.mp3
Processing file 133 of 364: Kaivon - I Feel It.mp3
Processing file 134 of 364: Kaskade - Hot Wheels.mp3
Processing file 135 of 364: Kaskade, Too Many Zooz - Jorts FTW.mp3
Processing file 136 of 364: KATIE - Faux.mp3
Processing file 137 of 364: kd, KOTONOHOUSE, RANASOL - IN YOUR EYES 「瞳の中で」.mp3
Processing file 138 of 364: keshi - us.mp3
Processing file 139 of 364: Khakii, Colde - BASS.mp3
Processing file 140 of 364: Kill Paris - Red Lights (feat. Dotter) [Just A Gent Remix].mp3
Processing file 

Processing file 257 of 364: Rome in Silver - Fool.mp3
Processing file 258 of 364: Rome in Silver - Inferno.mp3
Processing file 259 of 364: Rome in Silver - Tempest.mp3
Processing file 260 of 364: Rome in Silver - Think About It.mp3
Processing file 261 of 364: Rome in Silver - Touch.mp3
Processing file 262 of 364: Rome in Silver - Waiting....mp3
Processing file 263 of 364: Rome in Silver - Yoko.mp3
Processing file 264 of 364: Rome in Silver - You Don't Know (what it's like).mp3
Processing file 265 of 364: Rome in Silver, chae - Friends.mp3
Processing file 266 of 364: ROSÉ - On The Ground.mp3
Processing file 267 of 364: SACHI, Naïka - Enchanté - Phantoms Remix.mp3
Processing file 268 of 364: Saint Punk - Howl.mp3
Processing file 269 of 364: Saint Punk - MAD.mp3
Processing file 270 of 364: Saint Punk - Menace.mp3
Processing file 271 of 364: Saint Punk, J - 911.mp3
Processing file 272 of 364: salute - Jennifer - Extended.mp3
Processing file 273 of 364: Samia - Minnesota - MICHELLE Remix.mp

In [8]:
tracks_df

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature,track_id,artist_ids,genre_list,track_name,artist_names,filename
0,0.784,0.521,1,-5.701,1,0.0322,0.06200,0.000005,0.0995,0.817,101.491,4,5q3LwAHTqo9d3rET2EA9Nq,"[1AhjOkOLkbHUfcHDSErXQs, 6UbmqUEgjLA6jAcXwbM1Z9]","[k-pop, korean r&b, asian american hip hop]",The Weekend,"88rising, BIBI","88rising, BIBI - The Weekend.mp3"
1,0.905,0.838,6,-6.838,1,0.0499,0.00112,0.839000,0.6080,0.464,126.007,4,0oUTyBEY3NQzmu2VuoKpSH,[6EqFMCnVGBRNmwPlk2f3Uc],"[fidget house, brostep, electro house, bass ho...",Bass Inside,AC Slater,AC Slater - Bass Inside.mp3
2,0.897,0.692,11,-4.985,0,0.0492,0.01870,0.725000,0.0603,0.607,125.023,4,26xlCgHpcF4GE0FFVsS6Oy,"[6EqFMCnVGBRNmwPlk2f3Uc, 7tm9Tuc70geXOOyKhtZHI...","[fidget house, house, tech house, bass house, ...",Fly Kicks - Wax Motif Remix,"AC Slater, Chris Lorenzo, Wax Motif","AC Slater, Chris Lorenzo - Fly Kicks - Wax Mot..."
3,0.724,0.792,1,-3.332,0,0.1020,0.01140,0.000008,0.6250,0.559,109.997,4,2PAr2vrUwJs6QqRxDv0mP8,"[1NfT4THLhxNASM4xVImfNg, 4pnrk5QCXeKifcD98STB24]","[kawaii future bass, kawaii edm, future bass]",Space Girl,"AIKA, TOFIE","AIKA, TOFIE - Space Girl.mp3"
4,0.710,0.729,2,-4.978,1,0.0539,0.07430,0.000004,0.0419,0.532,124.995,4,38TlYe1miTAL6r1FyJj7PS,"[4AVFqumd2ogHFlRbKIjp1t, 0awl5piYwO0CDTHEkCjUh...","[progressive electro house, pop, dutch house, ...",I Wanna Know - Alesso & Deniz Koyu Remix,"Alesso, Nico & Vinz, Deniz Koyu","Alesso, Nico & Vinz - I Wanna Know - Alesso & ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
348,0.579,0.759,2,-5.584,0,0.0386,0.06000,0.000138,0.1170,0.127,127.875,4,0qlAureWHln3vDCsqnkyRE,[2qxJFvFYMEDqd7ui6kSAcq],"[pop, edm, complextro, german techno, pop dance]",Done With Love,Zedd,Zedd - Done With Love.mp3
349,0.583,0.860,8,-4.347,1,0.0842,0.02310,0.000001,0.0977,0.279,127.032,4,1Aw0wPIW70GsV9KoD8m7Ps,[2qxJFvFYMEDqd7ui6kSAcq],"[pop, edm, complextro, german techno, pop dance]",Straight Into The Fire,Zedd,Zedd - Straight Into The Fire.mp3
350,0.572,0.864,2,-4.098,0,0.0477,0.02150,0.000000,0.3750,0.260,149.945,4,0NWQTyapmz4GuDTSN9xTB7,"[2qxJFvFYMEDqd7ui6kSAcq, 0id62QV2SZZfvBn9xpmuCl]","[pop soul, pop, r&b, edm, complextro, german t...",Candyman,"Zedd, Aloe Blacc","Zedd, Aloe Blacc - Candyman.mp3"
351,0.509,0.781,8,-3.480,1,0.0720,0.03980,0.000000,0.0749,0.176,128.000,4,60wwxj6Dd9NJlirf84wr2c,"[2qxJFvFYMEDqd7ui6kSAcq, 7qRll6DYV06u2VuRPAVqug]","[uk pop, pop, metropopolis, edm, complextro, g...",Clarity,"Zedd, Foxes","Zedd, Foxes - Clarity.mp3"


In [9]:
tracks_df.to_csv(r'../data/dataframes/spotify_metadata.csv', index=False)

In [10]:
tracks_df

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,time_signature,track_id,artist_ids,genre_list,track_name,artist_names,filename
0,0.784,0.521,1,-5.701,1,0.0322,0.06200,0.000005,0.0995,0.817,101.491,4,5q3LwAHTqo9d3rET2EA9Nq,"[1AhjOkOLkbHUfcHDSErXQs, 6UbmqUEgjLA6jAcXwbM1Z9]","[k-pop, korean r&b, asian american hip hop]",The Weekend,"88rising, BIBI","88rising, BIBI - The Weekend.mp3"
1,0.905,0.838,6,-6.838,1,0.0499,0.00112,0.839000,0.6080,0.464,126.007,4,0oUTyBEY3NQzmu2VuoKpSH,[6EqFMCnVGBRNmwPlk2f3Uc],"[fidget house, brostep, electro house, bass ho...",Bass Inside,AC Slater,AC Slater - Bass Inside.mp3
2,0.897,0.692,11,-4.985,0,0.0492,0.01870,0.725000,0.0603,0.607,125.023,4,26xlCgHpcF4GE0FFVsS6Oy,"[6EqFMCnVGBRNmwPlk2f3Uc, 7tm9Tuc70geXOOyKhtZHI...","[fidget house, house, tech house, bass house, ...",Fly Kicks - Wax Motif Remix,"AC Slater, Chris Lorenzo, Wax Motif","AC Slater, Chris Lorenzo - Fly Kicks - Wax Mot..."
3,0.724,0.792,1,-3.332,0,0.1020,0.01140,0.000008,0.6250,0.559,109.997,4,2PAr2vrUwJs6QqRxDv0mP8,"[1NfT4THLhxNASM4xVImfNg, 4pnrk5QCXeKifcD98STB24]","[kawaii future bass, kawaii edm, future bass]",Space Girl,"AIKA, TOFIE","AIKA, TOFIE - Space Girl.mp3"
4,0.710,0.729,2,-4.978,1,0.0539,0.07430,0.000004,0.0419,0.532,124.995,4,38TlYe1miTAL6r1FyJj7PS,"[4AVFqumd2ogHFlRbKIjp1t, 0awl5piYwO0CDTHEkCjUh...","[progressive electro house, pop, dutch house, ...",I Wanna Know - Alesso & Deniz Koyu Remix,"Alesso, Nico & Vinz, Deniz Koyu","Alesso, Nico & Vinz - I Wanna Know - Alesso & ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
348,0.579,0.759,2,-5.584,0,0.0386,0.06000,0.000138,0.1170,0.127,127.875,4,0qlAureWHln3vDCsqnkyRE,[2qxJFvFYMEDqd7ui6kSAcq],"[pop, edm, complextro, german techno, pop dance]",Done With Love,Zedd,Zedd - Done With Love.mp3
349,0.583,0.860,8,-4.347,1,0.0842,0.02310,0.000001,0.0977,0.279,127.032,4,1Aw0wPIW70GsV9KoD8m7Ps,[2qxJFvFYMEDqd7ui6kSAcq],"[pop, edm, complextro, german techno, pop dance]",Straight Into The Fire,Zedd,Zedd - Straight Into The Fire.mp3
350,0.572,0.864,2,-4.098,0,0.0477,0.02150,0.000000,0.3750,0.260,149.945,4,0NWQTyapmz4GuDTSN9xTB7,"[2qxJFvFYMEDqd7ui6kSAcq, 0id62QV2SZZfvBn9xpmuCl]","[pop soul, pop, r&b, edm, complextro, german t...",Candyman,"Zedd, Aloe Blacc","Zedd, Aloe Blacc - Candyman.mp3"
351,0.509,0.781,8,-3.480,1,0.0720,0.03980,0.000000,0.0749,0.176,128.000,4,60wwxj6Dd9NJlirf84wr2c,"[2qxJFvFYMEDqd7ui6kSAcq, 7qRll6DYV06u2VuRPAVqug]","[uk pop, pop, metropopolis, edm, complextro, g...",Clarity,"Zedd, Foxes","Zedd, Foxes - Clarity.mp3"
