# Deprecated Aspects of Analysis

### Extracting Song Data Using Spotipy

To broaden the available data and areas for analysis, I wanted to incorporate another dataset that can be joined to the Award Show Winner dataset above. Therefore, I thought it would be suitable to gather characteristics about a song which would allow us to see what characteristics are shared amongst winning songs, the types of songs that are popular in certain seasons, etc.

#### Spotipy Authentication

To perform API calls, I need to authenticate myself to use the Spotify and access the dashboard for the needed keys.

In [None]:
import json
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

with open('.keys/spotipy.json', 'r') as file:
    data = json.load(file)

auth_manager = SpotifyClientCredentials(client_id=data["client_id"], client_secret=data["client_secret"])
sp = spotipy.Spotify(auth_manager=auth_manager)

In [None]:


data = sp.search('track:BonVoyage artist:Dreamcatcher year:2000-2023', limit=1, type=['track', 'artist'], market='KR')['tracks']['items']
print(f"{data[0]['artists'][0]['name']} - {data[0]['name']}: {data[0]['id']}\n")

Dreamcatcher - BONVOYAGE: 3Jnwl9zlbFNEqKQjydxLxe



In [None]:
uniq_tracks = list(set(all_show_df_winners["Search Query"]))
uniq_tracks

['track:Love artist:Monsta X year:2000-2022',
 'track:Way 4 Luv artist:Plave year:2000-2024',
 'track:Perfect Night artist:Le Sserafim year:2000-2023',
 'track:Deja Vu artist:Tomorrow X Together year:2000-2024',
 "track:Killin' Me Good artist:Jihyo year:2000-2023",
 'track:Be There For Me artist:NCT 127 year:2000-2024',
 'track:Lighthouse artist:Tempest year:2000-2024',
 'track:S-Class artist:Stray Kids year:2000-2023',
 'track:Cream Soda artist:Exo year:2000-2023',
 'track:Birthday artist:Red Velvet year:2000-2022',
 'track:Love Shhh! artist:Jo Yuri year:2000-2022',
 'track:Selfish artist:YooA year:2000-2022',
 'track:Eunoia artist:Billlie year:2000-2023',
 'track:Erase Me artist:Oneus year:2000-2023',
 'track:Loser artist:AB6IX year:2000-2023',
 'track:Hit the Floor artist:TripleS year:2000-2024',
 'track:Give Me That artist:WayV year:2000-2024',
 'track:Girls Never Die artist:TripleS year:2000-2024',
 'track:OMG artist:NewJeans year:2000-2023',
 'track:Whisper artist:The Boyz year:2

In [None]:
uniq_song_ids = {
    ## Song Query : song ID
}

In [None]:
def extract_song_ids(uniq_tracks, uniq_song_ids):
    for i in range(len(uniq_tracks)):
        if uniq_tracks[i] not in uniq_song_ids:
            print(f"Searching for: {uniq_tracks[i]}")
            data = sp.search(uniq_tracks[i], limit=1, type=['track', 'artist'], market='KR')['tracks']['items']
            print(f"{data[0]['artists'][0]['name']} - {data[0]['name']}: {data[0]['id']}\n")
            uniq_song_ids[uniq_tracks[i]] = data[0]['id']
    return uniq_song_ids

In [None]:
uniq_song_ids = extract_song_ids(uniq_tracks, uniq_song_ids)

Searching for: track:Love artist:Monsta X year:2000-2022
MONSTA X - Love Killa: 3sPju6MEJhk7Sz8dsRBkLQ

Searching for: track:Way 4 Luv artist:Plave year:2000-2024
PLAVE - WAY 4 LUV: 1T6xi2QrnmwaebXGvWAjLg

Searching for: track:Perfect Night artist:Le Sserafim year:2000-2023
LE SSERAFIM - Perfect Night: 74X2u8JMVooG2QbjRxXwR8

Searching for: track:Deja Vu artist:Tomorrow X Together year:2000-2024
TOMORROW X TOGETHER - Deja Vu: 3aAnwyBJY9OLNLqSgd4fZU

Searching for: track:Killin' Me Good artist:Jihyo year:2000-2023
JIHYO - Killin’ Me Good: 3gafQxekHAbM52PxdX9SDR

Searching for: track:Be There For Me artist:NCT 127 year:2000-2024
NCT 127 - Be There For Me: 1k5b4EAewkP3sqLWcCmWRQ

Searching for: track:Lighthouse artist:Tempest year:2000-2024
TEMPEST - LIGHTHOUSE: 6Zv31tXdpqXPuMXIT4Pq7p

Searching for: track:S-Class artist:Stray Kids year:2000-2023
Stray Kids - 특 S-Class: 54zRGA28tVRKRmFCpywWko

Searching for: track:Cream Soda artist:Exo year:2000-2023
EXO - Cream Soda: 42h7yc9Rda1IOMYLACVg

In [None]:
## This song is consistently searched incorrectly by the API and I made the decision
## to hard-code the value rather than altering and potentially breaking the query
uniq_song_ids["track:Love artist:Monsta X year:2000-2022"] = "0dLenhMYqqeTlHrZqcXkm6"
uniq_song_ids["track:Love artist:Monsta X year:2000-2022"]

'0dLenhMYqqeTlHrZqcXkm6'

In [None]:
import os
# Check if the CSV file already exists
csv_file_path = 'all_award_show_winners.csv'
if os.path.exists(csv_file_path):
    print(f'{csv_file_path} already exists. Loading the existing file.")
    all_show_df_winners = pd.read_csv(csv_file_path).drop('Unnamed: 0', axis=1)
else:
    print(f'{csv_file_path} does not exist. Proceeding with scraping and saving data.")
    # Include the scraping and saving logic here
    all_show_df_winners['Song ID'] = all_show_df_winners["Search Query"].map(uniq_song_ids)
    all_show_df_winners.to_csv(csv_file_path)

### Extracting Audio Features

At this point, we now have a fairly clean dataset along with the Spotify ID for each track. With this track id, we are going to create another dataset pertaining to the tracks audio features.

In [None]:
sp.audio_features("3LCwQoTrdQgHsGJE5gGVqx")

### **IMPORTANT**: Pivot to MusicXMatch

As of recent, the Spotify API has deprecated the endpoints for the `audio_features` and `audio_analysis` features. Therefore, I spent time to find an alternative which is MusicXMatch. This API has a similar endpoint to Spotify's `audio_features` endpoint and the additional capability of extracting a track's lyrics. 

To proceed this pivot to the alternative API, I'm going to import the previous dataframe and begin extracting information for each track. 

In [None]:
df_all_shows = pd.read_csv('all_award_show_winners.csv').drop("Unnamed: 0", axis=1)
df_all_shows.head(5)

Unnamed: 0,Episode,Date,Artist,Song,Points,Award Show,Search Query,Song ID
0,1121,"January 9, 2022",Ive,Eleven,8533,Inkigayo,track:Eleven artist:Ive year:2000-2022,7n2FZQsaLb7ZRfRPfEeIvr
1,1122,"January 16, 2022",Ive,Eleven,6583,Inkigayo,track:Eleven artist:Ive year:2000-2022,7n2FZQsaLb7ZRfRPfEeIvr
2,1123,"January 23, 2022",Ive,Eleven,5927,Inkigayo,track:Eleven artist:Ive year:2000-2022,7n2FZQsaLb7ZRfRPfEeIvr
3,1124,"January 30, 2022",Got the Beat,Step Back,5612,Inkigayo,track:Step Back artist:Got the Beat year:2000-...,3LCwQoTrdQgHsGJE5gGVqx
4,1125,"February 20, 2022",Got the Beat,Step Back,7224,Inkigayo,track:Step Back artist:Got the Beat year:2000-...,3LCwQoTrdQgHsGJE5gGVqx


*Minor edit that I missed: Snip the text with parentheses to match the other instances of Future Perfect*

In [None]:
df_all_shows.loc[df_all_shows.Song == "Future Perfect (Pass the MIC)", "Song"] = "Future Perfect"

In [None]:
uniq_artists = df_all_shows.Artist.unique().tolist()
uniq_songs = df_all_shows.Song.unique().tolist()
uniq_artists.sort(), uniq_songs.sort()
print(f"Artists: {uniq_artists}\nSongs: {uniq_songs}")

Artists: ['(G)I-dle', 'AB6IX', 'AKMU', 'Aespa', 'Apink', 'Astro', 'Ateez', 'BSS', 'BTS', 'BabyMonster', 'Baekhyun', 'Bibi', 'Big Bang', 'Billlie', 'Blackpink', 'BoyNextDoor', 'BtoB', 'CSR', 'Chungha', 'Cravity', 'DKZ', 'Day6', 'Doyoung', 'Dreamcatcher', 'Enhypen', 'Everglow', 'Evnne', 'Exo', 'Fromis 9', 'G-Dragon', 'Got the Beat', 'H1-Key', 'Han Seung-woo', 'Highlight', 'IU', 'Illit', 'Itzy', 'Ive', 'J-Hope&J. Cole', 'Jaechan', 'Jaehyun', 'Jennie', 'Jihyo', 'Jimin', 'Jin', 'Jisoo', 'Jo Yuri', 'Jung Kook', 'Kai', 'Kang Daniel', 'Kara', 'Kep1er', 'Key', 'Kim Jae-hwan', 'Kim Min-seok', 'Kim Woo Seok', 'Kiss of Life', 'Kwon Eunbi', 'Le Sserafim', 'Lee Chan-won', 'Lee Gi-kwang', 'Lee Young-ji', 'Lim Young-woong', 'Loona', 'Miyeon', 'Monsta X', 'N.SSign', 'NCT 127', 'NCT DoJaeJung', 'NCT Dream', 'NCT U', 'NCT Wish', 'Nayeon', 'NewJeans', 'NiziU', 'Nmixx', 'ONF', 'Oh My Girl', 'Oneus', 'Onew', 'P1Harmony', 'Pentagon', 'Plave', 'Psy', 'QWER', 'Red Velvet', 'Riize', 'Rosé&Bruno Mars', 'SF9', 'S

In [None]:
# If you need to make a high volume of requests, consider using proxies
import json
from musicxmatch_api import MusixMatchAPI
import urllib
api = MusixMatchAPI()

In [None]:
query_track = set(df_all_shows.Song.str.cat([df_all_shows['Artist'], df_all_shows['Song ID']], sep=" | ").tolist())

In [None]:
df_all_shows.loc[(df_all_shows.Artist == "NCT 127") & (df_all_shows.Song == "Be There For Me"), "Search Query"].tolist()

['track:Be There For Me artist:NCT 127 year:2000-2024',
 'track:Be There For Me artist:NCT 127 year:2000-2024']

### Gathering the Track IDs

In [None]:
import requests
import pandas as pd
import json
from time import sleep

# Load your DataFrame (Assuming it has 'artist' and 'track' columns)
df = pd.read_csv('all_award_show_winners.csv').drop("Unnamed: 0", axis=1) 

BASE_URL = "https://api.musixmatch.com/ws/1.1/"
APP_ID = "cf4d7395cd2d7f5618c4057426e93f26" 

results = []

for _, row in df.iterrows():
    artist = row["Artist"]
    track = row["Song"]
    
    query_url = f"{BASE_URL}matcher.track.get?apikey={APP_ID}&q_artist={artist}&q_track={track}&format=json"
    
    try:
        response = requests.get(query_url)
        data = response.json()
        
        if response.status_code == 200:
            results.append(data)
        else:
            print(f"Error {response.status_code}: {data}")

        sleep(1)  # Prevent hitting rate limits
        
    except Exception as e:
        print(f"Request failed for {artist} - {track}: {e}")

# Save results to a JSON file
with open("musixmatch_results.json", "w") as f:
    json.dump(results, f, indent=4)

print("Finished retrieving track data.")


Finished retrieving track data.


In [None]:
def extract_unique_artist_tracks(json_filename):
    """
    Reads a JSON file and extracts unique (artist, track) combinations.
    
    :param json_filename: Path to the JSON file containing Musixmatch results
    :return: Set of unique (artist, track) tuples
    """
    unique_tracks = set()

    try:
        # Load JSON data
        with open(json_filename, "r", encoding="utf-8") as file:
            data = json.load(file)

        # Ensure data is a list of dictionaries
        if not isinstance(data, list):
            raise ValueError(f"Expected JSON data to be a list, but got {type(data)}")

        # Extract artist-track pairs
        for entry in data:
            if not isinstance(entry, dict):
                print(f"Skipping invalid entry (not a dict): {entry}")
                continue

            try:
                if entry["message"]["body"] == "":
                    continue
                # Ensure the structure exists before accessing keys
                track_info = entry["message"]["body"]["track"]

                # Extract values
                artist_name = track_info["artist_name"].strip()
                track_name = track_info["track_name"].strip()

                # Debugging: Print extracted data
                print(f"Extracted: Artist = {artist_name}, Track = {track_name}")

                # Add to set if both values exist
                if artist_name and track_name:
                    unique_tracks.add((artist_name, track_name))

            except KeyError as e:
                print(f"Skipping entry due to missing key: {e}")

    except json.JSONDecodeError:
        print("Error: Invalid JSON format in file.")
    except FileNotFoundError:
        print(f"Error: File {json_filename} not found.")
    except Exception as e:
        print(f"Unexpected error: {e}")

    return unique_tracks

json_file = "musixmatch_results.json"
unique_combinations = extract_unique_artist_tracks(json_file)

Extracted: Artist = IVE, Track = ELEVEN
Extracted: Artist = IVE, Track = ELEVEN
Extracted: Artist = IVE, Track = ELEVEN
Extracted: Artist = GOT the beat, Track = Step Back
Extracted: Artist = GOT the beat, Track = Step Back
Extracted: Artist = TAEYEON, Track = INVU
Extracted: Artist = TAEYEON, Track = INVU
Extracted: Artist = TAEYEON, Track = INVU
Extracted: Artist = Kim MinSeok, Track = DrunKen Confession
Extracted: Artist = (G)I-DLE, Track = TOMBOY
Extracted: Artist = NCT DREAM, Track = Glitch Mode
Extracted: Artist = BIGBANG, Track = Still Life
Extracted: Artist = BIGBANG, Track = Still Life
Extracted: Artist = BIGBANG, Track = Still Life
Extracted: Artist = IVE, Track = LOVE DIVE
Extracted: Artist = PSY feat. SUGA, Track = That That (prod.&feat. SUGA of BTS)
Extracted: Artist = PSY feat. SUGA, Track = That That (prod.&feat. SUGA of BTS)
Extracted: Artist = PSY feat. SUGA, Track = That That (prod.&feat. SUGA of BTS)
Extracted: Artist = (G)I-DLE, Track = TOMBOY
Extracted: Artist = NC

In [None]:
def extract_all_artist_tracks(json_filename):
    """
    Reads a JSON file and extracts all of the song ids to merge it with our original data frame.
    
    :param json_filename: Path to the JSON file containing Musixmatch results
    :return: Set of unique (artist, track) tuples
    """
    tracks = list()

    try:
        # Load JSON data
        with open(json_filename, "r", encoding="utf-8") as file:
            data = json.load(file)

        # Ensure data is a list of dictionaries
        if not isinstance(data, list):
            raise ValueError(f"Expected JSON data to be a list, but got {type(data)}")

        # Extract artist-track pairs
        for entry in data:
            if not isinstance(entry, dict):
                print(f"Skipping invalid entry (not a dict): {entry}")
                continue

            try:
                if entry["message"]["body"] == "":
                    tracks.append("Not Found")
                # Ensure the structure exists before accessing keys
                track_info = entry["message"]["body"]["track"]

                # Extract values
                track_id = track_info["track_id"].strip()
                track_name = track_info["track_name"].strip()

                # Debugging: Print extracted data
                print(f"Extracted: Artist = {artist_name}, Track = {track_name}")

                # Add to set if both values exist
                if artist_name and track_name:
                    unique_tracks.add((artist_name, track_name))

            except KeyError as e:
                print(f"Skipping entry due to missing key: {e}")

    except json.JSONDecodeError:
        print("Error: Invalid JSON format in file.")
    except FileNotFoundError:
        print(f"Error: File {json_filename} not found.")
    except Exception as e:
        print(f"Unexpected error: {e}")

    return unique_tracks

json_file = "musixmatch_results.json"
unique_combinations = extract_unique_artist_tracks(json_file)

In [None]:
unique_df_tracks = set(zip(df["Artist"], df["Song"]))

In [None]:
def normalize_and_print_side_by_side(list1, list2, header1="DataFrame Tracks", header2="JSON Tracks", spacing=40):
    """
    Prints two lists side by side with aligned formatting after normalizing (lowercasing) and sorting.
    
    :param list1: First list (e.g., DataFrame unique tracks)
    :param list2: Second list (e.g., JSON unique tracks)
    :param header1: Title for first column
    :param header2: Title for second column
    :param spacing: Width allocated for each column
    """
    # Normalize to lowercase and sort
    list1 = sorted({(artist.lower(), track.lower()) for artist, track in list1})
    list2 = sorted({(artist.lower(), track.lower()) for artist, track in list2})

    # Print headers
    print(f"{header1.ljust(spacing)} {header2}")
    print("=" * (spacing * 2))

    # Print items side by side
    for item1, item2 in zip(list1, list2):
        print(f"{str(item1).ljust(spacing)} {str(item2)}")

    # Handle cases where lists have different lengths
    longer_list = list1 if len(list1) > len(list2) else list2
    extra_items = longer_list[len(list1):] if len(list1) < len(list2) else longer_list[len(list2):]

    for item in extra_items:
        print(f"{str(item).ljust(spacing)} -") if list1 is longer_list else print(f"{'-'.ljust(spacing)} {str(item)}")

# Example usage
normalize_and_print_side_by_side(unique_df_tracks, unique_combinations)


DataFrame Tracks                         JSON Tracks
('(g)i-dle', 'fate')                     ('(g)i-dle', 'fate')
('(g)i-dle', 'klaxon')                   ('(g)i-dle', 'klaxon')
('(g)i-dle', 'nxde')                     ('(g)i-dle', 'nxde')
('(g)i-dle', 'queencard')                ('(g)i-dle', 'queencard')
('(g)i-dle', 'super lady')               ('(g)i-dle', 'super lady')
('(g)i-dle', 'tomboy')                   ('(g)i-dle', 'tomboy')
('ab6ix', 'loser')                       ('ab6ix', 'loser')
('aespa', 'armageddon')                  ('aespa', 'armageddon')
('aespa', 'drama')                       ('aespa', 'drama')
('aespa', 'girls')                       ('aespa', 'girls')
('aespa', 'spicy')                       ('aespa', 'spicy')
('aespa', 'supernova')                   ('aespa', 'supernova')
('aespa', 'up')                          ('aespa', 'up (karina solo)')
('aespa', 'whiplash')                    ('aespa', 'whiplash')
('akmu', 'love lee')                     ('akmu', 'love l

**Wrong Search**
- Seventeen - _World
- Seventeen - love, money, fame is a remix
- TXT - Good Boy Gone Bad is a remix
- Jihyo - Killin' Me Good is english ver
- TXT - Chasing That Feeling found a song by viva
- Jhope Jcole song is just the solo
- NCT - Songbird is the Japanese version

**404 error**
- Le Sserafim - Eve Psyche & Bluebeards Wife
- Wheein - Make Me Happy **Doesn't appear to be in their catalog**
- TXT - Sugar Rush Ride
- CSR - ♡Ticon
- Almost all of Fromis9 songs 3 (Songs)
- lee gi-kwang - predator
- Yooa - Selfish

Insights Findings:
- Wheein appears as Whee In
- TXT can be found under TOMORROW X TOGETHER, which fixes all of their issues
- ♡Ticon => LOVETICON
- Fromis_9 songs have difficulty finding despite being a verified artist. Therefore, opting to use their artist id for their queries
- Lee Gi-Kwang => LEEGIKWANG
- Yooa => 유아
- SEVENTEEN - _WORLD cannot be via queries so I found its trackID: 242182341
- Most of the queries can be resolved using the track search endpoint instead of the matcher endpoint
- Jihyo => Ji Hyo
- Songbird => Songbird (Korean Version)

In [None]:
import requests
import urllib.parse

track = "Songbird (Korean)"
artist = "NCT"
APP_ID = "cf4d7395cd2d7f5618c4057426e93f26"
BASE_URL = "https://api.musixmatch.com/ws/1.1/"

## Matcher Query
query_url = f"{BASE_URL}matcher.track.get?apikey={APP_ID}&q_artist={artist}&q_track={track}&format=json"

## Track Search Query
query_url = f"{BASE_URL}track.search?apikey={APP_ID}&q_artist={artist}&q_track={track}&page_size=2&page=1&s_track_rating=desc&format=json"

## Find artist
# query_url = f"{BASE_URL}artist.search?apikey={APP_ID}&q_artist={artist}&page_size=1"

## Test Query
# query_url = f"{BASE_URL}track.search?apikey={APP_ID}&f_artist_id=34857912&q_track=menow&page_size=1&page=1&format=json"
# query_url = f"{BASE_URL}track.search?apikey={APP_ID}&f_artist_id=31899946&f_album_id=53067298&q_track=Circles&page_size=2&page=1&format=json"
# query_url = f"{BASE_URL}album.tracks.get?apikey={APP_ID}&album_id=53067298&page_size=2&page=1"

# Get mood values
query_url = f"{BASE_URL}track.lyrics.mood.get?apikey={APP_ID}&commontrack_id=5920049"



search = requests.get(query_url).json()
print(json.dumps(search, indent=4))

{
    "message": {
        "header": {
            "status_code": 401,
            "execute_time": 0.01217794418335,
            "hint": "moods not enabled on this plan"
        },
        "body": []
    }
}


In [None]:
import requests

track = "Eve, Psyche & The Bluebeard's wife"
artist = "LE SSERAFIM"
APP_ID = "cf4d7395cd2d7f5618c4057426e93f26"
BASE_URL = "https://api.musixmatch.com/ws/1.1/"

## Matcher Query
# query_url = f"{BASE_URL}matcher.track.get?apikey={APP_ID}&q_artist={artist}&q_track={track}&format=json"

## Track Search Query
query_url = f"{BASE_URL}track.search?apikey={APP_ID}&q_artist={artist}&q_track={track}&format=json&page_size=1"

search = requests.get(query_url).json()
print(json.dumps(search, indent=4))

{
    "message": {
        "header": {
            "status_code": 200,
            "execute_time": 0.025938987731934,
            "available": 5
        },
        "body": {
            "track_list": [
                {
                    "track": {
                        "track_id": 256226625,
                        "track_name": "Eve, Psyche & The Bluebeard\u2019s wife",
                        "track_name_translation_list": [],
                        "track_rating": 71,
                        "commontrack_id": 159108270,
                        "instrumental": 0,
                        "explicit": 1,
                        "has_lyrics": 1,
                        "has_subtitles": 1,
                        "has_richsync": 1,
                        "num_favourite": 34,
                        "album_id": 57090328,
                        "album_name": "UNFORGIVEN",
                        "artist_id": 53333048,
                        "artist_name": "LE SSERAFIM",
           