# Genre Tag Comparison for each API source
There APIs have been choosen to extract genre tag data from: AudioDB, SpotifyAPI and LastFM
| API Source    | Documentation Link | Genre Information |
|--------------|--------------------|-------------------|
| Spotify API  | [Spotify for Developers](https://developer.spotify.com/) | Genre is associated to the artist of the track |
| Last.fm      | [Last.fm API Docs](https://www.last.fm/api) | Genre tags are associated to the track but are inconsistent |
| AudioDB      | [TheAudioDB Free Music API Guide](https://www.theaudiodb.com/) | Genre tag associated to the track, more consistent, but a lot of NaN values |


## First lets inspect a test chart dataset
This is dataset from charts.spotify.com, that contains 200 rows for the top 200 tracks from GB for the week starting in 2024-01-04.

In [None]:
import pandas as pd
import json

In [2]:
week_chart = pd.read_csv('regional-gb-weekly-2024-01-04.csv')
week_chart.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   rank            200 non-null    int64 
 1   uri             200 non-null    object
 2   artist_names    200 non-null    object
 3   track_name      200 non-null    object
 4   source          200 non-null    object
 5   peak_rank       200 non-null    int64 
 6   previous_rank   200 non-null    int64 
 7   weeks_on_chart  200 non-null    int64 
 8   streams         200 non-null    int64 
dtypes: int64(5), object(4)
memory usage: 14.2+ KB


## Start cleaning the dataset to better compare API results

In [4]:
# Select important columns
week_chart_genres = week_chart[['rank','track_name','artist_names']]
week_chart_genres.tail(5)

Unnamed: 0,rank,track_name,artist_names
195,196,I Wonder,Kanye West
196,197,Wet Dreamz,J. Cole
197,198,Wish You The Best,Lewis Capaldi
198,199,Fortunate Son,Creedence Clearwater Revival
199,200,Karma,Taylor Swift


In [21]:
# Load the data extracted from the APIs (test_AddGenreToTrack.py is used for this)
with open('test_AudioDB_genres.json', 'r') as file:
    test_AudioDB_genres = json.load(file)

with open('test_Spotify_genres.json', 'r') as file:
    test_Spotify_genres = json.load(file)

with open('test_LastFM_genres.json', 'r') as file:
    test_LastFM_genres = json.load(file)

with open('final_genre_tags.json', 'r') as file:
    final_genre_tags = json.load(file)
    

In [38]:
# Build Genre Dataset
SpotifyAPI_genre_df = pd.DataFrame(test_Spotify_genres)
AudioDB_genre_df = pd.DataFrame(test_AudioDB_genres)
LastFM_genre_df = pd.DataFrame(test_LastFM_genres)
Final_genre_df = pd.DataFrame(final_genre_tags)

# Merge Spotify genres
chart_genre_merge1 = pd.merge(week_chart_genres,SpotifyAPI_genre_df,left_on=['track_name', 'artist_names'],right_on=['song_name', 'artist'],how='inner')
chart_genre_merge1.rename(columns={'genre': 'Spotify_Genre'}, inplace=True)
chart_genre_merge1.drop(columns=['song_name', 'artist'], inplace=True)

# Merge AudioDB genres
chart_genre_merge2 = pd.merge(chart_genre_merge1,AudioDB_genre_df,left_on=['track_name', 'artist_names'],right_on=['song_name', 'artist'],how='inner')
chart_genre_merge2.rename(columns={'genre': 'AudioDB_Genre'}, inplace=True)
chart_genre_merge2.drop(columns=['song_name', 'artist'], inplace=True)

# Merge LastFM genres
chart_genre_merge3 = pd.merge(chart_genre_merge2,LastFM_genre_df,left_on=['track_name', 'artist_names'],right_on=['song_name', 'artist'],how='inner')
chart_genre_merge3.rename(columns={'genre': 'LastFM_Genre'}, inplace=True)
chart_genre_merge3.drop(columns=['song_name', 'artist'], inplace=True)

# Merge Final genres
chart_genre_merge4 = pd.merge(chart_genre_merge3,Final_genre_df,left_on=['track_name', 'artist_names'],right_on=['song_name', 'artist'],how='inner')
chart_genre_merge4.rename(columns={'genre': 'Final_Genre'}, inplace=True)
chart_genre_merge4.drop(columns=['song_name', 'artist'], inplace=True)


chart_genre_merge4

Unnamed: 0,rank,track_name,artist_names,Spotify_Genre,AudioDB_Genre,LastFM_Genre,Final_Genre
0,1,Stick Season,Noah Kahan,,,indie,indie
1,2,Lovin On Me,Jack Harlow,,,rap,rap
2,3,greedy,Tate McRae,,,pop,pop
3,4,Prada,"cassö, RAYE, D-Block Europe",grime,,,grime
4,5,Strangers,Kenya Grace,,,Drum and bass,Drum and bass
...,...,...,...,...,...,...,...
195,196,I Wonder,Kanye West,rap,Hip-Hop,Hip-Hop,Hip-Hop
196,197,Wet Dreamz,J. Cole,rap,Hip-Hop,MySpotigramBot,Hip-Hop
197,198,Wish You The Best,Lewis Capaldi,soft pop,Singer Songwriter,pop,Singer Songwriter
198,199,Fortunate Son,Creedence Clearwater Revival,classic rock,Classic Rock,classic rock,Classic Rock


In [31]:
chart_genre_merge4[chart_genre_merge4['Spotify_Genre'].notnull()]

Unnamed: 0,rank,track_name,artist_names,Spotify_Genre,AudioDB_Genre,LastFM_Genre,Final_Genre
3,4,Prada,"cassö, RAYE, D-Block Europe",grime,,,grime
7,8,Sprinter,"Dave, Central Cee",grime,,,grime
8,9,Water,Tyla,afrobeats,,rnb,afrobeats
10,11,Popular (with Playboi Carti & Madonna) - From ...,"The Weeknd, Playboi Carti, Madonna",grime,,,rage rap
11,12,Mr. Brightside,The Killers,alternative rock,Indie,rock,Indie
...,...,...,...,...,...,...,...
193,194,The Less I Know The Better,Tame Impala,indie,Psychedelic Rock,psychedelic,Psychedelic Rock
195,196,I Wonder,Kanye West,rap,Hip-Hop,Hip-Hop,Hip-Hop
196,197,Wet Dreamz,J. Cole,rap,Hip-Hop,MySpotigramBot,Hip-Hop
197,198,Wish You The Best,Lewis Capaldi,soft pop,Singer Songwriter,pop,Singer Songwriter


In [19]:
chart_genre_merge4.groupby('AudioDB_Genre')['AudioDB_Genre'].count()

# AudioDB problematic genre tags: ['...', 'Singer Songwriter', ''] 

AudioDB_Genre
                      1
...                   1
Alternative Rock      6
Classic Rock          1
Country               2
Country Pop           1
Electronic            1
Hip-Hop              14
House                 1
Indie                12
Indie Pop             2
Indie Rock            2
Pop                  18
Pop-Rock              3
Psychedelic Rock      1
R&B                   8
Reggae fusion         1
Rock                  5
Singer Songwriter     3
Soul                  3
Synthpop              2
Name: AudioDB_Genre, dtype: int64

In [20]:
chart_genre_merge4.groupby('LastFM_Genre')['LastFM_Genre'].count()

LastFM_Genre
2010s                  1
70s                    1
80s                    4
Adele                  1
Awesome                1
Coldplay               1
Drum and bass          1
Hip-Hop               10
House                  3
Kanye West             1
Lana Del Rey           1
MySpotigramBot        10
Soundtrack             1
acoustic               1
americana              1
atb                    1
britpop                2
chamber pop            1
classic rock           2
country                1
country pop            1
cyndi lauper           1
dance                  1
dirty masterpiece      1
dream pop              1
electronic             3
electropop             2
folk                   1
hip hop                1
indie                 12
indie folk             1
indie pop              4
indie rock             6
motomano               1
overrated              1
peter                  1
pop                   29
pop rock               3
power pop              1
psychedelic 

In [36]:
# Number of tracks with null genre_tag value out of 200
chart_genre_merge4['Final_Genre'].isnull().sum()

np.int64(17)