## Spotify API Data Analysis

This is analysis aims to explore and process production-like data from the spotify API, create blueprints for data processing,  
sample will match the full population.

In [3]:
import os
import sys

sys.dont_write_bytecode = True

import requests
import json
import time
import datetime

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from dotenv import load_dotenv

load_dotenv()

False

Loading input data harvested by Natalia Krebesova from the public official Spotify data API.

In [4]:
data = pd.read_csv('./Data/spotify_tracks.csv')

In [6]:
data.head()

Unnamed: 0,id,name,track_number,disc_number,duration_ms,popularity,explicit,artists,artist_ids,album,album_total_tracks,album_artists,album_artist_ids,album_release_date,restrictions,available_markets
0,3d9DChrdc6BOeFsbrZ3Is0,Under the Bridge,11,1,264306,81,False,Red Hot Chili Peppers,0L8ExT028jH3ddEcZwqJJ5,Blood Sugar Sex Magik (Deluxe Edition),19,Red Hot Chili Peppers,0L8ExT028jH3ddEcZwqJJ5,1991-09-24,,"['AR', 'AU', 'AT', 'BE', 'BO', 'BR', 'BG', 'CA..."
1,6sy3LkhNFjJWlaeSMNwQ62,Counting Stars,1,1,257840,71,False,OneRepublic,5Pwc4xIPtQLFEnJriah9YJ,Native,15,OneRepublic,5Pwc4xIPtQLFEnJriah9YJ,2014-01-01,,"['AR', 'AU', 'AT', 'BE', 'BO', 'BR', 'BG', 'CA..."
2,5YJtMNWKe55yr49cyJgxva,Everytime We Touch,1,1,197124,72,False,Cascada,0N0d3kjwdY2h7UVuTdJGfp,Everytime We Touch (Premium Edition),45,Cascada,0N0d3kjwdY2h7UVuTdJGfp,2010-02-12,,"['AR', 'AU', 'AT', 'BE', 'BO', 'BR', 'BG', 'CA..."
3,27AHAtAirQapVldIm4c9ZX,Jump,2,1,195106,64,False,Kris Kross,2zrZfs23sjuHDv4E6YRmNf,Totally Krossed Out,15,Kris Kross,2zrZfs23sjuHDv4E6YRmNf,1992-03-17,,"['AR', 'AU', 'AT', 'BE', 'BO', 'BR', 'BG', 'CA..."
4,5G1sTBGbZT5o4PNRc75RKI,Lonely Boy,1,1,193653,68,False,The Black Keys,7mnBLXK823vNxN3UWB7Gfz,El Camino,11,The Black Keys,7mnBLXK823vNxN3UWB7Gfz,2011-12-06,,"['AR', 'AU', 'AT', 'BE', 'BO', 'BR', 'BG', 'CA..."


NOTE:

After further consultation with the team we have decided that this data from official Spotify API might not be the best for the purposes of data analyses  
and modelling, hence we will use other data - specifically, data from Kaggle which were used in the first introductory analyses could be a great candidate  
since this data contains acustic data features which is important for modelling purposes.

___

### Other data source exploration

These datasets are candidates for project source data:
- https://www.kaggle.com/datasets/gauthamvijayaraj/spotify-tracks-dataset-updated-every-week

#### weekly-update dataset

This dataset contains data about ~ 60 000 tracks, accordidng to the data documentation on Kaggle it supports 6 different languages, hence an analysis  
of duplicates is needed to determine whether these tracks are duplicates in different languages or are unique songs sang in different languages.

In [8]:
data = pd.read_csv('./Data/spotify_tracks_kaggle_weekly.csv')

In [13]:
data.shape

(62317, 22)

Attribute quick analysis - what kind of data are we dealing with?

- track_id : unique ID for songs -> PK
- track_url : combination of spotify endpoint + track_id -> another PK but ID is enough
- others seem to be normal data attributes

In [22]:
data.head()

Unnamed: 0,track_id,track_name,artist_name,year,popularity,artwork_url,album_name,acousticness,danceability,duration_ms,...,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,track_url,language
0,2r0ROhr7pRN4MXDMT1fEmd,"Leo Das Entry (From ""Leo"")",Anirudh Ravichander,2024,59,https://i.scdn.co/image/ab67616d0000b273ce9c65...,"Leo Das Entry (From ""Leo"")",0.0241,0.753,97297.0,...,8.0,0.1,-5.994,0.0,0.103,110.997,4.0,0.459,https://open.spotify.com/track/2r0ROhr7pRN4MXD...,Tamil
1,4I38e6Dg52a2o2a8i5Q5PW,AAO KILLELLE,"Anirudh Ravichander, Pravin Mani, Vaishali Sri...",2024,47,https://i.scdn.co/image/ab67616d0000b273be1b03...,AAO KILLELLE,0.0851,0.78,207369.0,...,10.0,0.0951,-5.674,0.0,0.0952,164.995,3.0,0.821,https://open.spotify.com/track/4I38e6Dg52a2o2a...,Tamil
2,59NoiRhnom3lTeRFaBzOev,Mayakiriye Sirikiriye - Orchestral EDM,"Anirudh Ravichander, Anivee, Alvin Bruno",2024,35,https://i.scdn.co/image/ab67616d0000b27334a1dd...,Mayakiriye Sirikiriye (Orchestral EDM),0.0311,0.457,82551.0,...,2.0,0.0831,-8.937,0.0,0.153,169.996,4.0,0.598,https://open.spotify.com/track/59NoiRhnom3lTeR...,Tamil
3,5uUqRQd385pvLxC8JX3tXn,Scene Ah Scene Ah - Experimental EDM Mix,"Anirudh Ravichander, Bharath Sankar, Kabilan, ...",2024,24,https://i.scdn.co/image/ab67616d0000b27332e623...,Scene Ah Scene Ah (Experimental EDM Mix),0.227,0.718,115831.0,...,7.0,0.124,-11.104,1.0,0.445,169.996,4.0,0.362,https://open.spotify.com/track/5uUqRQd385pvLxC...,Tamil
4,1KaBRg2xgNeCljmyxBH1mo,Gundellonaa X I Am A Disco Dancer - Mashup,"Anirudh Ravichander, Benny Dayal, Leon James, ...",2024,22,https://i.scdn.co/image/ab67616d0000b2735a59b6...,Gundellonaa X I Am a Disco Dancer (Mashup),0.0153,0.689,129621.0,...,7.0,0.345,-9.637,1.0,0.158,128.961,4.0,0.593,https://open.spotify.com/track/1KaBRg2xgNeCljm...,Tamil


Duplicate count based on track id -> this should eliminate language barier, ID serves as unique key for songs

In [12]:
data['track_id'].duplicated().sum()

78

In [19]:
data[data['track_id'].duplicated(keep=False)].sort_values(by='track_id')

Unnamed: 0,track_id,track_name,artist_name,year,popularity,artwork_url,album_name,acousticness,danceability,duration_ms,...,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,track_url,language
2494,08elXSPBnP9uFYkjpObT5T,"Yaaro Ucchikilai Meley (From ""Taramani"")",Yuvan Shankar Raja,2020,0,https://i.scdn.co/image/ab67616d0000b27320428b...,U1 For Life,0.0217,0.643,177867.0,...,8.0,0.1460,-5.697,0.0,0.0510,179.991,4.0,0.509,https://open.spotify.com/track/08elXSPBnP9uFYk...,Tamil
2572,08elXSPBnP9uFYkjpObT5T,"Yaaro Ucchikilai Meley (From ""Taramani"")",Yuvan Shankar Raja,2020,0,https://i.scdn.co/image/ab67616d0000b27320428b...,U1 For Life,0.0217,0.643,177867.0,...,8.0,0.1460,-5.697,0.0,0.0510,179.991,4.0,0.509,https://open.spotify.com/track/08elXSPBnP9uFYk...,Tamil
2432,0A0fukU8m9V1DHzzIOFUHq,"Ye Rasa - From ""MaaManithan""","Yuvan Shankar Raja, Ilayaraaja, U1 Records",2021,35,https://i.scdn.co/image/ab67616d0000b2738bc6a0...,"Ye Rasa (From ""MaaManithan"")",0.7260,0.702,242659.0,...,9.0,0.0731,-7.922,1.0,0.0323,125.990,3.0,0.412,https://open.spotify.com/track/0A0fukU8m9V1DHz...,Tamil
2510,0A0fukU8m9V1DHzzIOFUHq,"Ye Rasa - From ""MaaManithan""","Yuvan Shankar Raja, Ilayaraaja, U1 Records",2021,35,https://i.scdn.co/image/ab67616d0000b2738bc6a0...,"Ye Rasa (From ""MaaManithan"")",0.7260,0.702,242659.0,...,9.0,0.0731,-7.922,1.0,0.0323,125.990,3.0,0.412,https://open.spotify.com/track/0A0fukU8m9V1DHz...,Tamil
2491,0DqKN7Wg0lsQaf6MuaREpX,Munnala Ninna Pothum,"Yuvan Shankar Raja, Saindhavi, Sunandan",2020,4,https://i.scdn.co/image/ab67616d0000b2734ae5bd...,Dabangg 3,0.2700,0.496,246625.0,...,9.0,0.1390,-3.419,1.0,0.1120,128.921,4.0,0.848,https://open.spotify.com/track/0DqKN7Wg0lsQaf6...,Tamil
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2506,7rto8ANj8esy1YSfDekGSL,"Valimai Motion Poster Theme (From ""Valimai"")",Yuvan Shankar Raja,2021,34,https://i.scdn.co/image/ab67616d0000b2733a8f59...,"Valimai Motion Poster Theme (From ""Valimai"")",0.3060,0.722,87600.0,...,8.0,0.0546,-8.597,1.0,0.0748,100.032,4.0,0.596,https://open.spotify.com/track/7rto8ANj8esy1YS...,Tamil
2489,7t89o04EbT4flij8QFYGuK,Super Deluxe (BGM Cover),Yuvan Shankar Raja,2020,20,https://i.scdn.co/image/ab67616d0000b273992681...,Super Deluxe (BGM Cover),0.3250,0.361,66130.0,...,3.0,0.1100,-8.770,0.0,0.0330,158.910,3.0,0.238,https://open.spotify.com/track/7t89o04EbT4flij...,Tamil
2567,7t89o04EbT4flij8QFYGuK,Super Deluxe (BGM Cover),Yuvan Shankar Raja,2020,20,https://i.scdn.co/image/ab67616d0000b273992681...,Super Deluxe (BGM Cover),0.3250,0.361,66130.0,...,3.0,0.1100,-8.770,0.0,0.0330,158.910,3.0,0.238,https://open.spotify.com/track/7t89o04EbT4flij...,Tamil
2538,7uihm93NAp8PWqyn1wn4OB,Glance of Rukmani,"Yuvan Shankar Raja, Vivek - Mervin",2021,5,https://i.scdn.co/image/ab67616d0000b273e3d93e...,Sulthan (Original Background Score),0.7880,0.544,50583.0,...,5.0,0.0807,-9.090,0.0,0.0799,143.750,4.0,0.923,https://open.spotify.com/track/7uihm93NAp8PWqy...,Tamil


duplicate count against all attributes

In [24]:
data.duplicated().sum()

78

No data anomalies present since ID duplicate count == row duplicate count, this data can be dropped.

In [25]:
processed_data = data.drop_duplicates()

In [28]:
processed_data.sort_values(by='popularity')

Unnamed: 0,track_id,track_name,artist_name,year,popularity,artwork_url,album_name,acousticness,danceability,duration_ms,...,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,track_url,language
31197,7wFbKKM7TXNwj1SWBe4vz7,Actuality,Twice,2016,0,https://i.scdn.co/image/ab67616d0000b2739c03ee...,Actuality,0.05920,0.212,270493.0,...,9.0,0.3640,-6.907,1.0,0.0554,123.948,4.0,0.351,https://open.spotify.com/track/7wFbKKM7TXNwj1S...,Korean
38685,6gPaEqLowtl4Q3X6geC8cy,"Yen Ooru Madura, Pt. 2",Deva,1992,0,https://i.scdn.co/image/ab67616d0000b273ea9df4...,Vaasalil Oru Vennila (Original Motion Picture ...,0.85000,0.591,306307.0,...,4.0,0.3810,-6.245,0.0,0.0383,86.688,4.0,0.798,https://open.spotify.com/track/6gPaEqLowtl4Q3X...,Tamil
38686,3LmyzTgiIkQgNxbyjc6RUN,Padikavandhadhu - Male Vocals,"Deva, Mano, Kalidasan",1992,0,https://i.scdn.co/image/ab67616d0000b2731cec4d...,En Rajangam (Original Motion Picture Soundtrack),0.43500,0.600,285525.0,...,9.0,0.0552,-6.956,1.0,0.3300,66.408,4.0,0.833,https://open.spotify.com/track/3LmyzTgiIkQgNxb...,Tamil
38687,2KOl72TPOjfROBtBdQmRiJ,Chinna Roja Ival,"Deva, Vasan, Sujatha",1992,0,https://i.scdn.co/image/ab67616d0000b273146768...,Manasukkul Varalama (Original Motion Picture S...,0.37300,0.643,281800.0,...,8.0,0.3630,-7.761,1.0,0.0427,101.067,4.0,0.444,https://open.spotify.com/track/2KOl72TPOjfROBt...,Tamil
38689,1mBSAPdKwtfBR9OWJm184x,Manasu Thudikkuthu,"Deva, K. S. Chithra",1992,0,https://i.scdn.co/image/ab67616d0000b27329f84a...,Kavalukku Kannellai (Original Motion Picture S...,0.75400,0.580,240466.0,...,1.0,0.1930,-10.207,1.0,0.0735,118.496,4.0,0.803,https://open.spotify.com/track/1mBSAPdKwtfBR9O...,Tamil
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
24273,4R2kfaDFhslZEMJqAFNpdd,cardigan,Taylor Swift,2020,89,https://i.scdn.co/image/ab67616d0000b27395f754...,folklore,0.53700,0.613,239560.0,...,0.0,0.2500,-8.588,0.0,0.0424,130.033,4.0,0.551,https://open.spotify.com/track/4R2kfaDFhslZEMJ...,English
54962,62bOmKYxYg7dhrC6gH9vFn,Bye Bye Bye - From Deadpool and Wolverine Soun...,*NSYNC,2000,90,https://i.scdn.co/image/ab67616d0000b273a6cb8f...,No Strings Attached,0.03100,0.610,200400.0,...,8.0,0.0821,-4.843,0.0,0.0479,172.638,4.0,0.861,https://open.spotify.com/track/62bOmKYxYg7dhrC...,English
26432,0VjIjW4GlUZAMYd2vXMi3b,Blinding Lights,The Weeknd,2020,91,https://i.scdn.co/image/ab67616d0000b2738863bc...,After Hours,0.00146,0.514,200040.0,...,1.0,0.0897,-5.934,1.0,0.0598,171.005,4.0,0.334,https://open.spotify.com/track/0VjIjW4GlUZAMYd...,English
26580,7MXVkk9YMctZqd1Srtv4MB,Starboy,"The Weeknd, Daft Punk",2016,91,https://i.scdn.co/image/ab67616d0000b2734718e2...,Starboy,0.14100,0.679,230453.0,...,7.0,0.1370,-7.015,1.0,0.2760,186.003,4.0,0.486,https://open.spotify.com/track/7MXVkk9YMctZqd1...,English


Duplicate analysis - completed, onto the anomaly analysis

**Data type analysis**

In [40]:
categorical_columns = processed_data.dtypes[processed_data.dtypes == object].index
numerical_columns = processed_data.dtypes[processed_data.dtypes != object].index

inspection of categorical columns - there seems to be no anomaly in numerical columns which would force recast to categorical dtype

Besides anomaly detection, these columns can be included in drop list as they serve as identifiers:
- track_name : ID like attribute
- artwork_url : not important for analytical results
- track_url : not important for analytical results

In [45]:
processed_data[categorical_columns]

Unnamed: 0,track_id,track_name,artist_name,artwork_url,album_name,track_url,language
0,2r0ROhr7pRN4MXDMT1fEmd,"Leo Das Entry (From ""Leo"")",Anirudh Ravichander,https://i.scdn.co/image/ab67616d0000b273ce9c65...,"Leo Das Entry (From ""Leo"")",https://open.spotify.com/track/2r0ROhr7pRN4MXD...,Tamil
1,4I38e6Dg52a2o2a8i5Q5PW,AAO KILLELLE,"Anirudh Ravichander, Pravin Mani, Vaishali Sri...",https://i.scdn.co/image/ab67616d0000b273be1b03...,AAO KILLELLE,https://open.spotify.com/track/4I38e6Dg52a2o2a...,Tamil
2,59NoiRhnom3lTeRFaBzOev,Mayakiriye Sirikiriye - Orchestral EDM,"Anirudh Ravichander, Anivee, Alvin Bruno",https://i.scdn.co/image/ab67616d0000b27334a1dd...,Mayakiriye Sirikiriye (Orchestral EDM),https://open.spotify.com/track/59NoiRhnom3lTeR...,Tamil
3,5uUqRQd385pvLxC8JX3tXn,Scene Ah Scene Ah - Experimental EDM Mix,"Anirudh Ravichander, Bharath Sankar, Kabilan, ...",https://i.scdn.co/image/ab67616d0000b27332e623...,Scene Ah Scene Ah (Experimental EDM Mix),https://open.spotify.com/track/5uUqRQd385pvLxC...,Tamil
4,1KaBRg2xgNeCljmyxBH1mo,Gundellonaa X I Am A Disco Dancer - Mashup,"Anirudh Ravichander, Benny Dayal, Leon James, ...",https://i.scdn.co/image/ab67616d0000b2735a59b6...,Gundellonaa X I Am a Disco Dancer (Mashup),https://open.spotify.com/track/1KaBRg2xgNeCljm...,Tamil
...,...,...,...,...,...,...,...
62312,3eHDwMQYPEziy2DWRBNoLv,Sani - G.O.A.T Remix,"Arvind Raj, Sheezay, Music Kitchen, FSPROD Vin...",https://i.scdn.co/image/ab67616d0000b273819d23...,Sani (G.O.A.T Remix),https://open.spotify.com/track/3eHDwMQYPEziy2D...,Tamil
62313,5hHtCqkNv5eo99OrEFFcgS,Life of Bachelor,"A H Kaashif, Navakkarai Naveen Prabanjam, Asal...",https://i.scdn.co/image/ab67616d0000b2736cd651...,Bachelor (Original Motion Picture Soundtrack),https://open.spotify.com/track/5hHtCqkNv5eo99O...,Tamil
62314,08foF9YHgKmIgOy3xMWRZy,Yo Baby,"Rakesh Ambigapathy, Asal Kolaar, MC Vickey",https://i.scdn.co/image/ab67616d0000b27300da25...,Yo Baby,https://open.spotify.com/track/08foF9YHgKmIgOy...,Tamil
62315,2wLFbVlQGKJSd9lwzwL47F,Fast Fast Vadiley,"Asal Kolaar, Priyadarshan Balasubramanian",https://i.scdn.co/image/ab67616d0000b273e051e1...,Arjuna Phalguna,https://open.spotify.com/track/2wLFbVlQGKJSd9l...,Tamil
