In [11]:
#Importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# sets the theme of the charts
plt.style.use('seaborn-darkgrid')

%matplotlib inline

In [12]:
#Loading the three datasets
charts_df = pd.read_csv('ph_spotify_daily_charts.csv')
charts_artists_df = pd.read_csv('ph_spotify_daily_charts_artists.csv')
charts_tracks_df = pd.read_csv('ph_spotify_daily_charts_tracks.csv')


## Dataset Description

This notebook will be utilizing the dataset called "Spotify Daily Top 200 Tracks in the Philippines".
It contains information about the Daily Top 200 most streamed songs on Spotify from January 1, 2017 until March 31, 2023. 

The entire dataset is seperated into three files.

The **first file** contains data about the tracks that made it to the Top 200 for a specific day. The following are descriptions of each variable in the dataset. 
- **`date`**: the exact date the track is in the Daily Top 200 Chart.
- **`position`**: the position of the track based on the number of streams. Values are limited in the range of 1 to 200.
- **`track_id`**: the unique identifier Spotify uses to identify the songs on their platform. They can be specifically used this way: "https://open.spotify.com/track/<track_id>"
- **`track_name`**: the name of the track/song
- **`artist`**: the name of the artist who made the track/song
- **`streams`**: number of times the song was played

The file was loaded in the code blocks below. It contains *456200* observations, each observation representing a track that was part of the Daily Top 200 Chart for a specific day. It means that it can contain multiple instances for the same track, given that it belongs to the Top 200 Charts for a different date.

In [13]:
charts_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 456201 entries, 0 to 456200
Data columns (total 6 columns):
 #   Column      Non-Null Count   Dtype 
---  ------      --------------   ----- 
 0   date        456201 non-null  object
 1   position    456201 non-null  int64 
 2   track_id    456201 non-null  object
 3   track_name  456191 non-null  object
 4   artist      456191 non-null  object
 5   streams     456201 non-null  int64 
dtypes: int64(2), object(4)
memory usage: 20.9+ MB


In [14]:
#Top 5 songs for January 1st, 2017
charts_df.head(5)

Unnamed: 0,date,position,track_id,track_name,artist,streams
0,2017-01-01,1,0kN8xEmgMW9mh7UmDYHlJP,Versace on the Floor,Bruno Mars,185236
1,2017-01-01,2,5uCax9HTNlzGybIStD3vDh,Say You Won't Let Go,James Arthur,180552
2,2017-01-01,3,7BKLCZ1jbUBVqRi2FVlTVw,Closer,The Chainsmokers,158720
3,2017-01-01,4,2rizacJSyD9S1IQUxUxnsK,All We Know,The Chainsmokers,130874
4,2017-01-01,5,5MFzQMkrl1FOOng9tq6R9r,Don't Wanna Know,Maroon 5,129656


The **second file** contains information about the artists who made the songs that reached the Daily Top 200 Charts. The following are descriptions of each variable in the dataset.
- **`artist_id`**: the unique identifier Spotify uses to identify the artists on their platform.
- **`artist_name`**: the name of the artist
- **`total_followers`**: the number of followers the artist had on the time of recording the data
- **`genres`**: an array containing the genres the artist is associated with (if any). [Source](https://developer.spotify.com/documentation/web-api/reference/get-an-artist)
- **`popularity`**: an integer between 0 being the lowest to 100 being the highest, representing the popularity of the artist calculated from all of their tracks.

The code blocks below show that there are *988* observations for this file, each observation representing a unique artist whose tracks made it into the Daily Top 200 Charts atleast once.

In [8]:
charts_artists_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 988 entries, 0 to 987
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   artist_id        988 non-null    object
 1   artist_name      988 non-null    object
 2   total_followers  988 non-null    int64 
 3   genres           988 non-null    object
 4   popularity       988 non-null    int64 
dtypes: int64(2), object(3)
memory usage: 38.7+ KB


In [9]:
#First 5 artists in the dataset
charts_artists_df.head(5)

Unnamed: 0,artist_id,artist_name,total_followers,genres,popularity
0,0du5cEVh5yTK9QJze8zA0C,Bruno Mars,47387027,"['dance pop', 'pop']",89
1,4IWBUUAFIplrNtaOHcJPRM,James Arthur,11471232,"['pop', 'talent show', 'uk pop']",82
2,69GGBxA162lTqCwzJG5jLp,The Chainsmokers,20036566,"['dance pop', 'edm', 'electropop', 'pop', 'pop...",81
3,04gDigrS5kc9YWfZHwBETP,Maroon 5,40125006,['pop'],86
4,5p7f24Rk5HkUZsaS3BLG5F,Hailee Steinfeld,8535540,"['dance pop', 'pop', 'post-teen pop']",73


The **third file** contains data about the speicfic details about the tracks listed in the **first file**. The following are descriptions of each variable in the dataset.
- **`track_id`**: the unique identifier Spotify uses to identify the songs on their platform. They can be specifically used this way: "https://open.spotify.com/track/<track_id>". Identical to their respective observations in the **first file**
- **`track_name`**: the name of the track/song. Identical to their respective observations in the **first file**
- **`artist_id`**: the unique identifier Spotify uses to identify the artists on their platform. Identical to their respective observations in the **second file**
- **`artist_name`**: the name of the artist. Identical to their respective observations in the **second file**
- **`album_id`**: the unique identifier Spotify uses for the album that contains the track/song (if any). [Source](https://developer.spotify.com/documentation/web-api/reference/get-an-album)
- **`duration`**: the duration of the song in milliseconds
- **`release_date`**: the date the album containing the track/song was released
- **`popularity`**: the popularity of the album represented as an integer between 0 being the lowest to 100 being the highest
- **`danceability`**: describes how suitable a song is for dancing based on musical elements present in the track specifically the tempo, rhythm stability, beat strength, and overall regularity. It has a value between 0 being the least danceable to 1 being the most danceable.
- **`energy`**: a value between 0 to 1 representing the measure of intensity and activity.
- **`key`**: a value ranging from -1 to -11 representing the key the track is in using Standard Pitch Class Notation. e.g. 0 = C, 1 = C#/Db, and so on.
- **`loudness`**: overall (average) loudness of a track in decibels. Values typically range between -60 to 0 db
- **`mode`**: a value of either 0 or 1 indicating the modality of a track, 0 represents *minor* and 1 represents *major*
- **`speechiness`**: a value representing the presence of spoken words detected in a track. It ranges between 0 to 1 where tracks that are exclusively speech-like has a value that is closer to 1.0
- **`acousticness`**: a value between 0 to 1 representing the confidence that the track is acoustic. 1.0 represents high confidence that the track is acoustic
- **`instrumentalness`**: a value between 0 to 1 representing a prediction whether the track has no vocals. The closer the value is to 1.0, the greater likelihood that it has no vocal content.
- **`liveness`**: a value between 0 to 1 representing the presence of an audience detected in the recording. A value above 0.8 provides a strong likelihood that the track is live (performed and recorded in front of an audience).
- **`valence`**: a value between 0 to 1 representing the positiveness conveyed by a track. A value closer to 1.0 represents a positive track (e.g. happy or cheerful) and a value closer to 0.0 represents a negative track (e.g. sad or depressing)
- **`tempo`**: a value representing the overall tempo (speed or pace) of a track in beats per minute (BPM). [Source](https://developer.spotify.com/documentation/web-api/reference/get-audio-features)

The code blocks below show that the file contains *4768* observations, each observation representing a unique track that was in the Daily Top 200 Charts atleast once.

In [39]:
charts_tracks_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4768 entries, 0 to 4767
Data columns (total 19 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   track_id          4768 non-null   object 
 1   track_name        4763 non-null   object 
 2   artist_id         4768 non-null   object 
 3   artist_name       4763 non-null   object 
 4   album_id          4768 non-null   object 
 5   duration          4768 non-null   int64  
 6   release_date      4768 non-null   object 
 7   popularity        4768 non-null   int64  
 8   danceability      4767 non-null   float64
 9   energy            4767 non-null   float64
 10  key               4767 non-null   float64
 11  loudness          4767 non-null   float64
 12  mode              4767 non-null   float64
 13  speechiness       4767 non-null   float64
 14  acousticness      4767 non-null   float64
 15  instrumentalness  4767 non-null   float64
 16  liveness          4767 non-null   float64


In [40]:
charts_tracks_df.head(5)

Unnamed: 0,track_id,track_name,artist_id,artist_name,album_id,duration,release_date,popularity,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo
0,0kN8xEmgMW9mh7UmDYHlJP,Versace on the Floor,0du5cEVh5yTK9QJze8zA0C,Bruno Mars,4PgleR09JVnm3zY1fW3XBA,261240,17/11/2016,75,0.578,0.574,2.0,-6.209,1.0,0.0454,0.196,0.0,0.083,0.301,174.152
1,5uCax9HTNlzGybIStD3vDh,Say You Won't Let Go,4IWBUUAFIplrNtaOHcJPRM,James Arthur,7oiJYvEJHsmYtrgviAVIBD,211466,28/10/2016,87,0.358,0.557,10.0,-7.398,1.0,0.059,0.695,0.0,0.0902,0.494,85.043
2,7BKLCZ1jbUBVqRi2FVlTVw,Closer,69GGBxA162lTqCwzJG5jLp,The Chainsmokers,0rSLgV8p5FzfnqlEk4GzxE,244960,29/07/2016,85,0.748,0.524,8.0,-5.599,1.0,0.0338,0.414,0.0,0.111,0.661,95.01
3,2rizacJSyD9S1IQUxUxnsK,All We Know,69GGBxA162lTqCwzJG5jLp,The Chainsmokers,0xmaV6EtJ4M3ebZUPRnhyb,194080,29/09/2016,71,0.662,0.586,0.0,-8.821,1.0,0.0307,0.097,0.00272,0.115,0.296,90.0
4,5MFzQMkrl1FOOng9tq6R9r,Don't Wanna Know,04gDigrS5kc9YWfZHwBETP,Maroon 5,0fvTn3WXF39kQs9i3bnNpP,214480,11/10/2016,0,0.783,0.623,7.0,-6.126,1.0,0.08,0.338,0.0,0.0975,0.447,100.048
