## My Spotify Data Analysis Using Python, Excel and Tableau

Spotify is a digital music, podcast, and video streaming service that gives you access to millions of songs and other content from creators all over the world. It is one of the most popular streaming platforms.

Every year, Spotify users receive their "Spotify Wrapped," which is a summary of their listening habits for the year. However, you don't have to wait until December to get a summary of your listening habits. There are apps like stats.fm that can help you analyze your data, or you can do it yourself.

I have a stats.fm account, but I decided to analyze my data myself anyway. I used the data that I downloaded from Spotify last year when I subscribed to a stats.fm premium account. The data is not up to date, but I will make do with it.

The data set contains my streaming data from 20th of February, 2020 to 7th of July, 2022. it contains 58,607 rows and 5 columns

This project will include four stages:

Importing libraries and dataset

Data cleaning

Analysis

Visualization

Explanation of each stage:

Stage 1: Importing libraries and dataset

In this stage, I will import the necessary libraries and the dataset into my programming environment. I will use Python for this project.

Stage 2: Data cleaning

In this stage, I will clean the data to remove any errors or inconsistencies. I will also check for duplicate and null entries and remove them.

Stage 3: Analysis

In this stage, I will analyze the data to identify patterns and trends. I will also use statistical methods to calculate summary statistics, such as the total number of streams per artist or the total time spent listening to each album.

Stage 4: Visualization

In this stage, I will create visualizations to communicate the results of my analysis. I will use charts and graphs to show the top artists, albums, and songs, as well as other interesting trends.

I am excited to start this project and learn more about my listening habits. I am also curious to see what new insights I can gain from my data.




### Stage 1: Importing of libraries and Dataset

Here i imported all the libraries i would use for this project. A library is a collection of pre-written code used to optimize tasks. Every library is used to solve specific problems. For example the pandas and numpy libraries are used fo data analysis while the matplotlib is used for visualization.

After importing the libraries, i will import the data which is in a json file.

In [1]:
# importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
import os


#import plotly.graph_objects as go



Next i will import the data set which are json files 

In [2]:
# importing the json data

Quin0 = open('endsong_0.json', encoding='utf8')
data = json.load(Quin0)

Quin1 = open('endsong_1.json', encoding='utf8')
data1 = json.load(Quin1)

Quin2 = open('endsong_2.json', encoding='utf8')
data2 = json.load(Quin2)

Quin3 = open('endsong_3.json', encoding='utf8')
data3 = json.load(Quin3)


In [3]:
print(data[:7])
print(data1[:7])
print(data2[:7])
print(data3[:7])

[{'ts': '2020-04-27T22:48:54Z', 'username': 'lf1e2m0ko4zhz7r9za98dwf4s', 'platform': 'Android-tablet OS 9 API 28 (INFINIX MOBILITY LIMITED,   xyz  )', 'ms_played': 271215, 'conn_country': 'NG', 'ip_addr_decrypted': '41.190.2.188', 'user_agent_decrypted': 'unknown', 'master_metadata_track_name': 'Popo (How deep is our love?)', 'master_metadata_album_artist_name': 'Yerin Baek', 'master_metadata_album_album_name': 'Every letter I sent you.', 'spotify_track_uri': 'spotify:track:5XAJzwa0B2Hf8Rb1q0rowN', 'episode_name': None, 'episode_show_name': None, 'spotify_episode_uri': None, 'reason_start': 'trackdone', 'reason_end': 'trackdone', 'shuffle': False, 'skipped': None, 'offline': False, 'offline_timestamp': 1588027461324, 'incognito_mode': False}, {'ts': '2020-11-30T14:42:27Z', 'username': 'lf1e2m0ko4zhz7r9za98dwf4s', 'platform': 'Android-tablet OS 9 API 28 (INFINIX MOBILITY LIMITED,   xyz  )', 'ms_played': 186168, 'conn_country': 'NG', 'ip_addr_decrypted': '41.190.30.186', 'user_agent_decr

### Stage 2: Data Cleaning

The steps i will be taking in this stage includes:

Step 1: Convert the data set from a json file to data frames

Step 2: Join all 4 dataframes together

Step 3: Change the end time column data type to a date-time data type

Step 4: Convert the ms_played column from milliseconds to minutes then rename it

Step 5: Check for null values

Step 6: Check for duplcate values



In [3]:
# Step 1 : Converting the json files into dataframes. 
#          A json file is in a textual format which makes it hard to read.

streaming_data_1 = pd.DataFrame()

def extract_json_value(column_name):
    
    return [i[column_name] for i in data]
streaming_data_1['track_url'] = extract_json_value('spotify_track_uri')
streaming_data_1['artist_name'] = extract_json_value('master_metadata_album_artist_name')
streaming_data_1['end_time'] = extract_json_value('ts')
streaming_data_1['ms_played'] = extract_json_value('ms_played')
streaming_data_1['track_name'] = extract_json_value('master_metadata_track_name')
streaming_data_1['album_name'] = extract_json_value('master_metadata_album_album_name')

In [13]:
streaming_data_1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15350 entries, 0 to 15349
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   track_url    15342 non-null  object
 1   artist_name  15342 non-null  object
 2   end_time     15350 non-null  object
 3   ms_played    15350 non-null  int64 
 4   track_name   15342 non-null  object
 5   album_name   15342 non-null  object
dtypes: int64(1), object(5)
memory usage: 719.7+ KB


In [14]:
# Step 1: Converting the json files into dataframes

streaming_data_2 = pd.DataFrame()

def extract_json_value(column_name):
    
    return [i[column_name] for i in data1]

streaming_data_2['track_url'] = extract_json_value('spotify_track_uri')
streaming_data_2['artist_name'] = extract_json_value('master_metadata_album_artist_name')
streaming_data_2['end_time'] = extract_json_value('ts')
streaming_data_2['ms_played'] = extract_json_value('ms_played')
streaming_data_2['track_name'] = extract_json_value('master_metadata_track_name')
streaming_data_2['album_name'] = extract_json_value('master_metadata_album_album_name')

In [7]:
streaming_data_2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15323 entries, 0 to 15322
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   artist_name  15314 non-null  object
 1   end_time     15323 non-null  object
 2   ms_played    15323 non-null  int64 
 3   track_name   15314 non-null  object
 4   album_name   15314 non-null  object
dtypes: int64(1), object(4)
memory usage: 598.7+ KB


In [15]:
# Step 1: Converting the json files into a dataframe

streaming_data_3 = pd.DataFrame()

def extract_json_value(column_name):
    
    return [i[column_name] for i in data2]

streaming_data_['track_url'] = extract_json_value('spotify_track_uri')
streaming_data_3['artist_name'] = extract_json_value('master_metadata_album_artist_name')
streaming_data_3['end_time'] = extract_json_value('ts')
streaming_data_3['ms_played'] = extract_json_value('ms_played')
streaming_data_3['track_name'] = extract_json_value('master_metadata_track_name')
streaming_data_3['album_name'] = extract_json_value('master_metadata_album_album_name')


In [9]:
streaming_data_3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15307 entries, 0 to 15306
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   artist_name  15295 non-null  object
 1   end_time     15307 non-null  object
 2   ms_played    15307 non-null  int64 
 3   track_name   15295 non-null  object
 4   album_name   15295 non-null  object
dtypes: int64(1), object(4)
memory usage: 598.1+ KB


In [16]:
# Step 1: Converting the json files into a dataframe

streaming_data_4 = pd.DataFrame()

def extract_json_value(column_name):
    
    return [i[column_name] for i in data3]

streaming_data_4['track_url'] = extract_json_value('spotify_track_uri')
streaming_data_4['artist_name'] = extract_json_value('master_metadata_album_artist_name')
streaming_data_4['end_time'] = extract_json_value('ts')
streaming_data_4['ms_played'] = extract_json_value('ms_played')
streaming_data_4['track_name'] = extract_json_value('master_metadata_track_name')
streaming_data_4['album_name'] = extract_json_value('master_metadata_album_album_name')

In [11]:
streaming_data_4.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12627 entries, 0 to 12626
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   artist_name  12622 non-null  object
 1   end_time     12627 non-null  object
 2   ms_played    12627 non-null  int64 
 3   track_name   12622 non-null  object
 4   album_name   12622 non-null  object
dtypes: int64(1), object(4)
memory usage: 493.4+ KB


Next i concat (which means to join) all 4 data frames together so i can view and analyze them all at once

In [17]:
# Step 2: concating the data frames

streaming_data = (pd.concat([streaming_data_1,streaming_data_2,streaming_data_3,streaming_data_4]))

In [18]:
streaming_data


Unnamed: 0,track_url,artist_name,end_time,ms_played,track_name,album_name
0,spotify:track:5XAJzwa0B2Hf8Rb1q0rowN,Yerin Baek,2020-04-27T22:48:54Z,271215,Popo (How deep is our love?),Every letter I sent you.
1,spotify:track:7HJkl4z2MXokJoGtWPL2pz,Fireboy DML,2020-11-30T14:42:27Z,186168,Go Away,APOLLO
2,spotify:track:2o2sgVJIgFXk8GQjWTgI6U,Taylor Swift,2020-12-14T10:47:30Z,215920,long story short,evermore
3,spotify:track:2YjKcFQ1vYFhQPgsmCOjss,Taylor Swift,2022-01-06T12:32:26Z,242093,Haunted,Speak Now
4,spotify:track:7dZ1Odmx9jWIweQSatnRqo,Lady Gaga,2021-10-08T21:57:04Z,203851,Million Reasons,Joanne
...,...,...,...,...,...,...
12622,spotify:track:0ZNU020wNYvgW84iljPkPP,Taylor Swift,2020-12-01T07:29:11Z,313963,mirrorball,folklore
12623,spotify:track:14hJ5tc1VCFMWhVn9axRTC,Shawn Mendes,2021-06-25T16:36:45Z,1428,Life Of The Party,Handwritten
12624,spotify:track:6OKKfwf75kQ4veyLktYV7F,Selena Gomez,2020-04-23T08:31:18Z,184080,Only You,13 Reasons Why
12625,spotify:track:5YqltLsjdqFtvqE7Nrysvs,Taylor Swift,2021-12-27T09:00:36Z,193146,We Are Never Ever Getting Back Together (Taylo...,Red (Taylor's Version)


In [19]:
# Step 3: changing end_time from object to datetime

streaming_data['end_time'] = pd.to_datetime(streaming_data['end_time'] )

In [20]:
# sorted the data using the end time column because i want
# to view the data from the first day to the last

streaming_data.sort_values('end_time', ascending = True)

Unnamed: 0,track_url,artist_name,end_time,ms_played,track_name,album_name
7029,spotify:track:7FKbTRXXIWVFQmPH8zGfU0,Taylor Swift,2020-02-20 14:37:28+00:00,219385,The Man - Live From Paris,The Man
15305,spotify:track:1kHEuJRasudLhjvnbfc4yS,Taylor Swift,2020-02-20 15:11:00+00:00,43178,Blank Space,1989
36,spotify:track:2slqvGLwzZZYsT4K4Y1GBC,Taylor Swift,2020-02-20 15:12:24+00:00,35175,Only The Young - Featured in Miss Americana,Only The Young
1770,spotify:track:1dGr1c8CrMLDpV6mPbImSI,Taylor Swift,2020-02-20 15:16:04+00:00,221306,Lover,Lover
4621,spotify:track:630sXRhIcfwr2e4RdNtjKN,Zac Efron,2020-02-20 15:16:54+00:00,46484,Rewrite The Stars,Rewrite The Stars
...,...,...,...,...,...,...
4343,spotify:track:3TcL0dyCMyr0kyTTc4NLgI,Selena Gomez & The Scene,2022-07-07 23:51:53+00:00,195613,Who Says,When The Sun Goes Down
6624,spotify:track:4upQnh3K5k1xbVOr97fdG7,Kizz Daniel,2022-07-07 23:55:23+00:00,91792,Oshe (feat. The Cavemen.),Barnabas
13575,spotify:track:3jjujdWJ72nww5eGnfs2E7,Harry Styles,2022-07-07 23:55:24+00:00,1230,Adore You,Fine Line
8626,spotify:track:5x5JM1BSB6vollcIzDocqT,Miley Cyrus,2022-07-07 23:55:25+00:00,1184,The Climb,The Time Of Our Lives


In [21]:
# step 4: converting from milliseconds to minutes and renaming the ms_played column

streaming_data['mins_played'] = streaming_data.ms_played.divide(60000)
streaming_data.drop('ms_played', axis=1, inplace=True)
streaming_data


Unnamed: 0,track_url,artist_name,end_time,track_name,album_name,mins_played
0,spotify:track:5XAJzwa0B2Hf8Rb1q0rowN,Yerin Baek,2020-04-27 22:48:54+00:00,Popo (How deep is our love?),Every letter I sent you.,4.520250
1,spotify:track:7HJkl4z2MXokJoGtWPL2pz,Fireboy DML,2020-11-30 14:42:27+00:00,Go Away,APOLLO,3.102800
2,spotify:track:2o2sgVJIgFXk8GQjWTgI6U,Taylor Swift,2020-12-14 10:47:30+00:00,long story short,evermore,3.598667
3,spotify:track:2YjKcFQ1vYFhQPgsmCOjss,Taylor Swift,2022-01-06 12:32:26+00:00,Haunted,Speak Now,4.034883
4,spotify:track:7dZ1Odmx9jWIweQSatnRqo,Lady Gaga,2021-10-08 21:57:04+00:00,Million Reasons,Joanne,3.397517
...,...,...,...,...,...,...
12622,spotify:track:0ZNU020wNYvgW84iljPkPP,Taylor Swift,2020-12-01 07:29:11+00:00,mirrorball,folklore,5.232717
12623,spotify:track:14hJ5tc1VCFMWhVn9axRTC,Shawn Mendes,2021-06-25 16:36:45+00:00,Life Of The Party,Handwritten,0.023800
12624,spotify:track:6OKKfwf75kQ4veyLktYV7F,Selena Gomez,2020-04-23 08:31:18+00:00,Only You,13 Reasons Why,3.068000
12625,spotify:track:5YqltLsjdqFtvqE7Nrysvs,Taylor Swift,2021-12-27 09:00:36+00:00,We Are Never Ever Getting Back Together (Taylo...,Red (Taylor's Version),3.219100


     Note: Spotify counts time in milliseconds so i conveted the time of each stream to minutes.

In [22]:
#s Step 5: Checking for null values
pd.isnull(streaming_data).sum()

track_url      34
artist_name    34
end_time        0
track_name     34
album_name     34
mins_played     0
dtype: int64

There are 34 null values in the artist_name, track_name and album_name colums. These columns will not be usefull so i will delete them.

In [23]:
# drop null columns

streaming_data.dropna()

Unnamed: 0,track_url,artist_name,end_time,track_name,album_name,mins_played
0,spotify:track:5XAJzwa0B2Hf8Rb1q0rowN,Yerin Baek,2020-04-27 22:48:54+00:00,Popo (How deep is our love?),Every letter I sent you.,4.520250
1,spotify:track:7HJkl4z2MXokJoGtWPL2pz,Fireboy DML,2020-11-30 14:42:27+00:00,Go Away,APOLLO,3.102800
2,spotify:track:2o2sgVJIgFXk8GQjWTgI6U,Taylor Swift,2020-12-14 10:47:30+00:00,long story short,evermore,3.598667
3,spotify:track:2YjKcFQ1vYFhQPgsmCOjss,Taylor Swift,2022-01-06 12:32:26+00:00,Haunted,Speak Now,4.034883
4,spotify:track:7dZ1Odmx9jWIweQSatnRqo,Lady Gaga,2021-10-08 21:57:04+00:00,Million Reasons,Joanne,3.397517
...,...,...,...,...,...,...
12622,spotify:track:0ZNU020wNYvgW84iljPkPP,Taylor Swift,2020-12-01 07:29:11+00:00,mirrorball,folklore,5.232717
12623,spotify:track:14hJ5tc1VCFMWhVn9axRTC,Shawn Mendes,2021-06-25 16:36:45+00:00,Life Of The Party,Handwritten,0.023800
12624,spotify:track:6OKKfwf75kQ4veyLktYV7F,Selena Gomez,2020-04-23 08:31:18+00:00,Only You,13 Reasons Why,3.068000
12625,spotify:track:5YqltLsjdqFtvqE7Nrysvs,Taylor Swift,2021-12-27 09:00:36+00:00,We Are Never Ever Getting Back Together (Taylo...,Red (Taylor's Version),3.219100


In [24]:
# Step 5: check for duplicates rows
duplicate = streaming_data[streaming_data.duplicated()]



In [25]:
# deleting duplicate rows
streaming_data.drop_duplicates()



Unnamed: 0,track_url,artist_name,end_time,track_name,album_name,mins_played
0,spotify:track:5XAJzwa0B2Hf8Rb1q0rowN,Yerin Baek,2020-04-27 22:48:54+00:00,Popo (How deep is our love?),Every letter I sent you.,4.520250
1,spotify:track:7HJkl4z2MXokJoGtWPL2pz,Fireboy DML,2020-11-30 14:42:27+00:00,Go Away,APOLLO,3.102800
2,spotify:track:2o2sgVJIgFXk8GQjWTgI6U,Taylor Swift,2020-12-14 10:47:30+00:00,long story short,evermore,3.598667
3,spotify:track:2YjKcFQ1vYFhQPgsmCOjss,Taylor Swift,2022-01-06 12:32:26+00:00,Haunted,Speak Now,4.034883
4,spotify:track:7dZ1Odmx9jWIweQSatnRqo,Lady Gaga,2021-10-08 21:57:04+00:00,Million Reasons,Joanne,3.397517
...,...,...,...,...,...,...
12622,spotify:track:0ZNU020wNYvgW84iljPkPP,Taylor Swift,2020-12-01 07:29:11+00:00,mirrorball,folklore,5.232717
12623,spotify:track:14hJ5tc1VCFMWhVn9axRTC,Shawn Mendes,2021-06-25 16:36:45+00:00,Life Of The Party,Handwritten,0.023800
12624,spotify:track:6OKKfwf75kQ4veyLktYV7F,Selena Gomez,2020-04-23 08:31:18+00:00,Only You,13 Reasons Why,3.068000
12625,spotify:track:5YqltLsjdqFtvqE7Nrysvs,Taylor Swift,2021-12-27 09:00:36+00:00,We Are Never Ever Getting Back Together (Taylo...,Red (Taylor's Version),3.219100


 It is possible that columns like artist_name, track_name and album_name are duplicated because i could listen to an artist, song or album multiple times. hence why i looked for duplicate rows across all of the columns of the DataFrame and dropped them

After data cleaning, there are 58,257 rows left. i did not import the 'shuffle' and 'skipped' column because there were not enough rowS available for analysis.

## Stage 3: Analysis

Here i analyze the data frame. The questions i aim to answer include:
1) Who are my top 15 artist
2) What are my top 10 most streamed songs
3) What are my to 10 most played albums
4) What date did i listen to music the most
5) WHat day of the week did i listen to music the most
6) What year did i listen to music the most?


In [26]:
# Question 1: who are my top 15 artists based on total streams

streaming_data.groupby(['artist_name'])['track_name'].count().sort_values(ascending= False)[:15]

artist_name
Taylor Swift      17650
One Direction      1558
Olivia Rodrigo     1132
Harry Styles       1099
BTS                1098
Camila Cabello     1007
Ed Sheeran         1000
Little Mix          814
ZAYN                780
Ariana Grande       684
Halsey              633
Billie Eilish       623
Lauv                580
Katy Perry          551
BLACKPINK           544
Name: track_name, dtype: int64

It appears that Taylor Swift is my most streamed artist with a total of 17,650 streams followed by One direction with 1,558 streams. They both are my favourite artists so this comes as no surprise.

In [28]:
# Question 1b : Who are my top 15 artists based of minutes streamed

mins_listened = streaming_data.groupby(by='artist_name')['mins_played'].sum().sort_values(ascending=False)[:10]

mins_listened

artist_name
Taylor Swift      56453.886700
One Direction      3739.169833
Olivia Rodrigo     3240.399017
Harry Styles       3027.331483
Camila Cabello     2583.564883
BTS                2431.850233
Little Mix         2000.582467
Ed Sheeran         1797.139783
ZAYN               1512.517217
BLACKPINK          1422.257800
Name: mins_played, dtype: float64

Once again, Taylor Swift is my most streamed artist. i spent 56, 453.88 minutes listening to her. that is roughly 941 hours.

In [29]:
#Question 2: What are my top 10 most streamed songs

top_tracks = streaming_data.groupby(['artist_name', 'track_name']).size().reset_index(name='count')
top_tracks = top_tracks.sort_values(by='count', ascending=False).head(10)

top_tracks

Unnamed: 0,artist_name,track_name,count
5779,Taylor Swift,willow,323
4418,Olivia Rodrigo,drivers license,274
5735,Taylor Swift,champagne problems,223
5511,Taylor Swift,Blank Space,196
5682,Taylor Swift,The Man,191
5732,Taylor Swift,cardigan,189
5727,Taylor Swift,august,183
5533,Taylor Swift,Cruel Summer,183
1110,Camila Cabello,Bam Bam (feat. Ed Sheeran),178
5608,Taylor Swift,Lover,178


In [30]:
#Question 3: What are my top 10 most played albums


top_albums = streaming_data.groupby(['artist_name', 'album_name']).size().reset_index(name='count')
top_albums = top_albums.sort_values(by='count', ascending=False).head(10)

top_albums

Unnamed: 0,artist_name,album_name,count
3294,Taylor Swift,Lover,2617
3320,Taylor Swift,evermore,2178
3321,Taylor Swift,folklore,1919
3274,Taylor Swift,1989,1686
3301,Taylor Swift,Red (Taylor's Version),1347
3303,Taylor Swift,Speak Now,1208
3300,Taylor Swift,Red,1105
3324,Taylor Swift,reputation,1018
3287,Taylor Swift,Fearless (Taylor's Version),877
2593,Olivia Rodrigo,SOUR,874


In [33]:
# Question 4: What date did i listen to music the most

streaming_data['day'] = [d.date() for d in streaming_data['end_time']]
streaming_data['time'] = [d.time() for d in streaming_data['end_time']]
streaming_data.drop('end_time', axis=1, inplace=True)
streaming_data.head()

KeyError: 'end_time'

In [38]:
day = streaming_data.groupby(by=['day'], as_index=False).sum().sort_values('mins_played', ascending =False)
day

  day = streaming_data.groupby(by=['day'], as_index=False).sum().sort_values('mins_played', ascending =False)


Unnamed: 0,day,mins_played,year
314,2021-04-09,1177.370150,701287
551,2022-01-09,843.932417,679392
621,2022-03-22,754.310800,485280
350,2021-05-21,706.880283,549712
387,2021-07-05,685.149817,491103
...,...,...,...
260,2021-02-05,0.077950,2021
628,2022-03-29,0.063633,2022
652,2022-04-23,0.016683,2022
682,2022-05-31,0.000000,2022


This shows that i streamed music the most on April 9th, 2021. This was the day Taylor Swift released her first rerecorded album  Fearless TV so that makes perfect sense. Though i am surprised it was not November 12th, 2021 when she released her second rerecorded album Red TV.

In [39]:
# Question 5: What day of the week do i listen to music the most

streaming_data['day_of_the_week'] = pd.DatetimeIndex(streaming_data['day']).day_name()

streaming_data.head()


Unnamed: 0,track_url,artist_name,track_name,album_name,mins_played,day,time,year,day_of_the_week
0,spotify:track:5XAJzwa0B2Hf8Rb1q0rowN,Yerin Baek,Popo (How deep is our love?),Every letter I sent you.,4.52025,2020-04-27,22:48:54,2020,Monday
1,spotify:track:7HJkl4z2MXokJoGtWPL2pz,Fireboy DML,Go Away,APOLLO,3.1028,2020-11-30,14:42:27,2020,Monday
2,spotify:track:2o2sgVJIgFXk8GQjWTgI6U,Taylor Swift,long story short,evermore,3.598667,2020-12-14,10:47:30,2020,Monday
3,spotify:track:2YjKcFQ1vYFhQPgsmCOjss,Taylor Swift,Haunted,Speak Now,4.034883,2022-01-06,12:32:26,2022,Thursday
4,spotify:track:7dZ1Odmx9jWIweQSatnRqo,Lady Gaga,Million Reasons,Joanne,3.397517,2021-10-08,21:57:04,2021,Friday


In [40]:
day = streaming_data.groupby(by=['day_of_the_week'], as_index=False).sum().sort_values('mins_played', ascending =False)
day

  day = streaming_data.groupby(by=['day_of_the_week'], as_index=False).sum().sort_values('mins_played', ascending =False)


Unnamed: 0,day_of_the_week,mins_played,year
0,Friday,24168.24325,19626906
4,Thursday,20967.165633,18056308
5,Tuesday,19991.888367,19035693
1,Monday,19506.20625,16212282
6,Wednesday,17331.83405,14760834
2,Saturday,17274.462117,15914705
3,Sunday,16720.6878,14832491


i listen to music more on fridays than i do on other week days. Friday is the day new music get released every week. So i can attribute this habit to te fact that i check out new music on fridays. Sunday appears to be the day i stream the least. i have no reason why that is.

In [35]:
# Question 6: What year did i listen to music the most?
streaming_data['year'] = pd.DatetimeIndex(streaming_data['day']).year

streaming_data.head()


Unnamed: 0,track_url,artist_name,track_name,album_name,mins_played,day,time,year
0,spotify:track:5XAJzwa0B2Hf8Rb1q0rowN,Yerin Baek,Popo (How deep is our love?),Every letter I sent you.,4.52025,2020-04-27,22:48:54,2020
1,spotify:track:7HJkl4z2MXokJoGtWPL2pz,Fireboy DML,Go Away,APOLLO,3.1028,2020-11-30,14:42:27,2020
2,spotify:track:2o2sgVJIgFXk8GQjWTgI6U,Taylor Swift,long story short,evermore,3.598667,2020-12-14,10:47:30,2020
3,spotify:track:2YjKcFQ1vYFhQPgsmCOjss,Taylor Swift,Haunted,Speak Now,4.034883,2022-01-06,12:32:26,2022
4,spotify:track:7dZ1Odmx9jWIweQSatnRqo,Lady Gaga,Million Reasons,Joanne,3.397517,2021-10-08,21:57:04,2021


In [37]:
years = streaming_data.groupby(by=['year'], as_index=False).sum().sort_values('mins_played', ascending =False)
years

  years = streaming_data.groupby(by=['year'], as_index=False).sum().sort_values('mins_played', ascending =False)


Unnamed: 0,year,mins_played
1,2021,58341.8512
0,2020,46531.2843
2,2022,31087.351967


I streamed music more in 2021 than i did in the other years. Though it is worth noting that the other years do not have complete data so this is not an accurate representation. 2020 is missing January data while 2022 is missing data from July 8th till year end.

## Data Visualization

The visualization for this project was done in Tableau. I visualized my top 5 artists, albums and songs, total number of songs I streamed how many minutes I spent streaming, my total streams and many more.
Before visualizing in Tableau, I added 3 extra columns to the table using Power Query in Excel. The columns were namely: Artist Image, Track Image and Album Image. These columns contained URL links to each artist, album and track image.
as I was using a function called Image Role in Tableau which requires me to have URLs that navigate to images not more than 200kb and some other requirements.

The dashboard is available [here](https://public.tableau.com/app/profile/quincy.oluwaji/viz/MySpotifyAnalysis4/Story1?publish=yes)

## Insights From My Visualisation

In [None]:
1) My data revealed that i stream music more  between 8am to 3pm. 9am -10am was the hour i streamed music the most.
2) 10th January 2022 - 16th January 2022 was the week i had streamed music the most.
3) I spent 135,960 minutes streaming which is 2266 hours of listening to music.
4) Based on total streams, BTS is my 5th most streamed artist but based on minutes listened, Camilla Cabello is my 5th most streamed artist.