---
title: "Spotify Favorites Analysis"
format: html
date: 2025-03-02
categories: [Python, Spotify]
image: spotify_logo.png
execute:
  warning: False
---

### Spotify Favorites Analysis

In [2]:
import pandas as pd

The following data frame includes spotify user data from the [2018 Spotify Million Dataset Challenge](https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge).

In [3]:
spotify = pd.read_csv('https://bcdanl.github.io/data/spotify_all.csv')

Out of all these brilliant artists, I'd like to highlight and analyze some of my favorites:

- Saint Motel
- Tame Impala
- Castlecomer
- CRX
- Two Door Cinema Club

First, let's find which songs from each of these artists are in the larger data frame and how many times each song appears:

In [4]:
artist_favorites = ['Saint Motel', 'Tame Impala', 'Castlecomer', 'CRX', 'Two Door Cinema Club']
spotify_favorites = spotify[spotify['artist_name'].isin(artist_favorites)]

# Song names for each artist
song_names_fav = spotify_favorites.drop_duplicates(subset='track_name', keep='first')[['artist_name', 'track_name']].sort_values('artist_name')
song_names_fav

Unnamed: 0,artist_name,track_name
101794,CRX,Slow Down
129597,Castlecomer,Fire Alarm
21124,Saint Motel,Cold Cold Man
132905,Saint Motel,Ace In The Hole - Live from Spotify San Francisco
1235,Saint Motel,Born Again
...,...,...
23358,Two Door Cinema Club,You're Not Stubborn
12902,Two Door Cinema Club,What You Know - Live
16457,Two Door Cinema Club,Undercover Martyn
68025,Two Door Cinema Club,"Eat That Up, Its Good For You"


In [5]:
# Number of songs each artist has in the larger spotify DataFrame
songs_count_fav = spotify_favorites.drop_duplicates(subset='track_name', keep='first').value_counts('artist_name').sort_values()
songs_count_fav

Unnamed: 0_level_0,count
artist_name,Unnamed: 1_level_1
CRX,1
Castlecomer,1
Saint Motel,11
Two Door Cinema Club,24
Tame Impala,40


In [6]:
# Times each individual track was listed
songs_listed_fav = spotify_favorites.value_counts(['artist_name', 'track_name']).sort_index()
songs_listed_fav

Unnamed: 0_level_0,Unnamed: 1_level_0,count
artist_name,track_name,Unnamed: 2_level_1
CRX,Slow Down,1
Castlecomer,Fire Alarm,1
Saint Motel,Ace In The Hole - Live from Spotify San Francisco,1
Saint Motel,Born Again,2
Saint Motel,Cold Cold Man,14
...,...,...
Two Door Cinema Club,This Is The Life,1
Two Door Cinema Club,Undercover Martyn,10
Two Door Cinema Club,What You Know,39
Two Door Cinema Club,What You Know - Live,1


Based on the data, let's see what the highest and lowest amounts for listed songs were

In [7]:
songs_listed_fav.nlargest(5, keep='all')

Unnamed: 0_level_0,Unnamed: 1_level_0,count
artist_name,track_name,Unnamed: 2_level_1
Two Door Cinema Club,What You Know,39
Tame Impala,The Less I Know The Better,30
Tame Impala,Feels Like We Only Go Backwards,20
Saint Motel,My Type,19
Two Door Cinema Club,Something Good Can Work,19


In [8]:
songs_listed_fav.nsmallest(5, keep='all')

Unnamed: 0_level_0,Unnamed: 1_level_0,count
artist_name,track_name,Unnamed: 2_level_1
CRX,Slow Down,1
Castlecomer,Fire Alarm,1
Saint Motel,Ace In The Hole - Live from Spotify San Francisco,1
Saint Motel,Daydream / Wetdream / Nightmare,1
Saint Motel,Local Long Distance Relationship (LA2NY),1
Saint Motel,Something About Us - Recorded at Spotify Studios NYC,1
Saint Motel,Sweet Talk,1
Saint Motel,You Can Be You,1
Tame Impala,'Cause I'm A Man - HAIM Remix,1
Tame Impala,Desire Be Desire Go,1


It seems like there are many instances where tracks are listed only once, which is a shame, but there are plenty of more popular tracks across these artists.

Of these, it seems like 'Two Door Cinema Club' and 'Tame Impala' are the two most popular artists of the bunch.

Going back to the original 'spotify_favorites' DataFrame, we can look at the longest song length as well:

In [9]:
spotify_favorites['duration_min'] = ((spotify_favorites['duration_ms'] / 1000) / 60)
spotify_favorites.sort_values('duration_min', ascending=False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  spotify_favorites['duration_min'] = ((spotify_favorites['duration_ms'] / 1000) / 60)


Unnamed: 0,pid,playlist_name,pos,artist_name,track_name,duration_ms,album_name,duration_min
64923,969,Marshall,33,Tame Impala,Let It Happen - Soulwax Remix,556924,Let It Happen,9.282067
142147,999121,.::March::.,36,Tame Impala,Let It Happen - Soulwax Remix,556924,Let It Happen,9.282067
165421,999491,Play this at my funeral,25,Tame Impala,Let It Happen,467585,Currents,7.793083
19178,303,Tame Impala,25,Tame Impala,Let It Happen,467585,Currents,7.793083
85204,1276,FIREFLY 2016,27,Tame Impala,Let It Happen,467585,Currents,7.793083
...,...,...,...,...,...,...,...,...
19184,303,Tame Impala,31,Tame Impala,Disciples,108546,Currents,1.809100
171295,999586,Fall 2017,1,Tame Impala,Disciples,108546,Currents,1.809100
101795,1521,yoga,17,Tame Impala,Disciples,108546,Currents,1.809100
57271,849,chill,63,Tame Impala,Nangs,107533,Currents,1.792217


Interestingly, both the top 5 and bottom 5 song durations came from 'Tame Impala' (most of which from the same album as well), ranging from 9.28 minutes to 1.79 minutes.