# A Musical Interlude: Understanding my music tastes

- In this project, I will use a combination of my streaming history from Spotify and the Spotify API (using spotipy) to see what interesting things we can learn from it.

## Requirements
- Python (tested with 3.9 and 3.11)
- Spotipy (``pip install spotipy``) >= 2.21
- NumPy
- Pandas

## Initialisation

In [1]:
import os
import re
import json
import numpy as np
import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials


## Importing my streaming history
The core dataset for this project - the user's streaming history - can be downloaded from the Spotify Account/Privacy page ([here](https://www.spotify.com/uk/account/privacy/)). It'll take a while to arrive, but when it does you'll have a folder containing a whole host of json files. Of interest for this project are those named ``StreamingHistory[x].json`` - these contain:
- Artist Name - The name of the Artist
- Song Name - The name of the song
- endTime - Timestamp that the song finished playing at, ``YYYY-MM-DD HH:MM``
- msPlayed - Amount of the song that was played in milliseconds

There are multiple files, named from 0 through [n], split up to ease reading.

In and of itself this data is interesting, but it can be combined with spotify's catalog to allow for more advanced analysis. The Catalog contains a wide variety of datapoints about each song, including its BPM, dancability, valence and energy. We'll use these traits later to assess my listening habits.

Now, let's import all of these StreamingHistory files into a dataframe:

In [28]:
dataPath = "./data/2022/"
# has to be filtered to remove the .ipynb_checkpoint files that sometimes appear
fileList = list(filter(lambda path: "json" in path, os.listdir(dataPath)))
frames = []
for path in fileList:
    content = pd.read_json(dataPath + path)
    frames.append(content)
history = pd.concat(frames)
history
history.to_csv("./data/output/history.csv")

# This gives us a dataframe with ~17000 entries - i.e. a record of 17000 streams by this user, which should be quite enough for this project.

## Adding Details about each song
Let's use *spotipy* to get some extra info about each song. API requests are slow and expensive, so we want to reduce them to a minimum where possible. Quite a lot of the songs in the ``history`` dataframe are duplicate, so let's filter it down to remove any duplicates

In [20]:
unique = history.copy(deep=True)
unique.drop_duplicates(
    subset=['artistName', 'trackName'], keep="first", inplace=True)
unique

Unnamed: 0,endTime,artistName,trackName,msPlayed
0,2021-11-28 18:50,Professor Green,Read All About It,77971
1,2021-12-06 23:59,WALK THE MOON,Shut Up and Dance,119774
2,2021-12-07 04:49,Stephanie Hsu,A Guy That I'd Kinda Be Into,12068
4,2021-12-07 04:52,Maisie Peters,Boy,178013
5,2021-12-07 04:56,Maisie Peters,Feels Like This,223346
...,...,...,...,...
7079,2022-12-07 17:10,Johnny Hodges,The Last Time I Saw Paris,174960
7080,2022-12-07 17:11,Ella Fitzgerald,Let's Fall In Love,27365
7081,2022-12-07 17:18,Paul Desmond,The Way You Look Tonight - 2003 Remastered,439507
7082,2022-12-07 17:22,Hank Jones,My Wish,212587


We now have 4279 rows, which seems like about the right number of unique tracks  for this period.

Now, let's initialise spotipy

In [21]:
SPOTIPY_CLIENT_ID = "c87d94fc86754ae4bb478a50402a3254"
SPOTIPY_CLIENT_SECRET = "1739de3f5fe14e74b8e6c61eb1c9069c"

auth_manager = SpotifyClientCredentials(client_id= SPOTIPY_CLIENT_ID, client_secret=SPOTIPY_CLIENT_SECRET)
sp = spotipy.Spotify(auth_manager=auth_manager)

In [22]:
mp = sp.search("Maisie Peters", type="artist")
mp['artists']['items'][0]['uri']
trying = sp.search("Maisie Peters I'm Trying (Not Friends)", type="track", market="GB", limit=1)
trying['tracks']['items'][0]

{'album': {'album_type': 'album',
  'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/2RVvqRBon9NgaGXKfywDSs'},
    'href': 'https://api.spotify.com/v1/artists/2RVvqRBon9NgaGXKfywDSs',
    'id': '2RVvqRBon9NgaGXKfywDSs',
    'name': 'Maisie Peters',
    'type': 'artist',
    'uri': 'spotify:artist:2RVvqRBon9NgaGXKfywDSs'}],
  'external_urls': {'spotify': 'https://open.spotify.com/album/1X1EZB1hCoymZ9gU8JKv86'},
  'href': 'https://api.spotify.com/v1/albums/1X1EZB1hCoymZ9gU8JKv86',
  'id': '1X1EZB1hCoymZ9gU8JKv86',
  'images': [{'height': 640,
    'url': 'https://i.scdn.co/image/ab67616d0000b273084229044ca0f2f9f43584cc',
    'width': 640},
   {'height': 300,
    'url': 'https://i.scdn.co/image/ab67616d00001e02084229044ca0f2f9f43584cc',
    'width': 300},
   {'height': 64,
    'url': 'https://i.scdn.co/image/ab67616d00004851084229044ca0f2f9f43584cc',
    'width': 64}],
  'name': 'You Signed Up For This',
  'release_date': '2021-08-27',
  'release_date_precision': 

Let's test it

Looking in the Spotify app, we see that Maisie Peter's URI is spotify:artist:2RVvqRBon9NgaGXKfywDSs, which matches with what the API just gave us.

Doing this for a track returns lots of details about each song; for this project i've selected the following as 'interesting':
- release-date
- duration_ms
- explicit
- popularity
- uri
Now, let's go through the entire `unique` dataframe and add track details. the search can return multiple tracks, but let's assume whatever one comes back first is the correct one - we can test that later if we need to.

In [None]:
def getSongDetails(artist, track):
    searchRequest = f"{artist} {track}"
    try:
        results = sp.search(searchRequest, type="track", market="GB")
    except: 
        print(f"Failed to download this song: {searchRequest}")
            
    if 'error' in results:
        raise Exception(results['error'])
    items = results['tracks']['items']
    # print(artist, track)
    if len(items) != 1:
        results
    return {
        "name": items[0]['name'],
        "release_date" : items[0]['album']['release_date'],
        "duration_ms" : items[0]['duration_ms'],
        "explicit" : items[0]['explicit'],
        "popularity" : items[0]['popularity'],
        "uri" : items[0]['uri']
    }
    
getSongDetails("Maisie Peter", "You Signed Up For This")


This function appears to be working correctly, so let's deploy it with two rows of the dataframe

*This cell will give warnings as the test scenario is working on a slice; for these testing purposes that isn't an issue as we'll overwrite all of the findings in a future cell. I've hidden them to make this document tidier, but they can be enabled again by removing line 1 of block 9*

In [None]:
%%capture --no-display
detail_unique = unique.head(2)
detail_unique
detail_unique['details'] =  detail_unique.apply(lambda row: getSongDetails(row['artistName'], row['trackName']), axis = 1)
detail_unique[['name','release_date', 'duration_ms', 'explicit', 'popularity', 'uri']] = detail_unique['details'].apply(pd.Series)
detail_unique.drop("details",inplace=True, axis=1)
detail_unique

That did what we expected, so now let's do it to the whole unique frame, and save the result in a CSV file (to avoid having to re-call it regularly).


**This takes about ~10 mins to run, and depending on how spotify is feeling right now may crash after a few minutes - if that happens, either wait for the rate limit to reset (seems to be ~15 mins), or just load up the CSV file. I've included a demo of this working above**

In [27]:
print(len(unique))
unique.sort_values('artistName', inplace=True)
unique['details'] =  unique.apply(lambda row: getSongDetails(row['artistName'], row['trackName']), axis = 1)
unique[['name','release_date', 'duration_ms', 'explicit', 'popularity', 'uri']] = unique['details'].apply(pd.Series)
# unique.drop("details",inplace=True, axis=1)
unique.to_csv("./data/output/unique_with_details.csv")

4279
!llmind Take A Break - Interlude
"Weird Al" Yankovic Party In the CIA (Parody of "Party In The U.S.A." by Miley Cyrus)
"Weird Al" Yankovic The Hamilton Polka
'Come From Away' Band Screech Out
'Come From Away' Company In The Bar / Heave Away
'Come From Away' Company Lead Us Out Of The Night
'Come From Away' Company Darkness And Trees
'Come From Away' Company 38 Planes
'Come From Away' Company Blankets And Bedding
'Come From Away' Company Darkness And Trees - Reprise
'Come From Away' Company 10 Years Later
'Come From Away' Company Something's Missing
'Come From Away' Company On The Edge
'Come From Away' Company On The Bus
'Come From Away' Company Phoning Home
'In The Heights' Original Broadway Company Breathe
13 Original Broadway Cast 13 / Becoming a Man
2014 Golden Cove Superheroes (Instrumental Version)
5 Seconds of Summer Amnesia
5 Seconds of Summer Don't Stop
5 Seconds of Summer Youngblood
5 Seconds of Summer She Looks So Perfect
5 Seconds of Summer Me Myself & I
5 Seconds of Su

In case you can't be bothered to wait for this to happen, i've saved the output in a csv file in data/output. From here on in this project, we'll load it from there to avoid hitting the API too often. As such, I've put the actual analysis in another file, `index2`