# importSpotifyData

 - This notebook is built upon the an eaieler project where I pulled simmilar data from the Spotify Web API: [getSpotifyData.ipynb](https://github.com/sreidy/DATA601_HW2/blob/main/getSpotifyData.ipynb)

## Table Of Contents 

### Importing Packages

In [33]:
import spotipy
import pandas as pd
from spotipy.oauth2 import SpotifyClientCredentials


import getpass
import pickle
import json

### Seting up Spotify Web API Credentials & Using Spotipy

 - We will need the following:
     + [A spotify Account](https://developer.spotify.com/)
     + Create an App in the [Spotify Developer Dashboard](https://developer.spotify.com/dashboard/applications) 
     + Within the newly create Spotify App
         + Spotify Client ID 
         + Spotify Client Secret 

In [2]:
CLIENT_ID = getpass.getpass(prompt = "your-spotify-client-id")

your-spotify-client-id········


In [3]:
CLIENT_SECRET = getpass.getpass(prompt = "your-spotify-client-secret")

your-spotify-client-secret········


##### Get Oauth Token

In [74]:
import warnings;
warnings.filterwarnings('ignore');

token = SpotifyClientCredentials(client_id=CLIENT_ID, client_secret=CLIENT_SECRET)
cache_token = token.get_access_token()
#save token as sp, will be used in future API calls
sp = spotipy.Spotify(cache_token)

##### Test Spotify Web API Token

In [5]:
# Enter artist name 
artist_name = input("Name of Artist to Search: ")

Name of Artist to Search: My Bloody Valentine


In [6]:
#Search for an Artists and print out 20 popular tracks 
results = sp.search(q=artist_name, limit=20)
for i, t in enumerate(results['tracks']['items']):
    print(' ', i, t['name'])

  0 My Bloody Valentine
  1 When You Sleep
  2 Only Shallow
  3 Sometimes
  4 Soon
  5 To Here Knows When
  6 Loomer
  7 I Only Said
  8 Only Shallow
  9 Come in Alone
  10 Touched
  11 What You Want
  12 Blown a Wish
  13 Lose My Breath
  14 My Bloody Valentine
  15 Off Your Face
  16 Soft as Snow (But Warm Inside)
  17 Cupid Come
  18 Honey Power (EP Version)
  19 (When You Wake) You're Still in a Dream


### Geting Audio Feature Data 

 - Expanding upon the code I wrote for an [ealier project](https://github.com/sreidy/DATA601_HW2/blob/main/getSpotifyData.ipynb) 
 - Spotify handles genre by providing a list of genres that a particualr artists is associated with 
     + from the spotify documentaiton 
     <blockquote>
        ... list of the genres the artist is associated with. For example: "Prog Rock" , "Post-Grunge". (If not yet classified, the array is empty.)
</blockquote> 
 - We will need to find a way to manualy group simmilar genres and also use a the Artist_id of a track to pull and data about genre 

##### Audio Features 

The Spotify Web Api provided the following Audio Features for each track [more info here](https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/)

- track_id : a spotify primary key; unique for each track
- artist: name of artist
- album: name of album
- trackName: title of track
- acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
- danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. 
- duration_ms: The duration of the track in milliseconds.
- energy: a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. 
- instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context.
- key: The estimated overall key of the track. Integers map to pitches using standard [Pitch Class notation](https://en.wikipedia.org/wiki/Pitch_class) 
- liveness: Detects the presence of an audience in the recording.
- loudness: The overall loudness of a track in decibels (dB). 
- mode: indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.
- speechiness: Speechiness detects the presence of spoken words in a track.
- tempo: BPM of track
- time_signature: An estimated overall time signature of a track. 
- valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. 
- category_id: The Spotify Category ID of the track 


In [34]:
###### Example 
# a spotify playlist ID 
# Playlist ID of any playlist can be found by looking at the spotify URL string 
playlistID_1 = "37i9dQZF1DX4jsulumEbDn"

# Get All tracks in the playlist 
tracks = sp.user_playlist_tracks("spotify",playlistID_1)['items']

#get the first song in the playlist
song = tracks[0]

#get song meta data 
#spotify track_id (primary key)
track_id = song['track']['id']
track_name = song['track']['name']
# For tracks where there is more than one listed artist, we will take the first listed artist 
artist = song["track"]["album"]["artists"][0]["name"]
artist_id = song['track']['artists'][0]['id']

# Get list of genres from the artist 
genre_list = sp.artist(artist_id)['genres']


print(track_name + " : " + artist )
print(genre_list)


All My Friends : LCD Soundsystem
['alternative dance', 'alternative rock', 'dance-punk', 'electronic rock', 'electronica', 'indie pop', 'indie rock', 'indietronica', 'modern rock', 'new rave', 'rock']


In [13]:
sp.audio_features(track_id)

[{'danceability': 0.701,
  'energy': 0.788,
  'key': 2,
  'loudness': -5.611,
  'mode': 1,
  'speechiness': 0.0425,
  'acousticness': 0.148,
  'instrumentalness': 0.625,
  'liveness': 0.0977,
  'valence': 0.795,
  'tempo': 142.584,
  'type': 'audio_features',
  'id': '2Ud3deeqLAG988pfW0Kwcl',
  'uri': 'spotify:track:2Ud3deeqLAG988pfW0Kwcl',
  'track_href': 'https://api.spotify.com/v1/tracks/2Ud3deeqLAG988pfW0Kwcl',
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/2Ud3deeqLAG988pfW0Kwcl',
  'duration_ms': 462267,
  'time_signature': 4}]

In [61]:
#query_spotify_playlist: Creates a dataframe of songs and assorted metadata from a playlist.

# Parameters 
# :creator: (string) username who authored playlist 
# :playlist_id: (string) a unique primary key for the playlist (found from playlist URL extention)
# :track_feature_list: (array) a list of strings of different features to be queried in dataframe 
#                      learn more about track feautres here
#                     https://developer.spotify.com/documentation/web-api/reference/tracks/get-several-audio-features/
# :metadata_list: (array) [Default] a list of mettadata to be queried, same for all tracks

# Returns 
# :playlist_df: a pandas dataframe of the playlist 
 
def query_spotify_playlist(creator, playlist_id,track_feature_list,
                           metadata_list = ["track_id","artist","artist_id","album","trackName"]):
    
    colNames = metadata_list + track_feature_list
    
    #set DataFrame for the playlist 
    playlist_df = pd.DataFrame(columns = colNames)
    
    #pull playlist data from Spotify Web API
    playlist_raw_dict = sp.user_playlist_tracks(creator,playlist_id)
    playlist = playlist_raw_dict['items']
    check_if_next = playlist_raw_dict['next']
    total_len = playlist_raw_dict['total']
    
    #Handling playlist of over 100 songs, spotify web api limits to 100 tracks at a time
    if check_if_next != None:
        spotify_offset = 100
        while check_if_next != None:
            next_playlist_chunk = sp.user_playlist_tracks(creator,playlist_id,offset = spotify_offset)
            playlist.extend(next_playlist_chunk['items'])
            check_if_next = next_playlist_chunk['next']
            spotify_offset += 100
            
    print(len(playlist))
    
    #loop over all items in a playlist 
    for i in playlist:
        playlist_elem = {}
        
        # Metadata (hardcoded locations of mettadata)
        playlist_elem["track_id"] = i["track"]["id"]
        playlist_elem["artist"] = i["track"]["album"]["artists"][0]["name"]
        playlist_elem["artist_id"] = i["track"]["album"]["artists"][0]["id"]
        playlist_elem["album"] = i["track"]["album"]["name"]
        playlist_elem["trackName"] = i["track"]["name"]
        
        # Artist Genre list, 
        track_genre_list = sp.artist(i["track"]["album"]["artists"][0]["id"])['genres']
        playlist_elem['genre'] = str(track_genre_list)
        
        # Track Audio Features
        audio_features = sp.audio_features(playlist_elem["track_id"])[0]
        for feature in track_feature_list:
            playlist_elem[feature] = audio_features[feature]
            
        #print(playlist_elem) 
        # add track data to playlist dataframe     
        track_df = pd.DataFrame(playlist_elem, index = [0])
        playlist_df = pd.concat([playlist_df, track_df], ignore_index = True)
        
    return playlist_df 

In [62]:
playlistID_1 = "37i9dQZF1DX4jsulumEbDn"
audioFeatureList = ["acousticness","danceability","duration_ms","energy","instrumentalness","key","liveness","loudness","mode","speechiness","tempo","time_signature","valence"]

playlist1_df = query_spotify_playlist("spotiy",playlistID_1, audioFeatureList)
playlist1_df.head()

23


Unnamed: 0,track_id,artist,artist_id,album,trackName,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,genre
0,2Ud3deeqLAG988pfW0Kwcl,LCD Soundsystem,066X20Nz7iquqkkCW6Jxy6,Sound of Silver,All My Friends,0.148,0.701,462267,0.788,0.625,2,0.0977,-5.611,1,0.0425,142.584,4,0.795,"['alternative dance', 'alternative rock', 'dan..."
1,2cmRpmO04TLaKPzmAzySYZ,LCD Soundsystem,066X20Nz7iquqkkCW6Jxy6,This Is Happening,Dance Yrself Clean,0.00557,0.739,536471,0.611,0.725,11,0.04,-9.829,1,0.0622,98.004,4,0.794,"['alternative dance', 'alternative rock', 'dan..."
2,53PkA8aXiwH4ppa0V0iO7o,LCD Soundsystem,066X20Nz7iquqkkCW6Jxy6,american dream,oh baby,0.0192,0.58,349693,0.622,0.653,5,0.648,-12.005,1,0.0352,169.442,4,0.781,"['alternative dance', 'alternative rock', 'dan..."
3,2VGDntFPvgvqSiUf9ITEfW,LCD Soundsystem,066X20Nz7iquqkkCW6Jxy6,Sound of Silver,Someone Great,0.0418,0.722,390013,0.894,0.00374,7,0.11,-5.903,1,0.0325,113.003,4,0.396,"['alternative dance', 'alternative rock', 'dan..."
4,1XlDNpWy8dyEljyRd0RC2J,LCD Soundsystem,066X20Nz7iquqkkCW6Jxy6,LCD Soundsystem,Losing My Edge,0.000208,0.769,473067,0.959,0.631,5,0.331,-8.717,0,0.0634,116.168,4,0.893,"['alternative dance', 'alternative rock', 'dan..."


In [36]:
#geting list of genres back from data frame
eval(playlist1_df['genre'][0])

['alternative dance',
 'alternative rock',
 'dance-punk',
 'electronic rock',
 'electronica',
 'indie pop',
 'indie rock',
 'indietronica',
 'modern rock',
 'new rave',
 'rock']

- Idea for Future project: Use a differnt API to pull song lyrics

In [64]:
playlist_of_everything = "69fEt9DN5r4JQATi52sRtq"


In [66]:
anti_folk_playlist = "1SXOzFKlMx7ODmxI1jnxux"

playlist_df = query_spotify_playlist("thesoundsofspotify",anti_folk_playlist,audioFeatureList)

299


In [67]:
playlist_df

Unnamed: 0,track_id,artist,artist_id,album,trackName,acousticness,danceability,duration_ms,energy,instrumentalness,key,liveness,loudness,mode,speechiness,tempo,time_signature,valence,genre
0,6UGpcXcENaUqQKPc6oqOe4,The Magnetic Fields,6RWjTQqILL7a1tQ0VapyLK,69 Love Songs,The Book of Love,0.978000,0.484,162413,0.262,0.000060,7,0.1170,-13.286,1,0.0263,98.516,4,0.332,"['alternative rock', 'anti-folk', 'art pop', '..."
1,2sg3oUyWN2M6MmIXfGC9k3,Jeffrey Lewis,2Icsf5D1lAs2EFx6kggg5D,The Last Time I Did Acid I Went Insane,The Last Time I Did Acid I Went Insane,0.874000,0.403,307293,0.299,0.735000,4,0.0837,-14.448,0,0.0598,166.102,4,0.114,"['anti-folk', 'folk punk', 'lo-fi']"
2,2pKi1lRvXNASy7ybeQIDTy,The Moldy Peaches,7GGv4mV9JTJK9e7nIzUliU,The Moldy Peaches,Anyone Else But You,0.923000,0.521,180093,0.237,0.000000,7,0.1440,-12.535,1,0.0510,145.856,4,0.580,"['anti-folk', 'lo-fi']"
3,4tUOIdAbmMCYnrxSVbyc9V,Kimya Dawson,5PPCkoOKabpGGhqrUwSikz,Remember That I Love You,Tire Swing,0.840000,0.513,188280,0.222,0.000054,5,0.1210,-13.239,1,0.0787,165.885,3,0.393,"['anti-folk', 'folk punk', 'olympia wa indie']"
4,2M1Qc1mGSI1IYtmJzQtfPq,The Mountain Goats,3hyGGjxu73JuzBa757H6R5,The Sunset Tree,This Year,0.118000,0.699,232893,0.719,0.002590,9,0.0913,-9.481,1,0.0312,138.958,4,0.831,"['alternative rock', 'anti-folk', 'chamber pop..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
294,4VbOjDapnSHrT1kbtTnIaW,"Defiance, Ohio",275dFoyS7o1g1O4rFgy3Ix,The Calling - EP,I'm Against the Government,0.000326,0.391,153995,0.914,0.079900,4,0.2230,-2.529,1,0.0586,102.695,4,0.887,"['acoustic punk', 'anti-folk', 'columbus ohio ..."
295,46KacVTVWMkyXS0jf7TwOX,Harley Poe,6lqmbDjxe5rhvXIyfNl23o,"Satan, Sex and No Regrets",Transvestities Can Be Cannibals Too,0.151000,0.499,276973,0.614,0.000004,7,0.0555,-8.175,1,0.0621,89.038,4,0.605,"['acoustic punk', 'anti-folk', 'folk punk', 's..."
296,3pn6wBx1BT0auNXvwxLYmQ,Amanda Palmer,726Dh6A5VyDfAAQxilT6A0,Ukulele Anthem - Single,Ukulele Anthem,0.896000,0.629,333084,0.434,0.000000,11,0.1060,-7.455,1,0.0699,86.668,4,0.583,"['anti-folk', 'dark cabaret', 'piano rock', 's..."
297,6tIncEukhkHYONzISltkLC,Mirah,5AZHg5fZjuN4FzGUnaiQcj,You Think It's Like This But Really It's Like ...,Sweepstakes Prize,0.629000,0.497,289720,0.502,0.000057,9,0.3030,-9.509,1,0.0295,111.395,4,0.501,"['anti-folk', 'freak folk', 'olympia wa indie']"


### Simplifying Genre 

 - Spotify dose not curently have a large list of every genre defined on the service
 - For the sake of building a clasifer it would be best to simplify each track to one genre 
 - [Every Noise](http://everynoise.com/) visilizes all the differnt spotify genres. Wow there are alot

##### List of playlists for to pull

In [72]:
Genre_list = ['pop','dance_pop','rap','rock','hip_hop','country','edm','indie_pop','indie_rock','punk','metal','alternative_rock','jazz']
audioFeatureList = ["acousticness","danceability","duration_ms","energy","instrumentalness","key","liveness","loudness","mode","speechiness","tempo","time_signature","valence"]
Every_Noise_Playlists = {
    'The Sound of Pop' : ["The Sounds of Spotify","6gS3HhOiI17QNojjPuPzqc",'pop'],
    'The Sound of Dance Pop' : ["The Sounds of Spotify","2ZIRxkFuqNPMnlY7vL54uK",'dance_pop'],
    'The Sound of Rock' : ["The Sounds of Spotify","7dowgSWOmvdpwNkGFMUs6e",'rock'],
    'The Sound of Rap' : ["The Sounds of Spotify","6s5MoZzR70Qef7x4bVxDO1", 'rap'],
    'The Sound of Hip Hop' : ["The Sounds of Spotify","6MXkE0uYF4XwU4VTtyrpfP",'hip_hop'],
    'The Sound of Country' : ["The Sounds of Spotify","4mijVkpSXJziPiOrK7YX4M",'country'],
    'The Sound of EDM' : ["The Sounds of Spotify","3pDxuMpz94eDs7WFqudTbZ",'edm'],
    'The Sound of Indie Pop' : ["The Sounds of Spotify","1aYiM4zLmBuFq0Fg6NQb6a",'indie_folk'],
    'The Sound of Indie Rock' : ["The Sounds of Spotify","4XXr357Jej7eUBh7XPK8hb",'indie_rock'],
    'The Sound of Punk' : ["The Sounds of Spotify","17qQT0G3yFjOJ02wWZaNCw","punk"],
    'The Sound of Metal' : ["The Sounds of Spotify","3pBfUFu8MkyiCYyZe849Ks","metal"],
    'The Sound of Alternitive Rock' : ["The Sounds of Spotify","3dlw4x21qVajwZLPNHtS3u",'alternative_rock'],
    'The Sound of Jazz' : ["The Sounds of Spotify", "5EyFMotmvSfDAZ4hSdKrbx",'jazz']
}

In [76]:
#query_multiple_spotify_playlists: Takes a dictionary of playlists and returns a dictionary of pandas dataframes 

# Parameters 
# :playlists: (dict) Key -> (str) Playlist name: 
#                            [:creator: (str)username who authored playlist,
#                            playlist_id: (string) a unique primary key for the playlist (found from playlist URL extention)
#                            category_id: (string) The Spotify category ID (genera)

# :track_feature_list: (array) a list of strings of different features to be queried in dataframe 
#                      learn more about track feautres here
#                     https://developer.spotify.com/documentation/web-api/reference/tracks/get-several-audio-features/
# :metadata_list: (array) [Default] a list of mettadata to be queried, same for all tracks

# Returns 
# :playlist_dict: a dictionary of a pandas dataframe of the playlist
#                Key -> (str) Playlist name: [dataframe of tracks in playlist]


def query_multiple_spotify_playlists(playlists,track_feature_list,
                                    metadata_list = ["track_id","artist","artist_id","album","trackName"], echo = False):
    
    # Create playlist_dict to hold all output playlists 
    playlist_dict = {}
    
    #Loop over every playlist in playlists  
    for playlist in playlists.items():
        
        playlist_name = playlist[0];playlist_creator = playlist[1][0];playlist_id = playlist[1][1];playlist_category_id = playlist[1][2];
        
        # Print curent playlist being pulled from API
        if echo:
            print(playlist_name + " " + playlist_id + " " + playlist_category_id)
        
        # if API unable to find playlist data, skip playlist 
        try:
            #query track data from playlist, save as dataframe 
            playlist_df = query_spotify_playlist(playlist_creator, playlist_id,track_feature_list)
        
            #lable track data with spotify category ID (genera)
            playlist_df = playlist_df.assign(category_id = playlist_category_id )
        
            #add playlsit_df to playlist_dict
            playlist_dict[playlist_name] = playlist_df
            
        except:
            print("playlist Error: Unable to query " + playlist_name + " by " + playlist_creator)
            print(playlist_id)
            continue

        
    #check if all playlists dataframes added to dictionary     
    #assert len(playlists) == len(playlist_dict)
    #return dict of data_frames 
    return playlist_dict
    

In [75]:
all_playlists = query_multiple_spotify_playlists(Every_Noise_Playlists,audioFeatureList, echo = True)
spotify_df = pd.concat(all_playlists)

The Sound of Pop 6gS3HhOiI17QNojjPuPzqc pop
1094
The Sound of Dance Pop 2ZIRxkFuqNPMnlY7vL54uK dance_pop
1220
playlist Error: Unable to query The Sound of Dance Pop by The Sounds of Spotify
2ZIRxkFuqNPMnlY7vL54uK
The Sound of Rock 7dowgSWOmvdpwNkGFMUs6e rock
1208
playlist Error: Unable to query The Sound of Rock by The Sounds of Spotify
7dowgSWOmvdpwNkGFMUs6e
The Sound of Rap 6s5MoZzR70Qef7x4bVxDO1 rap
playlist Error: Unable to query The Sound of Rap by The Sounds of Spotify
6s5MoZzR70Qef7x4bVxDO1
The Sound of Hip Hop 6MXkE0uYF4XwU4VTtyrpfP hip_hop
playlist Error: Unable to query The Sound of Hip Hop by The Sounds of Spotify
6MXkE0uYF4XwU4VTtyrpfP
The Sound of Country 4mijVkpSXJziPiOrK7YX4M country
playlist Error: Unable to query The Sound of Country by The Sounds of Spotify
4mijVkpSXJziPiOrK7YX4M
The Sound of EDM 3pDxuMpz94eDs7WFqudTbZ edm
playlist Error: Unable to query The Sound of EDM by The Sounds of Spotify
3pDxuMpz94eDs7WFqudTbZ
The Sound of Indie Pop 1aYiM4zLmBuFq0Fg6NQb6a indi

AssertionError: 