# Introduction
Spotify is one of the most popular streaming platforms in the world. They have an API for developers to utilise their huge database of music to build interesting applications and uncover insights into our listening habits.

This notebook focuses on how to extract Spotify data based on certain artist criteria using the Spotify API .

![](https://miro.medium.com/max/10342/1*m3xkO7nM64KwpmdWhXr_Vw.jpeg)
Spotify Technology S.A.  is a Swedish media services provider and music streaming service. It is legally domiciled in Luxembourg and is headquartered in Stockholm, Sweden.

Founded in 2006, the company's primary business is providing an audio streaming platform, the "Spotify" platform, that provides DRM-restricted music, videos and podcasts from record labels and media companies. As a freemium service, basic features are free with advertisements or automatic music videos, while additional features, such as offline listening and commercial-free listening, are offered via paid subscriptions.

Launched on October 2008, the Spotify platform provides access to over 50 million tracks.Users can browse by parameters such as artist, album, or genre, and can create, edit, and share playlists. Spotify is available in most of Europe and the Americas, Australia, New Zealand, and parts of Africa and Asia, and on most modern devices, including Windows, macOS, and Linux computers, and iOS, and Android smartphones and tablets.

#### So what do we need to get started ?????

* Spotify API permissions & credentials that could apply for here. For this log into the “dashboard” of "**developer.spotify.com**" and select “create client id” and follow the instructions. Spotify are not too strict on providing permissions so put anything you like when they ask for commercial application.

![](https://i.ytimg.com/vi/prayNyuN3w0/maxresdefault.jpg)

* Python module — spotipy — imported
Once you have all that, you can dive into Spotify’s API via Python.

In [4]:
!pip install spotipy

You should consider upgrading via the '/opt/conda/bin/python3.7 -m pip install --upgrade pip' command.[0m


#### *To access Spotify authorised data we need to create our credentials like clientid and client_secret as well as spotify object to access the API*

In [5]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials 
client_id = "0054a24f2fc643c69d56d020dd5f70be"
client_secret = "98b4a4b772ad4eca934a92ca60c246a0"
client_credentials_manager = SpotifyClientCredentials(client_id=client_id, client_secret=client_secret)
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager) 

#### Search for a particular artist to see what albums are available .In this case I have choosen **Jennifer Lopez** as the artist

In [6]:
name = ["Micheal Jackson","pitbull","Christina","Elvis Presley"]
result = sp.search(name) 
result['tracks']['items'][1]['artists']

[{'external_urls': {'spotify': 'https://open.spotify.com/artist/3fMbdgg4jU18AjLCKBhRSm'},
  'href': 'https://api.spotify.com/v1/artists/3fMbdgg4jU18AjLCKBhRSm',
  'id': '3fMbdgg4jU18AjLCKBhRSm',
  'name': 'Michael Jackson',
  'type': 'artist',
  'uri': 'spotify:artist:3fMbdgg4jU18AjLCKBhRSm'}]

### Extract Spotify albums 
We will store Spotify URIs and album names in separate lists for reference that we can use later on during data analysis

In [7]:
#Extract Artist's uri
artists_uris = result['tracks']['items'][0]['artists'][0]['uri']
#Pull all of the artist's albums
artist_albums = sp.artist_albums(artists_uris, album_type='album')
#Store artist's albums' names' and uris in separate lists
artist_album_names = []
artist_album_uris = []
for i in range(len(artist_albums['items'])):
    artist_album_names.append(artist_albums['items'][i]['name'])
    artist_album_uris.append(artist_albums['items'][i]['uri'])
    
artist_album_names
artist_album_uris
#Keep names and uris in same order to keep track of duplicate albums

['spotify:album:2X8UOIkZQdcz2Hi5Ynt2uk',
 'spotify:album:7pomP86PUhoJpY3fsC0WDQ',
 'spotify:album:54WFfZzds26NrwcmStfamd',
 'spotify:album:0yYWeLqonFk11pmb0RiMbP',
 'spotify:album:52E4RP7XDzalpIrOgSTgiQ',
 'spotify:album:0RNsFWWdiz1rrdLI1pwbvJ',
 'spotify:album:6BYNnU3IYdmYCNHtvlFngH',
 'spotify:album:3OBhnTLrvkoEEETjFA3Qfk',
 'spotify:album:3Y4ZambRFbDzlnCtqEUrgY',
 'spotify:album:0oX4SealMgNXrvRDhqqOKg',
 'spotify:album:3Us57CjssWnHjTUIXBuIeH',
 'spotify:album:24TAupSNVWSAHL0R7n71vm',
 'spotify:album:2ANVost0y2y52ema1E9xAZ',
 'spotify:album:5Ne1SPpiXQqpCZ6pZCRfIM',
 'spotify:album:1C2h7mLntPSeVYciMRTF4a',
 'spotify:album:2ZytN2cY4Zjrr9ukb2rqTP',
 'spotify:album:3dZ91ooVpNEdukDWuNzbpZ',
 'spotify:album:6leeBdAohG7zhybC3Cmu8W',
 'spotify:album:3RNMz43pauYka09DdYHpHZ',
 'spotify:album:6o7W0hkHepZ2cDBF4gJsEg']

### **Extract all the songs from every album**
Next would be to loop through each album to extract key album track data.

In [8]:
def album_songs(uri):
    album = uri 
    spotify_albums[album] = {}
    #Create keys-values of empty lists inside nested dictionary for album
    spotify_albums[album]['album'] = [] 
    spotify_albums[album]['track_number'] = []
    spotify_albums[album]['id'] = []
    spotify_albums[album]['name'] = []
    spotify_albums[album]['uri'] = []
    #pull data on album tracks
    tracks = sp.album_tracks(album) 
    for n in range(len(tracks['items'])): 
        spotify_albums[album]['album'].append(artist_album_names[album_count]) 
        spotify_albums[album]['track_number'].append(tracks['items'][n]['track_number'])
        spotify_albums[album]['id'].append(tracks['items'][n]['id'])
        spotify_albums[album]['name'].append(tracks['items'][n]['name'])
        spotify_albums[album]['uri'].append(tracks['items'][n]['uri'])

Now we can apply the function to each album URI in the list to pull the track data and  create an empty dictionary called to store the spotify album data.

In [9]:
spotify_albums = {}
album_count = 0
for i in artist_album_uris: #each album
    album_songs(i)
    print(str(artist_album_names[album_count]) + " album songs has been added to spotify_albums dictionary")
    album_count+=1 #Updates album count once all tracks have been added

Scream album songs has been added to spotify_albums dictionary
XSCAPE album songs has been added to spotify_albums dictionary
XSCAPE - Track by Track Commentary album songs has been added to spotify_albums dictionary
Michael album songs has been added to spotify_albums dictionary
Invincible album songs has been added to spotify_albums dictionary
BLOOD ON THE DANCE FLOOR/ HIStory In The Mix album songs has been added to spotify_albums dictionary
BLOOD ON THE DANCE FLOOR/ HIStory In The Mix album songs has been added to spotify_albums dictionary
HIStory - PAST, PRESENT AND FUTURE - BOOK I album songs has been added to spotify_albums dictionary
HIStory - PAST, PRESENT AND FUTURE - BOOK I album songs has been added to spotify_albums dictionary
Dangerous album songs has been added to spotify_albums dictionary
Bad (Remastered) album songs has been added to spotify_albums dictionary
Bad 25th Anniversary album songs has been added to spotify_albums dictionary
Thriller album songs has been adde

### Extract audio features for each song

To store the audio features of each album track and append the data into lists representing all the music tracks for that album we add additional key-values.

In [10]:
def audio_features(album):
    #Add new key-values to store audio features
    spotify_albums[album]['acousticness'] = []
    spotify_albums[album]['danceability'] = []
    spotify_albums[album]['energy'] = []
    spotify_albums[album]['instrumentalness'] = []
    spotify_albums[album]['liveness'] = []
    spotify_albums[album]['loudness'] = []
    spotify_albums[album]['speechiness'] = []
    spotify_albums[album]['tempo'] = []
    spotify_albums[album]['valence'] = []
    spotify_albums[album]['popularity'] = []
    
    track_count = 0
    for track in spotify_albums[album]['uri']:
        #pull audio features per track
        features = sp.audio_features(track)
        
        #Append to relevant key-value
        spotify_albums[album]['acousticness'].append(features[0]['acousticness'])
        spotify_albums[album]['danceability'].append(features[0]['danceability'])
        spotify_albums[album]['energy'].append(features[0]['energy'])
        spotify_albums[album]['instrumentalness'].append(features[0]['instrumentalness'])
        spotify_albums[album]['liveness'].append(features[0]['liveness'])
        spotify_albums[album]['loudness'].append(features[0]['loudness'])
        spotify_albums[album]['speechiness'].append(features[0]['speechiness'])
        spotify_albums[album]['tempo'].append(features[0]['tempo'])
        spotify_albums[album]['valence'].append(features[0]['valence'])
        #popularity is stored elsewhere
        pop = sp.track(track)
        spotify_albums[album]['popularity'].append(pop['popularity'])
        track_count+=1

Now let loop through albums extracting the audio features.For this we will add a random delay every few albums to avoid sending too many requests at Spotify's API.

In [11]:
import time
import numpy as np
sleep_min = 2
sleep_max = 5
start_time = time.time()
request_count = 0
for i in spotify_albums:
    audio_features(i)
    request_count+=1
    if request_count % 5 == 0:
        print(str(request_count) + " playlists completed")
        time.sleep(np.random.uniform(sleep_min, sleep_max))
        print('Loop #: {}'.format(request_count))
        print('Elapsed Time: {} seconds'.format(time.time() - start_time))

5 playlists completed
Loop #: 5
Elapsed Time: 11.971925497055054 seconds
10 playlists completed
Loop #: 10
Elapsed Time: 24.387006282806396 seconds
15 playlists completed
Loop #: 15
Elapsed Time: 42.136138677597046 seconds
20 playlists completed
Loop #: 20
Elapsed Time: 50.553205490112305 seconds


Now its time to add the above spotify music data to a new dataframe but before we do so we need to first organise our data into a dictionary which can be converted into a dataframe.

In [12]:
dic_df = {}
dic_df['album'] = []
dic_df['track_number'] = []
dic_df['id'] = []
dic_df['name'] = []
dic_df['uri'] = []
dic_df['acousticness'] = []
dic_df['danceability'] = []
dic_df['energy'] = []
dic_df['instrumentalness'] = []
dic_df['liveness'] = []
dic_df['loudness'] = []
dic_df['speechiness'] = []
dic_df['tempo'] = []
dic_df['valence'] = []
dic_df['popularity'] = []
for album in spotify_albums: 
    for feature in spotify_albums[album]:
        dic_df[feature].extend(spotify_albums[album][feature])
        
len(dic_df['album'])

300

In [13]:
import pandas as pd
dataframe = pd.DataFrame.from_dict(dic_df)
dataframe

Unnamed: 0,album,track_number,id,name,uri,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence,popularity
0,Scream,1,2BU4dK1fu1CFLjTBz7ju7Q,This Place Hotel (a.k.a. Heartbreak Hotel),spotify:track:2BU4dK1fu1CFLjTBz7ju7Q,0.2430,0.731,0.851,0.000142,0.1170,-4.085,0.1030,97.005,0.582,57
1,Scream,2,7azo4rpSUh8nXgtonC6Pkq,Thriller,spotify:track:7azo4rpSUh8nXgtonC6Pkq,0.0816,0.764,0.887,0.000108,0.8470,-3.725,0.0738,118.421,0.721,68
2,Scream,3,0QgsvILuE9VQiAVsLStD3W,Blood on the Dance Floor,spotify:track:0QgsvILuE9VQiAVsLStD3W,0.0148,0.806,0.816,0.009780,0.0623,-2.767,0.0764,108.980,0.520,48
3,Scream,4,6A4Jc8npNo79BOgsrPptLA,Somebody's Watching Me - Single Version,spotify:track:6A4Jc8npNo79BOgsrPptLA,0.0409,0.767,0.712,0.000081,0.1320,-4.128,0.0324,124.091,0.804,69
4,Scream,5,4FC7GMnGwYeAjuvC83Srbs,Dirty Diana - 2012 Remaster,spotify:track:4FC7GMnGwYeAjuvC83Srbs,0.1780,0.734,0.801,0.000401,0.3240,-4.525,0.0437,131.424,0.331,63
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
295,Music & Me,6,4Lmu0By2w5pHiLkKN6Xr44,Doggin' Around,spotify:track:4Lmu0By2w5pHiLkKN6Xr44,0.2480,0.403,0.759,0.000246,0.0932,-10.920,0.0524,94.041,0.897,21
296,Music & Me,7,4tizeXJqI5BP4IU4xJJEJm,Johnny Raven,spotify:track:4tizeXJqI5BP4IU4xJJEJm,0.4260,0.670,0.698,0.000354,0.1180,-10.498,0.0295,111.809,0.877,20
297,Music & Me,8,0QGze7AKyvgFCRfABSj0WN,Euphoria,spotify:track:0QGze7AKyvgFCRfABSj0WN,0.1540,0.473,0.742,0.004140,0.1510,-11.202,0.1170,151.037,0.876,21
298,Music & Me,9,1xwdyB2h7UN7pQyITEm3nK,Morning Glow,spotify:track:1xwdyB2h7UN7pQyITEm3nK,0.2250,0.599,0.461,0.000352,0.1070,-11.723,0.0244,101.105,0.744,21


Spotify has a duplicate issue which can be addressed by removing all but the most popular songs.

In [14]:
print(len(dataframe))
final_df = dataframe.sort_values('popularity', ascending=False).drop_duplicates('name').sort_index()
print(len(final_df))

300
191


In [15]:
final_df.head()

Unnamed: 0,album,track_number,id,name,uri,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence,popularity
0,Scream,1,2BU4dK1fu1CFLjTBz7ju7Q,This Place Hotel (a.k.a. Heartbreak Hotel),spotify:track:2BU4dK1fu1CFLjTBz7ju7Q,0.243,0.731,0.851,0.000142,0.117,-4.085,0.103,97.005,0.582,57
1,Scream,2,7azo4rpSUh8nXgtonC6Pkq,Thriller,spotify:track:7azo4rpSUh8nXgtonC6Pkq,0.0816,0.764,0.887,0.000108,0.847,-3.725,0.0738,118.421,0.721,68
3,Scream,4,6A4Jc8npNo79BOgsrPptLA,Somebody's Watching Me - Single Version,spotify:track:6A4Jc8npNo79BOgsrPptLA,0.0409,0.767,0.712,8.1e-05,0.132,-4.128,0.0324,124.091,0.804,69
4,Scream,5,4FC7GMnGwYeAjuvC83Srbs,Dirty Diana - 2012 Remaster,spotify:track:4FC7GMnGwYeAjuvC83Srbs,0.178,0.734,0.801,0.000401,0.324,-4.525,0.0437,131.424,0.331,63
5,Scream,6,0XOuVAHsANAWw0LF0Yxkw5,Torture,spotify:track:0XOuVAHsANAWw0LF0Yxkw5,0.0179,0.754,0.836,3e-06,0.35,-4.703,0.0425,131.131,0.965,56


Finally let us save this dataframe into a CSV file

In [16]:
final_df.to_csv("spotify_music.csv")

### Hope you know by now how to fetch SPOTIFY Music data from Spotify using its API. 

### In the next kernel notebook below will perform more detailed exploratory data analysis and go deep dive understanding the data from a song/album popularity prospective

### https://www.kaggle.com/pavansanagapati/spotify-music-eda-part2-music-classification

# Please do leave your comments /suggestions if any and greatly appreciate to UPVOTE if you like this notebook.