Danielle Paes Barretto de Arruda Camara

**LAST VERSION: 02-06-19 (review)**

In this first notebook playlist's information for each category of Spotify is obtained.

**Input:**

* Spotify API credentials 
* Inform optionally as function input:

        country - An ISO 3166-1 alpha-2 country code (optional).
        locale - The desired language, consisting of an ISO 639 language code and an ISO 3166-1 alpha-2 
        country code, joined by an underscore (optional).


**Output:**

* list of categories in .txt (e.g. category_ids_list_country_NL_2019-04-28.txt)
* csv files with playlist_name and playlist_id for all categories (e.g. playlists_category_afro_NL_2019-04-28.csv)

The following notebook (02-retrieve_playlist_tracks_audio_features_Categories_070319.ipynb) will use the information obtained here to retrieve tracks for each playlist and their audio features.

# Libraries

In [1]:
import pandas as pd
import time
TodaysDate = time.strftime("%Y-%m-%d")

from tqdm import tqdm

import pickle

# Folders

In [2]:
# output_folder = "./data/playlists/"

output_folder = "D:/DATA_02062019/NEW_DATA/playlists/"

# Access to Spotify API 

For credentials : https://developer.spotify.com/dashboard/login


In [3]:
import spotipy 
from spotipy.oauth2 import SpotifyClientCredentials 
cid ="********************************" 
secret = "********************************" 

# maybe redirect uri will be required for some of the commands 
redirect_uri = 'http://127.0.0.1:5001/login/authorized'

# the user id of my account
username = 'your_spotify_user_name'

client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret) 
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

# Functions

## Step 1: Get list of categories

https://spotipy.readthedocs.io/en/latest/#spotipy.client.Spotify.categories

https://developer.spotify.com/documentation/web-api/reference/browse/get-list-categories/


category_playlists(category_id=None, country=None, limit=20, offset=0)

Parameters:

* country - An ISO 3166-1 alpha-2 country code.
* locale - The desired language, consisting of an ISO 639 language code and an ISO 3166-1 alpha-2 country code, joined by an underscore.
* limit - The maximum number of items to return. Default: 20. Minimum: 1. Maximum: 50
* offset - The index of the first item to return. Default: 0 (the first object). Use with limit to get the next set of items.


In [4]:
def get_list_categories(country=None, locale=None, limit=20):
    """ Generate a list of category_id.
    
    Input: 
        country - An ISO 3166-1 alpha-2 country code (optional).
        locale - The desired language, consisting of an ISO 639 language code and an ISO 3166-1 alpha-2 
        country code, joined by an underscore (optional).
        limit - The maximum number of items to return. Default: 20. Minimum: 1. Maximum: 50
            
    Output:
        list of category_id
        
    P.S.: offset - The index of the first item to return. Default: 0 (the first object). Use with limit 
        to get the next set of items.
        
    """
    
    category_ids_list = []
    max_nr_categories = 2000
    for i in range(0,max_nr_categories,50):
        categories = sp.categories(country, locale, limit, offset=i)
        for category in range(len(categories['categories']['items'])):
            category_ids_list.append(categories['categories']['items'][category]['id'])
            
    #save in a list
    
    if country == None:
        country = 'Non-specified-country'
    
    filename = "category_ids_list_country_"+country+"_"+TodaysDate +".txt"
       
    with open(output_folder+filename, "wb") as fp:   #Pickling
        pickle.dump(category_ids_list, fp)
    
    return category_ids_list    

## Step 2: Get category's Playlists 

https://spotipy.readthedocs.io/en/latest/#spotipy.client.Spotify.category_playlists

https://developer.spotify.com/documentation/web-api/reference/browse/get-categorys-playlists/

category_playlists(category_id=None, country=None, limit=20, offset=0)
 
 
Parameters:

* category_id - The Spotify category ID for the category.
* country - An ISO 3166-1 alpha-2 country code.
* limit - The maximum number of items to return. Default: 20. Minimum: 1. Maximum: 50
* offset - The index of the first item to return. Default: 0 (the first object). Use with limit to get the next set of items.

In [5]:
def get_playlists_per_category(category_id=None, country=None, limit=20):
    
    """ Retrieve information (id and name) of playlists belonging to a category_id
    
    Input: 
        category_id - The Spotify category ID for the category.
        country - An ISO 3166-1 alpha-2 country code (optional).
        limit - The maximum number of items to return. Default: 20. Minimum: 1. Maximum: 50
        offset - The index of the first item to return. Default: 0 (the first object). 
        Use with limit to get the next set of items. 
        
    Output:
        Dictionary with playlist_ids and playlist_names for category_id
        .csv file from a dataframe generated from the dictionary obtained 
        
    P.S.: A dataframe is build from this dictionary and save in disk in a .csv file
    """ 
    
    playlist_id_list = []
    playlist_name_list = []
    category_list = []
    max_nr_playlist = 2000
    for i in range(0,max_nr_playlist,50):
        playlists_per_category = sp.category_playlists(category_id, country, limit, offset=i)
        for playlist in range(len(playlists_per_category['playlists']['items'])):
            playlist_id_list.append(playlists_per_category['playlists']['items'][playlist]['id'])
            playlist_name_list.append(playlists_per_category['playlists']['items'][playlist]['name'])
            category_list.append(category_id)
        
    dic_playlists_category = {"category": category_list,
                              "playlist_id": playlist_id_list,
                               "playlist_name": playlist_name_list}
    
    df_playlists_category = pd.DataFrame(dic_playlists_category)
    
    
    # save dataframe in csv
    
    if country == None:
        country = 'Non-specified-country'
    
    filename = "playlists_category_"+category_id+"_"+country+"_"+TodaysDate +".csv"
    df_playlists_category.to_csv(output_folder+filename,index = False)
    
    
    return dic_playlists_category


## Get playlist information (id and name) for all categories

To obtain playlist_id and playlist_name for all categories it is only necessary to use the list of category_id obtained by using the function 'get_list_categories' and then call function 'get_playlists_per_category' for each category_id.

However, it is also necessary to use a little trick otherwise you can get the following error: 

ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))


For this a little of delay using 2 for-loops was inserted. 

In [6]:
def get_all_categories_playlists(country = None, locale = None, limit=20):
    """ Obtain all playlists of all categories for a certain country if specified 
    
    Input: 
        country - An ISO 3166-1 alpha-2 country code (optional).
        locale - The desired language, consisting of an ISO 639 language code and an ISO 3166-1 alpha-2 
        country code, joined by an underscore (optional).
        limit - The maximum number of items to return. Default: 20. Minimum: 1. Maximum: 50
    
    """
    
    
    # Get list of category
    list_category_ids = get_list_categories(country, locale, limit)
    
    # Generate a dataframe with playlists
    df_playlists = pd.DataFrame({'categories':list_category_ids})
    
    # Apply two for-loops to add a bit delay and avoid 'Connection aborted'
    
    for i in tqdm(range(0,len(df_playlists),3)):
        for j in range(3):
            try:
                get_playlists_per_category(category_id=df_playlists.iloc[i+j][0],country=country, limit=limit)
            except IndexError:
                pass
        
    if country == None:
        country = 'Non-specified-country'
        
    print("Playlists of {} categories for {} retrieved!".format(df_playlists.shape[0], country))
    
    

# Get all categories' playlists for NL

In [7]:
get_all_categories_playlists(country="NL",locale=None,limit=50)

100%|██████████████████████████████████████████████████████████████████████████████████| 13/13 [02:35<00:00, 11.94s/it]


Playlists of 37 categories for NL retrieved!


# Checking the data obtained

In [8]:
# opening list to check

with open(output_folder+ "category_ids_list_country_NL_2019-06-02.txt", "rb") as fp:   # Unpickling
    list_categories_NL = pickle.load(fp)


len(list_categories_NL)

37

In [9]:
list_categories_NL

['toplists',
 'mood',
 'pop',
 'hiphop',
 'dutch',
 'chill',
 'edm_dance',
 'indie_alt',
 'focus',
 'workout',
 'party',
 'soul',
 'sleep',
 'rock',
 'dinner',
 'rnb',
 'afro',
 'latin',
 'jazz',
 'roots',
 'country',
 'blues',
 'decades',
 'summer',
 'arab',
 'romance',
 'travel',
 'sessions',
 'metal',
 'classical',
 'reggae',
 'kpop',
 'desi',
 'punk',
 'funk',
 'gaming',
 'kids']

In [10]:
# Checking one of the csv for playlists categories

df_playlists_category_latin_NL = pd.read_csv(output_folder+"playlists_category_latin_NL_2019-06-02.csv")
df_playlists_category_latin_NL.head()

Unnamed: 0,category,playlist_id,playlist_name
0,latin,37i9dQZF1DWXHyhanaNMoy,La Vida Loca
1,latin,37i9dQZF1DX10zKzsJ2jva,¡Viva Latino!
2,latin,37i9dQZF1DWY7IeIP1cdjF,Baila Reggaeton
3,latin,37i9dQZF1DX7MTlMMRl0MD,Bachata Lovers
4,latin,37i9dQZF1DX4qKWGR9z0LI,Salsa Nation


In [11]:
df_playlists_category_latin_NL.info(null_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 3 columns):
category         100 non-null object
playlist_id      100 non-null object
playlist_name    100 non-null object
dtypes: object(3)
memory usage: 2.4+ KB


In [12]:
len(df_playlists_category_latin_NL.playlist_id.unique())

100

In [13]:
len(df_playlists_category_latin_NL.playlist_name.unique())

100