# Lab | API wrappers - Create your collection of songs & audio features

**Instructions**

To move forward with the project, you need to create a collection of songs with their audio features - as large as possible!

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster. The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

### 1. Libary imports

In [60]:
import pandas as pd
import requests
import json
import time
import random
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

### 2. Authentification

In [61]:
# Initialize SpotiPy with user credentias

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id="af3a4e21d9974f798b0ddef081728f2b",
                                                           client_secret="99a65d20eff04d64bcf24b11824dffc4"))


### 3. Playlists fetching
### 3.1. Featured playlists

In [62]:
# get featured playlists from spotify 
# offset not worlking, maybe only 10 playlists availible atm
# inputs: number of playlists, outputs: list of playlist ids, df with playlist names and ids

def get_featured_playlists(n_playlists):
    playlist_names, playlist_ids, sublist_names, sublist_ids = [], [], [], []
    # fetch 10 playlists at a time via api
    for ofs in range(0,n_playlists,10):
        featured_pl_10 = sp.featured_playlists(locale='en_US', country='US', offset=ofs)
        
        # get all playlists of each fetch
        for i in range(len(featured_pl_10['playlists']['items'])):
            sublist_ids.append(featured_pl_10['playlists']['items'][i]['id'])
            sublist_names.append(featured_pl_10['playlists']['items'][i]['name'])
        
        # fill output lists
        playlist_ids += sublist_ids
        playlist_names += sublist_names
        time.sleep(2)
    
    # overview df
    featured_playlists_total = pd.DataFrame({'name': playlist_names, 'ids': playlist_ids})
    return playlist_ids, featured_playlists_total

In [63]:
featured_10_ids, featured_10 = get_featured_playlists(n_playlists=10)


In [64]:
featured_10_ids

['37i9dQZF1DX4JAvHpjipBk',
 '37i9dQZF1DX3LyU0mhfqgP',
 '37i9dQZF1DXcRXFNfZr7Tp',
 '37i9dQZF1DWUxHPh2rEiHr',
 '37i9dQZF1DWZryfp6NSvtz',
 '37i9dQZF1DX8CopunbDxgW',
 '37i9dQZF1DX0bUGQdz5BJG',
 '37i9dQZF1DWUzFXarNiofw',
 '37i9dQZF1DWT6SJaitNDax',
 '37i9dQZF1DWZoF06RIo9el']

### 3.2 User playlists (alternative)

In [70]:
# get playlists from spotify user
# offset not worlking, maybe only 10 playlists availible atm
# inputs: user id, optional: number of playlists, outputs: list of playlist ids, df with playlist names and ids

def get_user_playlists(user_id, n_playlists=-1):
    playlist_names, playlist_ids = [], []
    
    # fetch first 20 playlists 
    results = sp.user_playlists(user=user_id, limit=20)
    user_playlists = results['items']
    time.sleep(2)
    
    # fetch playlist 21 - end
    for ofs in range(20,n_playlists,20):
        results = sp.user_playlists(user=user_id, limit=20, offset=ofs)
        user_playlists += results['items']
        time.sleep(2)
        
    # option to limit the no of playlists retreived
    if n_playlists > 0:
        if n_playlists < results['total']:
            user_playlists = user_playlists[:n_playlists]
        
    # extract name and id
    for pl in user_playlists:
        playlist_ids.append(pl['id'])
        playlist_names.append(pl['name'])
        playlist_ids = playlist_ids[:n_playlists]
        playlist_names = playlist_names[:n_playlists]
        
    # overview df
    user_playlists_total = pd.DataFrame({'name': playlist_names, 'ids': playlist_ids})
    return playlist_ids, user_playlists_total

In [71]:
# get offiacal spotify playlists
# number of playlists, outputs: list of playlist ids, df with playlist names and ids

def get_spotify_playlists(n_playlists):
    spo_playlist_ids, spo_playlists_total = get_user_playlists(user_id='spotify', n_playlists=n_playlists)
    return spo_playlist_ids, spo_playlists_total

In [74]:
spotify_playlist_ids, spotify_playlists_total = get_spotify_playlists(n_playlists=30)

In [75]:
spotify_playlists_total

Unnamed: 0,name,ids
0,Today's Top Hits,37i9dQZF1DXcBWIGoYBM5M
1,RapCaviar,37i9dQZF1DX0XUsuxWHRQd
2,Hot Country,37i9dQZF1DX1lVhptIYRda
3,¡Viva Latino!,37i9dQZF1DX10zKzsJ2jva
4,New Music Friday,37i9dQZF1DX4JAvHpjipBk
5,Peaceful Piano,37i9dQZF1DX4sWSpwq3LiO
6,Are & Be,37i9dQZF1DX4SBhb3fqCJd
7,Rock Classics,37i9dQZF1DWXRqgorJj26U
8,mint,37i9dQZF1DX4dyzvuaRJ0n
9,Rock This,37i9dQZF1DXcF6B6QPhFDv


### 4. Artists fetching

In [49]:
# fetch all artists present in a playlist
# inputs: playlist id, outputs: list of artist ids

def get_artists_from_playlist(plst_id):
    # get first 100 tracks from playlist
    playlist = sp.playlist_tracks(playlist_id=plst_id)
    tracks = playlist['items']
    artist_ids = []

    # get track 101 - end from playlist
    for ofs in range(100,playlist['total'],100):
        playlist = sp.playlist_tracks(playlist_id=plst_id, offset=ofs)
        time.sleep(2)
        tracks += playlist['items']
    
    # extract artist ids
    for track in tracks:
        for a in track['track']['artists']:
            artist_ids.append(a['id'])
            
    return list(set(artist_ids))


In [50]:
artist_ids = []

for playlist_id in spotify_playlist_ids[11:15]:
    pl_artist_ids = get_artists_from_playlist(plst_id=playlist_id)
    artist_ids += pl_artist_ids


In [51]:
len(artist_ids)

468

### 5. Fetching Tracks from Artists

In [52]:
# pick random items from multiple lists of the same length, pick items at the same positions in each list
# inputs: list of lists, number of items, outputs: random list

def select_random_list_items(lists, n_items):
    if len(lists[0]) > n_items:
        new_lists = []
        items_selected = random.sample(range(0, len(lists[0])-1), n_items)
        for l in lists:
            list_selected = []
            for x in items_selected:
                list_selected.append(l[x])
            l = list_selected
            new_lists.append(l)
    return new_lists

In [53]:
# fetch albums from artists (ToDo: max. 20 per artist)
# inputs: artist id, outputs: list of albums

def get_albums_from_artists(art_id):
    # get first 20 albums from artist
    results = sp.artist_albums(artist_id=art_id, album_type='album', limit=20)
    albums = results['items']
    album_ids, album_names = [], []

    # get album 21 - end from artist
    for ofs in range(20,results['total'],20):
        results = sp.artist_albums(artist_id=art_id, album_type='album', limit=20, offset=ofs)
        albums += results['items']
        time.sleep(3)
    
    # extract album ids
    for idx, album in enumerate(albums):
        if albums[idx]['id'] not in album_ids:
            album_ids.append(album['id'])
            album_names.append(album['name'])
        
    return album_ids

In [58]:
# fetch tracks from artists albums
# inputs: album id, outputs: list of albums

def get_tracks_from_artists_albums(art_id, n_items=-1):
    # lists for tracks fetch and outputs
    tracks, track_ids, track_names, artists = [], [], [], []
    # fetch albums from artists
    artist_albums_list = get_albums_from_artists(art_id=art_id)
    
    for abm_id in artist_albums_list:
        # get first 50 tracks from album
        results = sp.album_tracks(album_id=abm_id, limit=50)
        tracks += results['items']
        time.sleep(2)
        
        # get track 51 - end from album
        for ofs in range(50,results['total'],50):
            results = sp.album_tracks(album_id=abm_id, limit=50, offset=ofs)
            tracks += results['items']
            time.sleep(2)

    # extract track info
    if len(artist_albums_list) > 0:
        for idx, track in enumerate(tracks):
            if tracks[idx]['name'] not in track_names:
                # track id
                track_ids.append(track['id'])
                # track name
                track_names.append(track['name'])
                # track artists
                artist_name = ''
                for a in track['artists']:
                    artist_name += (a['name'] + ', ')
                artist_name.strip().strip(',')
                artists.append(artist_name)
    
    # option to select random number of tracks from each artist
    if n_items > 0:
        if n_items < len(track_ids):
            track_info_lists = [track_ids, track_names, artists]
            track_info_lists_selected = select_random_list_items(lists=track_info_lists, n_items=n_items)
            track_ids = track_info_lists_selected[0]
            track_names = track_info_lists_selected[1]
            artists = track_info_lists_selected[2]
            
    # df overview
    tracklist_overview = pd.DataFrame({'artists': artists, 'track_name': track_names, 'track_id': track_ids})
        
    return track_ids, tracklist_overview

In [76]:
track_ids_10_20, track_names_10_20, artists_10_20 = [], [], []
many_tracks_overview_10_20 = pd.DataFrame({'artists': artists_10_20, 'track_name': track_names_10_20, 'track_id': track_ids_10_20})

for arst_id in artist_ids[50:70]:
    track_ids, many_tracks_overview = get_tracks_from_artists_albums(art_id=arst_id, n_items=100)
    track_ids_10_20 += track_ids
    many_tracks_overview_10_20 = many_tracks_overview_10_20.append(many_tracks_overview)
    
many_tracks_overview_10_20 = many_tracks_overview_10_20.reset_index(drop=True)

In [56]:
len(track_ids_10_20)

577

In [57]:
len(many_tracks_overview_10_20)#.head()

577