
 - This notebook creates a dataset of track name, ids and contributing artists
 - The dataset, a pandas dataframe is then saved in parquet.
 - Tracks are gathered from playlists of some prominent edm music labels present on spotify
  > ex : armada, enhanced, emergent, anjunabeats, silk music, spinnin records.
 - And some edm playlists curated by spotify.
 - The only exception to mainstream edm genres included here is chillhop/ lofi. 
 - We also filter playlists based on keywords. This is especially effective for spotify's playlists, since there are many that we are not interested in.
    
 - All tracks are collected in one run except for anjunabeats and silk music (reason shared later in a comment below)
 - These tracks are then added/concatenated to the main output dataframe


In [1]:
%load_ext autoreload

In [2]:
from cap_package import SpotipyCollectPub as scp
from dotenv import load_dotenv
import os
import pandas as pd
from pathlib import Path


In [3]:
from IPython.utils.text import columnize

%autoreload 2
#%aimport cap_package.SpotipyCollect

In [4]:
def disp_col(list_):
    
    # import -> from IPython.utils.text import columnize
    l = list(map(lambda x:repr(x)+',', list_))
    print(columnize(l, displaywidth=120))

In [5]:
disp_col(dir(scp))

'SpotifyClientCredentials',  '__file__',     '__spec__',              'get_tracks',      're',                 
'__builtins__',              '__loader__',   'filterby_keyword',      'get_tracks_df',   'spotipy',            
'__cached__',                '__name__',     'get_artist_name',       'json_normalize',  'spotipy_client_cred',
'__doc__',                   '__package__',  'get_public_playlists',  'pd',              'uri_to_id',          



In [6]:
load_dotenv()
SPOTIPY_CLIENT_ID = os.getenv('CLIENT_ID')
SPOTIPY_CLIENT_SECRET = os.getenv('CLIENT_SECRET')

In [7]:
sp = scp.spotipy_client_cred(client_id=SPOTIPY_CLIENT_ID, client_secret=SPOTIPY_CLIENT_SECRET)

In [8]:
keywords = ['Progressive', 'melodic', 'chillout', 'deep', 'lofi', 'lo-fi', 'house', 'trance', 'trap', 'drum', 'bass',\
           'electronic', 'techno', 'dubstep', 'releases','releases', 'released']
exclude = ['rock', 'country', 'classical', 'Orchestra', 'pop', 'reggae', 'indie', 'punk', 'metal', 'soul' ]

In [9]:
user_uri = ['spotify:user:vcdikomxn7xuhp0ci0pn31pn9', 'spotify:user:anjunadeep', 'spotify:user:spinninrecordsofficial',\
            'spotify:user:enhanced_music', 'spotify:user:emergentmusic', 'spotify:user:armadamusicofficial',\
            'spotify:user:spotify']

In [10]:
user_id = scp.uri_to_id(user_uri)
user_id

['vcdikomxn7xuhp0ci0pn31pn9',
 'anjunadeep',
 'spinninrecordsofficial',
 'enhanced_music',
 'emergentmusic',
 'armadamusicofficial',
 'spotify']

In [11]:
users_pl = scp.get_public_playlists(sp, user_id)

In [12]:
# Let's see what the above output looks like
print( 'There are a total of {} users'.format(len(users_pl)))
print('Some public playlists of a user : ')
users_pl[3]

There are a total of 7 users
Some public playlists of a user : 


[('4PzgMtDpYGAKUD6HUtFTkN',
  'Enhanced Latest Releases',
  'The most recent Enhanced releases all in one playlist, updated weekly. Related Playlists: <a href="https://open.spotify.com/user/enhanced_music/playlist/0lfuRuuTMfrTkzaDOLFlYM?si=NFOw2zjoRQeWcl182lECZw">Dance &amp; EDM 2020</a>.'),
 ('4sjsye4xyuVeYFQh0MJlMG',
  'Electronic Chill',
  'A selection of Ambient and Electronica, lose yourself in these sonic gems.'),
 ('0lfuRuuTMfrTkzaDOLFlYM',
  'Dance & EDM 2020',
  'THE hottest tracks in Dance Music. Featuring the latest from Martin Garrix, Deadmau5, Kaskade, Tritonal, Alesso and more.'),
 ('2AYDglM1LTpPSn5cSRvGDM',
  'Witness The Fitness ',
  'Workout with Witness The Fitness for the ultimate gym playlist, featuring Regard, Gorgon City, Mabel and more! '),
 ('1JM1GoxgcTkyiKLJUJIYiu',
  'Progressive Trance',
  'Progressive and Melodic Trance with Above &amp; Beyond, Farius, Tritonal, ALPHA 9, Cosmic Gate and more. <a href="https://show.co/osrz29G">Submit here</a>.'),
 ('3CfUZ5lcV

In [13]:
# Let's filter these playlists based on the keywords we are looking for
filtered = scp.filterby_keyword(keywords, exclude, users_pl)
print(' We can see some playlits have been filtered from the above list')
filtered[3]

 We can see some playlits have been filtered from the above list


[('4PzgMtDpYGAKUD6HUtFTkN',
  'Enhanced Latest Releases',
  'The most recent Enhanced releases all in one playlist, updated weekly. Related Playlists: <a href="https://open.spotify.com/user/enhanced_music/playlist/0lfuRuuTMfrTkzaDOLFlYM?si=NFOw2zjoRQeWcl182lECZw">Dance &amp; EDM 2020</a>.'),
 ('4sjsye4xyuVeYFQh0MJlMG',
  'Electronic Chill',
  'A selection of Ambient and Electronica, lose yourself in these sonic gems.'),
 ('1JM1GoxgcTkyiKLJUJIYiu',
  'Progressive Trance',
  'Progressive and Melodic Trance with Above &amp; Beyond, Farius, Tritonal, ALPHA 9, Cosmic Gate and more. <a href="https://show.co/osrz29G">Submit here</a>.'),
 ('3CfUZ5lcVkkimVJxYOgWvd',
  'Tritonal Selects',
  'A collection of our latest releases and current favorites. Enjoy - Chad &amp; Dave')]

In [14]:
# Let's see some names of filterted playlists
for p in filtered:
    #for x in p:
    disp_col([x[1] for x in p][:3])
    print('-'*100)

'Chill Beats Weekly 🦔 groove, relax, study (chill lofi jazzhop)',
'Chill Jazz Vibes 🎷 jazz beats to chillout (mellow jazzhop)',    
'Mellow Lofi 💙 melodic lofi hip hop beats to study to',          

----------------------------------------------------------------------------------------------------
'Anjunadeep New Releases (Deep House, Techno, Electronica)',  'Summer Chillout with Anjunadeep',
'Anjunadeep Recommends',                                    

----------------------------------------------------------------------------------------------------
'Spinnin’ Records Brand New',  "Progressive House - by Spinnin' Records",  "Deep House Hits - by Spinnin' Records",

----------------------------------------------------------------------------------------------------
'Enhanced Latest Releases',  'Electronic Chill',  'Progressive Trance',

----------------------------------------------------------------------------------------------------
'Emergent Music - New Releases',  'Emergent Musi

In [15]:
# extract playlist ids for input
pl_id = [x[0] for y in filtered for x in y]
print('There are a total of {} playlists'.format(len(pl_id)))

There are a total of 127 playlists


In [16]:
full_df = scp.get_tracks_df(sp, pl_id)

In [17]:
# last index doesn't show the total here, since we remove repeated tracks
print('So far from the above playlists we have collected : {} tracks'.format(len(full_df)))
full_df

So far from the above playlists we have collected : 16074 tracks


Unnamed: 0,name,id,artists_name
0,Between Waves,7BlRx52vRmpTUXYLRiNZsR,Fletcher Reed
1,Late Summer,1JkwkL02sWvXXrZBxxueEJ,"Chill Beats Music, Teo"
2,Peace of Mind,7m9DwYCtmES5QkZzELMuIp,"Chill Beats Music, Hanimo"
3,After Hours,51LsiDyQKJbH5CVLHpsowc,"Chill Beats Music, Board-Man"
4,Standing Still,3pGk6zfgPFMEdTXhvEdP7t,"Chill Beats Music, kevatta"
...,...,...,...
19792,Kafka,6QlAW39LcyCNTxqsYjRBki,Ercos Blanka
19793,Branches,6q5QgqXvW1lgIBSdSihcVN,Nora En Pure
19794,Whispers,0V8okvZJmTFqnbsW9L1v1N,Valante
19795,Setting Sail - Chill Mix,3Am7bRVPs1DPbTkYQb9giH,Eastern Odyssey


Let's include tracks from Silk music and Anjunabeats discography.
Note : Silk music is present on spotify as an artist rather than a user unlike
most other labels are, and there are no endpoints on the spotify webapi
which gives access to an artist's public playlists. 
Hence, we are getting URIs of these playlists manually.

In [18]:
uris = ['spotify:playlist:7mLu08jeo1UJRAPhcNo1NA', 'spotify:playlist:2WXcnnjBAVWh3pyPme2AzA',\
        'spotify:playlist:2Gv5OEPtSFXdVRIR9XcCYl', 'spotify:playlist:5YBoaqYu4nR1klcylFyPDO',\
        'spotify:playlist:0KOeFoxHzINtljRqURylxF', 'spotify:playlist:1CMTxvbJpzJ7aBSiyjqvkD']
id_list = scp.uri_to_id(uris)
df = scp.get_tracks_df(sp, id_list)

In [19]:
df.tail()

Unnamed: 0,name,id,artists_name
2898,Volume One - Anjuna Deep Mix,4qKcK7oKhB2oMF1Ny2T4Rr,Anjunabeats
2899,Volume One - Original Mix,1ZfyvweqqR3csr8K4ACjt4,Anjunabeats
2900,Volume One - Tease Dub,4GYyq7cPaTjyynNWpPkC9P,Anjunabeats
2901,Remember - Genix Remix,5pHHeT460Tb9HkTca1HY7v,"Gabriel & Dresden, Centre, Genix"
2902,Won't Sleep Tonight - Moody Dub Mix,3G2fWbQSseRglXGRhC99Jm,Super8 & Tab


In [20]:
# update full_df by adding tracks from anjunabeats disc and removing duplicates
full_df = pd.concat([full_df, df], ignore_index=True)

In [21]:
# index has reset
full_df.tail()

Unnamed: 0,name,id,artists_name
18615,Volume One - Anjuna Deep Mix,4qKcK7oKhB2oMF1Ny2T4Rr,Anjunabeats
18616,Volume One - Original Mix,1ZfyvweqqR3csr8K4ACjt4,Anjunabeats
18617,Volume One - Tease Dub,4GYyq7cPaTjyynNWpPkC9P,Anjunabeats
18618,Remember - Genix Remix,5pHHeT460Tb9HkTca1HY7v,"Gabriel & Dresden, Centre, Genix"
18619,Won't Sleep Tonight - Moody Dub Mix,3G2fWbQSseRglXGRhC99Jm,Super8 & Tab


In [22]:
subdf = full_df[full_df.duplicated(subset=['name', 'artists_name'], keep='first')]
full_df = full_df.drop(subdf.index)
print('The final number of total tracks is:', len(full_df))

The final number of total tracks is: 18423


In [23]:
# reset the index
#full_df.reset_index(drop=True)
full_df.tail()

Unnamed: 0,name,id,artists_name
18614,Volume One - Free State vs Dirt Devils Remix,03NSRR3kgp4oOKqGtTEjOZ,Anjunabeats
18615,Volume One - Anjuna Deep Mix,4qKcK7oKhB2oMF1Ny2T4Rr,Anjunabeats
18616,Volume One - Original Mix,1ZfyvweqqR3csr8K4ACjt4,Anjunabeats
18617,Volume One - Tease Dub,4GYyq7cPaTjyynNWpPkC9P,Anjunabeats
18619,Won't Sleep Tonight - Moody Dub Mix,3G2fWbQSseRglXGRhC99Jm,Super8 & Tab


Save the above dataframe in parquet to load later

In [24]:
p = Path.cwd().joinpath('Dataset')
p.mkdir(exist_ok=True)

In [25]:
if not Path(p.joinpath('musiclabels_tracks.parquet')).exists():
    full_df.to_parquet(p.joinpath('musiclabels_tracks.parquet'), engine='pyarrow')