All methods require user authorization. Register the app at your [dashboard](https://developer.spotify.com/dashboard) to get the credentials. When creating the app, select "Web API" and set `http://localhost:8888/callback` as "Redirect URI" (as suggested in a [video](https://youtu.be/3RGm4jALukM) in the Spotipy [documentation](https://spotipy.readthedocs.io)). Then, before running the notebook, set the environment variables of the credentials (client ID and client secret) in your terminal:

* `export SPOTIPY_CLIENT_ID=`...
* `export SPOTIPY_CLIENT_SECRET=`...

In [1]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

spotify = spotipy.Spotify(client_credentials_manager=SpotifyClientCredentials())

Tracks are obtained from playlists of different genres. You can get the ID of a playlist from its URL. Some of the genres are:

* [pop](https://open.spotify.com/genre/0JQ5DAqbMKFEC4WFtoNRpw)
* [blues](https://open.spotify.com/genre/0JQ5DAqbMKFQiK2EHwyjcU)
* [country](https://open.spotify.com/genre/0JQ5DAqbMKFKLfwjuJMoNC)
* [rock](https://open.spotify.com/genre/0JQ5DAqbMKFDXXwE9BDJAr)
* [classical](https://open.spotify.com/genre/0JQ5DAqbMKFPrEiAOxgac3)
* [reggae](https://open.spotify.com/genre/0JQ5DAqbMKFJKoGyUMo2hE)
* [disco](https://open.spotify.com/genre/0JQ5DAqbMKFwT0m6BbKvGP) (it was under Dance/Electronic)
* [hiphop](https://open.spotify.com/genre/0JQ5DAqbMKFQ00XGBls6ym)
* [jazz](https://open.spotify.com/genre/0JQ5DAqbMKFAJ5xb0fwo9m)
* [metal](https://open.spotify.com/genre/0JQ5DAqbMKFDkd668ypn6O)

You can search for more genres in [Spotify](https://open.spotify.com/search)

In [2]:
playlists = { #TODO re-choose them with some criteria
    'pop': ['37i9dQZF1DX6wfQutivYYr', '37i9dQZF1DX01NP73ErE8b'],
    'blues': ['37i9dQZF1DXd9rSDyQguIk', '37i9dQZF1DXcu3QLJudo4X'],
    'country': ['37i9dQZF1DWZBCPUIUs2iR', '37i9dQZF1DWTkxQvqMy4WW'],
    'rock': ['37i9dQZF1DWViUlcvfltyZ', '37i9dQZF1DWW9tK1GiTdMf'],
    'classical': ['37i9dQZF1DWWEJlAGA9gs0', '37i9dQZF1DWV0gynK7G6pD'],
    'reggae': ['37i9dQZF1DWSiyIBdVQrkk', '37i9dQZF1DXbSbnqxMTGx9'],
    'disco': ['37i9dQZF1DX3jWba5xiDhV', '37i9dQZF1DX1MUPbVKMgJE'],
    'hiphop': ['37i9dQZF1DX14EWeH2Pwf3', '37i9dQZF1DWYQKKZPHGppf'],
    'jazz': ['37i9dQZF1DX30D074EWuy7', '37i9dQZF1DX55mNLGdHona'],
    'metal': ['37i9dQZF1DWWOaP4H0w5b0', '37i9dQZF1DXe6bgV3TmZOL']}

In [3]:
from contextlib import suppress

#get features
features_ = []
for genre, playlists_ in playlists.items():
    for playlist in playlists_:
        tracks = []

        #get track ids, preview urls and genres
        for item in spotify.playlist_tracks(playlist)['items']:
            with suppress(TypeError): #skips missing tracks
                tracks.append({
                    'id': item['track']['id'],
                    'preview_url': item['track']['preview_url'],
                    'genre': genre})

        #get danceability values
        ids = [track['id'] for track in tracks]
        for track, features in zip(tracks, spotify.audio_features(ids)):
            with suppress(TypeError, AssertionError): #skips missing and wrong features
                assert track['id'] == features['id']
                track['danceability'] = features['danceability']

        features_.extend(tracks)
len(features_)

1505

In [4]:
import os
import pandas as pd

#save features
data_dir = 'data'
os.makedirs(data_dir, exist_ok=True)
features_path = os.path.join(data_dir, 'features.csv')
features = pd.DataFrame(features_).dropna().drop_duplicates('id', ignore_index=True)
features = features.sample(frac=1, random_state=42).to_csv(features_path, index=False)
features = pd.read_csv(features_path)
features.tail(2)

Unnamed: 0,id,preview_url,genre,danceability
956,35QTv0RV3hMqSDAYeWsQaf,https://p.scdn.co/mp3-preview/4194cd2c8720672b...,classical,0.324
957,4YAtqSH9gdzlNhVMUVmZMy,https://p.scdn.co/mp3-preview/9c0572e96bc7e099...,blues,0.248


In [5]:
from concurrent.futures import ThreadPoolExecutor
from urllib.request import urlretrieve

audio_dir = os.path.join(data_dir, 'audio')
os.makedirs(audio_dir, exist_ok=True)

def download(row):
    audio_path = os.path.join(audio_dir, f"{row['id']}.mp3")
    urlretrieve(row['preview_url'], audio_path)
    print('.', end='')

#download previews
with ThreadPoolExecutor() as executor:
    errors = executor.map(download, features[-2:].iloc) #TODO use all data after debugging
next(errors) #raises an error if any

..

In [6]:
from torch.utils.data import Dataset
from PIL import Image

class SpectrogramsDataset(Dataset):

    def __init__(self, spec_dir, features_path, target='genre', transform=None):
        self.spec_dir = spec_dir
        self.features = pd.read_csv(features_path)[-2:] #TODO use all features after debugging
        self.target = target
        self.transform = transform

    def __len__(self):
        return len(self.features)

    def __getitem__(self, i):
        row = self.features.iloc[i]
        spec_path = os.path.join(self.spec_dir, f"{row['id']}.png")
        spec = Image.open(spec_path) #if ResourceWarning see torchvision.datasets.ImageFolder
        spec = spec.convert('RGB') #as torchvision.datasets.ImageFolder
        if self.transform:
            spec = self.transform(spec)
        label = row[self.target] #TODO if target=='genre' return class index instead (create self.classes as in ImageFolder)
        return spec, label

# TODO

* spectrograms
* TODOs
* git LFS