# Song data download
This notebook will download and save song data based on your inputs. Please refer to each step on how to do so.
Based on the amount of songs you provide via playlists, this can be quite a lengthy process. You can skip this and just use the sample data provided. To do so, just go to the next notebook.

## 1. Config load and Spotify Authentication
The following cell will load your spotify and lastfm credentials that you have specified in the config.json file.
You will need:
- A spotify username
- A spotify client id and secret (https://developer.spotify.com/documentation/general/guides/app-settings/)
- A last.fm username and password
- A last.fm API key and secret (https://www.last.fm/api)

Once you fill out your config file, you can run the cell below. It will redirect you to google.com and ask you to paste the URL in the output.

In [None]:
import spotipy
import spotipy.util as util
import json
import pandas as pd
import numpy as np

with open('config.json') as json_file:
    data = json.load(json_file)

spotify_username = data['spotify_username']
client_id = data['client_id']
client_secret = data['client_secret']
lastfm_username = data['lastfm_username']
lastfm_password = data['lastfm_password']
LASTFM_API_KEY = data['LASTFM_API_KEY']
LASTFM_API_SECRET = data['LASTFM_API_SECRET']

scope = 'user-library-read'

token = util.prompt_for_user_token(spotify_username,scope,client_id,client_secret,redirect_uri='https://google.com/')
if token:
    sp = spotipy.Spotify(auth=token)
    print("login successful.")
else:
    print ("Can't get token for", spotify_username)

songs = []
song_ids = []

full_genres = [['soundtrack'],
                ['jazz'],
                ['classical'],
               ['metal'],
               ['indietronica','wave','synth'],
               ['downtempo','trip hop'],
               ['edm','electronica','idm','dubstep','techno'],
               ['house'],
               ['r&b','rnb','soul'],
               ['rock'],
               ['hip hop','rap','trap','hiphop'],
               ['pop']]

## 2. Import/Export functions
The following cell defines a few functions to save and load data. If you have a previous song list or dataframe, you can use these to import.
If starting from scratch, simply run the cell so that you can export your final dataframe later.

In [2]:
def saveSongList(filename):
    with open(filename, "w", encoding="utf8") as outfile:
        global songs
        json.dump(songs,outfile)

def loadSongList(filename):
    with open(filename, 'r') as myfile:
        global songs
        data=myfile.read().replace('\n', '')
        song_data  = json.loads(data)
        songs = song_data
        extractSongIds()
        
def extractSongIds():
        global song_ids
        song_ids = [row[1] for row in songs]
        
def saveDataFrame(filename):
    df.to_pickle(filename)

def loadDataFrame(filename):
    df = pd.read_pickle(filename)

## 3. Download song metadata from Spotify playlists
Run this cell with the array of playlists you want to use as training sets. For each playlist, you will need the username of the playlist owner and the unique ID given at the end of each playlist URI. In order to find this ID, open a playlist, select 'Share', and select copy URI. The ID will be the final segment of the copied URI.

In [None]:
playlists = [
['thesoundsofspotify', '2RmQ1WAONeEih0FWEK7CW5'],
# ['thesoundsofspotify', '5EyFMotmvSfDAZ4hSdKrbx'],
# ['thesoundsofspotify', '3HYK6ri0GkvRcM6GkKh0hJ'],
# ['thesoundsofspotify', '3pBfUFu8MkyiCYyZe849Ks'],
# ['thesoundsofspotify', '0yqVOsxA2U4P260ad60QuU'],
# ['thesoundsofspotify', '5zxj55xrB53As97uvt5x8x'],
# ['thesoundsofspotify', '3pDxuMpz94eDs7WFqudTbZ'],
# ['thesoundsofspotify', '6AzCASXpbvX5o3F8yaj1y0'],
# ['thesoundsofspotify', '1rLnwJimWCmjp3f0mEbnkY'],
# ['thesoundsofspotify', '7dowgSWOmvdpwNkGFMUs6e'],
# ['thesoundsofspotify', '6MXkE0uYF4XwU4VTtyrpfP'],
# ['thesoundsofspotify', '6gS3HhOiI17QNojjPuPzqc']
]

for j,playlist in enumerate(playlists): 
    playlist_data = sp.user_playlist(playlist[0],playlist[1])
    playlist_name = playlist_data['name']
    playlist_size = playlist_data['tracks']['total']

    print('*********************',playlist_name,'|',j, 'of ', len(playlists), ' playlists','*********************')
    
    for i in range(0,playlist_size,100):
        print(i,'out of',playlist_size)
        try:
            playlist_items = sp.user_playlist_tracks(playlist[0], playlist[1],offset=i)['items']
        except:
            print('Connection error')
            i = i-100
            continue
        for item in playlist_items:
            song = []
            if item is None:
                continue
            if item['track'] is None:
                continue
            song.append(item['track']['name'])
            song.append(item['track']['uri'])
            song.append(item['track']['artists'][0]['name'])
            song.append(item['track']['album']['name'])
            artist_id = item['track']['artists'][0]['uri']
            try:
                artist_info = sp.artist(artist_id)
            except:
                continue
            song.append([len(full_genres),'other'])
            matchFound = False
            for j,genres in enumerate(full_genres):
                for artist_genre in artist_info['genres']:
                    for genre in genres:
                        if artist_genre in genre:
                            song[-1] = [j,genres[0]]
                            matchFound = True
                            break;
                    if matchFound:
                        break;
                if matchFound:
                    break;
            if matchFound:
                songs.append(song)
print(len(songs))

ids = [row[1] for row in songs]

## 4. Download song genres from LastFM
With the help of LastFM, we can find the top tagged genres and see if they match any in our list of accepted genres. To do this, we will need to use the lastfm credentials provided in the config.json.

In [None]:
import pylast as pylast
import math
import re

password_hash = pylast.md5(lastfm_password)

network = pylast.LastFMNetwork(api_key=LASTFM_API_KEY, api_secret=LASTFM_API_SECRET,
                               username=lastfm_username, password_hash=password_hash)


for i in range(len(songs)):
    if i % math.floor(len(songs)/20) == 0:
        print('*********************',i, 'of ', len(songs), ' songs','*********************')
    song = songs[i]
    track = network.get_track(song[2],song[0])
    try:
        topItems = track.get_top_tags(limit=3)
    except:
        topItems = []
    matchFound = False
    for item in topItems:
        tag = re.findall(r"'(.*?)'", str(item))[0]
        for j,genres in enumerate(full_genres):
            for genre in genres:
                if genre.lower().replace('-',' ') in tag:
                    song[-1] = [j,genres[0]]
                    matchFound = True
                    break;
            if matchFound:
                break;
        if matchFound:
            break;

# saveData('song_list_with_lastfm_tags.txt')

ids = [row[1] for row in songs]

## 5. Construct the final dataframe of audio features
Run the cell below to build the data frame for each song and audio feature. This dataframe will be used in the next notebook. Don't forget to export with the step after!

In [None]:
audio_features= np.zeros((0,10))

raw_audio_features = []
print(len(songs))
print(len(ids))
for k in range(0,len(songs),100):
    raw_audio_features.extend(sp.audio_features(ids[k:k+100]))
    

for i,feature in enumerate(raw_audio_features):
    song_data = []
    song_data.append(feature['danceability'])
    song_data.append(feature['energy'])
    song_data.append(feature['loudness'])
    song_data.append(feature['speechiness'])
    song_data.append(feature['acousticness'])
    song_data.append(feature['instrumentalness'])
    song_data.append(feature['liveness'])
    song_data.append(feature['valence'])
    song_data.append(feature['tempo'])
    song_data.append(songs[i][-1][0])
    audio_features = np.append(audio_features,[song_data],axis=0)

column_names= ['danceability','energy','loudness','speechiness',
          'acousticness','instrumentalness','liveness','valence','tempo','genre']    

df = pd.DataFrame(data=audio_features[0:,0:],
                 columns=column_names)
print(df.size)
df[:10]

## 6. Export the dataframe and song list
Save the data frame and list of songs with the previously defined functions. You are ready to train!

In [None]:
saveSongList('song_list_final.txt')
saveDataFrame('dataframe.pkl')