# Getting song information from spotify

In [1]:
import pandas as pd
import time
import sys
sys.path.append('../../Source')
from source import *

There is already a list of songs from a previous notebook, so we use that to search Spotify.

In [2]:
df_songs = pd.read_fwf('../../Data/Raw/unique_songs.txt', header=None, names=['SongAndArtist'])
print(df_songs.describe())

                                            SongAndArtist
count                                              204470
unique                                             204428
top     Nightwish - All the Works of Nature Which Ador...
freq                                                    8


For every song name we have, we search spotify and get the top result. That might lead to duplicates, but they should be rare enough to not matter and we can easily filter them out using the spotify id.
<br>
We then save them into Data/Raw/unique_songs.txt

20 hours estimate for all 200k songs - first 1k in 6 mins; first 10k in 58mins
<br>
However, Spotify API limits this to 25k songs a day, in about 150 minutes.

In [3]:
access_token = get_token()
with open('../../Data/Raw/spotify_song_data_3.txt', "a+", encoding="utf-8") as f:
    for index, row in df_songs.iloc[50000:75000].iterrows():
        this_song = (index, row['SongAndArtist'])
        if index % 1000 == 0:
            print("Index: ", index)
        # The token expires in around an hour, so we need to get a new one every so often
        if index % 8000 == 0:
            access_token = get_token()
        try:
            track_info = get_tracks(access_token, row['SongAndArtist'], limit=1, offset=0)
            # print(track_info.json()['tracks']['items'])
            f.write(str(track_info.json()['tracks']["items"][0]) + '\n')
        except Exception as e:
            print("Failed to get a track: ", this_song)
            try:
                print("With Python throwing: ", e)
                print("With the response message: ", track_info.json())
            except Exception as e:
                # We don't get the tempalted response from spotify, meaning we get a error message instead
                # Very likely just a 429 error, so we wait a bit and try again
                # Or in some cases other issues, leading to 529/other errors
                print(track_info)
                time.sleep(300)
            access_token = get_token()

        time.sleep(0.1)

Access token: BQC5xglCAwBTDzEB4v7sFbONUM_sIOEMUZ14aqMyezHGKkzELvExBW2-Ox3JEFCzSo6JEbPDixvQ9GsWjDUEJ_nafighg_Pm1FPrVlIvr5AwlNLeVGr_WY7ITROuJWcb9iRAT5ajE3I
Index:  50000
Failed to get a track:  (50411, 'W Sound - Mi Novio Tiene Novia - W Sound 02')
With Python throwing:  Expecting value: line 1 column 1 (char 0)
<Response [504]>
Access token: BQDISlpIwKPqwW-M_KgvL7ti_EwGB3HpP1WDIpj1a_3UnQ3GXYjPDvx07K8gHSXR8Yti3hF7yeXocjI2eLwPP-6TUSrj8g2W7P8Q5V2Hf6TdrxY4lQx4_QRZZ-tEoLVB3ozJbw2pUgk
Failed to get a track:  (50619, 'Ricardo Ray - Bella Es la Navidad')
With Python throwing:  ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
With the response message:  {'tracks': {'href': 'https://api.spotify.com/v1/search?offset=0&limit=1&query=Los%20Graduados%20-%20La%20Pelea%20Del%20Siglo&type=track', 'limit': 1, 'next': 'https://api.spotify.com/v1/search?offset=1&limit=1&query=Los%20Graduados%20-%20La%20Pelea%20Del%20Siglo&type=track', 'offset': 0, 'previous': Non

Songs we couldn't get the first time around, either due to internet issues or spotify not finding anything
<br>
We can get those once we get all the others

10907, 'Milky Chance - Stolen Dance'
13570, 'Stephen Dawes - Teenage Dream'
17994, 'Kollegah - Der Boss is Back'
18253, "Destiny's Child - 8 Days of Christmas - Live"

25294, 'Gert Verhulst - Altijd', 25446, "Vald - J'pourrai", 27775, 'Le classico organisé - À la rue marié', 33373, 'IZA - Pesadão (Participação especial Marcelo Falcão)', 38297, 'Thiago Brava - Lei do Desapego', 40740, 'Basta - Моя Вселенная', 46643, 'Lo & Leduc - Gschirr'

50411, 'W Sound - Mi Novio Tiene Novia - W Sound 02', 50619, 'Ricardo Ray - Bella Es la Navidad', 50685, 'Andy Rivera - Pa Que Me Recuerdes', 56583, 'Revolverheld - Lass uns gehen - Single Version', 57678, 'pd stone - Telephone Box', 59386, 'Musso - Paris', 64980, 'Troels Gustavsen - GIV MIG TID', 65950, 'Matvey Emerson - I Know You Care - Radio Mix', 65972, 'George Michael - You Have Been Loved', 67019, 'Lander Rey - La Compe', 70238, 'kewin - 1idee', 73486, 'Pole. - Roma'