# Get the song ID's

In this file, we get the song id's and do a little bit of data preprocessing by cleaning out duplicates.
We use the Spotify API library called Spotipy to easily access the data. While manually making the GET- requests would result in more clean data and more control, this is by far the easiest way to gain access to the API.

In the end, we write the ID's down to a CSV file so we don't have to get them again once we have them. This also allows us to 

In [1]:
## Import all the necessary libraries
from requests import HTTPError
import spotipy as spy
import pandas as pd
import csv

In [2]:
## The secret must be hidden so not to compromise the app
with open('secret.txt') as s:
    client_secret = next(s)
s.close()
client_id = '66b50b394b214bd399d67a70be01a80e'
## By default use a playlist which contains roughly 10,000 songs
user = 'Willis Orr'
playlist_uri = 'spotify:playlist:5S8SJdl1BDc0ugpkEvFsIL'

In [3]:
## Set up the credentials manager
credentials_manager = spy.oauth2.SpotifyClientCredentials(client_id, client_secret)
sp = spy.Spotify(client_credentials_manager=credentials_manager)

In [4]:
## Get all the song id's from the playlist. The structure is quite complex, but there's practically nothing in this point we can do.
## Append all the ID's to a list that we'll later on use to write everything down.
pl_t = sp.user_playlist_tracks(user, playlist_uri)
tmp = pl_t['items']
ids = []
while pl_t['next']:
    pl_t = sp.next(pl_t)
    tmp.extend(pl_t['items'])
    for s in tmp:
        ids.append(s["track"]["id"])

In [5]:
## Do the first step of preprocessing: delete duplicate ID's from the list by converting it to a set and then back to a list.
i = len(ids)
print("Amount of rows:", i)
if len(ids) != len(set(ids)):
    ids = list(set(ids))
    print("Duplicates found, {} rows cleaned.".format(i-len(ids)))
    print("The remaining rows: ", len(ids))
else:
    print("No duplicates")

Amount of rows: 504900
Duplicates found, 494940 rows cleaned.
The remaining rows:  9960


In [6]:
## Finally, write down the ID's in the list.
## In this part, we can also check if the given ID already is in the list so we don't end up adding it again.
## If we would like to do this process to multiple playlists, this could come in handy.
with open('Datasets/ids.csv', 'r') as r:
    old_ids = []
    reader = r.read()
    for item in reader:
        old_ids.append(item)
r.close()
with open('Datasets/ids.csv', 'w') as w:
    writer = csv.writer(w)
    if "track_id" not in old_ids:
        writer.writerow(["track id"])
    for i in range(len(ids)):
        if ids[i] not in old_ids:
            writer.writerow([ids[i]])
w.close()