# SPOTIFY ALBUM DATA

[How to get data from Spotify](https://github.com/saracoop/modeling_spotify_api_data)  
[Spotify API](https://developer.spotify.com/documentation/web-api/)  
[Spotify Dashboard](https://developer.spotify.com/dashboard/applications)  
[How to get Spotify Playlist id](https://clients.caster.fm/knowledgebase/110/How-to-find-Spotify-playlist-ID.html)  
[spotipy](https://spotipy.readthedocs.io/en/2.6.1/)  

In [1]:
import os
import pandas as pd
import numpy as np
import json
import random, string
import seaborn as sns
import matplotlib.pyplot as plt
from tqdm import tqdm
%matplotlib inline

In [2]:
# Settings for pandas
pd.set_option('display.max_rows', 10)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

# Settings for seaborn
sns.set(rc = {'figure.figsize':(20,8)})

In [3]:
import warnings  # To ignore warnings

def warn(*args, **kwargs):
    pass

warnings.warn = warn

## Loading the playlists

In [4]:
path = 'C:\\Users\\admin\\Documents\\Academics\\JupyterLab\\CUAI 동계 컨퍼런스\\album\\'

In [5]:
playlist_id_list = pd.read_csv(path+'playlist\\angry_id-2.csv')
playlist_id_list

Unnamed: 0,emotion,id,playlist name
0,angry,2pPLItxjDyaqN4yXhQsFya,Annoyed
1,angry,5TfA5OD30FWxNJ2dF5wmx5,annoyed
2,angry,3NHPNhRnriNAhwgHfSzJ7U,"Angry, Angsty and Annoyed"
3,angry,65fvPmZ7nvAH71I7YVY5NF,Annoyed
4,angry,0pced4seHhiUTLHPWKWaRc,Annoyed.
...,...,...,...
108,angry,54VgzaNfaOe1C7TpT0CKmc,퇴근길 만원버스 짜증방지용
109,angry,7B0SkSpz5yJ1X0nilufCa7,짜증나
110,angry,55HoLLE0ccHLumPMtOetXb,짜증나
111,angry,56PAwSWpF9INIUaq1uv0OW,K-Rap 짜증나


## Gathering data using spotipy

In [6]:
# import necessary libaries
import spotipy
import spotipy.util as util
import requests
import shutil
import urllib.request
from csv import writer  # for metadata


# setup authorization(more information on this in the setup authorization file)
username = 'jaeyonggy'
client_id = '08671d110d2746dd814a36d7b836fa15'
client_secret = 'b36607417d004df8916c7f396cdad74c'
redirect_uri = 'http://localhost:7778/callback'
scope = 'user-library-read'
token = util.prompt_for_user_token(username=username, client_id=client_id, client_secret=client_secret, redirect_uri=redirect_uri, scope=scope)
spotify = spotipy.Spotify(auth=token)


something_happened = 0  # To count how many times something happened


# loop for each playlist_id
for loop_i in tqdm(range(len(playlist_id_list))):
    try:
        # chose playlist of songs
        playlist = spotify.playlist_tracks(playlist_id=playlist_id_list.iloc[loop_i][1])

        # create list for track ids
        track_id = []

        # add the id of each song in the playlist
        for x in playlist['items']:
            track_id.append(x['track']['id'])

        # if there are over 100 songs in the playlist run this:
        while playlist['next']:
            # move to next 100 songs
            playlist = spotify.next(playlist)
            for x in playlist['items']:
                track_id.append(x['track']['id'])


        # a loop for each song id
        for id in track_id:
            song = spotify.track(id)
            # the text file that will contain all the unique album ids
            with open(path+"no_duplicate_albums\\"+playlist_id_list['emotion'][0]+"_no_duplicate_albums.txt","r+",encoding='utf-8') as f_txt:
                contents = f_txt.read().splitlines()  
                # If it's not a duplicate album; album id accessed by song['album']['id']
                if song['album']['id'] not in contents: 
                    f_txt.write(song['album']['id']+"\n")  # add the id to the no duplicate list
                    with open('metadata.csv', 'a+', newline='', encoding='utf-8') as f_csv:  # adding a new row of metadata of an album to a csv
                        row = [song['album']['id'],  # album id will be used to see the metadata
                            song['album']['name'],  # album name
                            song['album']['artists'][0]['name'],  # artist name
                            song['album']['release_date'],  # album release date
                            int(song['album']['total_tracks']),  # total number of tracks in the album
                            playlist_id_list['emotion'][0],  # the emotion group the album belongs to according to the search result in Spotfiy web
                            playlist_id_list['playlist name'][loop_i]]  # the name of the playlist the album was included in
                        writer_object = writer(f_csv, delimiter=',')  # Pass the CSV  file object to the writer() function which will gives a writer object
                        writer_object.writerow(row)  # Pass the data in the list as an argument into the writerow() function

                    # downlaoding the album image; 0: 640*640, 1: 300*300, 2: 64*64 (the number in song['album']['images'][0]['url'])
                    urllib.request.urlretrieve(song['album']['images'][0]['url'], path+"emotion_album_images\\"+playlist_id_list['emotion'][0]+"\\{}.jpg".format(song['album']['id']))
                    
    except:
        something_happened += 1
        pass
    
print("Number of times something happend:", something_happened)

100%|████████████████████████████████████████████████████████████████████████████████| 113/113 [23:04<00:00, 12.25s/it]

Number of times something happend: 14





## Metadata for the albums

In [7]:
metadata = pd.read_csv("metadata.csv")
metadata

Unnamed: 0,album_id,album_name,artist,album_release_date,album_total_tracks,emotion,playlist_name
0,5D8Rdb09BkmHscEGSWAlA6,Cold Heart (PNAU Remix),Elton John,2021-08-13,1,happy,Happy
1,39McjovZ3M6n5SFtNmWTdp,My Universe,Coldplay,2021-09-24,2,happy,Happy
2,7GEzhoTiqcPYkOprWQu581,One Kiss (with Dua Lipa),Calvin Harris,2018-04-06,1,happy,Happy
3,350m7qAQ1c2NmkBoTuUvHl,Lose To Find,George Shelley,2021-12-17,1,happy,Happy
4,0JpJfu78KAB1yTRJCs0Jgi,BASEMENT,Micah Emrich,2021-11-12,8,happy,Happy
...,...,...,...,...,...,...,...
63322,2QlO3x8scDNugjs9Va3Dab,Astronaut In The Ocean (Remix) [feat. G-Eazy &...,Masked Wolf,2021-05-07,2,angry,shall we 내적댄스?
63323,12XOStb1MxtXZp3qehRm5f,Bang!,AJR,2020-02-12,1,angry,shall we 내적댄스?
63324,0Z9AAcNOWjlkrOm47iRwpJ,Aquanetta Dr,Mack Keane,2019-10-25,4,angry,shall we 내적댄스?
63325,5jZs0pEMbz0ZDdEqd0GmrI,Monte Carlo,Remi Wolf,2020-08-03,1,angry,shall we 내적댄스?


## Spotify API 사용할 때 주의점

Spotify API Token은 1시간이면 만료되어서 이를 다시 초기화시켜야 하는데, 이는 APP을 새로 만들고 .cache라는 파일을 지워주면 된다.

따라서 많은 양의 앨범이미지를 다운 받을 때 여러 세션에 걸쳐 나눠서 다운을 받아야 한다. 앨범이미지의 이름을 album id로 지어서 메타데이터로 접근할 수 있게 하였다. 이때, 다운 받은 앨범이미지랑 메타데이터의 개체수랑 안맞는 경우 메타데이터에 있는 album id들 중 앨범이미지의 이름으로 없는 것들은 제거하면 된다.