**Instructions**

To move forward with the project, you need to create a collection of songs with their audio features - as large as possible!

These are the songs that we will cluster. And, later, when the user inputs a song, we will find the cluster to which the song belongs and recommend a song from the same cluster. The more songs you have, the more accurate and diverse recommendations you'll be able to give. Although... you might want to make sure the collected songs are "curated" in a certain way. Try to find playlists of songs that are diverse, but also that meet certain standards.

The process of sending hundreds or thousands of requests can take some time - it's normal if you have to wait a few minutes (or, if you're ambitious, even hours) to get all the data you need.

An idea for collecting as many songs as possible is to start with all the songs of a big, diverse playlist and then go to every artist present in the playlist and grab every song of every album of that artist. The amount of songs you'll be collecting per playlist will grow exponentially!

In [14]:
import config
import spotipy
import json
from spotipy.oauth2 import SpotifyClientCredentials
import pandas as pd

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(client_id= config.client_id,
                                                           client_secret= config.client_secret))

**1. Firstly I'll try with a small playlist**

In [15]:
playlist = sp.user_playlist_tracks("spotify", "35eNWNVrgv1xWTEw9vF0X9",market="GB")

In [16]:
def get_playlist_tracks(username, playlist_id):
    results = sp.user_playlist_tracks(username,playlist_id,market="GB")
    tracks = results['items']
    while results['next']:
        results = sp.next(results)
        tracks.extend(results['items'])
    return tracks

In [17]:
tracks=get_playlist_tracks("spotify","35eNWNVrgv1xWTEw9vF0X9")

In [18]:
list_of_audio_features=[]
for item in range(len(playlist['items'])):
    list_of_audio_features.append(sp.audio_features(tracks[item]["track"]["id"])[0])

In [19]:
list_of_audio_features[0]

{'danceability': 0.462,
 'energy': 0.531,
 'key': 2,
 'loudness': -13.373,
 'mode': 1,
 'speechiness': 0.185,
 'acousticness': 0.847,
 'instrumentalness': 0.923,
 'liveness': 0.0792,
 'valence': 0.631,
 'tempo': 169.945,
 'type': 'audio_features',
 'id': '5zBMbktf0ufYlRsLhwcSSr',
 'uri': 'spotify:track:5zBMbktf0ufYlRsLhwcSSr',
 'track_href': 'https://api.spotify.com/v1/tracks/5zBMbktf0ufYlRsLhwcSSr',
 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/5zBMbktf0ufYlRsLhwcSSr',
 'duration_ms': 130784,
 'time_signature': 4}

In [20]:
df=pd.DataFrame(list_of_audio_features)    
df=df[["danceability","energy","loudness","speechiness","acousticness",
    "instrumentalness","liveness","valence","tempo","id","duration_ms"]]
df

Unnamed: 0,danceability,energy,loudness,speechiness,acousticness,instrumentalness,liveness,valence,tempo,id,duration_ms
0,0.462,0.531,-13.373,0.1850,0.847000,0.923000,0.0792,0.6310,169.945,5zBMbktf0ufYlRsLhwcSSr,130784
1,0.654,0.754,-6.274,0.0361,0.075900,0.000000,0.0388,0.9840,148.829,32RvD4tQKXEQ3TfXnh8cbS,173129
2,0.786,0.552,-8.940,0.2440,0.012400,0.047600,0.1200,0.7600,147.402,5x82hqOUyleS7E1u9qRpNL,260979
3,0.639,0.346,-13.063,0.5380,0.095000,0.000427,0.1550,0.1340,146.435,2crDGxJzeL6CsmuWFuERKi,391899
4,0.578,0.595,-7.013,0.2090,0.986000,0.000046,0.7120,0.8420,108.479,6FauEkShd10tj0KNga5U8U,92451
...,...,...,...,...,...,...,...,...,...,...,...
80,0.876,0.282,-12.618,0.3190,0.050300,0.000048,0.0741,0.8010,130.958,64xslN8DB3jG1tK2l3H1mA,207733
81,0.339,0.625,-25.877,0.5280,0.949000,0.934000,0.1130,0.0521,123.711,5jnpCbKjKliHoOb9Eeog0q,296067
82,0.610,0.588,-8.406,0.0490,0.095300,0.000000,0.6530,0.5000,132.623,4wElJpVuDuCo2NpvCVa6a3,189573
83,0.316,0.484,-9.110,0.0308,0.000334,0.004450,0.0912,0.2990,117.363,4mn2kNTqiGLwaUR8JdhJ1l,269907


**2. Now, let's try with different playlist to get a wide range of variaty.**

In [21]:
#1. get dictionary with playlist, Goal: get the amount of songs
playlist_2 = sp.user_playlist_tracks("spotify", "5S8SJdl1BDc0ugpkEvFsIL",market="GB")
len(playlist_2["items"])
playlist_2.keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

In [39]:
#2. get all the tracks in a dictionary
tracksplaylist_2=get_playlist_tracks("spotify","5S8SJdl1BDc0ugpkEvFsIL")

In [37]:
tracksplaylist_2[9999]["track"]["id"]

'003vvx7Niy0yvhvHt4a68B'

In [38]:
#3. get all the audio features in a dictionary / expect is that if it crass we say to pass
#we add print(i) in order to monitor de for loop, if it crashes at some point we will just 'pass' it.
list_of_audio_features_playlist_2=[]
for i in range(0,9999,10):
    try:
        print(i)
        list_of_audio_features_playlist_2.append(sp.audio_features(tracksplaylist_2[i]["track"]["id"])[0])
    except:
        pass

0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
260
270
280
290
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
460
470
480
490
500
510
520
530
540
550
560
570
580
590
600
610
620
630
640
650
660
670
680
690
700
710
720
730
740
750
760
770
780
790
800
810
820
830
840
850
860
870
880
890
900
910
920
930
940
950
960
970
980
990
1000
1010
1020
1030
1040
1050
1060
1070
1080
1090
1100
1110
1120
1130
1140
1150
1160
1170
1180
1190
1200
1210
1220
1230
1240
1250
1260
1270
1280
1290
1300
1310
1320
1330
1340
1350
1360
1370
1380
1390
1400
1410
1420
1430
1440
1450
1460
1470
1480
1490
1500
1510
1520
1530
1540
1550
1560
1570
1580
1590
1600
1610
1620
1630
1640
1650
1660
1670
1680
1690
1700
1710
1720
1730
1740
1750
1760
1770
1780
1790
1800
1810
1820
1830
1840
1850
1860
1870
1880
1890
1900
1910
1920
1930
1940
1950
1960
1970
1980
1990
2000
2010
2020
2030
2040
2050
2060
2070
2080
2090
2100
2110
2120
2130
2140
2150
2160
2170
2180
2190
2200
2210
2

In [None]:
list_of_audio_features_playlist_2[0].keys()


In [None]:
type(list_of_audio_features_playlist_2[0])
pd.DataFrame.from_dict(list_of_audio_features_playlist_2[0])

In [None]:
#create dataframe
list_of_audio_features_playlist_2

In [None]:
df_2.to_csv("Data/spotifysongs2.csv")

-------

**3. Now a list of paylists**

In [23]:
list_playlists = ['59SUAzPx3rLT1OwT3xeSO6',
 '085xgJMt3iw4fYLhDGbSic',
 '5O7kE8EhCQxxyN5ZtvNZRq',
 '4lO91AUk7HfuyoanUyJAgR',
 '6YbFgcQiKKxOqa0NjX5kXq',
 '6PYUsr8PT5j8ivYwsyw84k',
 '5dcb1jWziI9f6cqAh1fFF6',
 '2OcOlzeUwis6JYpoRauln3',
 '3ANGJ9boEGZkOzCx5D1LRA',
 '67kTdycjhxJYTjU5kEv08R',
 '4BVAD6ubqQ4pCO5TEelZPC']

In [24]:
#1. get dictionary with the 11 playlists, Goal: get the amount of songs
playlist_2=[]

for item in list_playlists:
    playlist_2.append(sp.user_playlist_tracks("spotify",item,market="GB"))

In [25]:
playlist_2[10]["items"]   [88]["track"]["id"]   #1-list - playlists (10) - items  2- [88] song

'56sxpXFgha5NcJexZUPGXS'

In [26]:
#1. create a list with tracks, adding all the tracks of each playlist

all_tracks=[]

for i in range(len(playlist_2)):
    print(i)
    for k in playlist_2[i]["items"]:
        all_tracks.append(playlist_2[i]["items"])  

0
1
2
3
4
5
6
7
8
9
10


In [27]:
len(all_tracks)

1100

In [36]:
audio_features_playlist=[]
for i in range(len(all_tracks)):
    print(i)
    audio_features_playlist.append(sp.audio_features(all_tracks[i][0]["track"]["id"]))

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
27

In [34]:
sp.audio_features(all_tracks[i][0]["track"]["id"])

{'danceability': 0.241,
 'energy': 0.898,
 'key': 2,
 'loudness': -9.639,
 'mode': 0,
 'speechiness': 0.127,
 'acousticness': 0.0236,
 'instrumentalness': 0.936,
 'liveness': 0.131,
 'valence': 0.132,
 'tempo': 131.469,
 'type': 'audio_features',
 'id': '4mNt7EG9vNDGNizhBqoxKy',
 'uri': 'spotify:track:4mNt7EG9vNDGNizhBqoxKy',
 'track_href': 'https://api.spotify.com/v1/tracks/4mNt7EG9vNDGNizhBqoxKy',
 'analysis_url': 'https://api.spotify.com/v1/audio-analysis/4mNt7EG9vNDGNizhBqoxKy',
 'duration_ms': 223840,
 'time_signature': 4}

In [43]:
x = pd.DataFrame.from_dict(audio_features_playlist)