# Training and Testing Datasets

> Athina Davari 8180020 \
> Department of Management Science and Technology \
> Athens University of Economics and Business

## Description
Create the training and testing datasets for dissect Spotify's Valence assignment.

## Setting the Scene
For data analysis process, its necessary to import the packages we'll need.\
As a good practice, at the begining of the notebook we have a cell with all the imports.

In [1]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from spotify_config import config
import pandas as pd

In order to run the notebook, you should have a spotify developers account, create an application and add to `spotify_config.py` file your spotify `client_id` and `client_secret`, so that your app will be able to use the Spotify API:

`spotify_config.py`:

 ```
  config = {
      'client_id' : 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
      'client_secret' :'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
  }
  ```

In [2]:
client_credentials_manager = SpotifyClientCredentials(config['client_id'],
                                                      config['client_secret'])
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

## Training Dataset

In order to create the random dataset, i found first the [Music Map of all genres on spotify](
https://everynoise.com/everynoise1d.cgi?scope=all&vector=popularity). Then i downloaded the html page, cleaned up html and saved it as `site.html`. 

From that page i took all the Spotify genres.

In [3]:
music_types = pd.read_html('site.html')[0]
music_types.drop(1, axis=1, inplace=True)
music_types


Unnamed: 0,0,2
0,1,pop
1,2,dance pop
2,3,rap
3,4,rock
4,5,latin
...,...,...
5751,5752,yunnan traditional
5752,5753,swazi traditional
5753,5754,classical string trio
5754,5755,himene tarava


I searched for 50 tracks for the 450 most popular categories.

In [4]:
df = pd.DataFrame(columns = ['id', 'name'])
x = range(450)
for n in x:
    genre = music_types[2].iloc[n]
    tracks1 = sp.search(q='genre:'+genre, type="track", limit=50)
    tracks1
    x = tracks1.get("tracks")
    y = x.get("items")
    for track in y:
        df = df.append({'id' : track['id'], 'name' : track['name']}, 
                         ignore_index = True)

print("Unique songs")
print(len(df["id"].unique()))

Unique songs
17608


In [5]:
df

Unnamed: 0,id,name
0,4fouWK6XVHhzl78KzQ1UjL,abcdefu
1,3Vi5XqYrmQgOYBajMWSvCi,Need to Know
2,4ZtFanR9U6ndgddUvNcjcG,good 4 u
3,6Uj1ctrBOjOas8xZXGqKk4,Woman
4,27NovPIUIRrOZoCHxABJwK,INDUSTRY BABY (feat. Jack Harlow)
...,...,...
20546,3Evwk0LpaDBxZw5XivAnAW,Te Olvidaré (feat. Demarco Flamenco)
20547,6lVnWrqFYadbLQQXJ3eSbq,Una pequeña historia
20548,6o70OF4tqIT43azQ8gR7Y6,Flamenco
20549,62QYD9QijBwm26Id8azkcR,New Mego's Flamenco - Boston Show


In [6]:
df.drop_duplicates()

Unnamed: 0,id,name
0,4fouWK6XVHhzl78KzQ1UjL,abcdefu
1,3Vi5XqYrmQgOYBajMWSvCi,Need to Know
2,4ZtFanR9U6ndgddUvNcjcG,good 4 u
3,6Uj1ctrBOjOas8xZXGqKk4,Woman
4,27NovPIUIRrOZoCHxABJwK,INDUSTRY BABY (feat. Jack Harlow)
...,...,...
20546,3Evwk0LpaDBxZw5XivAnAW,Te Olvidaré (feat. Demarco Flamenco)
20547,6lVnWrqFYadbLQQXJ3eSbq,Una pequeña historia
20548,6o70OF4tqIT43azQ8gR7Y6,Flamenco
20549,62QYD9QijBwm26Id8azkcR,New Mego's Flamenco - Boston Show


Then i got, for each of the tracks, its audio features.

In order to do that, i created a dictionary keyed by id, with values being the audio features for the specific track.

In [7]:
features = {}
all_track_ids = list(df['id'].unique())
start = 0
num_tracks = 100
  
while start < len(all_track_ids):
    print(f'getting from {start} to {start+num_tracks}')
    tracks_batch = all_track_ids[start:start+num_tracks]
    features_batch = sp.audio_features(tracks_batch)
    features.update({ track_id : track_features 
                     for track_id, track_features in zip(tracks_batch, features_batch) })
    start += num_tracks

getting from 0 to 100
getting from 100 to 200
getting from 200 to 300
getting from 300 to 400
getting from 400 to 500
getting from 500 to 600
getting from 600 to 700
getting from 700 to 800
getting from 800 to 900
getting from 900 to 1000
getting from 1000 to 1100
getting from 1100 to 1200
getting from 1200 to 1300
getting from 1300 to 1400
getting from 1400 to 1500
getting from 1500 to 1600
getting from 1600 to 1700
getting from 1700 to 1800
getting from 1800 to 1900
getting from 1900 to 2000
getting from 2000 to 2100
getting from 2100 to 2200
getting from 2200 to 2300
getting from 2300 to 2400
getting from 2400 to 2500
getting from 2500 to 2600
getting from 2600 to 2700
getting from 2700 to 2800
getting from 2800 to 2900
getting from 2900 to 3000
getting from 3000 to 3100
getting from 3100 to 3200
getting from 3200 to 3300
getting from 3300 to 3400
getting from 3400 to 3500
getting from 3500 to 3600
getting from 3600 to 3700
getting from 3700 to 3800
getting from 3800 to 3900
getting

And this is how the features look like:

In [8]:
features

{'4fouWK6XVHhzl78KzQ1UjL': {'danceability': 0.695,
  'energy': 0.54,
  'key': 4,
  'loudness': -5.692,
  'mode': 1,
  'speechiness': 0.0493,
  'acousticness': 0.299,
  'instrumentalness': 0,
  'liveness': 0.367,
  'valence': 0.415,
  'tempo': 121.932,
  'type': 'audio_features',
  'id': '4fouWK6XVHhzl78KzQ1UjL',
  'uri': 'spotify:track:4fouWK6XVHhzl78KzQ1UjL',
  'track_href': 'https://api.spotify.com/v1/tracks/4fouWK6XVHhzl78KzQ1UjL',
  'analysis_url': 'https://api.spotify.com/v1/audio-analysis/4fouWK6XVHhzl78KzQ1UjL',
  'duration_ms': 168602,
  'time_signature': 4},
 '3Vi5XqYrmQgOYBajMWSvCi': {'danceability': 0.664,
  'energy': 0.609,
  'key': 1,
  'loudness': -6.509,
  'mode': 1,
  'speechiness': 0.0707,
  'acousticness': 0.304,
  'instrumentalness': 0,
  'liveness': 0.0926,
  'valence': 0.194,
  'tempo': 130.041,
  'type': 'audio_features',
  'id': '3Vi5XqYrmQgOYBajMWSvCi',
  'uri': 'spotify:track:3Vi5XqYrmQgOYBajMWSvCi',
  'track_href': 'https://api.spotify.com/v1/tracks/3Vi5XqYrmQ

Here i turned the dictionary to a DataFrame.

In [9]:
tracks_analysis = pd.DataFrame.from_dict(features, orient='index')
tracks_analysis

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
4fouWK6XVHhzl78KzQ1UjL,0.695,0.540,4,-5.692,1,0.0493,0.2990,0.00000,0.3670,0.4150,121.932,audio_features,4fouWK6XVHhzl78KzQ1UjL,spotify:track:4fouWK6XVHhzl78KzQ1UjL,https://api.spotify.com/v1/tracks/4fouWK6XVHhz...,https://api.spotify.com/v1/audio-analysis/4fou...,168602,4
3Vi5XqYrmQgOYBajMWSvCi,0.664,0.609,1,-6.509,1,0.0707,0.3040,0.00000,0.0926,0.1940,130.041,audio_features,3Vi5XqYrmQgOYBajMWSvCi,spotify:track:3Vi5XqYrmQgOYBajMWSvCi,https://api.spotify.com/v1/tracks/3Vi5XqYrmQgO...,https://api.spotify.com/v1/audio-analysis/3Vi5...,210560,4
4ZtFanR9U6ndgddUvNcjcG,0.563,0.664,9,-5.044,1,0.1540,0.3350,0.00000,0.0849,0.6880,166.928,audio_features,4ZtFanR9U6ndgddUvNcjcG,spotify:track:4ZtFanR9U6ndgddUvNcjcG,https://api.spotify.com/v1/tracks/4ZtFanR9U6nd...,https://api.spotify.com/v1/audio-analysis/4ZtF...,178147,4
6Uj1ctrBOjOas8xZXGqKk4,0.824,0.764,5,-4.175,0,0.0854,0.0888,0.00294,0.1170,0.8810,107.998,audio_features,6Uj1ctrBOjOas8xZXGqKk4,spotify:track:6Uj1ctrBOjOas8xZXGqKk4,https://api.spotify.com/v1/tracks/6Uj1ctrBOjOa...,https://api.spotify.com/v1/audio-analysis/6Uj1...,172627,4
27NovPIUIRrOZoCHxABJwK,0.736,0.704,3,-7.409,0,0.0615,0.0203,0.00000,0.0501,0.8940,149.995,audio_features,27NovPIUIRrOZoCHxABJwK,spotify:track:27NovPIUIRrOZoCHxABJwK,https://api.spotify.com/v1/tracks/27NovPIUIRrO...,https://api.spotify.com/v1/audio-analysis/27No...,212000,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3Evwk0LpaDBxZw5XivAnAW,0.739,0.874,7,-3.317,0,0.2360,0.3140,0.00000,0.1050,0.7670,95.055,audio_features,3Evwk0LpaDBxZw5XivAnAW,spotify:track:3Evwk0LpaDBxZw5XivAnAW,https://api.spotify.com/v1/tracks/3Evwk0LpaDBx...,https://api.spotify.com/v1/audio-analysis/3Evw...,188211,4
6lVnWrqFYadbLQQXJ3eSbq,0.692,0.767,11,-5.791,0,0.0401,0.3500,0.00000,0.0905,0.7600,139.930,audio_features,6lVnWrqFYadbLQQXJ3eSbq,spotify:track:6lVnWrqFYadbLQQXJ3eSbq,https://api.spotify.com/v1/tracks/6lVnWrqFYadb...,https://api.spotify.com/v1/audio-analysis/6lVn...,213333,4
6o70OF4tqIT43azQ8gR7Y6,0.590,0.735,8,-4.621,1,0.3000,0.0901,0.76500,0.1120,0.0584,137.925,audio_features,6o70OF4tqIT43azQ8gR7Y6,spotify:track:6o70OF4tqIT43azQ8gR7Y6,https://api.spotify.com/v1/tracks/6o70OF4tqIT4...,https://api.spotify.com/v1/audio-analysis/6o70...,340870,4
62QYD9QijBwm26Id8azkcR,0.547,0.685,9,-10.666,0,0.0312,0.1610,0.93300,0.9140,0.3660,130.063,audio_features,62QYD9QijBwm26Id8azkcR,spotify:track:62QYD9QijBwm26Id8azkcR,https://api.spotify.com/v1/tracks/62QYD9QijBwm...,https://api.spotify.com/v1/audio-analysis/62QY...,187227,4


I moved the index as a column called `song_id`.

In [10]:
tracks_analysis = tracks_analysis.reset_index().rename(columns={'index' : 'song_id'})
tracks_analysis

Unnamed: 0,song_id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,4fouWK6XVHhzl78KzQ1UjL,0.695,0.540,4,-5.692,1,0.0493,0.2990,0.00000,0.3670,0.4150,121.932,audio_features,4fouWK6XVHhzl78KzQ1UjL,spotify:track:4fouWK6XVHhzl78KzQ1UjL,https://api.spotify.com/v1/tracks/4fouWK6XVHhz...,https://api.spotify.com/v1/audio-analysis/4fou...,168602,4
1,3Vi5XqYrmQgOYBajMWSvCi,0.664,0.609,1,-6.509,1,0.0707,0.3040,0.00000,0.0926,0.1940,130.041,audio_features,3Vi5XqYrmQgOYBajMWSvCi,spotify:track:3Vi5XqYrmQgOYBajMWSvCi,https://api.spotify.com/v1/tracks/3Vi5XqYrmQgO...,https://api.spotify.com/v1/audio-analysis/3Vi5...,210560,4
2,4ZtFanR9U6ndgddUvNcjcG,0.563,0.664,9,-5.044,1,0.1540,0.3350,0.00000,0.0849,0.6880,166.928,audio_features,4ZtFanR9U6ndgddUvNcjcG,spotify:track:4ZtFanR9U6ndgddUvNcjcG,https://api.spotify.com/v1/tracks/4ZtFanR9U6nd...,https://api.spotify.com/v1/audio-analysis/4ZtF...,178147,4
3,6Uj1ctrBOjOas8xZXGqKk4,0.824,0.764,5,-4.175,0,0.0854,0.0888,0.00294,0.1170,0.8810,107.998,audio_features,6Uj1ctrBOjOas8xZXGqKk4,spotify:track:6Uj1ctrBOjOas8xZXGqKk4,https://api.spotify.com/v1/tracks/6Uj1ctrBOjOa...,https://api.spotify.com/v1/audio-analysis/6Uj1...,172627,4
4,27NovPIUIRrOZoCHxABJwK,0.736,0.704,3,-7.409,0,0.0615,0.0203,0.00000,0.0501,0.8940,149.995,audio_features,27NovPIUIRrOZoCHxABJwK,spotify:track:27NovPIUIRrOZoCHxABJwK,https://api.spotify.com/v1/tracks/27NovPIUIRrO...,https://api.spotify.com/v1/audio-analysis/27No...,212000,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17603,3Evwk0LpaDBxZw5XivAnAW,0.739,0.874,7,-3.317,0,0.2360,0.3140,0.00000,0.1050,0.7670,95.055,audio_features,3Evwk0LpaDBxZw5XivAnAW,spotify:track:3Evwk0LpaDBxZw5XivAnAW,https://api.spotify.com/v1/tracks/3Evwk0LpaDBx...,https://api.spotify.com/v1/audio-analysis/3Evw...,188211,4
17604,6lVnWrqFYadbLQQXJ3eSbq,0.692,0.767,11,-5.791,0,0.0401,0.3500,0.00000,0.0905,0.7600,139.930,audio_features,6lVnWrqFYadbLQQXJ3eSbq,spotify:track:6lVnWrqFYadbLQQXJ3eSbq,https://api.spotify.com/v1/tracks/6lVnWrqFYadb...,https://api.spotify.com/v1/audio-analysis/6lVn...,213333,4
17605,6o70OF4tqIT43azQ8gR7Y6,0.590,0.735,8,-4.621,1,0.3000,0.0901,0.76500,0.1120,0.0584,137.925,audio_features,6o70OF4tqIT43azQ8gR7Y6,spotify:track:6o70OF4tqIT43azQ8gR7Y6,https://api.spotify.com/v1/tracks/6o70OF4tqIT4...,https://api.spotify.com/v1/audio-analysis/6o70...,340870,4
17606,62QYD9QijBwm26Id8azkcR,0.547,0.685,9,-10.666,0,0.0312,0.1610,0.93300,0.9140,0.3660,130.063,audio_features,62QYD9QijBwm26Id8azkcR,spotify:track:62QYD9QijBwm26Id8azkcR,https://api.spotify.com/v1/tracks/62QYD9QijBwm...,https://api.spotify.com/v1/audio-analysis/62QY...,187227,4


Finally, i saved the dataset to `tracks_features.csv`. The dataset i used for dissecting Spotify's Valence metric is stored in `data/tracks_features.csv`.

In [11]:
tracks_analysis.to_csv('tracks_features.csv', index=False) 

## Testing Dataset

For testing dataset, i took all the ids from spotify_ids.txt.

In [12]:
testdf = pd.read_csv('data\spotify_ids.txt', header=None, sep=' ')
testdf

Unnamed: 0,0
0,7lPN2DXiMsVn7XUKtOW1CS
1,5QO79kh1waicV47BqGRL3g
2,0VjIjW4GlUZAMYd2vXMi3b
3,4MzXwWMhyBbmu6hOcLVD49
4,5Kskr9LcNYa0tpt5f0ZEJx
...,...
1157,4lUmnwRybYH7mMzf16xB0y
1158,1fzf9Aad4y1RWrmwosAK5y
1159,3E3pb3qH11iny6TFDJvsg5
1160,3yTkoTuiKRGL2VAlQd7xsC


Then i renamed `0` colum to `id`.

In [13]:
testdf = testdf.rename(columns={0 : 'id'})
testdf

Unnamed: 0,id
0,7lPN2DXiMsVn7XUKtOW1CS
1,5QO79kh1waicV47BqGRL3g
2,0VjIjW4GlUZAMYd2vXMi3b
3,4MzXwWMhyBbmu6hOcLVD49
4,5Kskr9LcNYa0tpt5f0ZEJx
...,...
1157,4lUmnwRybYH7mMzf16xB0y
1158,1fzf9Aad4y1RWrmwosAK5y
1159,3E3pb3qH11iny6TFDJvsg5
1160,3yTkoTuiKRGL2VAlQd7xsC


Then i got, for each of the tracks, its audio features.

In order to do that, i created a dictionary keyed by id, with values being the audio features for the specific track.

In [14]:
test_features = {}
all_test_track_ids = list(testdf['id'].unique())
start = 0
num_tracks = 100
  
while start < len(all_test_track_ids):
    print(f'getting from {start} to {start+num_tracks}')
    tracks_batch = all_test_track_ids[start:start+num_tracks]
    features_batch = sp.audio_features(tracks_batch)
    test_features.update({ track_id : track_features 
                     for track_id, track_features in zip(tracks_batch, features_batch) })
    start += num_tracks

getting from 0 to 100
getting from 100 to 200
getting from 200 to 300
getting from 300 to 400
getting from 400 to 500
getting from 500 to 600
getting from 600 to 700
getting from 700 to 800
getting from 800 to 900
getting from 900 to 1000
getting from 1000 to 1100
getting from 1100 to 1200


Here i turned the dictionary to a DataFrame.

In [15]:
test_tracks_analysis = pd.DataFrame.from_dict(test_features, orient='index')
test_tracks_analysis

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
7lPN2DXiMsVn7XUKtOW1CS,0.585,0.436,10,-8.761,1,0.0601,0.72100,0.000013,0.1050,0.132,143.874,audio_features,7lPN2DXiMsVn7XUKtOW1CS,spotify:track:7lPN2DXiMsVn7XUKtOW1CS,https://api.spotify.com/v1/tracks/7lPN2DXiMsVn...,https://api.spotify.com/v1/audio-analysis/7lPN...,242014,4
5QO79kh1waicV47BqGRL3g,0.680,0.826,0,-5.487,1,0.0309,0.02120,0.000012,0.5430,0.644,118.051,audio_features,5QO79kh1waicV47BqGRL3g,spotify:track:5QO79kh1waicV47BqGRL3g,https://api.spotify.com/v1/tracks/5QO79kh1waic...,https://api.spotify.com/v1/audio-analysis/5QO7...,215627,4
0VjIjW4GlUZAMYd2vXMi3b,0.514,0.730,1,-5.934,1,0.0598,0.00146,0.000095,0.0897,0.334,171.005,audio_features,0VjIjW4GlUZAMYd2vXMi3b,spotify:track:0VjIjW4GlUZAMYd2vXMi3b,https://api.spotify.com/v1/tracks/0VjIjW4GlUZA...,https://api.spotify.com/v1/audio-analysis/0VjI...,200040,4
4MzXwWMhyBbmu6hOcLVD49,0.731,0.573,4,-10.059,0,0.0544,0.40100,0.000052,0.1130,0.145,109.928,audio_features,4MzXwWMhyBbmu6hOcLVD49,spotify:track:4MzXwWMhyBbmu6hOcLVD49,https://api.spotify.com/v1/tracks/4MzXwWMhyBbm...,https://api.spotify.com/v1/audio-analysis/4MzX...,205090,4
5Kskr9LcNYa0tpt5f0ZEJx,0.907,0.393,4,-7.636,0,0.0539,0.45100,0.000001,0.1350,0.202,104.949,audio_features,5Kskr9LcNYa0tpt5f0ZEJx,spotify:track:5Kskr9LcNYa0tpt5f0ZEJx,https://api.spotify.com/v1/tracks/5Kskr9LcNYa0...,https://api.spotify.com/v1/audio-analysis/5Ksk...,205458,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4lUmnwRybYH7mMzf16xB0y,0.596,0.650,9,-5.167,1,0.3370,0.13800,0.000000,0.1400,0.188,133.997,audio_features,4lUmnwRybYH7mMzf16xB0y,spotify:track:4lUmnwRybYH7mMzf16xB0y,https://api.spotify.com/v1/tracks/4lUmnwRybYH7...,https://api.spotify.com/v1/audio-analysis/4lUm...,257428,4
1fzf9Aad4y1RWrmwosAK5y,0.588,0.850,4,-6.431,1,0.0318,0.16800,0.002020,0.0465,0.768,93.003,audio_features,1fzf9Aad4y1RWrmwosAK5y,spotify:track:1fzf9Aad4y1RWrmwosAK5y,https://api.spotify.com/v1/tracks/1fzf9Aad4y1R...,https://api.spotify.com/v1/audio-analysis/1fzf...,187310,4
3E3pb3qH11iny6TFDJvsg5,0.754,0.660,0,-6.811,1,0.2670,0.17900,0.000000,0.1940,0.316,83.000,audio_features,3E3pb3qH11iny6TFDJvsg5,spotify:track:3E3pb3qH11iny6TFDJvsg5,https://api.spotify.com/v1/tracks/3E3pb3qH11in...,https://api.spotify.com/v1/audio-analysis/3E3p...,209299,4
3yTkoTuiKRGL2VAlQd7xsC,0.584,0.836,0,-4.925,1,0.0790,0.05580,0.000000,0.0663,0.484,104.973,audio_features,3yTkoTuiKRGL2VAlQd7xsC,spotify:track:3yTkoTuiKRGL2VAlQd7xsC,https://api.spotify.com/v1/tracks/3yTkoTuiKRGL...,https://api.spotify.com/v1/audio-analysis/3yTk...,202204,4


I moved the index as a column called `song_id`.

In [16]:
test_tracks_analysis = test_tracks_analysis.reset_index().rename(columns={'index' : 'song_id'})
test_tracks_analysis

Unnamed: 0,song_id,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,type,id,uri,track_href,analysis_url,duration_ms,time_signature
0,7lPN2DXiMsVn7XUKtOW1CS,0.585,0.436,10,-8.761,1,0.0601,0.72100,0.000013,0.1050,0.132,143.874,audio_features,7lPN2DXiMsVn7XUKtOW1CS,spotify:track:7lPN2DXiMsVn7XUKtOW1CS,https://api.spotify.com/v1/tracks/7lPN2DXiMsVn...,https://api.spotify.com/v1/audio-analysis/7lPN...,242014,4
1,5QO79kh1waicV47BqGRL3g,0.680,0.826,0,-5.487,1,0.0309,0.02120,0.000012,0.5430,0.644,118.051,audio_features,5QO79kh1waicV47BqGRL3g,spotify:track:5QO79kh1waicV47BqGRL3g,https://api.spotify.com/v1/tracks/5QO79kh1waic...,https://api.spotify.com/v1/audio-analysis/5QO7...,215627,4
2,0VjIjW4GlUZAMYd2vXMi3b,0.514,0.730,1,-5.934,1,0.0598,0.00146,0.000095,0.0897,0.334,171.005,audio_features,0VjIjW4GlUZAMYd2vXMi3b,spotify:track:0VjIjW4GlUZAMYd2vXMi3b,https://api.spotify.com/v1/tracks/0VjIjW4GlUZA...,https://api.spotify.com/v1/audio-analysis/0VjI...,200040,4
3,4MzXwWMhyBbmu6hOcLVD49,0.731,0.573,4,-10.059,0,0.0544,0.40100,0.000052,0.1130,0.145,109.928,audio_features,4MzXwWMhyBbmu6hOcLVD49,spotify:track:4MzXwWMhyBbmu6hOcLVD49,https://api.spotify.com/v1/tracks/4MzXwWMhyBbm...,https://api.spotify.com/v1/audio-analysis/4MzX...,205090,4
4,5Kskr9LcNYa0tpt5f0ZEJx,0.907,0.393,4,-7.636,0,0.0539,0.45100,0.000001,0.1350,0.202,104.949,audio_features,5Kskr9LcNYa0tpt5f0ZEJx,spotify:track:5Kskr9LcNYa0tpt5f0ZEJx,https://api.spotify.com/v1/tracks/5Kskr9LcNYa0...,https://api.spotify.com/v1/audio-analysis/5Ksk...,205458,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1157,4lUmnwRybYH7mMzf16xB0y,0.596,0.650,9,-5.167,1,0.3370,0.13800,0.000000,0.1400,0.188,133.997,audio_features,4lUmnwRybYH7mMzf16xB0y,spotify:track:4lUmnwRybYH7mMzf16xB0y,https://api.spotify.com/v1/tracks/4lUmnwRybYH7...,https://api.spotify.com/v1/audio-analysis/4lUm...,257428,4
1158,1fzf9Aad4y1RWrmwosAK5y,0.588,0.850,4,-6.431,1,0.0318,0.16800,0.002020,0.0465,0.768,93.003,audio_features,1fzf9Aad4y1RWrmwosAK5y,spotify:track:1fzf9Aad4y1RWrmwosAK5y,https://api.spotify.com/v1/tracks/1fzf9Aad4y1R...,https://api.spotify.com/v1/audio-analysis/1fzf...,187310,4
1159,3E3pb3qH11iny6TFDJvsg5,0.754,0.660,0,-6.811,1,0.2670,0.17900,0.000000,0.1940,0.316,83.000,audio_features,3E3pb3qH11iny6TFDJvsg5,spotify:track:3E3pb3qH11iny6TFDJvsg5,https://api.spotify.com/v1/tracks/3E3pb3qH11in...,https://api.spotify.com/v1/audio-analysis/3E3p...,209299,4
1160,3yTkoTuiKRGL2VAlQd7xsC,0.584,0.836,0,-4.925,1,0.0790,0.05580,0.000000,0.0663,0.484,104.973,audio_features,3yTkoTuiKRGL2VAlQd7xsC,spotify:track:3yTkoTuiKRGL2VAlQd7xsC,https://api.spotify.com/v1/tracks/3yTkoTuiKRGL...,https://api.spotify.com/v1/audio-analysis/3yTk...,202204,4


Finally, i saved the tesing dataset to `test_dataset.csv`. The testing dataset is stored in `data/test_dataset.csv`.

In [17]:
test_tracks_analysis.to_csv('test_dataset.csv', index=False) 