# Dissecting Spotify Valence

This is the third assignment for the course "Applied Machine Learning".

> Nikolas Moatsos <br />
>Department of Management Science and Technology <br />
>Athens University of Economics and Business

Spotify uses a metric called *valence* to measure the happiness of a track. The metric itself, however, was not developed by Spotify. It was originally developed by Echo Nest, a company that was bought by Spotify in 2014. We don't know exactly how valence is calculated. Some details are given by a blog post, which you can find here:

https://web.archive.org/web/20170422195736/http://blog.echonest.com/post/66097438564/plotting-musics-emotional-valence-1950-2013

The task of this assignment is to untangle the mystery behind valence.

## Data Collection

This notebook demonstrates the acquisition of the necessary data for the assignment. To achieve that we will use the API offered by Spotify, which provides information about the general and audio features of a track. The Web API can be found here: https://developer.spotify.com/documentation/web-api/reference/#/

Specifically the endpoints that will be used are:
* [Get Several Tracks](https://developer.spotify.com/documentation/web-api/reference/#/operations/get-several-tracks), that provides general features about the track.
* [Get Tracks' Audio Features](https://developer.spotify.com/documentation/web-api/reference/#/operations/get-several-audio-features), that provides audio features about the track, in a higher level.
* [Get Track's Audio Analysis](https://developer.spotify.com/documentation/web-api/reference/#/operations/get-audio-analysis), that provides audio features about the track, in a lower level.

*In order to connect with the API we must create a spotify_config.py file that contains our credentials. The file must have the following structure:*
  ```
  config = {
      'client_id' : 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
      'client_secret' :'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
  }
  ```

* First, we import all the necessary libraries for the data collection.

In [1]:
import pandas as pd
import numpy as np

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from spotify_config import config

import random
import os

* In order to obtain a big sample of songs and their ids on Spotify (necessary to use the API), we will use a dataset from *Kaggle* that contains the top 100 songs of each year from 1921 to 2020. The datatset contains 169,909 songs in total and can be found here: https://www.kaggle.com/ektanegi/spotifydata-19212020.

* The dataset includes some features from the Spotify API, although to ensure that we collect the correct data, for now, we will only obtain the id and the song name.

In [2]:
songs = pd.read_csv('data/data.csv', usecols=['id', 'name'])
songs

Unnamed: 0,id,name
0,6KbQ3uYMLKb5jDxLF7wYDD,Singende Bataillone 1. Teil
1,6KuQTIu1KoTTkLXKrwlLPV,"Fantasiestücke, Op. 111: Più tosto lento"
2,6L63VW0PibdM1HDSBoqnoM,Chapter 1.18 - Zamek kaniowski
3,6M94FkXd15sOAOQYRnWPN8,Bebamos Juntos - Instrumental (Remasterizado)
4,6N6tiFZ9vLTSOIxkj8qKrd,"Polonaise-Fantaisie in A-Flat Major, Op. 61"
...,...,...
169904,4KppkflX7I3vJQk7urOJaS,Skechers (feat. Tyga) - Remix
169905,1ehhGlTvjtHo2e4xJFB0SZ,Sweeter (feat. Terrace Martin)
169906,52eycxprLhK3lPcRLbQiVk,How Would I Know
169907,3wYOGJYD31sLRmBgCvWxa4,I Found You


* We check for duplicated ids.

In [3]:
songs.drop_duplicates(subset='id', inplace=True)
songs.shape

(169909, 2)

* We check for duplicated song names, because the same song can exist with different id (single and album edition). To ensure that we have only unique songs we drop the duplicated names. 

In [4]:
songs.drop_duplicates(subset='name', inplace=True)
songs.shape

(132940, 2)

* We initialize the spotipy object with our credentials from the *spotify_config.py* file. 

In [5]:
client_credentials_manager = SpotifyClientCredentials(config['client_id'],
                                                      config['client_secret'])
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

* We initialize the list with the ids that will be used to perform the requests.

In [6]:
all_track_ids = list(songs['id'])

* Beacause later, when we get the low level analysis features, we will use a subset of the songs (first 60,000), we shuffle the ids to ensure that we will get a random sample.
* The `random.shuffle` method changes the original list.

In [7]:
random.seed(0)
random.shuffle(all_track_ids)

### Track and High Level Audio Features

* We get all the high level audio feautures of a track, with the use of the `sp.audio_features` method.
* Spotify permits a maximum of 100 song ids per batch.
* We save and return the results in the previously created dictionary.

*The code was obtained from the notes of Professor Louridas.* 

In [8]:
def get_audio_features(track_ids):
    audio_features_dict = {}
    start = 0
    num_tracks = 100

    while start < len(track_ids):
        print(f'getting from {start} to {start+num_tracks}')
        try:
            tracks_batch = track_ids[start:start+num_tracks]
            audio_batch = sp.audio_features(tracks_batch)
            audio_features_dict.update({ track_id : features 
                            for track_id, features in zip(tracks_batch, audio_batch) })
        except Exception:
            print('error:', start)
        start += num_tracks
        
    return audio_features_dict

In [9]:
audio_features = get_audio_features(all_track_ids)

getting from 0 to 100
getting from 100 to 200
getting from 200 to 300
getting from 300 to 400
getting from 400 to 500
getting from 500 to 600
getting from 600 to 700
getting from 700 to 800
getting from 800 to 900
getting from 900 to 1000
getting from 1000 to 1100
getting from 1100 to 1200
getting from 1200 to 1300
getting from 1300 to 1400
getting from 1400 to 1500
getting from 1500 to 1600
getting from 1600 to 1700
getting from 1700 to 1800
getting from 1800 to 1900
getting from 1900 to 2000
getting from 2000 to 2100
getting from 2100 to 2200
getting from 2200 to 2300
getting from 2300 to 2400
getting from 2400 to 2500
getting from 2500 to 2600
getting from 2600 to 2700
getting from 2700 to 2800
getting from 2800 to 2900
getting from 2900 to 3000
getting from 3000 to 3100
getting from 3100 to 3200
getting from 3200 to 3300
getting from 3300 to 3400
getting from 3400 to 3500
getting from 3500 to 3600
getting from 3600 to 3700
getting from 3700 to 3800
getting from 3800 to 3900
getting

* We use the same method to acquire the track features, with the use of the `sp.tracks` method.
* For this API endpoint Spotify permits 50 ids per batch.

In [13]:
def get_track_features(track_ids):
    track_features_dict = {}
    start = 0
    num_tracks = 50

    while start < len(track_ids):
        print(f'getting from {start} to {start+num_tracks}')
        try:
            tracks_batch = track_ids[start:start+num_tracks]
            features_batch = sp.tracks(tracks_batch)
            track_features_dict.update({ track_id : {'explicit': features['explicit'], 'release_date': features['album']['release_date']} 
                            for track_id, features in zip(tracks_batch, features_batch['tracks']) })
        except Exception:
            print('error:', start)
        start += num_tracks
        
    return track_features_dict

In [15]:
track_features = get_track_features(all_track_ids)

getting from 0 to 50
getting from 50 to 100
getting from 100 to 150
getting from 150 to 200
getting from 200 to 250
getting from 250 to 300
getting from 300 to 350
getting from 350 to 400
getting from 400 to 450
getting from 450 to 500
getting from 500 to 550
getting from 550 to 600
getting from 600 to 650
getting from 650 to 700
getting from 700 to 750
getting from 750 to 800
getting from 800 to 850
getting from 850 to 900
getting from 900 to 950
getting from 950 to 1000
getting from 1000 to 1050
getting from 1050 to 1100
getting from 1100 to 1150
getting from 1150 to 1200
getting from 1200 to 1250
getting from 1250 to 1300
getting from 1300 to 1350
getting from 1350 to 1400
getting from 1400 to 1450
getting from 1450 to 1500
getting from 1500 to 1550
getting from 1550 to 1600
getting from 1600 to 1650
getting from 1650 to 1700
getting from 1700 to 1750
getting from 1750 to 1800
getting from 1800 to 1850
getting from 1850 to 1900
getting from 1900 to 1950
getting from 1950 to 2000
get

* We check that we fetched the features for all the songs successfully.

In [16]:
print(len(audio_features), len(track_features))

132940 132940


* We also check for None results from the API.
* First, we check for the high level audio features and delete if necessary.

In [17]:
test_audio_features = audio_features.copy()

In [18]:
for index, s in test_audio_features.items():
    if test_audio_features[index] is None:
        del audio_features[index]

* We create the DataFrame with the high level audio feautres data.
* We keep only the columns that might be useful for our task.
* The selected columns are:
    1. **acousticness**: How much acoustic is the track.
    2. **danceability**: How suitable is the track for dancing.
    3. **duration_ms**: The duration of the track in ms.
    4. **energy**: A perceptual measure of the intensity and the activity, expressed by the track.
    5. **instrumentalness** : The likelihood that the track contains no vocals and only instruments.
    6. **key**: The key that the track is in. There are 12 integers that map to pitches using the [Pitch class notation](https://en.wikipedia.org/wiki/Pitch_class).
    7. **liveliness**: The probability that the track was performes live.
    8. **loudness**: The overall loudness of a track in decibels (dB).
    9. **mode**: The modality of the track (major or minor).
    10. **speechiness**: The volume of spoken words in the track.
    11. **tempo**: The overall estimated tempo of the track in beats per minute (BPM).
    12. **time_signature**: How many beats are in each bar of the track.
    13. **valence**: A measure describing the musical positiveness conveyed by the track *(our target value)*. 

*More information about the attributes [here](https://developer.spotify.com/documentation/web-api/reference/#/operations/get-several-audio-features).*

In [19]:
tracks = pd.DataFrame.from_dict(audio_features, orient='index')
tracks.drop(columns=['type', 'id', 'uri', 'track_href', 'analysis_url'], inplace=True)
tracks

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
4a2qoYY6xINDXtUOBFbJ6d,0.122,0.0182,2,-27.890,1,0.0383,0.967000,0.834000,0.0735,0.0473,77.590,477733,3
238ByVSDcz260ewWOEgAQf,0.415,0.1320,0,-17.186,1,0.0436,0.981000,0.179000,0.1650,0.4180,92.253,200040,4
533Q9BiC93uzp3AnvMNLmc,0.769,0.4360,7,-5.716,1,0.0505,0.715000,0.000000,0.1700,0.9710,119.675,297494,3
3DytidwwDXv5gZVilj2lO1,0.559,0.7200,0,-5.948,1,0.0302,0.111000,0.000105,0.3280,0.6830,125.139,228133,4
1Xkpw8X6WVmkditTPZ7YSk,0.481,0.3870,2,-16.633,1,0.0416,0.901000,0.820000,0.1560,0.7400,110.564,112039,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...
4K87CebUVshxBCfwv4Gjs8,0.704,0.2350,9,-17.272,1,0.9530,0.750000,0.000000,0.3700,0.4010,131.762,329497,4
36PNa2n55qLfQd7KLXyeoy,0.723,0.7240,0,-6.378,1,0.0304,0.294000,0.000000,0.1210,0.9270,117.990,251787,4
0ukMJRQa8UoAbtNVIhEgYE,0.543,0.5640,2,-10.260,1,0.0404,0.047100,0.000000,0.0936,0.8460,92.839,103173,4
2iguaRelnM3P06EOcBjsv9,0.497,0.6540,0,-9.714,1,0.0296,0.000507,0.390000,0.2030,0.8300,144.099,125453,3


* We follow the same procedure for the track features.

In [20]:
test_track_features = track_features.copy()

In [21]:
for index, s in test_track_features.items():
    if test_track_features[index] is None:
        del track_features[index]

* Again we do not keep all the columns, but only those that might be useful for our task.
* The selected columns are:
    1. **explicit**: Whether or not the track has explicit lyrics (transformed to binary (0,1)).
    2. **year**: The year that the track was released (created from the *release_date*). 

*More information about the attributes [here](https://developer.spotify.com/documentation/web-api/reference/#/operations/get-track).*

In [22]:
extra_features = pd.DataFrame.from_dict(track_features, orient='index')
extra_features['explicit'] = np.where(extra_features['explicit'] == True, 1, 0)
extra_features['year'] = extra_features['release_date'].apply(lambda x: int(x[0:4]))
extra_features.drop(columns='release_date', inplace=True)
extra_features

Unnamed: 0,explicit,year
4a2qoYY6xINDXtUOBFbJ6d,0,1985
238ByVSDcz260ewWOEgAQf,0,1949
533Q9BiC93uzp3AnvMNLmc,0,2004
3DytidwwDXv5gZVilj2lO1,0,1979
1Xkpw8X6WVmkditTPZ7YSk,0,1959
...,...,...
4K87CebUVshxBCfwv4Gjs8,0,1935
36PNa2n55qLfQd7KLXyeoy,0,2008
0ukMJRQa8UoAbtNVIhEgYE,0,1968
2iguaRelnM3P06EOcBjsv9,0,1947


* We merge the two DataFrames to acquire the complete DataFrame for the track and high level audio features.

In [24]:
tracks = tracks.merge(extra_features, left_index=True, right_index=True, how='inner')
tracks

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,explicit,year
4a2qoYY6xINDXtUOBFbJ6d,0.122,0.0182,2,-27.890,1,0.0383,0.967000,0.834000,0.0735,0.0473,77.590,477733,3,0,1985
238ByVSDcz260ewWOEgAQf,0.415,0.1320,0,-17.186,1,0.0436,0.981000,0.179000,0.1650,0.4180,92.253,200040,4,0,1949
533Q9BiC93uzp3AnvMNLmc,0.769,0.4360,7,-5.716,1,0.0505,0.715000,0.000000,0.1700,0.9710,119.675,297494,3,0,2004
3DytidwwDXv5gZVilj2lO1,0.559,0.7200,0,-5.948,1,0.0302,0.111000,0.000105,0.3280,0.6830,125.139,228133,4,0,1979
1Xkpw8X6WVmkditTPZ7YSk,0.481,0.3870,2,-16.633,1,0.0416,0.901000,0.820000,0.1560,0.7400,110.564,112039,4,0,1959
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4K87CebUVshxBCfwv4Gjs8,0.704,0.2350,9,-17.272,1,0.9530,0.750000,0.000000,0.3700,0.4010,131.762,329497,4,0,1935
36PNa2n55qLfQd7KLXyeoy,0.723,0.7240,0,-6.378,1,0.0304,0.294000,0.000000,0.1210,0.9270,117.990,251787,4,0,2008
0ukMJRQa8UoAbtNVIhEgYE,0.543,0.5640,2,-10.260,1,0.0404,0.047100,0.000000,0.0936,0.8460,92.839,103173,4,0,1968
2iguaRelnM3P06EOcBjsv9,0.497,0.6540,0,-9.714,1,0.0296,0.000507,0.390000,0.2030,0.8300,144.099,125453,3,0,1947


* We also make some checks to ensure that there are no NaN values and that all the ids are unique.

In [7]:
tracks.isna().sum()

danceability        0
energy              0
key                 0
loudness            0
mode                0
speechiness         0
acousticness        0
instrumentalness    0
liveness            0
valence             0
tempo               0
duration_ms         0
time_signature      0
explicit            0
year                0
dtype: int64

In [9]:
len(tracks.index.unique()) == len(tracks)

True

* Lastly, because the predicting models will be evaluated by specific track set, we discard all the tracks that exist in the final evaluation set.

In [6]:
test_ids = [line.strip() for line in open('data/spotify_ids.txt')]
len(test_ids)

1162

In [30]:
excluded_ids = tracks.index.isin(test_ids)
tracks = tracks.loc[~excluded_ids]
tracks

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,explicit,year
4a2qoYY6xINDXtUOBFbJ6d,0.122,0.0182,2,-27.890,1,0.0383,0.967000,0.834000,0.0735,0.0473,77.590,477733,3,0,1985
238ByVSDcz260ewWOEgAQf,0.415,0.1320,0,-17.186,1,0.0436,0.981000,0.179000,0.1650,0.4180,92.253,200040,4,0,1949
533Q9BiC93uzp3AnvMNLmc,0.769,0.4360,7,-5.716,1,0.0505,0.715000,0.000000,0.1700,0.9710,119.675,297494,3,0,2004
3DytidwwDXv5gZVilj2lO1,0.559,0.7200,0,-5.948,1,0.0302,0.111000,0.000105,0.3280,0.6830,125.139,228133,4,0,1979
1Xkpw8X6WVmkditTPZ7YSk,0.481,0.3870,2,-16.633,1,0.0416,0.901000,0.820000,0.1560,0.7400,110.564,112039,4,0,1959
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4K87CebUVshxBCfwv4Gjs8,0.704,0.2350,9,-17.272,1,0.9530,0.750000,0.000000,0.3700,0.4010,131.762,329497,4,0,1935
36PNa2n55qLfQd7KLXyeoy,0.723,0.7240,0,-6.378,1,0.0304,0.294000,0.000000,0.1210,0.9270,117.990,251787,4,0,2008
0ukMJRQa8UoAbtNVIhEgYE,0.543,0.5640,2,-10.260,1,0.0404,0.047100,0.000000,0.0936,0.8460,92.839,103173,4,0,1968
2iguaRelnM3P06EOcBjsv9,0.497,0.6540,0,-9.714,1,0.0296,0.000507,0.390000,0.2030,0.8300,144.099,125453,3,0,1947


* We can see now that there no common ids.

In [50]:
# Code obtained from Stack Overflow, https://stackoverflow.com/questions/2864842/common-elements-comparison-between-2-lists
list(set(test_ids).intersection(list(tracks.index)))

[]

* We export the data to a csv file.

In [32]:
tracks.to_csv('data/tracks.csv')

### Low Level Audio Features

* To enrich our data we will use some low level audio features that provide more detailed information about the tracks.
* Specifically we will concetrate to the segments of a track.
* Each segment contains a roughly conisistent sound throughout its duration.

* From the segments we will obtain three attributes:
    1. **Pitches**: Pitches provide a "chroma" vector that describes the dominance of every pitch (12 in total) in the chromatic scale. 
    2. **Timbre**: Timbre is the quality of a musical note or sound that distinguishes different types of musical instruments, or voices. It is also reffered as sound color. Timbre provided by Spotify's API is the result of a PCA and it's presented by a 12 value vector.  
    3. **Loudness Attributes**: The loudness attributes are, *loudness_start*, which indicates the onset loudness of the segment, *loudness_max*, which indicates the peak loudness of the segment and *loudness_max_time* which indicates the the segment-relative offset of the segment peak loudness in seconds. All together describe the "attack" of the song.

    *More information about the attributes [here](https://developer.spotify.com/documentation/web-api/reference/#/operations/get-audio-analysis).*

* To facilitate the data exctraction we create three functions.


* The first function returns a 2D numpy array with the pitches of every segment.

In [58]:
def get_pitch(segments):
    pitch_a = np.empty((0,12), int)
    for i in segments:
        pitch_a = np.append(pitch_a, np.array([i['pitches']]), axis=0)
    return pitch_a

* The second function returns a 2D numpy array with the timbre of every segment.

In [59]:
def get_timbre(segments):
    timbre_a = np.empty((0,12), int)
    for i in segments:
        timbre_a = np.append(timbre_a, np.array([i['timbre']]), axis=0)
    return timbre_a

* The last function returns a 2D numpy array with the loudness attributes of every segment.

In [60]:
def get_loudness(segments):
    loudness_a = np.empty((0,3), int)
    for i in segments:
        l = [i['loudness_start'], i['loudness_max_time'], i['loudness_max']]
        loudness_a = np.append(loudness_a, np.array([l]), axis=0)
    return loudness_a

* Also, we create the function that performs the requests.
* The function uses the `sp.audio_analysis` method to acquire the data.
* Because this endpoint does not provide batch requests, we will acquire the audio analysis for one track per request.
* The function returns three dictionaries with the previously mentioned 2D arrays, for each track.

In [71]:
def get_audio_analysis(track_ids, limit):    
    state = 0
    audio_analysis_pitch = {}
    audio_analysis_timbre = {}
    audio_analysis_loudness = {}

    while len(audio_analysis_pitch) < limit:

        try:
            track = track_ids[state]

            track_analysis = sp.audio_analysis(track)
            pitches = get_pitch(track_analysis['segments'])
            timbre = get_timbre(track_analysis['segments'])
            loudness = get_loudness(track_analysis['segments'])

            audio_analysis_pitch.update({ track : pitches})
            audio_analysis_timbre.update({ track : timbre})
            audio_analysis_loudness.update({ track : loudness})

        except Exception:
            print('error:', state)

        state += 1
    
    return audio_analysis_pitch, audio_analysis_timbre, audio_analysis_loudness

* Due to lack of resources we will obtain the data for 60,000 tracks (~5 hours).

*(Depending on your RAM you may need to obtain less tracks or obtain them in batches. This was performed by using 16 GB)*


In [None]:
audio_analysis_pitch, audio_analysis_timbre, audio_analysis_loudness = get_audio_analysis(all_track_ids, 60000)

* We remove all the tracks that exist in the final evaluation set.

In [7]:
collected_ids = list(audio_analysis_pitch.keys())

for track_id in collected_ids:
    if track_id in test_ids:
        del audio_analysis_pitch[track_id]
        del audio_analysis_timbre[track_id]
        del audio_analysis_loudness[track_id]

In [10]:
print(len(audio_analysis_pitch), len(audio_analysis_timbre), len(audio_analysis_loudness))

59960 59960 59960


* In the analysis notebook the pitches and the timbre will be tranformed and used as input in the Convolutional Neural Network.
* To save them and use as less space as possible we keep only the 150 first segments.
* First, we check for the tracks that have 150 and over segments.

In [8]:
accept_list = []

for track_id, a in audio_analysis_pitch.items():
    dim = a.shape[0]
    if (dim >= 150):
        accept_list.append(track_id)
        
len(accept_list)

59559

* Then we perform the downsizing on these tracks.

In [9]:
for i in accept_list:
    audio_analysis_timbre[i] = audio_analysis_timbre[i][0:150,]
    audio_analysis_pitch[i] = audio_analysis_pitch[i][0:150,]

* We save the results in .npy files.
* In this way we can load the data and make further transormations at any time.

In [11]:
np.save('data/pitch.npy',audio_analysis_pitch)
np.save('data/timbre.npy',audio_analysis_timbre)

* We will not only use the raw pitches and timbre.
* For each of the three attributes (pitches, timbre, loudness attributes) we find the mean values, from all the segments of a track.
* As a result we get:
    1. For the *pitches*, a 12 value vector with the mean value of each element.
    2. For the *timbre*, a 12 value vector with the mean of each element.
    3. For the *loudness attributes*, a 3 value vector with the mean of each attribute. 

In [39]:
audio_analysis_pitch_mean = {}
for track_id, analysis in audio_analysis_pitch.items():
    pitch_mean = np.mean(analysis, axis=0)
    audio_analysis_pitch_mean.update({ track_id : pitch_mean})

In [40]:
audio_analysis_timbre_mean = {}
for track_id, analysis in audio_analysis_timbre.items():
    timbre_mean = np.mean(analysis, axis=0)
    audio_analysis_timbre_mean.update({ track_id : timbre_mean})

In [41]:
audio_analysis_loudness_mean = {}
for track_id, analysis in audio_analysis_loudness.items():
    loudness_mean = np.mean(analysis, axis=0)
    audio_analysis_loudness_mean.update({ track_id : loudness_mean})

* Next, we transform the 3 vectors with the mean values to DataFrames. 

In [42]:
pitch_analysis = pd.DataFrame.from_dict(audio_analysis_pitch_mean, orient='index')
pitch_analysis = pitch_analysis.add_prefix('pitch_')
pitch_analysis

Unnamed: 0,pitch_0,pitch_1,pitch_2,pitch_3,pitch_4,pitch_5,pitch_6,pitch_7,pitch_8,pitch_9,pitch_10,pitch_11
238ByVSDcz260ewWOEgAQf,0.460020,0.201881,0.240555,0.169500,0.334406,0.346195,0.259822,0.380519,0.182167,0.226228,0.162203,0.276561
533Q9BiC93uzp3AnvMNLmc,0.360192,0.385989,0.551523,0.210450,0.270202,0.172840,0.367805,0.393697,0.203855,0.374937,0.216645,0.352609
3DytidwwDXv5gZVilj2lO1,0.846527,0.426190,0.288279,0.187522,0.303811,0.432586,0.225100,0.355768,0.215971,0.292785,0.277842,0.427218
1Xkpw8X6WVmkditTPZ7YSk,0.259775,0.444462,0.419316,0.237481,0.421523,0.316850,0.162951,0.329795,0.294014,0.365389,0.247331,0.184245
49C6EGQhCUSgyADHYvJ7ez,0.348984,0.402577,0.329118,0.272448,0.513416,0.340941,0.331070,0.469664,0.486889,0.565561,0.319361,0.282272
...,...,...,...,...,...,...,...,...,...,...,...,...
5XgCk8ikjfTx02rgxNULEy,0.179790,0.424339,0.463222,0.172609,0.244581,0.182408,0.344043,0.235784,0.211337,0.425495,0.149920,0.193404
5K99RGHiLcXAEgrPvw5x8B,0.283515,0.459829,0.238144,0.115733,0.212125,0.548562,0.185078,0.187220,0.262938,0.482391,0.838862,0.157858
1Um0R2yaEzp6EmfUhmWkzc,0.371257,0.360626,0.358172,0.328551,0.216945,0.205512,0.229466,0.449279,0.393903,0.255304,0.286725,0.231071
1p3vijWl4pezgUtyCmCVln,0.399122,0.421652,0.236681,0.374762,0.200595,0.232562,0.289443,0.191954,0.392973,0.296376,0.508678,0.377908


In [43]:
timbre_analysis = pd.DataFrame.from_dict(audio_analysis_timbre_mean, orient='index')
timbre_analysis = timbre_analysis.add_prefix('timbre_')
timbre_analysis

Unnamed: 0,timbre_0,timbre_1,timbre_2,timbre_3,timbre_4,timbre_5,timbre_6,timbre_7,timbre_8,timbre_9,timbre_10,timbre_11
238ByVSDcz260ewWOEgAQf,37.392476,-60.129075,31.480129,-10.155410,25.239797,-16.627843,-13.930937,0.223931,-6.450813,-5.146013,-6.132500,4.896064
533Q9BiC93uzp3AnvMNLmc,49.336098,27.805093,28.391246,7.129908,46.889570,-4.948037,-20.320301,8.109223,3.246466,2.691535,-12.141557,-1.579897
3DytidwwDXv5gZVilj2lO1,49.779793,66.461928,32.378580,-6.702217,19.016137,-20.668999,7.369926,6.275858,0.816352,-0.849888,-7.378406,0.654819
1Xkpw8X6WVmkditTPZ7YSk,38.458846,-36.962499,32.025077,27.126580,34.563018,-28.811045,26.761367,3.937700,-0.728032,11.823055,-16.817598,2.946972
49C6EGQhCUSgyADHYvJ7ez,48.275552,46.146274,17.062510,-7.208005,-1.829843,-20.445113,4.539008,13.935807,-9.770754,-3.445520,-9.008125,-12.408444
...,...,...,...,...,...,...,...,...,...,...,...,...
5XgCk8ikjfTx02rgxNULEy,40.067333,-33.145186,4.390391,-22.575345,5.941944,-21.037345,-16.483287,-2.807175,1.498037,0.754020,-6.064358,2.763354
5K99RGHiLcXAEgrPvw5x8B,40.374324,6.136207,31.638784,-10.354102,15.277647,-22.826544,-13.960940,-5.026567,-11.716491,-6.564302,-3.742398,10.847945
1Um0R2yaEzp6EmfUhmWkzc,42.567093,-22.582350,7.295955,-22.177771,40.039101,-17.311905,-28.104022,-3.548257,21.026206,6.822775,-6.141358,11.447684
1p3vijWl4pezgUtyCmCVln,44.230405,-31.686588,18.077242,-3.326328,44.804192,-22.838964,8.894932,-0.502899,-5.193930,-8.395782,-8.869433,-0.952230


In [44]:
loudness_analysis = pd.DataFrame.from_dict(audio_analysis_loudness_mean, orient='index')
loudness_analysis.columns = ['loudness_start', 'loudness_max_time', 'loudness_max']
loudness_analysis

Unnamed: 0,loudness_start,loudness_max_time,loudness_max
238ByVSDcz260ewWOEgAQf,-26.655862,0.083081,-19.040579
533Q9BiC93uzp3AnvMNLmc,-16.976020,0.063622,-5.893591
3DytidwwDXv5gZVilj2lO1,-13.735490,0.063940,-6.919745
1Xkpw8X6WVmkditTPZ7YSk,-26.885414,0.042667,-17.600172
49C6EGQhCUSgyADHYvJ7ez,-15.827730,0.059064,-8.513015
...,...,...,...
5XgCk8ikjfTx02rgxNULEy,-22.969223,0.085119,-17.191883
5K99RGHiLcXAEgrPvw5x8B,-22.160860,0.079966,-17.219731
1Um0R2yaEzp6EmfUhmWkzc,-21.376024,0.084418,-14.154549
1p3vijWl4pezgUtyCmCVln,-19.955407,0.072397,-11.868696


* We merge all the DataFrames with the low level audio features.

In [45]:
audio_analysis = pitch_analysis.merge(timbre_analysis.merge(loudness_analysis, left_index=True, right_index=True, how='inner'), left_index=True, right_index=True, how='inner')
audio_analysis

Unnamed: 0,pitch_0,pitch_1,pitch_2,pitch_3,pitch_4,pitch_5,pitch_6,pitch_7,pitch_8,pitch_9,...,timbre_5,timbre_6,timbre_7,timbre_8,timbre_9,timbre_10,timbre_11,loudness_start,loudness_max_time,loudness_max
238ByVSDcz260ewWOEgAQf,0.460020,0.201881,0.240555,0.169500,0.334406,0.346195,0.259822,0.380519,0.182167,0.226228,...,-16.627843,-13.930937,0.223931,-6.450813,-5.146013,-6.132500,4.896064,-26.655862,0.083081,-19.040579
533Q9BiC93uzp3AnvMNLmc,0.360192,0.385989,0.551523,0.210450,0.270202,0.172840,0.367805,0.393697,0.203855,0.374937,...,-4.948037,-20.320301,8.109223,3.246466,2.691535,-12.141557,-1.579897,-16.976020,0.063622,-5.893591
3DytidwwDXv5gZVilj2lO1,0.846527,0.426190,0.288279,0.187522,0.303811,0.432586,0.225100,0.355768,0.215971,0.292785,...,-20.668999,7.369926,6.275858,0.816352,-0.849888,-7.378406,0.654819,-13.735490,0.063940,-6.919745
1Xkpw8X6WVmkditTPZ7YSk,0.259775,0.444462,0.419316,0.237481,0.421523,0.316850,0.162951,0.329795,0.294014,0.365389,...,-28.811045,26.761367,3.937700,-0.728032,11.823055,-16.817598,2.946972,-26.885414,0.042667,-17.600172
49C6EGQhCUSgyADHYvJ7ez,0.348984,0.402577,0.329118,0.272448,0.513416,0.340941,0.331070,0.469664,0.486889,0.565561,...,-20.445113,4.539008,13.935807,-9.770754,-3.445520,-9.008125,-12.408444,-15.827730,0.059064,-8.513015
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5XgCk8ikjfTx02rgxNULEy,0.179790,0.424339,0.463222,0.172609,0.244581,0.182408,0.344043,0.235784,0.211337,0.425495,...,-21.037345,-16.483287,-2.807175,1.498037,0.754020,-6.064358,2.763354,-22.969223,0.085119,-17.191883
5K99RGHiLcXAEgrPvw5x8B,0.283515,0.459829,0.238144,0.115733,0.212125,0.548562,0.185078,0.187220,0.262938,0.482391,...,-22.826544,-13.960940,-5.026567,-11.716491,-6.564302,-3.742398,10.847945,-22.160860,0.079966,-17.219731
1Um0R2yaEzp6EmfUhmWkzc,0.371257,0.360626,0.358172,0.328551,0.216945,0.205512,0.229466,0.449279,0.393903,0.255304,...,-17.311905,-28.104022,-3.548257,21.026206,6.822775,-6.141358,11.447684,-21.376024,0.084418,-14.154549
1p3vijWl4pezgUtyCmCVln,0.399122,0.421652,0.236681,0.374762,0.200595,0.232562,0.289443,0.191954,0.392973,0.296376,...,-22.838964,8.894932,-0.502899,-5.193930,-8.395782,-8.869433,-0.952230,-19.955407,0.072397,-11.868696


* Lastly, we merge the DataFrame that contains the high level audio and track features with the DataFrame that contains the low level audio features.

In [46]:
low_level_tracks = audio_analysis.merge(tracks, left_index=True, right_index=True, how='inner')
low_level_tracks

Unnamed: 0,pitch_0,pitch_1,pitch_2,pitch_3,pitch_4,pitch_5,pitch_6,pitch_7,pitch_8,pitch_9,...,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,explicit,year
238ByVSDcz260ewWOEgAQf,0.460020,0.201881,0.240555,0.169500,0.334406,0.346195,0.259822,0.380519,0.182167,0.226228,...,0.0436,0.981000,0.179000,0.1650,0.4180,92.253,200040,4,0,1949
533Q9BiC93uzp3AnvMNLmc,0.360192,0.385989,0.551523,0.210450,0.270202,0.172840,0.367805,0.393697,0.203855,0.374937,...,0.0505,0.715000,0.000000,0.1700,0.9710,119.675,297494,3,0,2004
3DytidwwDXv5gZVilj2lO1,0.846527,0.426190,0.288279,0.187522,0.303811,0.432586,0.225100,0.355768,0.215971,0.292785,...,0.0302,0.111000,0.000105,0.3280,0.6830,125.139,228133,4,0,1979
1Xkpw8X6WVmkditTPZ7YSk,0.259775,0.444462,0.419316,0.237481,0.421523,0.316850,0.162951,0.329795,0.294014,0.365389,...,0.0416,0.901000,0.820000,0.1560,0.7400,110.564,112039,4,0,1959
49C6EGQhCUSgyADHYvJ7ez,0.348984,0.402577,0.329118,0.272448,0.513416,0.340941,0.331070,0.469664,0.486889,0.565561,...,0.0370,0.000942,0.000008,0.1460,0.8960,152.098,159301,4,0,1969
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5XgCk8ikjfTx02rgxNULEy,0.179790,0.424339,0.463222,0.172609,0.244581,0.182408,0.344043,0.235784,0.211337,0.425495,...,0.0287,0.859000,0.156000,0.0922,0.4310,85.578,186440,4,0,1997
5K99RGHiLcXAEgrPvw5x8B,0.283515,0.459829,0.238144,0.115733,0.212125,0.548562,0.185078,0.187220,0.262938,0.482391,...,0.0273,0.867000,0.354000,0.1130,0.0603,152.970,190827,4,0,1973
1Um0R2yaEzp6EmfUhmWkzc,0.371257,0.360626,0.358172,0.328551,0.216945,0.205512,0.229466,0.449279,0.393903,0.255304,...,0.0338,0.976000,0.000178,0.1820,0.1420,112.643,155387,4,0,1944
1p3vijWl4pezgUtyCmCVln,0.399122,0.421652,0.236681,0.374762,0.200595,0.232562,0.289443,0.191954,0.392973,0.296376,...,0.0744,0.260000,0.000542,0.1010,0.2540,139.886,418507,4,1,2012


* Before exporting the data we make some checks to ensure that there are no NaN values and that all the ids are unique.

In [93]:
low_level_tracks.isna().sum()

pitch_0              0
pitch_1              0
pitch_2              0
pitch_3              0
pitch_4              0
pitch_5              0
pitch_6              0
pitch_7              0
pitch_8              0
pitch_9              0
pitch_10             0
pitch_11             0
timbre_0             0
timbre_1             0
timbre_2             0
timbre_3             0
timbre_4             0
timbre_5             0
timbre_6             0
timbre_7             0
timbre_8             0
timbre_9             0
timbre_10            0
timbre_11            0
loudness_start       0
loudness_max_time    0
loudness_max         0
danceability         0
energy               0
key                  0
loudness             0
mode                 0
speechiness          0
acousticness         0
instrumentalness     0
liveness             0
valence              0
tempo                0
duration_ms          0
time_signature       0
explicit             0
year                 0
dtype: int64

In [94]:
len(low_level_tracks.index.unique()) == len(low_level_tracks)

True

* We export the data to a csv file.

In [47]:
low_level_tracks.to_csv('data/low_level_tracks.csv')

### Final Evaluation Data

The prediction models will be evaluated by a final data set. As a result we have to follow the same procedure to obtain the evaluation data.

* First, we acquire the high level audio features.

In [51]:
audio_features_eval = get_audio_features(test_ids)

getting from 0 to 100
getting from 100 to 200
getting from 200 to 300
getting from 300 to 400
getting from 400 to 500
getting from 500 to 600
getting from 600 to 700
getting from 700 to 800
getting from 800 to 900
getting from 900 to 1000
getting from 1000 to 1100
getting from 1100 to 1200


In [66]:
tracks_eval = pd.DataFrame.from_dict(audio_features_eval, orient='index')
tracks_eval.drop(columns=['type', 'id', 'uri', 'track_href', 'analysis_url'], inplace=True)
tracks_eval

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
7lPN2DXiMsVn7XUKtOW1CS,0.585,0.436,10,-8.761,1,0.0601,0.72100,0.000013,0.1050,0.132,143.874,242014,4
5QO79kh1waicV47BqGRL3g,0.680,0.826,0,-5.487,1,0.0309,0.02120,0.000012,0.5430,0.644,118.051,215627,4
0VjIjW4GlUZAMYd2vXMi3b,0.514,0.730,1,-5.934,1,0.0598,0.00146,0.000095,0.0897,0.334,171.005,200040,4
4MzXwWMhyBbmu6hOcLVD49,0.731,0.573,4,-10.059,0,0.0544,0.40100,0.000052,0.1130,0.145,109.928,205090,4
5Kskr9LcNYa0tpt5f0ZEJx,0.907,0.393,4,-7.636,0,0.0539,0.45100,0.000001,0.1350,0.202,104.949,205458,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...
4lUmnwRybYH7mMzf16xB0y,0.596,0.650,9,-5.167,1,0.3370,0.13800,0.000000,0.1400,0.188,133.997,257428,4
1fzf9Aad4y1RWrmwosAK5y,0.588,0.850,4,-6.431,1,0.0318,0.16800,0.002020,0.0465,0.768,93.003,187310,4
3E3pb3qH11iny6TFDJvsg5,0.754,0.660,0,-6.811,1,0.2670,0.17900,0.000000,0.1940,0.316,83.000,209299,4
3yTkoTuiKRGL2VAlQd7xsC,0.584,0.836,0,-4.925,1,0.0790,0.05580,0.000000,0.0663,0.484,104.973,202204,4


* Next, we acquire the track features.

In [52]:
track_features_eval = get_track_features(test_ids)

getting from 0 to 50
getting from 50 to 100
getting from 100 to 150
getting from 150 to 200
getting from 200 to 250
getting from 250 to 300
getting from 300 to 350
getting from 350 to 400
getting from 400 to 450
getting from 450 to 500
getting from 500 to 550
getting from 550 to 600
getting from 600 to 650
getting from 650 to 700
getting from 700 to 750
getting from 750 to 800
getting from 800 to 850
getting from 850 to 900
getting from 900 to 950
getting from 950 to 1000
getting from 1000 to 1050
getting from 1050 to 1100
getting from 1100 to 1150
getting from 1150 to 1200


In [67]:
extra_features_eval = pd.DataFrame.from_dict(track_features_eval, orient='index')
extra_features_eval['explicit'] = np.where(extra_features_eval['explicit'] == True, 1, 0)
extra_features_eval['year'] = extra_features_eval['release_date'].apply(lambda x: int(x[0:4]))
extra_features_eval.drop(columns='release_date', inplace=True)
extra_features_eval

Unnamed: 0,explicit,year
7lPN2DXiMsVn7XUKtOW1CS,1,2021
5QO79kh1waicV47BqGRL3g,1,2020
0VjIjW4GlUZAMYd2vXMi3b,0,2020
4MzXwWMhyBbmu6hOcLVD49,1,2020
5Kskr9LcNYa0tpt5f0ZEJx,1,2021
...,...,...
4lUmnwRybYH7mMzf16xB0y,1,2021
1fzf9Aad4y1RWrmwosAK5y,0,2021
3E3pb3qH11iny6TFDJvsg5,1,2021
3yTkoTuiKRGL2VAlQd7xsC,0,2021


* We merge the two DataFrames to obtain the high level audio and track features united.

In [68]:
tracks_eval = tracks_eval.merge(extra_features_eval, left_index=True, right_index=True, how='inner')
tracks_eval

Unnamed: 0,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,explicit,year
7lPN2DXiMsVn7XUKtOW1CS,0.585,0.436,10,-8.761,1,0.0601,0.72100,0.000013,0.1050,0.132,143.874,242014,4,1,2021
5QO79kh1waicV47BqGRL3g,0.680,0.826,0,-5.487,1,0.0309,0.02120,0.000012,0.5430,0.644,118.051,215627,4,1,2020
0VjIjW4GlUZAMYd2vXMi3b,0.514,0.730,1,-5.934,1,0.0598,0.00146,0.000095,0.0897,0.334,171.005,200040,4,0,2020
4MzXwWMhyBbmu6hOcLVD49,0.731,0.573,4,-10.059,0,0.0544,0.40100,0.000052,0.1130,0.145,109.928,205090,4,1,2020
5Kskr9LcNYa0tpt5f0ZEJx,0.907,0.393,4,-7.636,0,0.0539,0.45100,0.000001,0.1350,0.202,104.949,205458,4,1,2021
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4lUmnwRybYH7mMzf16xB0y,0.596,0.650,9,-5.167,1,0.3370,0.13800,0.000000,0.1400,0.188,133.997,257428,4,1,2021
1fzf9Aad4y1RWrmwosAK5y,0.588,0.850,4,-6.431,1,0.0318,0.16800,0.002020,0.0465,0.768,93.003,187310,4,0,2021
3E3pb3qH11iny6TFDJvsg5,0.754,0.660,0,-6.811,1,0.2670,0.17900,0.000000,0.1940,0.316,83.000,209299,4,1,2021
3yTkoTuiKRGL2VAlQd7xsC,0.584,0.836,0,-4.925,1,0.0790,0.05580,0.000000,0.0663,0.484,104.973,202204,4,0,2021


* Furthermore, we acquire the low level audio features.
* We edit the function that performs the requests to work for a specific list of tracks (and not until a limit).

In [78]:
def get_audio_analysis_specific(track_ids):    
    audio_analysis_pitch = {}
    audio_analysis_timbre = {}
    audio_analysis_loudness = {}

    for id in track_ids:
        try:
            track_analysis = sp.audio_analysis(id)
            pitches = get_pitch(track_analysis['segments'])
            timbre = get_timbre(track_analysis['segments'])
            loudness = get_loudness(track_analysis['segments'])

            audio_analysis_pitch.update({ id : pitches})
            audio_analysis_timbre.update({ id : timbre})
            audio_analysis_loudness.update({ id : loudness})

        except Exception:
            print('error:', id)
    
    return audio_analysis_pitch, audio_analysis_timbre, audio_analysis_loudness

In [79]:
audio_analysis_pitch_eval, audio_analysis_timbre_eval, audio_analysis_loudness_eval = get_audio_analysis_specific(test_ids)

HTTP Error for GET to https://api.spotify.com/v1/audio-analysis/4LaGu95Ui2s4vprSQYWUAZ with Params: {} returned 404 due to analysis not found


error: 4LaGu95Ui2s4vprSQYWUAZ


* We see that one of the evaluation tracks has no low level analysis and as a result we will not use it during the final evaluation.

In [80]:
print(len(audio_analysis_pitch_eval), len(audio_analysis_timbre_eval), len(audio_analysis_loudness_eval))

1161 1161 1161


* We perform the same transformations as before (calculate means and merge), in order to obtain the final DataFrame that has all the necesssary data.

* Mean calculations:

In [81]:
audio_analysis_pitch_mean_eval = {}
for track_id, analysis in audio_analysis_pitch_eval.items():
    pitch_mean = np.mean(analysis, axis=0)
    audio_analysis_pitch_mean_eval.update({ track_id : pitch_mean})

In [82]:
audio_analysis_timbre_mean_eval = {}
for track_id, analysis in audio_analysis_timbre_eval.items():
    timbre_mean = np.mean(analysis, axis=0)
    audio_analysis_timbre_mean_eval.update({ track_id : timbre_mean})

In [83]:
audio_analysis_loudness_mean_eval = {}
for track_id, analysis in audio_analysis_loudness_eval.items():
    loudness_mean = np.mean(analysis, axis=0)
    audio_analysis_loudness_mean_eval.update({ track_id : loudness_mean})

* Merge procedures:

In [84]:
pitch_analysis_eval = pd.DataFrame.from_dict(audio_analysis_pitch_mean_eval, orient='index')
pitch_analysis_eval = pitch_analysis_eval.add_prefix('pitch_')
pitch_analysis_eval

Unnamed: 0,pitch_0,pitch_1,pitch_2,pitch_3,pitch_4,pitch_5,pitch_6,pitch_7,pitch_8,pitch_9,pitch_10,pitch_11
7lPN2DXiMsVn7XUKtOW1CS,0.349608,0.367562,0.381938,0.333165,0.233992,0.421520,0.217720,0.279433,0.162397,0.243067,0.512964,0.229582
5QO79kh1waicV47BqGRL3g,0.509999,0.369865,0.293584,0.260545,0.470775,0.272121,0.243054,0.378965,0.232045,0.332873,0.217322,0.355135
0VjIjW4GlUZAMYd2vXMi3b,0.657973,0.627336,0.334794,0.341640,0.292579,0.488248,0.285965,0.315812,0.264716,0.234030,0.281505,0.227051
4MzXwWMhyBbmu6hOcLVD49,0.533442,0.548317,0.340552,0.299635,0.430683,0.286694,0.319173,0.355134,0.270570,0.365329,0.251225,0.323886
5Kskr9LcNYa0tpt5f0ZEJx,0.428253,0.406315,0.398415,0.222132,0.477949,0.250945,0.362671,0.211838,0.106474,0.132667,0.120139,0.378395
...,...,...,...,...,...,...,...,...,...,...,...,...
4lUmnwRybYH7mMzf16xB0y,0.561508,0.683205,0.419908,0.403616,0.429285,0.359302,0.275779,0.266855,0.178643,0.214506,0.153354,0.288517
1fzf9Aad4y1RWrmwosAK5y,0.327824,0.477446,0.236671,0.310013,0.575318,0.267171,0.362626,0.286104,0.506351,0.591351,0.304501,0.369560
3E3pb3qH11iny6TFDJvsg5,0.511021,0.497643,0.347654,0.238971,0.327831,0.325581,0.199572,0.265358,0.228669,0.443037,0.218408,0.156399
3yTkoTuiKRGL2VAlQd7xsC,0.663608,0.480316,0.348203,0.273951,0.353918,0.437275,0.351312,0.498068,0.301556,0.348925,0.265853,0.265831


In [85]:
timbre_analysis_eval = pd.DataFrame.from_dict(audio_analysis_timbre_mean_eval, orient='index')
timbre_analysis_eval = timbre_analysis_eval.add_prefix('timbre_')
timbre_analysis_eval

Unnamed: 0,timbre_0,timbre_1,timbre_2,timbre_3,timbre_4,timbre_5,timbre_6,timbre_7,timbre_8,timbre_9,timbre_10,timbre_11
7lPN2DXiMsVn7XUKtOW1CS,44.286929,-24.384765,13.186668,-9.708527,33.463126,-16.275168,-0.111490,-0.116343,-2.815301,-2.546181,-9.382599,3.806793
5QO79kh1waicV47BqGRL3g,51.124150,27.238452,-7.260601,-13.447243,19.275827,-27.172416,3.210723,-6.942141,-7.276782,-3.543406,-9.379946,-2.194406
0VjIjW4GlUZAMYd2vXMi3b,50.461401,49.290388,-10.005773,-14.477796,12.029462,-27.334700,7.383236,-0.745243,8.223896,2.743308,-12.995984,-0.989638
4MzXwWMhyBbmu6hOcLVD49,43.681466,-53.936549,-4.694368,-5.696588,35.700039,-20.637354,18.525245,8.093477,-0.432036,0.537412,-13.412574,-7.004807
5Kskr9LcNYa0tpt5f0ZEJx,43.645008,7.541005,-36.346186,2.057222,48.877285,-12.869335,-2.818465,-14.925499,1.476694,2.807862,-21.408289,4.122929
...,...,...,...,...,...,...,...,...,...,...,...,...
4lUmnwRybYH7mMzf16xB0y,49.001178,20.498719,-34.752324,-11.186367,29.765570,-9.769350,-3.421517,0.537746,-6.164586,6.263427,-21.499195,-6.566107
1fzf9Aad4y1RWrmwosAK5y,49.467759,16.223403,-4.131437,-8.027241,9.179778,-26.533251,-5.507732,-5.732575,-7.295153,-1.181860,-5.934610,0.560949
3E3pb3qH11iny6TFDJvsg5,46.650649,21.901945,-17.097046,-4.784885,38.901842,-13.813517,4.519290,0.073836,-7.527253,1.242939,-20.744944,-4.064045
3yTkoTuiKRGL2VAlQd7xsC,51.227427,69.133421,5.102990,-11.223282,36.341432,-27.087110,-0.022057,-3.249742,-8.522278,5.259177,-13.858022,-6.516408


In [86]:
loudness_analysis_eval = pd.DataFrame.from_dict(audio_analysis_loudness_mean_eval, orient='index')
loudness_analysis_eval.columns = ['loudness_start', 'loudness_max_time', 'loudness_max']
loudness_analysis_eval

Unnamed: 0,loudness_start,loudness_max_time,loudness_max
7lPN2DXiMsVn7XUKtOW1CS,-20.136485,0.071430,-11.846137
5QO79kh1waicV47BqGRL3g,-12.366643,0.059288,-5.266731
0VjIjW4GlUZAMYd2vXMi3b,-13.010345,0.059692,-6.141568
4MzXwWMhyBbmu6hOcLVD49,-21.296216,0.059611,-12.379307
5Kskr9LcNYa0tpt5f0ZEJx,-23.091459,0.055798,-10.516077
...,...,...,...
4lUmnwRybYH7mMzf16xB0y,-16.965315,0.058101,-6.245689
1fzf9Aad4y1RWrmwosAK5y,-14.089263,0.049178,-7.443791
3E3pb3qH11iny6TFDJvsg5,-19.211222,0.061354,-8.765307
3yTkoTuiKRGL2VAlQd7xsC,-12.448574,0.062063,-5.485316


In [87]:
audio_analysis_eval = pitch_analysis_eval.merge(timbre_analysis_eval.merge(loudness_analysis_eval, left_index=True, right_index=True, how='inner'), left_index=True, right_index=True, how='inner')
audio_analysis_eval

Unnamed: 0,pitch_0,pitch_1,pitch_2,pitch_3,pitch_4,pitch_5,pitch_6,pitch_7,pitch_8,pitch_9,...,timbre_5,timbre_6,timbre_7,timbre_8,timbre_9,timbre_10,timbre_11,loudness_start,loudness_max_time,loudness_max
7lPN2DXiMsVn7XUKtOW1CS,0.349608,0.367562,0.381938,0.333165,0.233992,0.421520,0.217720,0.279433,0.162397,0.243067,...,-16.275168,-0.111490,-0.116343,-2.815301,-2.546181,-9.382599,3.806793,-20.136485,0.071430,-11.846137
5QO79kh1waicV47BqGRL3g,0.509999,0.369865,0.293584,0.260545,0.470775,0.272121,0.243054,0.378965,0.232045,0.332873,...,-27.172416,3.210723,-6.942141,-7.276782,-3.543406,-9.379946,-2.194406,-12.366643,0.059288,-5.266731
0VjIjW4GlUZAMYd2vXMi3b,0.657973,0.627336,0.334794,0.341640,0.292579,0.488248,0.285965,0.315812,0.264716,0.234030,...,-27.334700,7.383236,-0.745243,8.223896,2.743308,-12.995984,-0.989638,-13.010345,0.059692,-6.141568
4MzXwWMhyBbmu6hOcLVD49,0.533442,0.548317,0.340552,0.299635,0.430683,0.286694,0.319173,0.355134,0.270570,0.365329,...,-20.637354,18.525245,8.093477,-0.432036,0.537412,-13.412574,-7.004807,-21.296216,0.059611,-12.379307
5Kskr9LcNYa0tpt5f0ZEJx,0.428253,0.406315,0.398415,0.222132,0.477949,0.250945,0.362671,0.211838,0.106474,0.132667,...,-12.869335,-2.818465,-14.925499,1.476694,2.807862,-21.408289,4.122929,-23.091459,0.055798,-10.516077
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4lUmnwRybYH7mMzf16xB0y,0.561508,0.683205,0.419908,0.403616,0.429285,0.359302,0.275779,0.266855,0.178643,0.214506,...,-9.769350,-3.421517,0.537746,-6.164586,6.263427,-21.499195,-6.566107,-16.965315,0.058101,-6.245689
1fzf9Aad4y1RWrmwosAK5y,0.327824,0.477446,0.236671,0.310013,0.575318,0.267171,0.362626,0.286104,0.506351,0.591351,...,-26.533251,-5.507732,-5.732575,-7.295153,-1.181860,-5.934610,0.560949,-14.089263,0.049178,-7.443791
3E3pb3qH11iny6TFDJvsg5,0.511021,0.497643,0.347654,0.238971,0.327831,0.325581,0.199572,0.265358,0.228669,0.443037,...,-13.813517,4.519290,0.073836,-7.527253,1.242939,-20.744944,-4.064045,-19.211222,0.061354,-8.765307
3yTkoTuiKRGL2VAlQd7xsC,0.663608,0.480316,0.348203,0.273951,0.353918,0.437275,0.351312,0.498068,0.301556,0.348925,...,-27.087110,-0.022057,-3.249742,-8.522278,5.259177,-13.858022,-6.516408,-12.448574,0.062063,-5.485316


In [88]:
low_level_tracks_eval = audio_analysis_eval.merge(tracks_eval, left_index=True, right_index=True, how='inner')
low_level_tracks_eval

Unnamed: 0,pitch_0,pitch_1,pitch_2,pitch_3,pitch_4,pitch_5,pitch_6,pitch_7,pitch_8,pitch_9,...,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,explicit,year
7lPN2DXiMsVn7XUKtOW1CS,0.349608,0.367562,0.381938,0.333165,0.233992,0.421520,0.217720,0.279433,0.162397,0.243067,...,0.0601,0.72100,0.000013,0.1050,0.132,143.874,242014,4,1,2021
5QO79kh1waicV47BqGRL3g,0.509999,0.369865,0.293584,0.260545,0.470775,0.272121,0.243054,0.378965,0.232045,0.332873,...,0.0309,0.02120,0.000012,0.5430,0.644,118.051,215627,4,1,2020
0VjIjW4GlUZAMYd2vXMi3b,0.657973,0.627336,0.334794,0.341640,0.292579,0.488248,0.285965,0.315812,0.264716,0.234030,...,0.0598,0.00146,0.000095,0.0897,0.334,171.005,200040,4,0,2020
4MzXwWMhyBbmu6hOcLVD49,0.533442,0.548317,0.340552,0.299635,0.430683,0.286694,0.319173,0.355134,0.270570,0.365329,...,0.0544,0.40100,0.000052,0.1130,0.145,109.928,205090,4,1,2020
5Kskr9LcNYa0tpt5f0ZEJx,0.428253,0.406315,0.398415,0.222132,0.477949,0.250945,0.362671,0.211838,0.106474,0.132667,...,0.0539,0.45100,0.000001,0.1350,0.202,104.949,205458,4,1,2021
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4lUmnwRybYH7mMzf16xB0y,0.561508,0.683205,0.419908,0.403616,0.429285,0.359302,0.275779,0.266855,0.178643,0.214506,...,0.3370,0.13800,0.000000,0.1400,0.188,133.997,257428,4,1,2021
1fzf9Aad4y1RWrmwosAK5y,0.327824,0.477446,0.236671,0.310013,0.575318,0.267171,0.362626,0.286104,0.506351,0.591351,...,0.0318,0.16800,0.002020,0.0465,0.768,93.003,187310,4,0,2021
3E3pb3qH11iny6TFDJvsg5,0.511021,0.497643,0.347654,0.238971,0.327831,0.325581,0.199572,0.265358,0.228669,0.443037,...,0.2670,0.17900,0.000000,0.1940,0.316,83.000,209299,4,1,2021
3yTkoTuiKRGL2VAlQd7xsC,0.663608,0.480316,0.348203,0.273951,0.353918,0.437275,0.351312,0.498068,0.301556,0.348925,...,0.0790,0.05580,0.000000,0.0663,0.484,104.973,202204,4,0,2021


* We export the evaluation data to a csv file.

In [89]:
low_level_tracks_eval.to_csv('data/low_level_tracks_eval.csv')

* Last, we save the detailed data of the segments for future use (CNN).
* We save the results in .npy files.

In [None]:
np.save('pitch_eval.npy',audio_analysis_pitch_eval)
np.save('timbre_eval.npy',audio_analysis_timbre_eval)