# Extraction of Audio Features

In [1]:
import pandas as pd
import math
import os
import librosa
from tqdm import tqdm
import warnings
warnings.filterwarnings("ignore")

The main aim of this notebook is to extract the mfcc features i.e. the audio data from the files. In this notebook we extract features of length 250.

In [2]:
musicLocation = './fma_medium_tracks'
tracksMetaData = pd.read_csv('./tracksMetaData.csv')

In [3]:
tracksMetaData

Unnamed: 0,id,trackName,genre,trackArtist
0,2,Food,Hip-Hop,AWOL
1,3,Electric Ave,Hip-Hop,AWOL
2,5,This World,Hip-Hop,AWOL
3,10,Freeway,Pop,Kurt Vile
4,134,Street Music,Hip-Hop,AWOL
...,...,...,...,...
24995,155297,Nebula Reborn,Instrumental,Alex Mason/BlackSunAeon Music
24996,155298,An Idiot Abroad,Folk,Greg Atkinson
24997,155306,Tiny Man,Folk,Greg Atkinson
24998,155307,Kolka,Experimental,AWOTT


In order to extract the MFCC features, we use librosa library, This library can help us read the audio file and extract the mfcc feature from it. What MFcc features are and how they work is explained in depth in the report and Powerpoint slides. In total we had around 25,000 songs each 30 seconds long, for which we will read and extract the features.

Since we also need to get lyrics for each song, we expected a lot of songs to be eliminated due to lack of lyrics or issues with reading the audio files. In order to compensate for that, we divide each song into 5 segments here, that is a song that is of 30 seconds gets divided into 5 songs of 6 seconds length, this is done through samples per segment. Then for each broken song we extract the mfcc featus which is of [250 * 15] shape. The extracted features are placed a Dataframe along with its id and genre. Any songs that are broken due to any issue is ignored and placed in a seperate dataframe.

In [4]:
def extractFeatures(musicPath,seg1,seg2):
    
    sampleRate = 22050
    numSegment=5
    numMfcc=15
    
    data = pd.DataFrame()
    broken = pd.DataFrame()
    cnt = 0
    totalCnt = 0
    samples_per_segment = int(sampleRate * 29 / numSegment)
    filenames = os.listdir(musicPath)
    for f in tqdm(filenames[seg1:seg2]):
        try:
            file_path = musicPath +"/" + "/" + str(f)
            y, sr = librosa.load(file_path, sr = sampleRate)
            for n in range(numSegment):
                totalCnt = totalCnt + 1
                start = samples_per_segment * n
                finish = start + samples_per_segment

                mfcc = librosa.feature.mfcc(y[start:finish], 
                       sampleRate, n_mfcc = numMfcc, 
                   n_fft = 2048, hop_length = 512)   
                idd = str(int(f.split('.')[0]))
                df = pd.DataFrame()
                df['id'] = [idd+'_'+str(n)]
                df['genre'] = tracksMetaData.loc[tracksMetaData['id'] == int(idd)]['genre'].values
                for i in range(numMfcc):
                    df['mfccFeature_'+str(i)] = [list(mfcc[i])]
                data = pd.concat([data,df])
                cnt = cnt + 1
        except Exception as e:
            df = pd.DataFrame()
            df['filename'] = [f]
            print('Broke at '+ f,e)
            broken = pd.concat([broken,df])
            continue
                
    print(totalCnt,cnt)
    return data,broken
    

While the function "extractFeatures" does the actual extraction, we need a efficient way of calling it on all the songs, for that we use "saveFiles", This takes start and end of a segment where the start and end are multiplied by 1000.

Eg: Startpart = 0 and endPart = 5 results in songs from 0 to 5000.

We give each segment a unique name, by doing it this way we get a check point system to save at every step so that if anything happens ,we do not lose all progress.

In [5]:
def saveFiles(startPart,endPart,segmentName):
    for i in range(startPart,endPart):
        print('Starting Part '+ str(i) +' from '+str(i*1000) +' to ' + str((i+1)*1000))
        data,broken = extractFeatures(musicLocation,i*1000,(i+1)*1000)
        data.to_csv('tracksAudioFeatures/tracksAudioFeatures_'+segmentName+'_part_'+str(i)+'.csv')
        if len(broken) >0:
            print('Broken Part saved '+ str(i))
            print('Length of broken ',len(broken))
            broken.to_csv('tracksAudioFeatures/broken_tracksAudioFeatures_'+segmentName+'_part_'+str(i)+'.csv')
        print('Finished Part '+ str(i))
        print('Saved as '+ 'tracksAudioFeatures/tracksAudioFeatures_'+segmentName+'_part_'+str(i)+'.csv')
        
        
   
    
            
    
    

Following cells call saveFiles and creates five unique segments each of 5000 songs. Each segement is then broken in to parts of 1000 songs, which on average took 11 mins so in total it us 250*11 mins to complete all songs

In [7]:
saveFiles(0,5,'firstSegment')

Starting Part 0 from 0 to 1000


 32%|████████████████████████▉                                                      | 316/1000 [03:48<08:07,  1.40it/s]

Broke at 001486.mp3 


 98%|█████████████████████████████████████████████████████████████████████████████▏ | 977/1000 [11:54<00:16,  1.39it/s]

Broke at 005574.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [12:09<00:00,  1.37it/s]


4990 4990
Broken Part saved 0
Length of broken  2
Finished Part 0
Saved as tracksAudioFeatures/tracksAudioFeatures_firstSegment_part_0.csv
Starting Part 1 from 1000 to 2000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [14:37<00:00,  1.14it/s]


5000 5000
Finished Part 1
Saved as tracksAudioFeatures/tracksAudioFeatures_firstSegment_part_1.csv
Starting Part 2 from 2000 to 3000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:48<00:00,  1.41it/s]


5000 5000
Finished Part 2
Saved as tracksAudioFeatures/tracksAudioFeatures_firstSegment_part_2.csv
Starting Part 3 from 3000 to 4000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [12:14<00:00,  1.36it/s]


5000 5000
Finished Part 3
Saved as tracksAudioFeatures/tracksAudioFeatures_firstSegment_part_3.csv
Starting Part 4 from 4000 to 5000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [19:18<00:00,  1.16s/it]


5000 5000
Finished Part 4
Saved as tracksAudioFeatures/tracksAudioFeatures_firstSegment_part_4.csv


In [8]:
saveFiles(5,10,'secondSegment')

Starting Part 5 from 5000 to 6000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:55<00:00,  1.40it/s]


5000 5000
Finished Part 5
Saved as tracksAudioFeatures/tracksAudioFeatures_secondSegment_part_5.csv
Starting Part 6 from 6000 to 7000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:46<00:00,  1.71it/s]


5000 5000
Finished Part 6
Saved as tracksAudioFeatures/tracksAudioFeatures_secondSegment_part_6.csv
Starting Part 7 from 7000 to 8000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:39<00:00,  1.73it/s]


5000 5000
Finished Part 7
Saved as tracksAudioFeatures/tracksAudioFeatures_secondSegment_part_7.csv
Starting Part 8 from 8000 to 9000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:40<00:00,  1.72it/s]


5000 5000
Finished Part 8
Saved as tracksAudioFeatures/tracksAudioFeatures_secondSegment_part_8.csv
Starting Part 9 from 9000 to 10000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:49<00:00,  1.70it/s]


5000 5000
Finished Part 9
Saved as tracksAudioFeatures/tracksAudioFeatures_secondSegment_part_9.csv


In [9]:
saveFiles(10,15,'thirdSegment')

Starting Part 10 from 10000 to 11000


 68%|█████████████████████████████████████████████████████▎                         | 675/1000 [06:26<03:13,  1.68it/s]

Broke at 065753.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:34<00:00,  1.74it/s]


4995 4995
Broken Part saved 10
Length of broken  1
Finished Part 10
Saved as tracksAudioFeatures/tracksAudioFeatures_thirdSegment_part_10.csv
Starting Part 11 from 11000 to 12000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:45<00:00,  1.71it/s]


5000 5000
Finished Part 11
Saved as tracksAudioFeatures/tracksAudioFeatures_thirdSegment_part_11.csv
Starting Part 12 from 12000 to 13000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:33<00:00,  1.74it/s]


5000 5000
Finished Part 12
Saved as tracksAudioFeatures/tracksAudioFeatures_thirdSegment_part_12.csv
Starting Part 13 from 13000 to 14000


 15%|███████████▌                                                                   | 146/1000 [01:25<08:00,  1.78it/s]

Broke at 080391.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:43<00:00,  1.71it/s]


4995 4995
Broken Part saved 13
Length of broken  1
Finished Part 13
Saved as tracksAudioFeatures/tracksAudioFeatures_thirdSegment_part_13.csv
Starting Part 14 from 14000 to 15000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:35<00:00,  1.74it/s]


5000 5000
Finished Part 14
Saved as tracksAudioFeatures/tracksAudioFeatures_thirdSegment_part_14.csv


In [10]:
saveFiles(15,20,'fourthSegment')

Starting Part 15 from 15000 to 16000


 63%|█████████████████████████████████████████████████▍                             | 626/1000 [06:05<03:31,  1.77it/s]

Broke at 098558.mp3 
Broke at 098559.mp3 
Broke at 098560.mp3 
Broke at 098565.mp3 
Broke at 098566.mp3 
Broke at 098567.mp3 
Broke at 098568.mp3 
Broke at 098569.mp3 
Broke at 098571.mp3 


 84%|██████████████████████████████████████████████████████████████████             | 836/1000 [08:01<01:37,  1.69it/s]

Broke at 099134.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:36<00:00,  1.74it/s]


4950 4950
Broken Part saved 15
Length of broken  10
Finished Part 15
Saved as tracksAudioFeatures/tracksAudioFeatures_fourthSegment_part_15.csv
Starting Part 16 from 16000 to 17000


 30%|████████████████████████                                                       | 305/1000 [02:54<06:32,  1.77it/s]

Broke at 105247.mp3 


 96%|███████████████████████████████████████████████████████████████████████████▋   | 958/1000 [09:14<00:24,  1.72it/s]

Broke at 108924.mp3 Input signal length=0 is too small to resample from 44100->22050
Broke at 108925.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:37<00:00,  1.73it/s]


4985 4985
Broken Part saved 16
Length of broken  3
Finished Part 16
Saved as tracksAudioFeatures/tracksAudioFeatures_fourthSegment_part_16.csv
Starting Part 17 from 17000 to 18000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:33<00:00,  1.74it/s]


5000 5000
Finished Part 17
Saved as tracksAudioFeatures/tracksAudioFeatures_fourthSegment_part_17.csv
Starting Part 18 from 18000 to 19000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:45<00:00,  1.71it/s]


5000 5000
Finished Part 18
Saved as tracksAudioFeatures/tracksAudioFeatures_fourthSegment_part_18.csv
Starting Part 19 from 19000 to 20000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:36<00:00,  1.74it/s]


5000 5000
Finished Part 19
Saved as tracksAudioFeatures/tracksAudioFeatures_fourthSegment_part_19.csv


In [11]:
saveFiles(20,25,'fifthSegment')

Starting Part 20 from 20000 to 21000


 62%|█████████████████████████████████████████████████                              | 621/1000 [05:55<03:30,  1.80it/s]

Broke at 126981.mp3 


 78%|█████████████████████████████████████████████████████████████▌                 | 780/1000 [07:26<02:10,  1.68it/s]

Broke at 127336.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:33<00:00,  1.74it/s]


4990 4990
Broken Part saved 20
Length of broken  2
Finished Part 20
Saved as tracksAudioFeatures/tracksAudioFeatures_fifthSegment_part_20.csv
Starting Part 21 from 21000 to 22000


 99%|██████████████████████████████████████████████████████████████████████████████ | 988/1000 [09:44<00:07,  1.71it/s]

Broke at 133297.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:50<00:00,  1.69it/s]


4995 4995
Broken Part saved 21
Length of broken  1
Finished Part 21
Saved as tracksAudioFeatures/tracksAudioFeatures_fifthSegment_part_21.csv
Starting Part 22 from 22000 to 23000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:41<00:00,  1.72it/s]


5000 5000
Finished Part 22
Saved as tracksAudioFeatures/tracksAudioFeatures_fifthSegment_part_22.csv
Starting Part 23 from 23000 to 24000


 62%|████████████████████████████████████████████████▉                              | 620/1000 [06:01<03:37,  1.74it/s]

Broke at 143992.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:43<00:00,  1.71it/s]


4995 4995
Broken Part saved 23
Length of broken  1
Finished Part 23
Saved as tracksAudioFeatures/tracksAudioFeatures_fifthSegment_part_23.csv
Starting Part 24 from 24000 to 25000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [09:39<00:00,  1.73it/s]


5000 5000
Finished Part 24
Saved as tracksAudioFeatures/tracksAudioFeatures_fifthSegment_part_24.csv


In [3]:
datasets = os.listdir('./tracksAudioFeatures')

After we have all the datasets parts, we need to merge them, we merge broken into a single dataset too.

In [4]:
tracksAudioFeatures_complete = pd.DataFrame()
broken_tracksAudioFeatures_complete = pd.DataFrame()

for i in tqdm(datasets):
    x = pd.read_csv('./tracksAudioFeatures/' + i)
    if 'broken' not in i:
        print(i)
        print(type(x['mfccFeature_0'].iloc[0]))
        print(len(x['mfccFeature_0'].iloc[0]))
        print(len(x['mfccFeature_0'].iloc[0].split(',')))
        tracksAudioFeatures_complete = pd.concat([tracksAudioFeatures_complete,x])
    else:
        broken_tracksAudioFeatures_complete = pd.concat([broken_tracksAudioFeatures_complete,x])
        
tracksAudioFeatures_complete = tracksAudioFeatures_complete.drop(['Unnamed: 0'], axis=1)
broken_tracksAudioFeatures_complete = broken_tracksAudioFeatures_complete.drop(['Unnamed: 0'], axis=1)
       

 27%|██████████████████████▋                                                            | 9/33 [00:01<00:03,  6.71it/s]

tracksAudioFeatures_fifthSegment_part_20.csv
<class 'str'>
2940
250


 30%|████████████████████████▊                                                         | 10/33 [00:02<00:07,  3.22it/s]

tracksAudioFeatures_fifthSegment_part_21.csv
<class 'str'>
2858
250


 33%|███████████████████████████▎                                                      | 11/33 [00:04<00:10,  2.09it/s]

tracksAudioFeatures_fifthSegment_part_22.csv
<class 'str'>
2905
250


 36%|█████████████████████████████▊                                                    | 12/33 [00:05<00:13,  1.53it/s]

tracksAudioFeatures_fifthSegment_part_23.csv
<class 'str'>
2802
250


 39%|████████████████████████████████▎                                                 | 13/33 [00:06<00:16,  1.21it/s]

tracksAudioFeatures_fifthSegment_part_24.csv
<class 'str'>
2893
250


 42%|██████████████████████████████████▊                                               | 14/33 [00:08<00:18,  1.02it/s]

tracksAudioFeatures_firstSegment_part_0.csv
<class 'str'>
2895
250


 45%|█████████████████████████████████████▎                                            | 15/33 [00:09<00:19,  1.10s/it]

tracksAudioFeatures_firstSegment_part_1.csv
<class 'str'>
2956
250


 48%|███████████████████████████████████████▊                                          | 16/33 [00:11<00:20,  1.21s/it]

tracksAudioFeatures_firstSegment_part_2.csv
<class 'str'>
2925
250


 52%|██████████████████████████████████████████▏                                       | 17/33 [00:12<00:20,  1.29s/it]

tracksAudioFeatures_firstSegment_part_3.csv
<class 'str'>
2947
250


 55%|████████████████████████████████████████████▋                                     | 18/33 [00:14<00:20,  1.34s/it]

tracksAudioFeatures_firstSegment_part_4.csv
<class 'str'>
2937
250


 58%|███████████████████████████████████████████████▏                                  | 19/33 [00:15<00:19,  1.38s/it]

tracksAudioFeatures_fourthSegment_part_15.csv
<class 'str'>
2749
250


 61%|█████████████████████████████████████████████████▋                                | 20/33 [00:17<00:18,  1.40s/it]

tracksAudioFeatures_fourthSegment_part_16.csv
<class 'str'>
2965
250


 64%|████████████████████████████████████████████████████▏                             | 21/33 [00:18<00:17,  1.43s/it]

tracksAudioFeatures_fourthSegment_part_17.csv
<class 'str'>
2937
250


 67%|██████████████████████████████████████████████████████▋                           | 22/33 [00:20<00:15,  1.44s/it]

tracksAudioFeatures_fourthSegment_part_18.csv
<class 'str'>
2934
250


 70%|█████████████████████████████████████████████████████████▏                        | 23/33 [00:21<00:14,  1.46s/it]

tracksAudioFeatures_fourthSegment_part_19.csv
<class 'str'>
2915
250


 73%|███████████████████████████████████████████████████████████▋                      | 24/33 [00:23<00:13,  1.48s/it]

tracksAudioFeatures_secondSegment_part_5.csv
<class 'str'>
2882
250


 76%|██████████████████████████████████████████████████████████████                    | 25/33 [00:24<00:11,  1.49s/it]

tracksAudioFeatures_secondSegment_part_6.csv
<class 'str'>
2928
250


 79%|████████████████████████████████████████████████████████████████▌                 | 26/33 [00:26<00:10,  1.50s/it]

tracksAudioFeatures_secondSegment_part_7.csv
<class 'str'>
2668
250


 82%|███████████████████████████████████████████████████████████████████               | 27/33 [00:27<00:08,  1.49s/it]

tracksAudioFeatures_secondSegment_part_8.csv
<class 'str'>
2935
250


 85%|█████████████████████████████████████████████████████████████████████▌            | 28/33 [00:29<00:07,  1.50s/it]

tracksAudioFeatures_secondSegment_part_9.csv
<class 'str'>
2639
250


 88%|████████████████████████████████████████████████████████████████████████          | 29/33 [00:30<00:05,  1.48s/it]

tracksAudioFeatures_thirdSegment_part_10.csv
<class 'str'>
2884
250


 91%|██████████████████████████████████████████████████████████████████████████▌       | 30/33 [00:32<00:04,  1.47s/it]

tracksAudioFeatures_thirdSegment_part_11.csv
<class 'str'>
2942
250


 94%|█████████████████████████████████████████████████████████████████████████████     | 31/33 [00:33<00:02,  1.46s/it]

tracksAudioFeatures_thirdSegment_part_12.csv
<class 'str'>
2951
250


 97%|███████████████████████████████████████████████████████████████████████████████▌  | 32/33 [00:35<00:01,  1.47s/it]

tracksAudioFeatures_thirdSegment_part_13.csv
<class 'str'>
2904
250


100%|██████████████████████████████████████████████████████████████████████████████████| 33/33 [00:36<00:00,  1.11s/it]

tracksAudioFeatures_thirdSegment_part_14.csv
<class 'str'>
2941
250





Fianlly we have total number of tracks to be 124895 and 21 broken tracks. We explore the dataset to further to verify if all the shapes and types are correct.

In [5]:
tracksAudioFeatures_complete.head(5)

Unnamed: 0,id,genre,mfccFeature_0,mfccFeature_1,mfccFeature_2,mfccFeature_3,mfccFeature_4,mfccFeature_5,mfccFeature_6,mfccFeature_7,mfccFeature_8,mfccFeature_9,mfccFeature_10,mfccFeature_11,mfccFeature_12,mfccFeature_13,mfccFeature_14
0,124423_0,Old-Time / Historic,"[-328.7624, -206.09636, -150.4038, -144.09038,...","[132.91347, 150.61551, 144.43718, 140.92017, 1...","[-90.37503, -126.661026, -140.2472, -138.8731,...","[-63.457577, -50.04628, -51.89651, -52.70818, ...","[-38.716564, -46.91884, -50.58444, -47.508934,...","[-56.29294, -62.14529, -53.36077, -51.67916, -...","[-17.247395, -34.123608, -40.161243, -42.32673...","[-3.6903424, -5.2592907, -1.0339475, -0.248538...","[-16.171959, -23.949322, -23.311352, -21.07062...","[-1.7130346, -6.9061756, -2.5145044, 4.5913672...","[-17.067202, -17.211754, -16.577923, -9.204768...","[-27.971401, -30.455574, -20.908623, -16.16786...","[8.152201, 8.3445635, 1.8432193, -5.7282104, -...","[10.642556, 9.837347, 18.859747, 17.205154, 8....","[-4.3580694, -3.3247807, 6.962681, 11.387974, ..."
1,124423_1,Old-Time / Historic,"[-236.76901, -203.01874, -200.4282, -208.92003...","[164.211, 159.14294, 136.04929, 136.82301, 156...","[-85.3822, -117.465, -142.91583, -143.71155, -...","[-32.252678, -37.033913, -54.37635, -57.882896...","[-7.4214034, -20.556595, -37.773483, -39.49316...","[-31.144537, -44.259975, -59.811573, -61.17159...","[-14.987923, -23.720203, -43.693214, -45.93542...","[4.4418592, 3.9385574, -10.675606, -14.657623,...","[-12.005516, -25.275543, -41.375755, -39.71647...","[-5.6790867, 1.2063706, -4.721801, -8.164688, ...","[-6.2902756, -3.91095, -4.4361215, -4.327603, ...","[-5.6098385, -7.864852, -7.2575703, -2.4607427...","[7.336014, 14.364861, 14.558414, 20.445679, 16...","[2.09269, -0.17045055, -0.978363, 0.28908974, ...","[-4.1928773, -3.4265738, -3.9279053, -5.465062..."
2,124423_2,Old-Time / Historic,"[-390.48816, -339.31622, -317.2405, -326.96646...","[97.122345, 115.60747, 127.4955, 128.93234, 14...","[-84.24397, -109.15258, -110.69359, -99.36203,...","[-40.43964, -44.553093, -47.186836, -41.452816...","[0.28208655, -1.2433429, -6.5626597, -0.983691...","[-36.504112, -43.72944, -42.5904, -35.0524, -4...","[-22.552574, -24.396482, -21.35398, -15.982605...","[1.2889054, 0.24470624, -1.0263059, 3.6228871,...","[-9.089966, -12.901368, -15.339406, -10.585323...","[-9.621838, -6.8267345, -4.4633904, -3.513043,...","[-6.904684, -6.3937654, -4.0830145, -0.6454271...","[-2.2965891, -8.502451, -8.448065, -3.9730012,...","[11.558136, 10.830755, 6.7897396, 8.929026, 13...","[5.913314, 5.5654097, 1.5172951, 0.10546613, 2...","[-6.5406847, -3.1975913, -1.6860085, -6.242577..."
3,124423_3,Old-Time / Historic,"[-352.81958, -337.61414, -345.47974, -343.2563...","[142.35503, 135.62234, 113.44443, 108.335236, ...","[-43.732452, -74.83505, -103.43189, -107.36707...","[-15.263801, -28.205692, -44.415802, -39.63460...","[9.44633, 6.6091423, -2.8628724, 1.0015795, 3....","[-31.255424, -38.347614, -47.272774, -40.84847...","[-19.78806, -21.916473, -29.354237, -18.958858...","[-6.792322, -5.6661687, -13.356573, -5.396633,...","[-20.756138, -21.027271, -25.952667, -26.81138...","[-11.619511, -10.518908, -14.541344, -15.04974...","[-2.953545, -3.8872366, -10.412258, -8.673106,...","[-2.9083943, -5.7381706, -7.6637254, -6.109688...","[5.6500626, 5.615141, 9.0784645, 14.550588, 17...","[3.5961199, 5.6481404, 5.7618313, 4.961307, 1....","[-0.0033575296, 1.9707386, -0.80081874, -2.087..."
4,124423_4,Old-Time / Historic,"[-298.37216, -311.58813, -341.85043, -346.8217...","[143.62546, 148.77249, 137.96545, 133.75113, 1...","[-61.373726, -84.11337, -84.06207, -85.60765, ...","[-43.16261, -43.88006, -47.562035, -44.25102, ...","[-34.505463, -32.188267, -25.248846, -13.07193...","[-53.48815, -64.33479, -62.00576, -53.099655, ...","[-34.04146, -38.559437, -35.13096, -35.746147,...","[-5.90814, -7.9788213, -15.199701, -14.585123,...","[-15.080654, -18.701542, -19.008532, -15.62354...","[-21.62012, -18.04889, -8.216867, -7.8816576, ...","[-12.881039, -12.1749, -11.892162, -13.694319,...","[-4.0920386, -6.83904, -6.9548445, -9.170607, ...","[4.849468, 10.788048, 11.474641, 9.209568, 10....","[5.5359507, 4.742424, 4.962161, 5.2920237, 6.5...","[1.0970776, 2.4824529, 0.74116087, -1.4359027,..."


In [6]:
tracksAudioFeatures_complete.shape

(124895, 17)

In [7]:
broken_tracksAudioFeatures_complete.head(5)

Unnamed: 0,filename
0,126981.mp3
1,127336.mp3
0,133297.mp3
0,143992.mp3
0,001486.mp3


In [8]:
len(broken_tracksAudioFeatures_complete)

21

In [9]:
print(type(tracksAudioFeatures_complete['mfccFeature_0'].iloc[0]))
print(len(tracksAudioFeatures_complete['mfccFeature_0'].iloc[0]))
print(len(tracksAudioFeatures_complete['mfccFeature_0'].iloc[0].split(',')))

<class 'str'>
2940
250


Final Audio Features Dataset is saved

In [10]:
tracksAudioFeatures_complete.to_csv('./tracksAudioFeatures/tracksAudioFeatures_complete.csv')
broken_tracksAudioFeatures_complete.to_csv('./tracksAudioFeatures/broken_tracksAudioFeatures_complete.csv')

The audio features will then be combined with theme features of same length to create the final dataset