# Extraction of Audio Features


In [1]:
import pandas as pd
import math
import os
import librosa
from tqdm import tqdm
import warnings
warnings.filterwarnings("ignore")

The main aim of this notebook is to extract the mfcc features i.e. the audio data from the files. In this notebook we extract features of length 125. It does the same thing as Audio_FeatureExtraction_250 notebook except here we split the songs into 10 parts and we have mfcc features of length 125.

Disclaimer - Some of the cells here have used 150 to name the files, please ignore that, it is supposed to be 125 but we realised it too late to make any change. Any 150 you see here, please replace with 125. This has only affected naming of datasets and not actual datasets which are of length 125 only.



In [2]:
musicLocation = './fma_medium_tracks'
tracksMetaData = pd.read_csv('./tracksMetaData.csv')

In [3]:
tracksMetaData

Unnamed: 0,id,trackName,genre,trackArtist
0,2,Food,Hip-Hop,AWOL
1,3,Electric Ave,Hip-Hop,AWOL
2,5,This World,Hip-Hop,AWOL
3,10,Freeway,Pop,Kurt Vile
4,134,Street Music,Hip-Hop,AWOL
...,...,...,...,...
24995,155297,Nebula Reborn,Instrumental,Alex Mason/BlackSunAeon Music
24996,155298,An Idiot Abroad,Folk,Greg Atkinson
24997,155306,Tiny Man,Folk,Greg Atkinson
24998,155307,Kolka,Experimental,AWOTT


In order to extract the MFCC features, we use the librosa library. This library can help us read the audio file and extract the mfcc feature from it. What MFcc features are and how they work is explained in depth in the report and Powerpoint slides. In total we had around 25,000 songs each 30 seconds long, for which we will read and extract the features.

Since we also need to get lyrics for each song, we expected a lot of songs to be eliminated due to lack of lyrics or issues with reading the audio files. In order to compensate for that, we divide each song into 10 segments here, that is a song that is of 30 seconds gets divided into 10 songs of 3 seconds length, this is done through samples per segment. Then for each broken song we extract the mfcc featus which is of [125 * 15] shape. The extracted features are placed a Dataframe along with its id and genre. Any songs that are broken due to any issue is ignored and placed in a seperate dataframe.

In [4]:
def extractFeatures(musicPath,seg1,seg2):
    
    sampleRate = 22050
    numSegment=10
    numMfcc=15
    
    data = pd.DataFrame()
    broken = pd.DataFrame()
    cnt = 0
    totalCnt = 0
    samples_per_segment = int(sampleRate * 29 / numSegment)
    filenames = os.listdir(musicPath)
    for f in tqdm(filenames[seg1:seg2]):
        try:
            file_path = musicPath +"/" + "/" + str(f)
            y, sr = librosa.load(file_path, sr = sampleRate)
            for n in range(numSegment):
                totalCnt = totalCnt + 1
                start = samples_per_segment * n
                finish = start + samples_per_segment

                mfcc = librosa.feature.mfcc(y[start:finish], 
                       sampleRate, n_mfcc = numMfcc, 
                   n_fft = 2048, hop_length = 512)   
                idd = str(int(f.split('.')[0]))
                df = pd.DataFrame()
                df['id'] = [idd+'_'+str(n)]
                df['genre'] = tracksMetaData.loc[tracksMetaData['id'] == int(idd)]['genre'].values
                for i in range(numMfcc):
                    df['mfccFeature_'+str(i)] = [list(mfcc[i])]
                data = pd.concat([data,df])
                cnt = cnt + 1
        except Exception as e:
            df = pd.DataFrame()
            df['filename'] = [f]
            print('Broke at '+ f,e)
            broken = pd.concat([broken,df])
            continue
                
    print(totalCnt,cnt)
    return data,broken
    

While the function "extractFeatures" does the actual extraction, we need a efficient way of calling it on all the songs, for that we use "saveFiles", This takes start and end of a segment where the start and end are multiplied by 1000.

Eg: Startpart = 0 and endPart = 5 results in songs from 0 to 5000.

We give each segment a unique name, by doing it this way we get a check point system to save at every step so that if anything happens, we do not lose all progress.

In [5]:
for i in range(startPart,endPart):
        print('Starting Part '+ str(i) +' from '+str(i*1000) +' to ' + str((i+1)*1000))
        data,broken = extractFeatures(musicLocation,i*1000,(i+1)*1000)
        data.to_csv('tracksAudioFeatures_150/tracksAudioFeatures_150_'+segmentName+'_part_'+str(i)+'.csv')
        if len(broken) >0:
            print('Broken Part saved '+ str(i))
            print('Length of broken ',len(broken))
            broken.to_csv('tracksAudioFeatures_150/broken_tracksAudioFeatures_150_'+segmentName+'_part_'+str(i)+'.csv')
        print('Finished Part '+ str(i))
        print('Saved as '+ 'tracksAudioFeatures_150/tracksAudioFeatures_150_'+segmentName+'_part_'+str(i)+'.csv')
        

Following cells call saveFiles() and creates five unique segments each of 5000 songs. Each segment is then broken in to parts of 1000 songs, which on average took 11 mins so in total it us 250*11 mins to complete all songs

In [6]:
saveFiles(0,5,'firstSegment')

Starting Part 0 from 0 to 1000


 32%|████████████████████████▉                                                      | 316/1000 [03:37<07:52,  1.45it/s]

Broke at 001486.mp3 


 98%|█████████████████████████████████████████████████████████████████████████████▏ | 977/1000 [11:32<00:16,  1.43it/s]

Broke at 005574.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:48<00:00,  1.41it/s]


9980 9980
Broken Part saved 0
Length of broken  2
Finished Part 0
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_firstSegment_part_0.csv
Starting Part 1 from 1000 to 2000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:56<00:00,  1.39it/s]


10000 10000
Finished Part 1
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_firstSegment_part_1.csv
Starting Part 2 from 2000 to 3000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:43<00:00,  1.42it/s]


10000 10000
Finished Part 2
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_firstSegment_part_2.csv
Starting Part 3 from 3000 to 4000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:42<00:00,  1.42it/s]


10000 10000
Finished Part 3
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_firstSegment_part_3.csv
Starting Part 4 from 4000 to 5000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:37<00:00,  1.43it/s]


10000 10000
Finished Part 4
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_firstSegment_part_4.csv


In [7]:
saveFiles(5,10,'secondSegment')

Starting Part 5 from 5000 to 6000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:38<00:00,  1.43it/s]


10000 10000
Finished Part 5
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_secondSegment_part_5.csv
Starting Part 6 from 6000 to 7000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:59<00:00,  1.39it/s]


10000 10000
Finished Part 6
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_secondSegment_part_6.csv
Starting Part 7 from 7000 to 8000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:41<00:00,  1.42it/s]


10000 10000
Finished Part 7
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_secondSegment_part_7.csv
Starting Part 8 from 8000 to 9000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:47<00:00,  1.41it/s]


10000 10000
Finished Part 8
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_secondSegment_part_8.csv
Starting Part 9 from 9000 to 10000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:44<00:00,  1.42it/s]


10000 10000
Finished Part 9
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_secondSegment_part_9.csv


In [8]:
saveFiles(10,15,'thirdSegment')

Starting Part 10 from 10000 to 11000


 68%|█████████████████████████████████████████████████████▎                         | 675/1000 [08:00<03:59,  1.36it/s]

Broke at 065753.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:51<00:00,  1.40it/s]


9990 9990
Broken Part saved 10
Length of broken  1
Finished Part 10
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_thirdSegment_part_10.csv
Starting Part 11 from 11000 to 12000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [12:34<00:00,  1.33it/s]


10000 10000
Finished Part 11
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_thirdSegment_part_11.csv
Starting Part 12 from 12000 to 13000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [12:34<00:00,  1.33it/s]


10000 10000
Finished Part 12
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_thirdSegment_part_12.csv
Starting Part 13 from 13000 to 14000


 15%|███████████▌                                                                   | 146/1000 [01:47<09:37,  1.48it/s]

Broke at 080391.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:53<00:00,  1.40it/s]


9990 9990
Broken Part saved 13
Length of broken  1
Finished Part 13
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_thirdSegment_part_13.csv
Starting Part 14 from 14000 to 15000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:44<00:00,  1.42it/s]


10000 10000
Finished Part 14
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_thirdSegment_part_14.csv


In [9]:
saveFiles(15,20,'fourthSegment')

Starting Part 15 from 15000 to 16000


 63%|█████████████████████████████████████████████████▍                             | 626/1000 [07:28<04:25,  1.41it/s]

Broke at 098558.mp3 
Broke at 098559.mp3 
Broke at 098560.mp3 
Broke at 098565.mp3 
Broke at 098566.mp3 
Broke at 098567.mp3 
Broke at 098568.mp3 
Broke at 098569.mp3 
Broke at 098571.mp3 


 84%|██████████████████████████████████████████████████████████████████             | 836/1000 [09:53<01:56,  1.41it/s]

Broke at 099134.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:50<00:00,  1.41it/s]


9900 9900
Broken Part saved 15
Length of broken  10
Finished Part 15
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_fourthSegment_part_15.csv
Starting Part 16 from 16000 to 17000


 30%|████████████████████████                                                       | 305/1000 [03:35<08:00,  1.45it/s]

Broke at 105247.mp3 


 96%|███████████████████████████████████████████████████████████████████████████▋   | 958/1000 [11:21<00:29,  1.44it/s]

Broke at 108924.mp3 Input signal length=0 is too small to resample from 44100->22050
Broke at 108925.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:49<00:00,  1.41it/s]


9970 9970
Broken Part saved 16
Length of broken  3
Finished Part 16
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_fourthSegment_part_16.csv
Starting Part 17 from 17000 to 18000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:40<00:00,  1.43it/s]


10000 10000
Finished Part 17
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_fourthSegment_part_17.csv
Starting Part 18 from 18000 to 19000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:54<00:00,  1.40it/s]


10000 10000
Finished Part 18
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_fourthSegment_part_18.csv
Starting Part 19 from 19000 to 20000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:43<00:00,  1.42it/s]


10000 10000
Finished Part 19
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_fourthSegment_part_19.csv


In [10]:
saveFiles(20,25,'fifthSegment')

Starting Part 20 from 20000 to 21000


 62%|█████████████████████████████████████████████████                              | 621/1000 [07:24<04:11,  1.50it/s]

Broke at 126981.mp3 


 78%|█████████████████████████████████████████████████████████████▌                 | 780/1000 [09:12<02:26,  1.50it/s]

Broke at 127336.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:44<00:00,  1.42it/s]


9980 9980
Broken Part saved 20
Length of broken  2
Finished Part 20
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_fifthSegment_part_20.csv
Starting Part 21 from 21000 to 22000


 99%|██████████████████████████████████████████████████████████████████████████████ | 988/1000 [11:24<00:08,  1.47it/s]

Broke at 133297.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:31<00:00,  1.45it/s]


9990 9990
Broken Part saved 21
Length of broken  1
Finished Part 21
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_fifthSegment_part_21.csv
Starting Part 22 from 22000 to 23000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:36<00:00,  1.44it/s]


10000 10000
Finished Part 22
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_fifthSegment_part_22.csv
Starting Part 23 from 23000 to 24000


 62%|████████████████████████████████████████████████▉                              | 620/1000 [06:59<04:14,  1.49it/s]

Broke at 143992.mp3 


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [11:21<00:00,  1.47it/s]


9990 9990
Broken Part saved 23
Length of broken  1
Finished Part 23
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_fifthSegment_part_23.csv
Starting Part 24 from 24000 to 25000


100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [10:30<00:00,  1.59it/s]


10000 10000
Finished Part 24
Saved as tracksAudioFeatures_150/tracksAudioFeatures_150_fifthSegment_part_24.csv


After we have all the datasets parts, we need to merge them, we merge broken into a single dataset too.

In [11]:
datasets = os.listdir('./tracksAudioFeatures_150')

In [12]:
tracksAudioFeatures_complete = pd.DataFrame()
broken_tracksAudioFeatures_complete = pd.DataFrame()

for i in tqdm(datasets):
    x = pd.read_csv('./tracksAudioFeatures_150/' + i)
    if 'broken' not in i:
        print(i)
        print(type(x['mfccFeature_0'].iloc[0]))
        print(len(x['mfccFeature_0'].iloc[0]))
        print(len(x['mfccFeature_0'].iloc[0].split(',')))
        tracksAudioFeatures_complete = pd.concat([tracksAudioFeatures_complete,x])
    else:
        broken_tracksAudioFeatures_complete = pd.concat([broken_tracksAudioFeatures_complete,x])
        
tracksAudioFeatures_complete = tracksAudioFeatures_complete.drop(['Unnamed: 0'], axis=1)
broken_tracksAudioFeatures_complete = broken_tracksAudioFeatures_complete.drop(['Unnamed: 0'], axis=1)
       

 27%|██████████████████████▋                                                            | 9/33 [00:01<00:03,  6.40it/s]

tracksAudioFeatures_150_fifthSegment_part_20.csv
<class 'str'>
1466
125


 30%|████████████████████████▊                                                         | 10/33 [00:02<00:07,  3.07it/s]

tracksAudioFeatures_150_fifthSegment_part_21.csv
<class 'str'>
1432
125


 33%|███████████████████████████▎                                                      | 11/33 [00:04<00:11,  1.98it/s]

tracksAudioFeatures_150_fifthSegment_part_22.csv
<class 'str'>
1456
125


 36%|█████████████████████████████▊                                                    | 12/33 [00:05<00:14,  1.47it/s]

tracksAudioFeatures_150_fifthSegment_part_23.csv
<class 'str'>
1429
125


 39%|████████████████████████████████▎                                                 | 13/33 [00:07<00:16,  1.19it/s]

tracksAudioFeatures_150_fifthSegment_part_24.csv
<class 'str'>
1453
125


 42%|██████████████████████████████████▊                                               | 14/33 [00:08<00:18,  1.02it/s]

tracksAudioFeatures_150_firstSegment_part_0.csv
<class 'str'>
1460
125


 45%|█████████████████████████████████████▎                                            | 15/33 [00:10<00:20,  1.12s/it]

tracksAudioFeatures_150_firstSegment_part_1.csv
<class 'str'>
1477
125


 48%|███████████████████████████████████████▊                                          | 16/33 [00:11<00:20,  1.20s/it]

tracksAudioFeatures_150_firstSegment_part_2.csv
<class 'str'>
1459
125


 52%|██████████████████████████████████████████▏                                       | 17/33 [00:13<00:20,  1.31s/it]

tracksAudioFeatures_150_firstSegment_part_3.csv
<class 'str'>
1471
125


 55%|████████████████████████████████████████████▋                                     | 18/33 [00:14<00:20,  1.36s/it]

tracksAudioFeatures_150_firstSegment_part_4.csv
<class 'str'>
1463
125


 58%|███████████████████████████████████████████████▏                                  | 19/33 [00:16<00:19,  1.39s/it]

tracksAudioFeatures_150_fourthSegment_part_15.csv
<class 'str'>
1378
125


 61%|█████████████████████████████████████████████████▋                                | 20/33 [00:17<00:18,  1.41s/it]

tracksAudioFeatures_150_fourthSegment_part_16.csv
<class 'str'>
1475
125


 64%|████████████████████████████████████████████████████▏                             | 21/33 [00:18<00:16,  1.42s/it]

tracksAudioFeatures_150_fourthSegment_part_17.csv
<class 'str'>
1461
125


 67%|██████████████████████████████████████████████████████▋                           | 22/33 [00:20<00:15,  1.43s/it]

tracksAudioFeatures_150_fourthSegment_part_18.csv
<class 'str'>
1464
125


 70%|█████████████████████████████████████████████████████████▏                        | 23/33 [00:21<00:14,  1.43s/it]

tracksAudioFeatures_150_fourthSegment_part_19.csv
<class 'str'>
1457
125


 73%|███████████████████████████████████████████████████████████▋                      | 24/33 [00:23<00:12,  1.44s/it]

tracksAudioFeatures_150_secondSegment_part_5.csv
<class 'str'>
1448
125
tracksAudioFeatures_150_secondSegment_part_6.csv
<class 'str'>
1471
125


 79%|████████████████████████████████████████████████████████████████▌                 | 26/33 [00:27<00:11,  1.67s/it]

tracksAudioFeatures_150_secondSegment_part_7.csv
<class 'str'>
1339
125
tracksAudioFeatures_150_secondSegment_part_8.csv
<class 'str'>
1461
125


 82%|███████████████████████████████████████████████████████████████████               | 27/33 [00:30<00:12,  2.04s/it]

tracksAudioFeatures_150_secondSegment_part_9.csv
<class 'str'>
1317
125


 85%|█████████████████████████████████████████████████████████████████████▌            | 28/33 [00:33<00:12,  2.43s/it]

tracksAudioFeatures_150_thirdSegment_part_10.csv
<class 'str'>
1447
125


 88%|████████████████████████████████████████████████████████████████████████          | 29/33 [00:37<00:11,  2.99s/it]

tracksAudioFeatures_150_thirdSegment_part_11.csv
<class 'str'>
1475
125


 94%|█████████████████████████████████████████████████████████████████████████████     | 31/33 [00:41<00:04,  2.48s/it]

tracksAudioFeatures_150_thirdSegment_part_12.csv
<class 'str'>
1476
125
tracksAudioFeatures_150_thirdSegment_part_13.csv
<class 'str'>
1458
125


100%|██████████████████████████████████████████████████████████████████████████████████| 33/33 [00:46<00:00,  1.42s/it]

tracksAudioFeatures_150_thirdSegment_part_14.csv
<class 'str'>
1469
125





Finally we have a total number of 249790 to be 124895 and 21 broken tracks. We explore the dataset to further to verify if all the shapes and types are correct.

In [13]:
tracksAudioFeatures_complete.head(5)

Unnamed: 0,id,genre,mfccFeature_0,mfccFeature_1,mfccFeature_2,mfccFeature_3,mfccFeature_4,mfccFeature_5,mfccFeature_6,mfccFeature_7,mfccFeature_8,mfccFeature_9,mfccFeature_10,mfccFeature_11,mfccFeature_12,mfccFeature_13,mfccFeature_14
0,124423_0,Old-Time / Historic,"[-331.76343, -206.83507, -150.79898, -144.4678...","[136.35646, 151.65372, 144.99507, 141.45316, 1...","[-92.903046, -127.679794, -140.80212, -139.403...","[-62.66017, -49.059425, -51.346558, -52.181923...","[-38.805756, -47.862038, -51.127518, -48.02935...","[-56.836433, -61.2565, -52.826473, -51.1662, -...","[-17.287859, -34.948433, -40.68494, -42.830673...","[-3.4888515, -4.506544, -0.52261865, 0.2448553...","[-16.90616, -24.62344, -23.808636, -21.55201, ...","[-1.4969412, -6.315495, -2.0328617, 5.059345, ...","[-17.040787, -17.715935, -17.042439, -9.658015...","[-28.81857, -30.039116, -20.462616, -15.730592...","[8.894039, 8.015279, 1.41698, -6.1483583, -11....","[9.8792515, 10.081762, 19.26508, 17.607117, 9....","[-4.434949, -3.4882398, 6.579256, 11.005165, 5..."
1,124423_1,Old-Time / Historic,"[-141.70447, -151.02011, -204.6398, -219.88483...","[70.76153, 69.304504, 54.755623, 62.831505, 70...","[-69.07377, -96.71388, -131.54568, -131.25073,...","[-2.5633497, -6.4746003, -19.870838, -15.17943...","[-10.86871, -12.382185, -22.568533, -15.358562...","[-53.357117, -58.777496, -64.76073, -66.34716,...","[-30.973467, -42.006577, -52.474915, -52.40824...","[14.31975, 16.98157, 23.953087, 23.335302, 18....","[-11.362137, -24.974201, -46.38675, -46.458347...","[-13.941832, -9.682327, -2.2792099, -0.9318579...","[11.06698, 18.50761, 18.23468, 19.086136, 18.4...","[0.7836997, -5.3090725, -9.245029, -15.167604,...","[-2.434061, 11.934198, 29.585491, 28.20034, 28...","[-0.4299671, 3.6665812, 6.398292, 3.2390728, -...","[2.2310064, -1.6334496, -8.868323, -8.320509, ..."
2,124423_2,Old-Time / Historic,"[-236.76901, -204.88608, -202.79424, -211.7985...","[164.211, 161.77617, 139.38297, 140.27386, 161...","[-85.3822, -120.07537, -146.21246, -147.70352,...","[-32.252678, -34.46131, -51.14087, -54.568718,...","[-7.4214034, -23.076878, -40.924683, -43.25477...","[-31.144537, -41.805965, -56.76655, -58.119328...","[-14.987923, -26.094719, -46.611755, -49.33288...","[4.4418592, 6.2212133, -7.901986, -11.970848, ...","[-12.005516, -27.454962, -43.98812, -42.64355,...","[-5.6790867, 3.272326, -2.2846823, -5.917389, ...","[-6.2902756, -5.854392, -6.686449, -6.7126894,...","[-5.6098385, -6.0516605, -5.2029867, -0.692430...","[7.336014, 12.688293, 12.705888, 18.635963, 14...","[2.09269, 1.3645461, 0.6684495, 1.5747235, 1.6...","[-4.1928773, -4.8164897, -5.367966, -6.704213,..."
3,124423_3,Old-Time / Historic,"[-270.1199, -205.64766, -187.19708, -196.86424...","[132.28531, 125.65335, 118.08946, 109.44469, 1...","[-109.53642, -122.49601, -123.42473, -127.4861...","[-33.72352, -29.988556, -31.954174, -41.72948,...","[-19.570417, -19.239956, -17.20611, -22.123035...","[-69.30449, -66.40855, -59.141136, -52.02983, ...","[-46.290672, -56.59155, -51.76654, -42.49778, ...","[7.9393797, 9.805471, 5.3692656, 4.1625504, 2....","[-21.933971, -32.40931, -30.753263, -21.176008...","[-10.03478, -6.5470505, 2.297513, 11.649738, 6...","[-7.3654776, -8.952282, -1.9247813, 1.6845284,...","[-17.044058, -15.736282, -6.0419655, -5.505317...","[24.518562, 24.02679, 13.123839, -2.288925, -4...","[2.2488708, -2.7423797, -3.6004543, -4.172913,...","[-22.445217, -25.029558, -21.466507, -17.66163..."
4,124423_4,Old-Time / Historic,"[-390.48816, -339.31622, -317.2405, -326.96646...","[97.122345, 115.60747, 127.4955, 128.93234, 14...","[-84.24397, -109.15258, -110.69359, -99.36203,...","[-40.43964, -44.553093, -47.186836, -41.452816...","[0.28208655, -1.2433429, -6.5626597, -0.983691...","[-36.504112, -43.72944, -42.5904, -35.0524, -4...","[-22.552574, -24.396482, -21.35398, -15.982605...","[1.2889054, 0.24470624, -1.0263059, 3.6228871,...","[-9.089966, -12.901368, -15.339406, -10.585323...","[-9.621838, -6.8267345, -4.4633904, -3.513043,...","[-6.904684, -6.3937654, -4.0830145, -0.6454271...","[-2.2965891, -8.502451, -8.448065, -3.9730012,...","[11.558136, 10.830755, 6.7897396, 8.929026, 13...","[5.913314, 5.5654097, 1.5172951, 0.10546613, 2...","[-6.5406847, -3.1975913, -1.6860085, -6.242577..."


In [14]:
tracksAudioFeatures_complete.shape

(249790, 17)

In [15]:
broken_tracksAudioFeatures_complete.head(5)

Unnamed: 0,filename
0,126981.mp3
1,127336.mp3
0,133297.mp3
0,143992.mp3
0,001486.mp3


In [16]:
len(broken_tracksAudioFeatures_complete)

21

In [17]:
print(type(tracksAudioFeatures_complete['mfccFeature_0'].iloc[0]))
print(len(tracksAudioFeatures_complete['mfccFeature_0'].iloc[0]))
print(len(tracksAudioFeatures_complete['mfccFeature_0'].iloc[0].split(',')))

<class 'str'>
1466
125


In [20]:
tracksAudioFeatures_complete.to_csv('./tracksAudioFeatures_150/tracksAudioFeatures_125_complete.csv')
broken_tracksAudioFeatures_complete.to_csv('./tracksAudioFeatures_150/broken_tracksAudioFeatures_125_complete.csv')

The audio features will then be combined with theme features of same length to create the final dataset