#### COMSAR Turorial

# Loading Timbre Track feature files

In order to use this notebook, you first have to export a set of feature files from ESRA.
If you haven nott, go to [ESRA](https://esra.fbkultur.uni-hamburg.de/) create a new selection
add some titles to it, and export their audio features.

In [1]:
import numpy as np
import pandas as pd

## 1. Load features from ESRA

ESRA Timbre Track feature files are simply csv files. Rows correspond to the $n$th STFT segment. Columns represent features.

In [2]:
raw_515 = pd.read_csv('../data/515-timbre.csv')
raw_515

Unnamed: 0,https://esra.fbkultur.uni-hamburg.de/explore/view?entity_id=515,centroid,spread,skewness,kurtosis,loudness,sharpness,roughness,flux
0,0,9983.942910,3718.935397,-0.744758,2.588002,65.542956,0.054649,0.000064,0.322574
1,1,9805.327623,4244.812836,-0.881107,2.553325,65.037413,0.021519,0.000011,0.134093
2,2,9203.276384,4440.022020,-0.626652,2.153447,65.028410,0.020421,0.000010,0.730260
3,3,8881.367429,4512.140976,-0.382792,1.754592,65.148284,0.023020,0.000013,2.856374
4,4,7879.407845,3866.439516,-0.016991,1.916608,65.705151,0.044893,0.000060,0.584241
...,...,...,...,...,...,...,...,...,...
15219,15219,6878.016265,3984.745093,0.348447,1.957656,65.355012,0.026131,0.000021,0.578096
15220,15220,8187.715794,4209.887036,-0.205816,1.820182,65.102992,0.022144,0.000014,0.680125
15221,15221,8866.021319,4239.620564,-0.335393,1.820388,65.170248,0.026088,0.000018,0.641975
15222,15222,9636.780512,3882.403590,-0.662754,2.376123,64.924752,0.024321,0.000014,0.522111


## 2. Feature selection

Selction of features is crucial for neural network training. Which features you use depends on you and your goals. Also, the SOM cannot (easily) learn time series data. Hence, you have to aggregate the time dimension.

### 2.1 Take only certain features

In [3]:
my_features = ['centroid', 'roughness']
raw_515[my_features]

Unnamed: 0,centroid,roughness
0,9983.942910,0.000064
1,9805.327623,0.000011
2,9203.276384,0.000010
3,8881.367429,0.000013
4,7879.407845,0.000060
...,...,...
15219,6878.016265,0.000021
15220,8187.715794,0.000014
15221,8866.021319,0.000018
15222,9636.780512,0.000014


### 2.2 Take only certain STFT segments

In [4]:
start = 20
stop = 100
step = 5
my_segments = slice(start, stop, step)
raw_515.iloc[my_segments]

Unnamed: 0,https://esra.fbkultur.uni-hamburg.de/explore/view?entity_id=515,centroid,spread,skewness,kurtosis,loudness,sharpness,roughness,flux
20,20,10784.393442,3463.970148,-1.068177,3.431734,65.319716,0.053023,5.7e-05,1.066819
25,25,8349.881147,4114.936646,-0.182253,1.742108,65.645011,0.044125,4.8e-05,2.738632
30,30,8182.244923,4645.318724,-0.253286,1.701299,65.075039,0.017986,8e-06,0.520296
35,35,9039.10829,4293.150636,-0.768592,2.264628,65.04004,0.018678,1e-05,0.072892
40,40,8800.271448,4303.346504,-0.363012,1.864678,65.348445,0.030777,2.2e-05,0.113309
45,45,9150.418601,4576.45583,-0.694357,2.091417,64.89801,0.015661,7e-06,0.271658
50,50,10170.135786,3745.373725,-0.859136,2.852902,65.257716,0.038051,3e-05,6.407647
55,55,10327.159526,3632.650515,-0.923328,3.036293,65.354458,0.046754,4.7e-05,4.822905
60,60,8898.491855,4404.858012,-0.354163,1.798745,65.195749,0.027105,1.8e-05,0.649228
65,65,9226.562308,4571.146966,-0.556022,1.967588,64.901857,0.017318,7e-06,0.162573


### 2.3 Aggregate over time

Choose a form of aggregation, like mean value or standard deviation.

In [5]:
features = ['centroid', 'spread', 'roughness', 'flux']
feat_515 = raw_515[features].iloc[20:]    # 20: means from segment 20 till the end
feat_515 = feat_515.mean().to_numpy()

In [6]:
feat_515    # This is how a single title is presented to the SOM.

array([1.91478845e+03, 2.11132686e+03, 1.57729118e-03, 2.57756297e+00])

## 3. Repeat

Choose a method and Repeat the above steps for all your csv files. Then, concatenate the features to a (big) SOM feature matrix (sfm).

In [7]:
# dummy data just for the example
feat_516 = np.random.rand(1, 4)
feat_517 = np.random.rand(1, 4)

In [8]:
feat_matrix = np.vstack((feat_515, feat_516, feat_517))
feat_matrix

array([[1.91478845e+03, 2.11132686e+03, 1.57729118e-03, 2.57756297e+00],
       [1.14007368e-01, 2.30311383e-01, 1.00486885e-02, 4.90907724e-01],
       [2.00552648e-01, 8.69712500e-01, 6.22862326e-01, 1.70603317e-01]])

It is good practice to convert the raw sfm array to a dataframe in order to retain information about the included features and songs.

In [9]:
song_names = ['Song1', 'Song2', 'SOng3']
sfm = pd.DataFrame(feat_matrix, index=song_names, columns=features)

In [10]:
sfm

Unnamed: 0,centroid,spread,roughness,flux
Song1,1914.788445,2111.326855,0.001577,2.577563
Song2,0.114007,0.230311,0.010049,0.490908
SOng3,0.200553,0.869713,0.622862,0.170603


## 4. Save your work

Once you've finished constructing the feature matrix it is usefull store you work for later reuse.
Since the feature matrix is simply a table of numerical values you can safely export it as csv file,
or any other file format you like.

In [11]:
sfm.to_csv('../data/my_project.sfm')

Read the file again with:

In [12]:
sfm2 = pd.read_csv('../data/my_project.sfm', index_col=0)

In [13]:
sfm2

Unnamed: 0,centroid,spread,roughness,flux
Song1,1914.788445,2111.326855,0.001577,2.577563
Song2,0.114007,0.230311,0.010049,0.490908
SOng3,0.200553,0.869713,0.622862,0.170603
