# Exploring AcousticBrainz classifier stability

AcousticBrainz has a large amount of classifier information available, and is used quite often as a source for psychological claims based on music like listening preferences over time, the influence of the seasons on our preferences, or claims that [pop music is getting sadder](http://www.bbc.com/culture/story/20190513-is-pop-music-really-getting-sadder-and-angrier).

Low level features are known to be unstable (see reference in my notes), so the hypothesis is that these results from the AcousticBrainz classifiers are very much dependent on things like source quality. Furthermore, high-level features have additional problems like which emotion model do you use, and differences in interpretation, for example: how do you interpret a 'party mood'? Furthermore, the 'ground truth' that these models are trained on is also subjective. If scientific claims made using the results from such classifiers as a basis, then these claims might not be true if the classifiers are unreliable.

Due to the crowdsourcing nature of AcousticBrainz, multiple submissions exist for the same recording, meaning that the classifier has been run multiple times over different submissions of the same recording. If these classifiers are accurate, then the results should remain fairly stable when minor variations in for example audio quality occur - a sad song should not become happy if the quality is higher, for example. 

Thus, the first question to answer is: **How stable are the classifiers included in AcousticBrainz** and a second question that arises is **Which classifiers are relatively stable, and which classifiers are relatively unstable?**


First, we import all required packages and load in the acousticbrainz dataset which was generated by running the scripts in ```acousticbrainz_data_generation```:

In [65]:
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Use seaborn style defaults and set the default figure size
sns.set(rc={'figure.figsize':(10, 10)})

# Load in the acousticbrainz dataset into the variable 'acousticbrainz'
acousticbrainz = pd.read_hdf(Path.cwd() / 'datasets' / 'acousticbrainz.h5')

The dataframe is indexed into two levels. The first level is the MBID and the second level is the submission id. Cell values are the label probabilities as given back by the classifier. The dataframe looks as follows:

In [66]:
acousticbrainz

Unnamed: 0_level_0,Unnamed: 1_level_0,danceability,danceability,gender,gender,genre_dortmund,genre_dortmund,genre_dortmund,genre_dortmund,genre_dortmund,genre_dortmund,...,moods_mirex,moods_mirex,moods_mirex,moods_mirex,timbre,timbre,tonal_atonal,tonal_atonal,voice_instrumental,voice_instrumental
Unnamed: 0_level_1,Unnamed: 1_level_1,danceable,not_danceable,female,male,alternative,blues,electronic,folkcountry,funksoulrnb,jazz,...,Cluster2,Cluster3,Cluster4,Cluster5,bright,dark,atonal,tonal,instrumental,voice
000009a8-34f1-4c58-a8de-1d99809cd626,0,3.000001e-14,1.000000,0.622127,0.377873,0.048124,0.010853,0.815454,0.032144,0.002521,0.059219,...,0.044404,0.227748,0.030632,0.619994,0.010919,0.989081,0.999987,0.000013,0.999995,0.000005
000009a8-34f1-4c58-a8de-1d99809cd626,1,2.751654e-01,0.724835,0.588847,0.411153,0.276660,0.022330,0.096782,0.438989,0.004316,0.097606,...,0.359569,0.389190,0.099670,0.060685,0.766210,0.233790,0.029674,0.970326,0.049349,0.950651
00000baf-9215-483a-8900-93756eaf1cfc,0,9.999093e-01,0.000091,0.500000,0.500000,0.123210,0.029354,0.678225,0.018370,0.004644,0.027921,...,0.059108,0.078950,0.058210,0.450918,0.382892,0.617108,0.222329,0.777671,0.508506,0.491494
00000baf-9215-483a-8900-93756eaf1cfc,1,9.999142e-01,0.000086,0.500000,0.500000,0.122299,0.029067,0.678300,0.017725,0.004513,0.029749,...,0.059855,0.078234,0.056700,0.455807,0.323493,0.676507,0.265459,0.734541,0.546402,0.453598
00000baf-9215-483a-8900-93756eaf1cfc,2,3.000001e-14,1.000000,0.622127,0.377873,0.001199,0.000155,0.997356,0.000136,0.000028,0.000596,...,0.044404,0.227748,0.030632,0.619994,0.011846,0.988154,0.993315,0.006685,0.997375,0.002625
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ffff5419-7355-4c06-9ad1-8e39f4d25627,0,3.000001e-14,1.000000,0.622127,0.377873,0.001664,0.000172,0.997439,0.000236,0.000015,0.000166,...,0.044404,0.227748,0.030632,0.619994,0.011438,0.988562,0.993962,0.006038,0.473732,0.526268
ffff6ec8-62eb-4791-b7aa-6a4394c12ae7,0,2.363820e-01,0.763618,0.172870,0.827130,0.183566,0.053612,0.496118,0.140909,0.008814,0.013020,...,0.256713,0.245888,0.219054,0.097303,0.896236,0.103764,0.068383,0.931617,0.011001,0.988999
ffff9c3f-85da-404e-b2e8-14a500b62648,0,2.547902e-01,0.745210,0.777648,0.222352,0.147801,0.104918,0.152298,0.328587,0.013151,0.018729,...,0.502939,0.170402,0.179711,0.044762,0.997130,0.002870,0.011740,0.988260,0.018471,0.981529
ffffa8a0-5949-403d-aa19-ec7346db2254,0,3.602032e-01,0.639797,0.003200,0.996800,0.087463,0.015773,0.857965,0.019118,0.001772,0.002774,...,0.097182,0.424299,0.241829,0.179418,0.989849,0.010151,0.190831,0.809169,0.000866,0.999134


Some recordings have many submissions, like `Bohemian Rhapsody`:

In [67]:
acousticbrainz.loc['b1a9c0e9-d987-4042-ae91-78d6a3267d69']

Unnamed: 0_level_0,danceability,danceability,gender,gender,genre_dortmund,genre_dortmund,genre_dortmund,genre_dortmund,genre_dortmund,genre_dortmund,...,moods_mirex,moods_mirex,moods_mirex,moods_mirex,timbre,timbre,tonal_atonal,tonal_atonal,voice_instrumental,voice_instrumental
Unnamed: 0_level_1,danceable,not_danceable,female,male,alternative,blues,electronic,folkcountry,funksoulrnb,jazz,...,Cluster2,Cluster3,Cluster4,Cluster5,bright,dark,atonal,tonal,instrumental,voice
0,0.1489382,0.851062,0.578761,0.421239,1.375808e-08,1.664143e-08,0.999978,7.003212e-06,2.72635e-07,1.4e-05,...,0.079465,0.353936,0.10881,0.379845,0.768909,0.231091,0.910963,0.089037,0.218576,0.781424
1,0.2619256,0.738074,0.596835,0.403165,1.789594e-08,2.16274e-08,0.99998,7.017995e-06,2.669164e-07,1.2e-05,...,0.076259,0.33565,0.108706,0.406341,0.831004,0.168996,0.967799,0.032201,0.432707,0.567293
2,0.1438872,0.856113,0.566937,0.433063,1.386684e-08,1.677844e-08,0.999978,7.049242e-06,2.723325e-07,1.4e-05,...,0.079381,0.352216,0.10907,0.381422,0.785194,0.214806,0.913616,0.086384,0.201462,0.798538
3,3.000001e-14,1.0,0.622127,0.377873,1.741867e-10,1.964787e-10,0.999995,3.106572e-07,4.512222e-08,4e-06,...,0.044404,0.227748,0.030632,0.619994,0.072778,0.927222,0.995248,0.004752,0.999997,3e-06
4,0.1855529,0.814447,0.628536,0.371464,2.096072e-08,2.513858e-08,0.999976,8.445537e-06,3.218076e-07,1.4e-05,...,0.07927,0.358264,0.111655,0.374395,0.717762,0.282238,0.965311,0.034689,0.273957,0.726043
5,0.1131953,0.886805,0.928412,0.071588,3.121276e-08,3.67638e-08,0.99997,9.376518e-06,3.10173e-07,1.9e-05,...,0.080853,0.378578,0.112081,0.350486,0.692965,0.307035,0.816579,0.183421,0.26265,0.73735
6,0.151217,0.848783,0.798306,0.201694,1.611008e-08,1.925473e-08,0.999977,7.245709e-06,2.484353e-07,1.4e-05,...,0.077502,0.375621,0.110121,0.362332,0.777267,0.222733,0.896466,0.103534,0.305524,0.694476
7,3.000001e-14,1.0,0.622127,0.377873,1.525879e-10,1.720485e-10,0.999996,2.916429e-07,3.915462e-08,4e-06,...,0.044404,0.227748,0.030632,0.619994,0.1036,0.8964,0.995494,0.004506,0.999999,1e-06
8,0.1466795,0.85332,0.640508,0.359492,1.397012e-08,1.68537e-08,0.999978,7.063604e-06,2.700113e-07,1.4e-05,...,0.079751,0.354635,0.108111,0.37964,0.796506,0.203494,0.896532,0.103468,0.222471,0.777529
9,3.000001e-14,1.0,0.622127,0.377873,1.595018e-10,1.858952e-10,0.999996,3.262027e-07,4.674204e-08,4e-06,...,0.044404,0.227748,0.030632,0.619994,0.066084,0.933916,0.997094,0.002906,0.999994,6e-06


The dataframe contains the following classifications:

In [68]:
acousticbrainz.columns

MultiIndex([(      'danceability',           'danceable'),
            (      'danceability',       'not_danceable'),
            (            'gender',              'female'),
            (            'gender',                'male'),
            (    'genre_dortmund',         'alternative'),
            (    'genre_dortmund',               'blues'),
            (    'genre_dortmund',          'electronic'),
            (    'genre_dortmund',         'folkcountry'),
            (    'genre_dortmund',         'funksoulrnb'),
            (    'genre_dortmund',                'jazz'),
            (    'genre_dortmund',                 'pop'),
            (    'genre_dortmund',           'raphiphop'),
            (    'genre_dortmund',                'rock'),
            (  'genre_electronic',             'ambient'),
            (  'genre_electronic',                 'dnb'),
            (  'genre_electronic',               'house'),
            (  'genre_electronic',              'techno'

# Classifier variance
There are multiple ways to look at this, we can either see how stable the *probabilities* are, i.e. how stable is the certainty of the classifier in the label being a specific value or we can see how stable the *labels* are, i.e. for all submissions are the labels the same or do they flip?

We'll begin with the first case.

We are interested in the probability values for a given label for the independent variable mbid. Some mbids only have one submission. These do not give us any information about the variance and should be filtered out:

In [69]:
filt = acousticbrainz.groupby(level=0).size() > 1
acousticbrainz = acousticbrainz[filt[acousticbrainz.index.get_level_values(level=0)].values]

acousticbrainz

Unnamed: 0_level_0,Unnamed: 1_level_0,danceability,danceability,gender,gender,genre_dortmund,genre_dortmund,genre_dortmund,genre_dortmund,genre_dortmund,genre_dortmund,...,moods_mirex,moods_mirex,moods_mirex,moods_mirex,timbre,timbre,tonal_atonal,tonal_atonal,voice_instrumental,voice_instrumental
Unnamed: 0_level_1,Unnamed: 1_level_1,danceable,not_danceable,female,male,alternative,blues,electronic,folkcountry,funksoulrnb,jazz,...,Cluster2,Cluster3,Cluster4,Cluster5,bright,dark,atonal,tonal,instrumental,voice
000009a8-34f1-4c58-a8de-1d99809cd626,0,3.000001e-14,1.000000e+00,0.622127,0.377873,0.048124,0.010853,0.815454,0.032144,0.002521,0.059219,...,0.044404,0.227748,0.030632,0.619994,0.010919,0.989081,0.999987,1.304279e-05,0.999995,0.000005
000009a8-34f1-4c58-a8de-1d99809cd626,1,2.751654e-01,7.248346e-01,0.588847,0.411153,0.276660,0.022330,0.096782,0.438989,0.004316,0.097606,...,0.359569,0.389190,0.099670,0.060685,0.766210,0.233790,0.029674,9.703256e-01,0.049349,0.950651
00000baf-9215-483a-8900-93756eaf1cfc,0,9.999093e-01,9.071009e-05,0.500000,0.500000,0.123210,0.029354,0.678225,0.018370,0.004644,0.027921,...,0.059108,0.078950,0.058210,0.450918,0.382892,0.617108,0.222329,7.776706e-01,0.508506,0.491494
00000baf-9215-483a-8900-93756eaf1cfc,1,9.999142e-01,8.580889e-05,0.500000,0.500000,0.122299,0.029067,0.678300,0.017725,0.004513,0.029749,...,0.059855,0.078234,0.056700,0.455807,0.323493,0.676507,0.265459,7.345413e-01,0.546402,0.453598
00000baf-9215-483a-8900-93756eaf1cfc,2,3.000001e-14,1.000000e+00,0.622127,0.377873,0.001199,0.000155,0.997356,0.000136,0.000028,0.000596,...,0.044404,0.227748,0.030632,0.619994,0.011846,0.988154,0.993315,6.685311e-03,0.997375,0.002625
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
fffe453c-b68b-4e43-9cef-b6767a587415,2,4.945789e-01,5.054211e-01,0.474031,0.525969,0.005022,0.002608,0.970523,0.004708,0.000768,0.012312,...,0.346396,0.130623,0.242688,0.130820,0.256248,0.743752,0.807171,1.928292e-01,0.174782,0.825218
fffe453c-b68b-4e43-9cef-b6767a587415,3,4.946274e-01,5.053725e-01,0.473776,0.526224,0.005021,0.002607,0.970527,0.004708,0.000768,0.012311,...,0.346324,0.130625,0.242698,0.130857,0.256226,0.743774,0.807171,1.928292e-01,0.174712,0.825288
fffe453c-b68b-4e43-9cef-b6767a587415,4,9.999999e-01,1.000000e-07,0.622150,0.377850,0.000541,0.000164,0.997072,0.000250,0.000028,0.001558,...,0.044661,0.220980,0.030723,0.627361,0.002617,0.997383,1.000000,3.400401e-07,0.966488,0.033512
fffe453c-b68b-4e43-9cef-b6767a587415,5,8.023580e-01,1.976421e-01,0.416326,0.583674,0.006733,0.004116,0.956743,0.006299,0.000994,0.019151,...,0.343183,0.130389,0.226018,0.149125,0.256608,0.743392,0.912610,8.739018e-02,0.209344,0.790656


We have many populations with relatively small sizes, and a few populations with a bit more (~30), however these are not sample sizes large enough to give us a good estimate of the classifier variance on the same songs.

However, we can calculate the variance for each individual population (with the probabilities in each population indexed $j=0,...,j=n_{i}-1$
$$s_{i}^{2} = \frac{1}{n_{i}-1} \sum_{j=0}^{j=n_{i}-1}(y_{j} - \bar{y_i})^2$$

And then compute the pooled variance for each classifier by taking the weighted average for all $k$ populations indexed $k=0,...,k-1$ 
$$s_{p}^{2} = \frac{\sum_{i=0}^{k-1}(n_{i}-1)s_{i}^{2}}{\sum_{i=0}^{k-1}(n_{i}-1)}$$

In [115]:
variances = acousticbrainz.groupby(level=0).var()
samplesizes = acousticbrainz.groupby(level=0).size()

pooledvariance = (variances.mul(samplesizes-1, axis=0).sum()) / (samplesizes.sum() - samplesizes.count())

print(pooledvariance.sort_values().to_string())

genre_dortmund      funksoulrnb            0.000011
                    pop                    0.000044
genre_tzanetakis    met                    0.000162
                    reg                    0.000233
                    dis                    0.000279
ismir04_rhythm      Rumba-Misc             0.000285
genre_dortmund      raphiphop              0.000288
genre_tzanetakis    cou                    0.000361
genre_rosamerica    spe                    0.000393
genre_tzanetakis    pop                    0.000414
                    blu                    0.000493
genre_electronic    dnb                    0.000512
genre_dortmund      jazz                   0.000747
ismir04_rhythm      Quickstep              0.000812
moods_mirex         Cluster1               0.001002
ismir04_rhythm      Rumba-International    0.001010
genre_tzanetakis    cla                    0.001295
genre_dortmund      blues                  0.001583
                    alternative            0.001803
            