# What is Human Performance ? And what is Optimal Performance

For intuition about accuracy, how well it's possible to do on the data, would be helpful to compare to human performance and to optimal possible performance on the composer classification problem. 

* What is accuracy for average person classifying Bach or Not ? 

* What is accuarcy for average person classifying Bach, Brahms, Beethoven, Schubert ? 

* Expert classical musicians accuracy classifying Bach or Not ? 

* Expert classical musicians classifying all four ? 

* Optimal Limit, room to improve beyond experts with ML algorithms ? 

Initial Hypothesis: 

* Avg person: ~0.5-0.6 acc for Bach or Not , ~0.3-0.4 acc for picking between the four 

    - from own experience, and EDA, Bach disitinctive sound, others more similar 
    
    - would also guess Beethoven should be easy for average person, but from EDA beethoven data resmebles the Brahms and Schubert 

* Expert Classical musician: ~0.95 Bach or Not , ~0.8-0.9 all four

    - would expect an expert to recognize not just the compose but often the name of the piece itself
    
    - but maybe there are lesser know works, or maybe clip is too short to be sure even for an expert 
    
* Bayes Optimal:~0.95 - 1.0 Bach or Nto , ~0.95 all four

    - given so much detailed possible structure and features could extract from clip
    
    - but maybe there are similarities or clips are short enough that ~1.0 not possible
    
## Check ? How well can we do on small sample of streams from data set ? 

see Feature_Extraction_Pipeline for function details 

In [1]:
import music21
import music21.features as features
from music21 import midi
from music21 import stream
from helper_functions import *
from numpy.random import randint

import numpy as np

from collections import namedtuple

In [28]:
# get dictionary paths to *.mid files for each composer 

Composers = ['Bach','Beethoven','Brahms','Schubert']

midi_files = make_composer_dict(Composers,data_dir = './')

# print number of files each folder

for composer in Composers:
    print(f'There are {len(midi_files[composer])} compositions by {composer}')

There are 17 compositions by Bach
There are 132 compositions by Beethoven
There are 20 compositions by Brahms
There are 25 compositions by Schubert


In [3]:
# for seeding random samples 

from numpy.random import Generator, PCG64
rng = Generator(PCG64())

In [7]:
# function to extract n streams, target duration T seconds for composer 

# extracts n_clips each from n_tracks

# will use namedtuples as convient way to store clips, composer info

Clip = namedtuple('Clip',['composer','path','start','stop','seconds','stream'])

def extract_samples(composer,T=30,n_tracks=10,n_clips=3,show_streams=False):
    delta_measures = list(range(5,50,1))

    clips = []
    
    for audio_path in midi_files[composer][:n_tracks]:

        mf = midi.MidiFile()
        mf.open(audio_path) # path='abc.midi'
        mf.read()
        mf.close()
        s = midi.translate.midiFileToStream(mf)

        total_time = s.secondsMap[0]['durationSeconds']

        number_of_measures = len(s[0])  # think this is total number of measures in the stream?

        print(audio_path,'lasts approx ',total_time,' seconds')
        
        #pick n_clips random measures to start clipping from 
        
        #start_measures = randint(1,number_of_measures//2,size=n_clips)
        start_measures = rng.integers(1,number_of_measures//2,size=n_clips)
        
        for i,start in enumerate(start_measures):
            
            for delta in delta_measures:

                stop = start+delta

                excerpt_stream = s.measures(start,stop)

                clip_time = excerpt_stream.secondsMap[0]['durationSeconds']

                if clip_time>T: # clip sample streams until time is longer than T
                    print('clip: ',i,' --> ','t = ',clip_time,'start = ',start,'stop =',stop,' delta = ',delta)

                    #clip = (composer,audio_path,start,stop,clip_time,excerpt_stream)
                    
                    clip = Clip(composer=composer,
                                path=audio_path,
                                start=start,
                                stop=stop,
                                seconds=clip_time,
                                stream = excerpt_stream)
                    
                    clips.append(clip)
                    
                    if show_streams: excerpt_stream.show('midi')
                        
                    break
                    
    
    return clips

In [5]:
target_time = 30 # desired time in seconds for audio clips

n_clips = 1 # since clips range from 2 - 4 minutes, will get few different slices with some but not a lot of overlap

n_tracks = 3 # this will give 4*15 = 60 per composer , 240 in total

# keep track of extracted clips lists in a dictionary

composer_clips = dict()

In [9]:
#return list of clips for give composer 

# will go one at a time, could change to loop over list if were more than 4 composers to workwith 

for composer in Composers:
    composer_clips[composer] = extract_samples(composer,T=target_time,n_tracks=n_tracks,n_clips=n_clips)

./Bach/Cello Suite 3_BWV1009_2222_cs3-6gig.mid lasts approx  178.0581818181818  seconds
clip:  0  -->  t =  30.75 start =  17 stop = 57  delta =  40
./Bach/Violin Sonata in B minor_BWV1014_2284_vhs1_3.mid lasts approx  151.34728996720207  seconds
clip:  0  -->  t =  32.0 start =  10 stop = 25  delta =  15
./Bach/Violin Sonata No 1 in G minor_BWV1001_2243_vs1_3.mid lasts approx  186.63523499970867  seconds
clip:  0  -->  t =  48.478229203229205 start =  6 stop = 11  delta =  5
./Brahms/String Sextet No 2 in G major_OP36_2147_br36m4.mid lasts approx  555.7777490470717  seconds
clip:  0  -->  t =  31.85835686053077 start =  1 stop = 13  delta =  12
./Brahms/Piano Quartet No 1 in G minor_OP25_2151_br25m4.mid lasts approx  480.1360988926574  seconds
clip:  0  -->  t =  30.2 start =  205 stop = 233  delta =  28
./Brahms/Horn Trio in E-flat major_OP40_2158_bra40_1.mid lasts approx  484.7308941926688  seconds
clip:  0  -->  t =  31.0 start =  20 stop = 50  delta =  30
./Beethoven/String Quarte

In [18]:
clip_indices = []

for composer in Composers: 
    
    for i in range(3):
        
        clip_indices.append((composer,i))
        
clip_indices = np.random.permutation(clip_indices) # shuffle the samples

In [35]:
guess_list = []

for composer,index in clip_indices: 
    
    index = int(index)
    
    clip_stream = composer_clips[composer][index].stream

    clip_stream.show('midi')
    
    print("Listen and take a Guess... 0 - Bach , 1 Beethoven, 2 - Brahams, 3 - Schubert ")
    
    guess_list.append(Composers[int(input(f"Guess ??? :  "))])

Listen and take a Guess... 0 - Bach , 1 Beethoven, 2 - Brahams, 3 - Schubert 
Guess ??? :  0


Listen and take a Guess... 0 - Bach , 1 Beethoven, 2 - Brahams, 3 - Schubert 
Guess ??? :  2


Listen and take a Guess... 0 - Bach , 1 Beethoven, 2 - Brahams, 3 - Schubert 
Guess ??? :  1


Listen and take a Guess... 0 - Bach , 1 Beethoven, 2 - Brahams, 3 - Schubert 
Guess ??? :  1


Listen and take a Guess... 0 - Bach , 1 Beethoven, 2 - Brahams, 3 - Schubert 
Guess ??? :  2


Listen and take a Guess... 0 - Bach , 1 Beethoven, 2 - Brahams, 3 - Schubert 
Guess ??? :  3


Listen and take a Guess... 0 - Bach , 1 Beethoven, 2 - Brahams, 3 - Schubert 
Guess ??? :  2


Listen and take a Guess... 0 - Bach , 1 Beethoven, 2 - Brahams, 3 - Schubert 
Guess ??? :  3


Listen and take a Guess... 0 - Bach , 1 Beethoven, 2 - Brahams, 3 - Schubert 
Guess ??? :  2


Listen and take a Guess... 0 - Bach , 1 Beethoven, 2 - Brahams, 3 - Schubert 
Guess ??? :  1


Listen and take a Guess... 0 - Bach , 1 Beethoven, 2 - Brahams, 3 - Schubert 
Guess ??? :  2


Listen and take a Guess... 0 - Bach , 1 Beethoven, 2 - Brahams, 3 - Schubert 
Guess ??? :  3


In [38]:
print('Guess: ',guess_list)
    
answers = [c for c,i in clip_indices]

print('Answers: ',answers)

Guess:  ['Bach', 'Brahms', 'Beethoven', 'Beethoven', 'Brahms', 'Schubert', 'Brahms', 'Schubert', 'Brahms', 'Beethoven', 'Brahms', 'Schubert']
Answers:  ['Bach', 'Brahms', 'Schubert', 'Brahms', 'Beethoven', 'Bach', 'Brahms', 'Beethoven', 'Schubert', 'Schubert', 'Beethoven', 'Bach']


In [44]:
number_correct = sum([guess_list[i]==answers[i] for i in range(12)])

print(f'You got {number_correct} of our {12} = {number_correct/12} accuracy')

You got 3 of our 12 = 0.25 accuracy


## Lessons ? 


* after trying the experiment, I got 0.3 accuracy, and I only got 1 Bach correct, so my intuition that I could recognize BACH misguided maybe

* listening helped me see the task is pretty, at least for me someone not a classical music lover , regular listener 

* going to update my rough guess by subtracting ~0.1 percent , these are just to keep in mind later when thinking about ideal ML model performance 







Revised Hypothesis: 

* Avg person: ~0.4 acc for Bach or Not , ~0.2-0.3 acc for picking between the four 

    
* Expert Classical musician: ~0.8 Bach or Not , ~0.7-0.8 all four

    
* Bayes Optimal:~0.9 - 1.0 Bach or Nto , ~0.9 all four

  