In this notebook we extract features by using knowledge from Panda et al 2018 and converting features of jSymbolic xml2csv. Furthermore, all features are then saved as in a 2D numpy array format (one row one sample, one column one feature) in a csv file. Feature names are stated as column titles there.

In [1]:
# for jSymbolic csv convertion:
import xml.etree.ElementTree as Xet
import pandas as pd
import glob
import os

# for Panda et al 2018 feature extraction:
import mido
from mido import MidiFile
import numpy as np

In [2]:
# Getting the used versions of imports:
import sys
import_list = [sys, pd, mido, np] # sys gives us the python version
my_versions = ['3.7.3', '1.3.5','1.2.10','1.21.6']
# When using MusicBERT you get an error for numpy versions >= 1.24!
for ele, my_version in zip(import_list, my_versions):
    try:
        v = ele.__version__
        print
    except:
        try:
            v = ele.version
            
        except:
            v = 'cant say version'
    print(ele, ': \nyour version ', v, '\noriginally used version ', my_version, '\n')

<module 'sys' (built-in)> : 
your version  3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0] 
originally used version  3.7.3 

<module 'pandas' from '/home/c/anaconda3/lib/python3.7/site-packages/pandas/__init__.py'> : 
your version  1.3.5 
originally used version  1.3.5 

<module 'mido' from '/home/c/anaconda3/lib/python3.7/site-packages/mido/__init__.py'> : 
your version  1.2.10 
originally used version  1.2.10 

<module 'numpy' from '/home/c/anaconda3/lib/python3.7/site-packages/numpy/__init__.py'> : 
your version  1.21.6 
originally used version  1.21.6 



# Features from Panda et al 2018

In [3]:
def create_feature_names(arr, feature_name:str, one_row:dict):
    """ When we want to store the feature values of Panda et al 2018 in a csv file we have to create
    a dictionary containing the name for the individual feature (feature_name) we the related values """
    try: # array arr as input
        for i,ele in enumerate(arr):

            # Convert variable name into string:
           # for  k, v in locals().items(): # search in local stack only activated in this function,
            #    # is a dictionary with key=variable_name, value=variable_value
              #  if np.all(v == arr):
               #     feature_name_part_one = k
                #    break



            globals()[feature_name + str(i)] = ele # use globals() to store the vars in the global stack
            # local() --> funktioniert nur innerhalb der Funktion
            

            one_row[feature_name+str(i)] = ele
    except: # arr as scalar
        one_row[feature_name] = arr
    return one_row


# Extract information

Handle midi files where each track has its own channel and midi files where all channels are in one track:

Found out: velocity=0 with message type 'note_on' as alternative to 'note_off' (https://www.midi.org/forum/228-writing-midi-software-send-note-off,-or-zero-velocity-note-on)!


In [4]:
import basic_info_extraction as bie
# import bie.basic_tools_panda, bie.channel_mask

Your version identical with original version 
-> numpy 
-> 1.21.6


Your version identical with original version 
-> pandas 
-> 1.3.5


Your version identical with original version 
-> mido 
-> 1.2.10


Your version identical with original version 
-> sys 
-> 3.7.3




Panda et al., 2018, lines 389-391: The paper only considers the main track. However, there are samples where several tracks play an important role. Therefore, valuable information would get lost. To prevent this I created the arrays below. When you wish so, you can only consider a part of those arrays to get the main track, which should be the one with the most note entrances != 0.:<br><br>
**note_collector_all** -> 2d array to collect ALL notes even when one track/channel has polyphonic structure. Rows build up the single tracks/channels, columns give us the note midi values. <br>
**note_collector** -> Same as note_collector_all BUT only considers a monophonic structure per track/channel. So part of the polyphonic notes within one track/channel gets ignored.<br>
**time_passed_ar** -> Refers to the note_collector. Gives us the time when which note starts with concerning the start at time 0. Through that we can compare different tracks/channels with different timings. Rows show us the single tracks/channels and columns the starting times of single notes findable in note_collector.<br>
**time_passed_ar_end** -> Same as time_passed_ar but instead of counting the starting time of notes, here we compute the ending time of each note with origin time point 0.<br>
**track_names** -> 1d array containing the names of individual tracks<br>
**SAL** -> has the same row-column-structure as other arrays. Gives us the velocity/intensity median of each note. Assumes monophonic structure of single tracks.<br>
**nd** -> Orientation at note_collector. How long each note lasts. <br>
**CD** -> Orientation at note_collector. Do we have cresendo 'c', decresendo 'd' or non of both 'n' between notes?<br>
**pause_collector** -> Orientation at note_collector. Getting the pauses between notes <br>

What is correct: note_collector_all finds all notes which are played in the same time point. That shows us the code down there. c.mid has for the track 'Voice'/Alt exactly 26 notes which where played in the same moment.

In [5]:
#len(np.where(note_collector_all[4]!=-1)[0]) - len(np.where(note_collector[4]!=-1)[0])

# Finding out features

Basics. we have for each note $note_i$ of N notes:<br>
respective sequence of $f_0$-s of frames of number $L_i$ -> One note i varies in the frequency -> $f_{j,i}$ where $j=1, ,L_i$ <br>
MIDI note values (for each $f_0$) -> simply only use the f0 value for the current note i -> $midi_{j,i}$<br>
MIDI note value for entire note -> simply the value of one note -> $MIDI_i$<br>
sequence of pitch saliences -> $sal_{j,i}$<br>
note duration in sec -> $nd_i$<br>
starting time in sec -> $st_i$<br>
ending time in sec -> $et_i$

## 3.4.2 Melodic Features

### 3.4.2.1 MIDI Note Number (MNN) statistics

We need: $MIDI_i$ which gives us the note value.

In [6]:
def six_statistics(collecting_ar,output_1D=True):
    ''' Gives us statistics, mean, std, skewness, kurtosis, max, min, for each channel/voice'''

    MIDImean2, MIDIstd2, MIDIskew2, MIDIkurt2, MIDImax2, MIDImin2 = np.array([]), np.array([]), np.array([]), np.array([]), np.array([]), np.array([])
    for line in collecting_ar:
        
        if output_1D == True:
            line = collecting_ar.flatten()
            note_collector_all_filtered = line[np.where(line!=-1)]
        else: 
            note_entries_ind = np.where(line!=-1) # treats -1
            note_collector_all_filtered = line[note_entries_ind]
        
        number_note_entries = len(note_collector_all_filtered)
        line_mod = np.copy(line)
        line_mod[np.where(line_mod==-1)] = 0

        summed = np.sum(note_collector_all_filtered)
        MIDImean_single = summed/number_note_entries
        MIDImean2 = np.append(MIDImean2,MIDImean_single)

        MIDIstd_single = (np.sum((note_collector_all_filtered - MIDImean_single)**2)/number_note_entries)**0.5
        MIDIstd2 = np.append(MIDIstd2,MIDIstd_single)
       
        MIDIskew_single = np.sum((note_collector_all_filtered - MIDImean_single)**3)/((number_note_entries-1)*MIDIstd_single**3)
        if np.isnan(MIDIskew_single):
            MIDIskew_single = -1 
        MIDIskew2 = np.append(MIDIskew2,MIDIskew_single)

        MIDIkurt_single = number_note_entries * np.sum((note_collector_all_filtered - MIDImean_single)**4) / np.sum((note_collector_all_filtered - MIDImean_single**2)**2)
        if np.isnan(MIDIkurt_single):
            MIDIkurt_single = -1
        MIDIkurt2 = np.append(MIDIkurt2,MIDIkurt_single)

        try:
            MIDImax_single = np.max(note_collector_all_filtered)
        except: # we have no note in this track
            MIDImax_single = -1
        MIDImax2 = np.append(MIDImax2,MIDImax_single)

        try:
            MIDImin_single = np.min(note_collector_all_filtered)
        except: # we have no note in this track
            MIDImin_single = -1
        MIDImin2 = np.append(MIDImin2,MIDImin_single)    
        
        if output_1D == True:
            MIDImean2,MIDIstd2,MIDIskew2,MIDIkurt2 = MIDImean2[0],MIDIstd2[0],MIDIskew2[0],MIDIkurt2[0]
            MIDImax2 = MIDImax2[0]
            MIDImin2 = MIDImin2[0]
            break
        
    return MIDImean2, MIDIstd2, MIDIskew2, MIDIkurt2, MIDImax2, MIDImin2

**-> MIDImean**<br>
Let's take all the note values of note_collector_all. Add up all note values and then divide by their numbers. Make that for all tracks at once/whole piece.

In [7]:
# note_entries_ind = np.where(note_collector_all!=-1)
# note_collector_all_filtered = note_collector_all[note_entries_ind]
# number_note_entries = len(note_collector_all_filtered)
# summed = np.sum(note_collector_all_filtered)
# MIDImean = summed/number_note_entries
# MIDImean

<b><p style="color:red;">MIDImean already used in jSymbolic "Mean Pitch"</p></b>

For each track get its own mean:

In [8]:
# MIDImean2, MIDIstd2, MIDIskew2, MIDIkurt2, MIDImax2, MIDImin2 = six_statistics()
# MIDImean2[mask_over_voices]


**-> MIDIstd**

In [9]:
# MIDIstd = (np.sum((note_collector_all_filtered - MIDImean)**2)/number_note_entries)**0.5
# MIDIstd

<b><p style="color:red;">MIDIstd already used in jSymbolic "Pitch Variability"</p></b>

In [10]:
# MIDIstd2[mask_over_voices]

**-> MIDIskew**

In [11]:
# MIDIskew = np.sum((note_collector_all_filtered - MIDImean)**3)/((number_note_entries-1)*MIDIstd**3)
# MIDIskew

<b><p style="color:red;">MIDIskew already used in jSymbolic "Pitch Skewness"</p></b>

In [12]:
# MIDIskew2[mask_over_voices]

**-> MIDIkurt**

In [13]:
# MIDIkurt = number_note_entries * np.sum((note_collector_all_filtered - MIDImean)**4) / np.sum((note_collector_all_filtered - MIDImean**2)**2)
# MIDIkurt

<b><p style="color:red;">MIDIkurt already used in jSymbolic "Pitch Kurtosis"</p></b>

In [14]:
# MIDIkurt2[mask_over_voices]

**-> MIDImax**

In [15]:
# MIDImax = np.max(note_collector_all_filtered)
# MIDImax

In [16]:
# MIDImax2[mask_over_voices]

**-> MIDImin**

In [17]:
# MIDImin = np.min(note_collector_all_filtered)
# MIDImin

In [18]:
# MIDImin2[mask_over_voices]

### 3.4.2.2 Note Space Length (NSL), Chroma NSL (CNSL) -> Not in featrues.md

### 3.4.2.3 Register Distribution

"This class of features indicates how the notes of the predominant melody are distributed across different pitch ranges."<br>
Use https://computermusicresource.com/midikeys.html to find out different pitch range values.

In [19]:
def RD(lower_border, upper_border, note_ar):
    '''Considers each track/channel extra. Else it wouldnt make sense to get the pitched range of different
    pitch ranged tracks.'''
    RD_arr = np.array([])
    #for track in note_collector: # error seen at 11102022
    for track in note_ar:  
        #summed = np.sum(track[np.where((track>=lower_border) & (track<=upper_border))])
        valid = len(np.where((track>=lower_border) & (track<=upper_border))[0])
        N = len(np.where(track>-1)[0])
        rd = valid/N
        RD_arr = np.append(RD_arr, rd)
    return RD_arr

In [20]:
def RD_1D(lower_border, upper_border, note_ar):
    '''Considers each track/channel extra. Else it wouldnt make sense to get the pitched range of different
    pitch ranged tracks.'''
    
    a = note_ar.flatten()
    notes_cleaned = a[np.where(a!=-1)]
    valid = len(np.where((notes_cleaned>=lower_border) & (notes_cleaned<=upper_border))[0])
    N = len(notes_cleaned)
    return valid/N


**-> RDsoprano**


In [21]:
# RDsoprano = RD(72,96,note_collector_all)
# RDsoprano[mask_over_voices]

**-> RDmezzosoprano**


In [22]:
# RDmezzosoprano = RD(69,93,note_collector_all)
# RDmezzosoprano[mask_over_voices]

**-> RDcontralto**


In [23]:
# RDcontralto = RD(65,88,note_collector_all)
# RDcontralto[mask_over_voices]

**-> RDtenor**


In [24]:
# RDtenor = RD(59,81,note_collector_all)
# RDtenor[mask_over_voices]

**-> RDbaritone**


In [25]:
# RDbaritone = RD(55,77,note_collector_all)
# RDbaritone[mask_over_voices]

**-> RDbass**


In [26]:
#RDbass = RD(52,76,note_collector_all)
# RDbass[mask_over_voices]

### 3.4.2.4 Register Distribution per Second -> Not in features.md

### 3.4.2.5 Ratios of Pitch Transitions 

Notes are followed by higher/lower/same note -> melody contour/movement -> Is melody smooth/conjunct or disjunct?<br>
Here it makes sense to look at note_collector because here we have monophonic track note approach.

In [27]:
def ratios_of_transition_NS(ar, nd_usage):
    note_collector_mod = np.copy(ar)
    note_collector_mod[np.where(note_collector_mod==-1)] = float('nan')
    if nd_usage == False:
        spec_diff = (note_collector_mod[:,:-1] - note_collector_mod[:,1:])
    else:
        spec_diff = (note_collector_mod[:,:-1] / note_collector_mod[:,1:]) - 1
        
    
    THPNR, TLPNR, TEPNR = np.array([]),np.array([]),np.array([])

    NSmean, NSstd, NSskew, NSkurt, NSmax, NSmin =  np.array([]),np.array([]),np.array([]),np.array([]),np.array([]),np.array([])

    for line in spec_diff:
        if nd_usage==False:
            equal_ind = np.where(line==0)[0]
            lower_ind = np.where(line>0)[0]
            higher_ind = np.where(line<0)[0]
        else:
            equal_ind = np.where((line<0.1)&(line>-0.1))[0]
            lower_ind = np.where(line<-0.1)[0]
            higher_ind = np.where(line>0.1)[0]
            
        line_mod = line[~np.isnan(line)] # treats nan
        N = len(line_mod) # remove nan
        THPNR_single, TLPNR_single, TEPNR_single = len(higher_ind)/(N-1),len(lower_ind)/(N-1),len(equal_ind)/(N-1)
        THPNR = np.append(THPNR, THPNR_single)
        TLPNR = np.append(TLPNR, TLPNR_single)
        TEPNR = np.append(TEPNR, TEPNR_single)

        line_abs = abs(line_mod)
        NSmean_single = sum(line_abs)/(N-1)
        NSmean = np.append(NSmean, NSmean_single)

        NSstd_single = (np.sum((line_abs - NSmean_single)**2)/N)**0.5
        NSstd = np.append(NSstd, NSstd_single)

        NSskew_single = np.sum((line_abs - NSmean_single)**3)/((N-1)*NSstd_single**3)
        NSskew = np.append(NSskew, NSskew_single)

        NSkurt_single = N * np.sum((line_abs - NSmean_single)**4) / np.sum((line_abs - NSmean_single**2)**2)
        NSkurt = np.append(NSkurt, NSkurt_single)

        try:
            NSmax_single = np.max(line_abs)
        except: # We dont have a single note in this track
            NSmax_single = -1
        NSmax = np.append(NSmax, NSmax_single)

        try:
            NSmin_single = np.min(line_abs)
        except: # We dont have a single note in this track
            NSmin_single = -1
        NSmin = np.append(NSmin, NSmin_single)

    return NSmean, NSstd, NSskew, NSkurt, NSmax, NSmin, THPNR, TLPNR, TEPNR

In [28]:
def ratios_of_transition_NS_1D(ar, nd_usage=False):
    '''if usage then THPNR
        elif no usage NSmean'''
    note_collector_mod = np.copy(ar)
    note_collector_mod[np.where(note_collector_mod==-1)] = float('nan')
    if nd_usage == False:
        spec_diff = (note_collector_mod[:,:-1] - note_collector_mod[:,1:])
    else:
        spec_diff = (note_collector_mod[:,:-1] / note_collector_mod[:,1:]) - 1
        
    
    THPNR, TLPNR, TEPNR = 0,0,0#np.array([]),np.array([]),np.array([])

    NSmean, NSstd, NSskew, NSkurt, NSmax, NSmin =  np.array([]),np.array([]),np.array([]),np.array([]),np.array([]),np.array([])
    total_note_number_N = -1
    all_values_abs = []
    NS_part = 0
    for line in spec_diff:
        if nd_usage==False:
            equal_ind = np.where(line==0)[0]
            lower_ind = np.where(line>0)[0]
            higher_ind = np.where(line<0)[0]
        else:
            equal_ind = np.where((line<0.1)&(line>-0.1))[0]
            lower_ind = np.where(line<-0.1)[0]
            higher_ind = np.where(line>0.1)[0]
            
        line_mod = line[~np.isnan(line)] # treats nan
        N = len(line_mod) # remove nan
        total_note_number_N += N 
 
        THPNR += len(higher_ind)
        TLPNR += len(lower_ind)
        TEPNR += len(equal_ind)
        
        
        
       

        line_abs = abs(line_mod)
        NS_part += sum(line_abs)
        
        all_values_abs.extend(line_abs)
        
        
       

    NSmean = NS_part/total_note_number_N
    NSstd = (np.sum((np.asarray(all_values_abs)- NSmean)**2)/total_note_number_N)**0.5
    NSskew = np.sum((np.asarray(all_values_abs) - NSmean)**3)/((total_note_number_N)*NSstd**3)
    NSkurt = total_note_number_N * np.sum((np.asarray(all_values_abs) - NSmean)**4) / np.sum((np.asarray(all_values_abs) - NSmean**2)**2)
    
    NSmin = min(all_values_abs)
    NSmax = max(all_values_abs)
    
    
    return NSmean, NSstd, NSskew, NSkurt, NSmax, NSmin, THPNR/total_note_number_N, TLPNR/total_note_number_N, TEPNR/total_note_number_N

In [29]:
# NSmean, NSstd, NSskew, NSkurt, NSmax, NSmin, THPNR, TLPNR, TEPNR = ratios_of_transition_NS()

In [30]:
# nd

**-> Transitions to Higher Pitch Notes Ratio (THPNR)**<br>
Each track/channel gets extra entry

In [31]:
# THPNR[mask_over_voices]

**-> Transitions to Lower Pitch Notes Ratio (TLPNR)**

In [32]:
# TLPNR[mask_over_voices]

**-> Transitions to Equal Pitch Notes Ratio (TEPNR)**

In [33]:
# TEPNR[mask_over_voices]

### 3.4.2.6 Note Smoothness (NS) statistics

Smoothness means how close consecutive notes are. Take note_collector again because we look at a sequence and not a polyphonic happening.<br>
Compute the 6 statistics.

**-> NSmean**

In [34]:
# NSmean[mask_over_voices]

**-> NSstd**

In [35]:
# NSstd[mask_over_voices]

**-> NSskew**

In [36]:
# NSskew[mask_over_voices]

**-> NSkurt**

In [37]:
# NSkurt[mask_over_voices]

**-> NSmax**

In [38]:
# NSmax[mask_over_voices]

**-> NSmin**

In [39]:
# NSmin[mask_over_voices]

## 3.4.3 Dynamics Features

### 3.4.3.0 Getting started

### 3.4.3.1 Note Intensity (NI) statistics

Computing the 6 statistics based on median pitch salience:

In [40]:
# SALmean2, SALstd2, SALskew2, SALkurt2, SALmax2, SALmin2 = six_statistics(SAL)
# SALmean2[mask_over_voices]

### 3.4.3.2 Note Intensity Distribution

In [41]:
def note_intensity_distribution(note_ar, mean, std,output_1D=False):
    '''Considers each track/channel extra. Else it wouldnt make sense to get the pitched range of different
    pitch ranged tracks.'''
    
    LINR, MINR, HINR = np.array([]),np.array([]),np.array([])
    for ind, track in enumerate(note_ar):
        
        if output_1D==True:
            track = note_ar
            mean2 = mean
            std2 = std
        else:
            mean2 = mean[ind]
            std2 = std[ind]
        
        low_ind = len(np.where((track<=mean2-0.5*std2) & (track>-1) ) [0])
        medium_ind = len(np.where((track>mean2-0.5*std2)&(track<mean2+0.5*std2) & (track>-1) )[0])
        high_ind = len(np.where((track>=mean2+0.5*std2) & (track>-1) )[0])
        
        N = len(np.where(track>-1))
        
        LINR_single = low_ind/N
        LINR = np.append(LINR, LINR_single)
        
        MINR_single = medium_ind/N
        MINR = np.append(MINR, MINR_single)
        
        HINR_single = high_ind/N
        HINR = np.append(HINR, HINR_single)
        
        if output_1D==True:
            LINR,MINR,HINR = LINR[0],MINR[0],HINR[0]
            break
        
        
    return LINR, MINR, HINR

In [42]:
# LINR, MINR, HINR = note_intensity_distribution()


### 3.4.3.3 Note Intensity Distribution per Second -> not in features.md

### 3.4.3.4 Ratios of Note Intensity Transitions

In [43]:
# a,b,c,d,e,f, THINR, TLINR, TELNR =ratios_of_transition_NS(SAL)
# TELNR[mask_over_voices]

### 3.4.3.5 Crescendo, Decresendo (CD) statistics

In [44]:
# D = np.zeros((CD.shape))
# D[np.where(CD=='d')] = 1
# D

In [45]:
# C = np.zeros((CD.shape))
# C[np.where(CD=='c')] = 1
# C

In [46]:
def CD_seq_finder(ar):
    s2 = np.array([[0]*ar.shape[0]]).reshape((ar.shape[0],1))
    D = np.append(ar, s2,axis=1)
    D = np.append(s2,D,axis=1)
    D_spec_diff = D[:,:-1]-D[:,1:]

    number_arr = np.array([])
    len_arr = np.zeros((D.shape))-1

    first_round = True
    for ind, track in enumerate(D_spec_diff):


        end_seq = np.where(track==1)[0]
        start_seq = np.where(track==-1)[0]

        number = len(start_seq)
        number_arr = np.append(number_arr, number)
        
        len_seq = end_seq - start_seq + 1 # error corrected at 11102022
        len_arr[ind, :number] = len_seq
    
    return len_arr, number_arr

In [47]:
# len_seq_C, number_seq_C = CD_seq_finder(ar=C)
# print(number_seq_C[mask_over_voices])
# Cmean2, Cstd2, Cskew2, Ckurt2, Cmax2, Cmin2 = six_statistics(len_seq_C)

# len_seq_D, number_seq_D = CD_seq_finder(ar=D)
# Dmean2, Dstd2, Dskew2, Dkurt2, Dmax2, Dmin2 = six_statistics(len_seq_D)
# print(number_seq_D[mask_over_voices])


## 3.4.4 Rhythmic Features

### 3.4.4.1 Note Duration (ND) statistics

In [48]:
# ndmean2, ndstd2, ndskew2, ndkurt2, ndmax2, ndmin2  = six_statistics(nd)

### 3.4.4.2 Note Duration Distribution

In [49]:
# SNR, MLNR, LNR = note_intensity_distribution(note_ar=nd, mean=ndmean2, std=ndstd2)

### 3.4.4.3 Note Duration Distribution per Second -> Not in features.md

### 3.4.4.4 Ratios of Note Duration transitions

In [50]:
# a,b,c,d,e,f, TLNR,TSNR,TELNR = ratios_of_transition_NS(nd, True)

## 3.4.5 Musical Texture Features -> Based on F0 estimates not MIDI

### 3.4.5.1  Musical Layers (ML) statistics

### 3.4.5.2 Musical Layers Distribution (MLD)

### 3.4.5.3  Ratio of Musical Layers Transitions (RMLT)

## 3.4.6 Expressivity Features

### 3.4.6.1 Articulation Features

Staccato (short, strong notes), Legato (smoothly connected notes). The program MuseScore for MIDI visualization and playing is setting the features by its own rules, so it is not written down in the messages (https://musescore.org/en/node/323749). One possibility to get legato: midi message control_change control=68, value<=63 (turns legato off) else on (https://anotherproducer.com/online-tools-for-musicians/midi-cc-list/). However, MuseScore is connecting notes without this control change. 

In [51]:
# Algorithm 1:
def articulation_detection(time_passed_ar, pause_collector, nd, output_1D=False):

    time_passed_ar_mod = np.copy(time_passed_ar)
    time_passed_ar_mod[np.where(time_passed_ar_mod==-1)] = float('nan')
    IOI = (time_passed_ar_mod[:,1:]-time_passed_ar_mod[:,:-1]) # nd + next_note_start.time = this_note_end + next_note_start.time
    IOI[np.where(IOI==0)] = 0.0000000000000000000000001 # so we get value 0 for ratio when / IOI instead of nan
    INS = np.copy(pause_collector[:,1:]) # pause at start not interesting here but the pauses BETWEEN notes
    INS[np.where(INS==-1)] = float('nan')
    ratio = INS/IOI

    # --> We ignore the nd's the last note because we look here always to the relation between two nodes
    # such that the ratio array becomes smaller than the nd array

    art = (np.zeros((ratio.shape))+float('nan'))

    #for ind, track in enumerate(art):
    art[np.where(INS <= 0.48)] = 1 # Legato
    art[np.where((ratio>=0.25)&(ratio<=0.75)&(nd[:,:-1]<=24))] = 2  # Staccato # this staccto fits to the first
    # appearing staccato in MuseScore
    art[np.where((art!=1)&(art!=2)&(~np.isnan(ratio)))] = 0 # Rest # error removed 12102022
    
    if output_1D == True: # Make art 1D
    
        art = art.flatten()
        art = art[np.where(~np.isnan(art))]

    return art

#### 3.4.6.1.1 Ratios

In [52]:
def art_ratios(ar, nd):
    SR, LR, OTR = np.array([]), np.array([]), np.array([])
    SNDR, LNDR, OTNDR =  np.array([]), np.array([]), np.array([])
    nd_S = np.zeros((nd.shape)) -1 #+ float('nan')
    nd_L, nd_OT = np.copy(nd_S), np.copy(nd_S)
    track_ind = 0
    
    for line, nd_line in zip(ar, nd):
        
        nd_line_mod = nd_line[np.where(nd_line!=-1)]
        line_mod = line[~np.isnan(line)]
        
        ind_legato = np.where(line_mod==1)
        number_legato = len(ind_legato[0])
        LNDR_single = np.sum(nd_line_mod[ind_legato])
        nd_L[track_ind, ind_legato[0]] = nd_line_mod[ind_legato]

        ind_staccato = np.where(line_mod==2)
        number_staccato = len(ind_staccato[0])
        SNDR_single = np.sum(nd_line_mod[ind_staccato])
        nd_S[track_ind, ind_staccato[0]] = nd_line_mod[ind_staccato]
        
        ind_rest = np.where(line_mod==0)
        number_rest = len(ind_rest[0])
        OTNDR_single = np.sum(nd_line_mod[ind_rest])
        nd_OT[track_ind, ind_rest[0]] = nd_line_mod[ind_rest]
        
        all_notes = len(line_mod)
        whole_duration = np.sum(nd_line_mod)
        
        if all_notes == 0:
            all_notes=1 # because below we don't want to devide by 0:
            whole_duration = 1
            
        SR = np.append(SR, number_staccato/all_notes)
        LR = np.append(LR, number_legato/all_notes)
        OTR = np.append(OTR, number_rest/all_notes)
        
        SNDR = np.append(SNDR, SNDR_single/whole_duration)
        LNDR = np.append(LNDR, LNDR_single/whole_duration)
        OTNDR = np.append(OTNDR, OTNDR_single/whole_duration)        
        
        track_ind += 1 
        
    return SR, LR, OTR, SNDR, LNDR, OTNDR, nd_S, nd_L, nd_OT

**-> Staccato Ratio (SR)**

In [53]:
# SR, LR, OTR, SNDR, LNDR, OTNDR, nd_S, nd_L, nd_OT = art_ratios()

**-> Legato Ratio (LR)**

**-> Other Transistions Ratio (OTR)**

#### 3.4.6.1.2 Note ratios statistics

**-> Staccato Notes Duration Ratio (SNDR) statistics**

In [54]:
# nd_Smean2, nd_Sstd2, nd_Sskew2, nd_Skurt2, nd_Smax2, nd_Smin2 = six_statistics(nd_S)

**-> Legato Notes Duration Ratio (LNDR) statistics**

In [55]:
# nd_Lmean2, nd_Lstd2, nd_Lskew2, nd_Lkurt2, nd_Lmax2, nd_Lmin2 = six_statistics(nd_L)

**-> Other Transition Notes Duration Ratio (OTNDR) statistics**

In [56]:
# nd_OTmean2, nd_OTstd2, nd_OTskew2, nd_OTkurt2, nd_OTmax2, nd_OTmin2 = six_statistics(nd_OT)

### 3.4.6.2 Glissando Features -> f0 estimates, MIDI files not directly usable

About using midi: Die exakte Notation eines Glissandos nur durch Noten ist nicht möglich, da ein ideales Glissando eine stetige Veränderung der Tonfrequenz ist, Noten jedoch diskrete Tonhöhen bezeichnen (https://de.wikipedia.org/wiki/Glissando). 

Imitate glissando with help of pitch wheel: https://music.tutsplus.com/tutorials/imitate-guitar-techniques-with-midi-part-3-glissando-1-octave--audio-7299

#### 3.4.6.2.1 Glissando Presence (GP)

#### 3.4.6.2.2 Glissando Extent (GE) statistics

#### 3.4.6.2.3 Glissando Duration (GD) and Glissando Slope (GS)

#### 3.4.6.2.4 Glissando Coverage (GC)

#### 3.4.6.2.5 Glissando Direction (GDIR)

#### 3.4.6.2.6 Glissando to Non-Glissando Ratio (GNGR)

### 3.4.6.3 Vibrato and Tremolo Features -> MIDI not fitting

Reason MIDI not suited for that task: "Unfortunately, the general MIDI implementation of modulation has one parameter which corresponds somewhat to width of the vibrato.  It is up to the synthesizer playing the file to create the vibrato.  Unfortunately, the modulation MIDI event is limited in type of vibrato it can create.  It has a limited range of control." (http://www-classes.usc.edu/engr/ise/599muscog/2004/projects/yang/)

tremolo: like vibrato but regarding to change in amplitude and **using pitch saliences of each note** instead of f0 variation. However, we only have two velocity/intensity/salience values per note which change in c.mid always in same way.

#### 3.4.6.3.1 Vibrato Presence (VP)

#### 3.4.6.3.2 Vibrato Rate (VR) statistics

#### 3.4.6.3.3 Vibrato Coverage (VC)

#### 3.4.6.3.4 High-Frequency Virbrato Coverage (HFVC) 

#### 3.4.6.3.5 Vibrato to Non-Vibrato Ratio (VNVR)

#### 3.4.6.3.6 Vibrato Notes Base Frequency (VNBF) statistics

## 3.4.7 Voice Analysis Toolbox (VAT) Features

### 3.4.7.1 All features of 3.4.... only using single voice instead of instruments --> Makes no sense for midi without singing voice

<b><p style="color:red;">Make the steps again only with other audio file (only singing voice). Separate the voice from the mixed file. Panda et al. 2018 used Fan et al. approach, I for myself downloaded an other one which worked out well in samples.</p></b>

### 3.4.7.2 Voice quality toolkit analysis directly from audio file -> not MIDI related

# xml2csv convertion

In [57]:
def getting_feature_values_one_sample_xml(ele, one_row,first_file_loop,cols):
    for i in ele.findall('feature'): # go through each feature per sample

                value_one_feat = [float(i.text.replace(",",".")) for i in i.findall("v")]

                if len(value_one_feat) == 1:
                    value_one_feat = value_one_feat[0]
                elif len(value_one_feat) == 0:
                    value_one_feat = -1 

                name = i.find("name").text

                if first_file_loop == True:
                    cols.append(name)

                one_row[f"{name}"] = value_one_feat
    return one_row, cols

# Putting Feature extraction from Panda et al 2018 and xml2csv convertion together

In [58]:
def feature_extraction(aim_storage_path_file,xml_path_file,midi_path, source_id): 
    '''aim_storage_path_file,xml_path_file: path + file name;
    midi_path: only path of stored midi files;
    source_id: kind of label telling us how the beloning midi file was generated'''


    try:
        aimed_directory = os.mkdir(aim_storage_path_file.split('/')[0])
    except:
        aimed_directory = aim_storage_path_file.split('/')[0]

    cols = ["sample_id","source_id"]
    rows = []
    first_file_loop = True

    xmlparse = Xet.parse(xml_path_file) 
    root = xmlparse.getroot()

    if 'feature_vector_file' in str(root): # sort of file saving features
        base = root.findall("data_set") # sample list

        for ind_base, ele in enumerate(base): # go through each sample
            file_path = base[ind_base].find("data_set_id").text
            file_name = file_path.split('/')[-1].split('.')[0]
            print('current file:', file_name)

            ###################################################################
            
            # xml2csv convertion of jSymboic features:

            one_row,cols = getting_feature_values_one_sample_xml(ele, {"sample_id": file_name, "source_id": source_id},first_file_loop,cols)
            # fill this row with all features of our current sample 

            print(f'done: convertion jSymbolic xml feature file into csv for {file_name}')
            
            ###################################################################
            
            # Extracting the Panda et al features and saving them in the final csv dictionary
            # together with the jSymbolic features:

            # basic tools for feature extraction:

            pause_collector, nd, note_collector, note_collector_all,SAL,CD,time_passed_ar, nd_short, _ = bie.basic_tools_panda(midi_path, file_name)
            mask = bie.channel_mask(note_collector) # mask to get only relevant channels/voices


            # feature: Statistics which are not covered by jSymbolic and refering to ALL note of the whole sample:

            note_entries_ind = np.where(note_collector_all!=-1)
            note_collector_all_filtered = note_collector_all[note_entries_ind]
            MIDImax = np.max(note_collector_all_filtered)
            MIDImin = np.min(note_collector_all_filtered)

            one_row['MIDImax'] = MIDImax
            one_row['MIDImin'] = MIDImin

            # feature: Register Distribution:

            RDsoprano = RD_1D(72,96,note_collector_all)
            RDmezzosoprano = RD_1D(69,93,note_collector_all)
            RDcontralto = RD_1D(65,88,note_collector_all)
            RDtenor = RD_1D(59,81,note_collector_all)
            RDbaritone = RD_1D(55,77,note_collector_all)
            RDbass = RD_1D(52,76,note_collector_all)

            one_row = create_feature_names(RDsoprano, 'RDsoprano', one_row)
            one_row = create_feature_names(RDmezzosoprano, 'RDmezzosoprano', one_row)
            one_row = create_feature_names(RDcontralto, 'RDcontralto', one_row)
            one_row = create_feature_names(RDtenor, 'RDtenor', one_row)
            one_row = create_feature_names(RDbaritone, 'RDbaritone', one_row)
            one_row = create_feature_names(RDbass, 'RDbass', one_row)    


            # feature: Ratio of Pitch Transitions +  Note smoothness statistics:

            a,b,c,d,e,f, THPNR, TLPNR, TEPNR = ratios_of_transition_NS_1D(note_collector,True)

            one_row = create_feature_names(THPNR, 'THPNR', one_row)
            one_row = create_feature_names(TLPNR, 'TLPNR', one_row)
            one_row = create_feature_names(TEPNR, 'TEPNR', one_row)

            NSmean, NSstd, NSskew, NSkurt, NSmax, NSmin, a,b,c = ratios_of_transition_NS_1D(note_collector,False)

            one_row = create_feature_names(NSmean, 'NSmean', one_row)
            one_row = create_feature_names(NSstd, 'NSstd', one_row)
            one_row = create_feature_names(NSskew, 'NSskew', one_row)
            one_row = create_feature_names(NSkurt, 'NSkurt', one_row)
            one_row = create_feature_names(NSmax, 'NSmax', one_row)
            one_row = create_feature_names(NSmin, 'NSmin', one_row) 


            # feature: Note intensity statistics:

            SALmean2, SALstd2, SALskew2, SALkurt2, SALmax2, SALmin2 = six_statistics(SAL,True)

            one_row = create_feature_names(SALmean2, 'SALmean', one_row)
            one_row = create_feature_names(SALstd2, 'SALstd', one_row)
            one_row = create_feature_names(SALskew2, 'SALskew', one_row)
            one_row = create_feature_names(SALkurt2, 'SALkurt', one_row)
            one_row = create_feature_names(SALmax2, 'SALmax', one_row)
            one_row = create_feature_names(SALmin2, 'SALmin', one_row) 


            # feature: Note intensity distribution:

            LINR, MINR, HINR = note_intensity_distribution(SAL, SALmean2, SALstd2,True)

            one_row = create_feature_names(LINR, 'LINR', one_row)
            one_row = create_feature_names(MINR, 'MINR', one_row)
            one_row = create_feature_names(HINR, 'HINR', one_row)


            # feature: Ratios of Note Intensity Transitions:

            a,b,c,d,e,f, THINR, TLINR, TELNR = ratios_of_transition_NS_1D(SAL, True)

            one_row = create_feature_names(THINR, 'THINR', one_row)
            one_row = create_feature_names(TLINR, 'TLINR', one_row)
            one_row = create_feature_names(TELNR, 'TELNR', one_row)


            # feature: Crescendo and Decrescendo + their statistics:


            C = np.zeros((CD.shape))
            C[np.where(CD=='c')] = 1
            len_seq_C, number_seq_C = CD_seq_finder(ar=C) 
            number_seq_C_1D = np.sum(number_seq_C)
            Cmean2, Cstd2, Cskew2, Ckurt2, Cmax2, Cmin2 = six_statistics(len_seq_C,True)

            D = np.zeros((CD.shape))
            D[np.where(CD=='d')] = 1
            len_seq_D, number_seq_D = CD_seq_finder(ar=D)
            number_seq_D_1D = np.sum(number_seq_D)
            Dmean2, Dstd2, Dskew2, Dkurt2, Dmax2, Dmin2 = six_statistics(len_seq_D,True)

            one_row = create_feature_names(number_seq_C_1D, 'number_cresendo_notes', one_row)
            one_row = create_feature_names(Cmean2, 'Cmean', one_row)
            one_row = create_feature_names(Cstd2, 'Cstd', one_row)
            one_row = create_feature_names(Cskew2, 'Cskew', one_row)
            one_row = create_feature_names(Ckurt2, 'Ckurt', one_row)
            one_row = create_feature_names(Cmax2, 'Cmax', one_row)
            one_row = create_feature_names(Cmin2, 'Cmin', one_row) 

            one_row = create_feature_names(number_seq_D_1D, 'number_decresendo_notes', one_row)

            one_row = create_feature_names(Dmean2, 'Dmean', one_row)
            one_row = create_feature_names(Dstd2, 'Dstd', one_row)
            one_row = create_feature_names(Dskew2, 'Dskew', one_row)
            one_row = create_feature_names(Dkurt2, 'Dkurt', one_row)
            one_row = create_feature_names(Dmax2, 'Dmax', one_row)
            one_row = create_feature_names(Dmin2, 'Dmin', one_row) 


            # feature: Note Duration statistics:

            ndmean2, ndstd2, ndskew2, ndkurt2, ndmax2, ndmin2  = six_statistics(nd,True)
            '''
            one_row = create_feature_names(ndmean2, 'ndmean', one_row)
            one_row = create_feature_names(ndstd2, 'ndstd', one_row)
            one_row = create_feature_names(ndskew2, 'ndskew', one_row)
            one_row = create_feature_names(ndkurt2, 'ndkurt', one_row)
            one_row = create_feature_names(ndmax2, 'ndmax', one_row)
            one_row = create_feature_names(ndmin2, 'ndmin', one_row) 
            '''

            # feature: note duration distribution:

            SNR, MLNR, LNR = note_intensity_distribution(nd, ndmean2, ndstd2,True)

            one_row = create_feature_names(SNR, 'SNR', one_row)
            one_row = create_feature_names(MLNR, 'MLNR', one_row)
            one_row = create_feature_names(LNR, 'LNR', one_row)


            # featue: ratios of note duration transitions:
            a,b,c,d,e,f, TLNR,TSNR,TELNR = ratios_of_transition_NS_1D(nd, True)

            one_row = create_feature_names(TLNR, 'TLNR', one_row)
            one_row = create_feature_names(TSNR, 'TSNR', one_row)
            one_row = create_feature_names(TELNR, 'TELNR', one_row)


            # feature: Staccato Ratio, Legato Ratio, Other Transitions Ratio + their note durations
            # their statistics:

            art = articulation_detection(time_passed_ar,pause_collector,nd_short,True)

            #art = articulation_detection(time_passed_ar,pause_collector,nd,True)
            nd_flat = nd.flatten()
            nd_flat = (nd_flat[np.where(nd_flat!=-1)])[:-1] # make it shorter because it will deal
            # with the art array which looks at the relation of note pairs and is therefore one note 
            # shorter
            nd_flat = nd_flat.reshape((1,len(nd_flat)))
            art = art.reshape((1,len(art)))
            
            SR, LR, OTR, SNDR, LNDR, OTNDR, nd_S, nd_L, nd_OT = art_ratios(art, nd_flat)

            nd_Smean2, nd_Sstd2, nd_Sskew2, nd_Skurt2, nd_Smax2, nd_Smin2 = six_statistics(nd_S,True)
            nd_Lmean2, nd_Lstd2, nd_Lskew2, nd_Lkurt2, nd_Lmax2, nd_Lmin2 = six_statistics(nd_L,True)
            nd_OTmean2, nd_OTstd2, nd_OTskew2, nd_OTkurt2, nd_OTmax2, nd_OTmin2 = six_statistics(nd_OT,True)

            one_row = create_feature_names(SR, 'SR', one_row)
            one_row = create_feature_names(LR, 'LR', one_row)
            one_row = create_feature_names(OTR, 'OTR', one_row)
            one_row = create_feature_names(SNDR, 'SNDR', one_row)
            one_row = create_feature_names(LNDR, 'LNDR', one_row)
            one_row = create_feature_names(OTNDR, 'OTNDR', one_row)               

            one_row = create_feature_names(nd_Smean2, 'nd_Smean', one_row)
            one_row = create_feature_names(nd_Sstd2, 'nd_Sstd', one_row)
            one_row = create_feature_names(nd_Sskew2, 'nd_Sskew', one_row)
            one_row = create_feature_names(nd_Skurt2, 'nd_Skurt', one_row)
            one_row = create_feature_names(nd_Smax2, 'nd_Smax', one_row)
            one_row = create_feature_names(nd_Smin2, 'nd_Smin', one_row)             

            one_row = create_feature_names(nd_Lmean2, 'nd_Lmean', one_row)
            one_row = create_feature_names(nd_Lstd2, 'nd_Lstd', one_row)
            one_row = create_feature_names(nd_Lskew2, 'nd_Lskew', one_row)
            one_row = create_feature_names(nd_Lkurt2, 'nd_Lkurt', one_row)
            one_row = create_feature_names(nd_Lmax2, 'nd_Lmax', one_row)
            one_row = create_feature_names(nd_Lmin2, 'nd_Lmin', one_row)             

            one_row = create_feature_names(nd_OTmean2, 'nd_OTmean', one_row)
            one_row = create_feature_names(nd_OTstd2, 'nd_OTstd', one_row)
            one_row = create_feature_names(nd_OTskew2, 'nd_OTskew', one_row)
            one_row = create_feature_names(nd_OTkurt2, 'nd_OTkurt', one_row)
            one_row = create_feature_names(nd_OTmax2, 'nd_OTmax', one_row)
            one_row = create_feature_names(nd_OTmin2, 'nd_OTmin', one_row)             

            print(f'done: getting features out of midi with instructions of Panda et al for {file_name}\n')

            ########################################################################
            # Ending with one sample by extending all rows by its dictionary row:
            rows.append(one_row) # all rows together appended by one sample
            first_file_loop = False
         #   except: 
          #      with open(f'{aimed_directory}/failed_xml2csv_file_names','a') as f:
           #         f.write(file_name+'\n')
                    # here should be especially files which are not human generated

    cols = list(one_row.keys())

    df = pd.DataFrame(rows, columns=cols)#.set_index("name")
    df = df.replace(float('nan'),-1)

    # Writing dataframe to csv
    df.to_csv(aim_storage_path_file) #f'{aimed_directory}/features_Panda_jSymbolic.csv') 
    print('done: whole process')
    #except: # xml contains error such that convertion not possible
    #with open(f'{aimed_directory}/failed_xml2csv_file_names','a') as f:
     #   f.write(file_name+'\n')
        
    return df

<b>
 1. Replace corrupted Internet midi files (human generator)<br>
    Some files are really corrupted, others are simply not opening with the jSymbolic xml-file generator.<br>
 2. Rename Internet midi files (human generator) to Folder_name + 'accompaniment.mid' to get the same file names as for the audio2midi generator midi files<br>
 3. Open the jSymbolic program and generate a xml-file which will give us the variable value 'xml_path_file'.<br>
 4. Extracting features out of midi files 'midi_path' and creating csv-files.<br>
 5. After generating two .csv-files (one human generated, one audio2midi generated) keep the duplicate file_names ('sample_id') (so we get dataset B and B2) and drop the ONLY audio2midi generated samples (dataset A)
</b>
 

2. Renaming the files done by humans:

In [59]:
'''
import glob
import os
path_folder_names = '../midi-files-filtered2_Mvtausgetauscht_fehlerhafteFilesErsetzt_mitFoldern_14152022/*'
for single_folder_path in glob.glob(path_folder_names):
    try:
        folder_name = single_folder_path.split('/')[-1]
        file_path = glob.glob(single_folder_path + '/*.mid')[0]
        new_name_path = single_folder_path  + '_accompaniment.mid'
        os.rename(file_path,new_name_path)
        os.rmdir(single_folder_path)
    except: 
        pass
print('done')
'''

"\nimport glob\nimport os\npath_folder_names = '../midi-files-filtered2_Mvtausgetauscht_fehlerhafteFilesErsetzt_mitFoldern_14152022/*'\nfor single_folder_path in glob.glob(path_folder_names):\n    try:\n        folder_name = single_folder_path.split('/')[-1]\n        file_path = glob.glob(single_folder_path + '/*.mid')[0]\n        new_name_path = single_folder_path  + '_accompaniment.mid'\n        os.rename(file_path,new_name_path)\n        os.rmdir(single_folder_path)\n    except: \n        pass\nprint('done')\n"

3. <br>
Observation: the audio2midi generated midi files have only few content because jSymbolic needs only seconds to convert all of them (even with the files which are not existing as human generator midi files) into xml-files. However, for the human generated midi files the jSymbolic program runs out of storage. Nevertheless, that could happened because of bugs in the human generated midi files. A handful of midi files had to be replaced. Else the jSymbolic program could not extract features out of them. That was also the case for some other working midi files.


4.  Extracting features out of midi files 'midi_path' and creating csv-files.

In [60]:
aim_folder = 'resulting_symbolic_dataframes' 
os.makedirs(aim_folder, exist_ok = True)
title_start = 'symbolic_dataframe'

midi_path_start = '../../stage1_data_collecting_phase/'

storage_folder_xml = 'GeneratedXml_jSymbolic'
title_start_xml = 'jSymbolic'

--> audio2midi converter Wang:

In [61]:
#df_wang = feature_extraction(aim_storage_path_file=f'{aim_folder}/{title_start}_audio2midiWang.csv',xml_path_file=f'{storage_folder_xml}/{title_start_xml}_audio2midiWang.xml',midi_path=midi_path_start+'audio2midi_converter/audio2midi_Wang/GeneratedMIDI_Wang', source_id='audio2midi generator')
#df_wang

--> audio2midi generator with merged single voices:

In [62]:
#df_merged = feature_extraction(aim_storage_path_file=f'{aim_folder}/{title_start}_audio2midiWang_mergedMIDIs.csv',xml_path_file=f'{storage_folder_xml}/{title_start_xml}_audio2midiWang_mergedMIDIs.xml',midi_path=midi_path_start+'audio2midi_converter/audio2midi_Wang/GeneratedFusionedMIDI_afterInstrumentalSplit', source_id='audio2midi generator merged')
#df_merged

--> human generator full songs:

In [63]:
#df_human = feature_extraction(aim_storage_path_file=f'{aim_folder}/{title_start}_human_fullSongs.csv',xml_path_file=f'{storage_folder_xml}/{title_start_xml}_human_fullSongs.xml',midi_path= midi_path_start+'webscraping/CollectedMIDI_webscraping_SolvedErrorsOfMvtAndCorruptedMIDI_RemovedFolderStructure', source_id='human generator')
#df_human

--> human generator with cut out snippets:

In [64]:
#df_snippet = feature_extraction(aim_storage_path_file=f'{aim_folder}/{title_start}_human_snippets.csv',xml_path_file=f'{storage_folder_xml}/{title_start_xml}_human_snippets.xml',midi_path=midi_path_start+'webscraping/MIDISnippetGeneration/GeneratedMIDISnippets_webscraping', source_id='human generator snippet')
#df_snippet

--> audio2midi converter basic-pitch

In [65]:
#df_basic_pitch = feature_extraction(aim_storage_path_file=f'{aim_folder}/{title_start}_audio2midiBasicPitch.csv',xml_path_file=f'{storage_folder_xml}/{title_start_xml}_human_audio2midiBasicPitch.xml',midi_path=midi_path_start+'audio2midi_converter/audio2midi_BasicPitch/GeneratedMIDI_BasicPitch', source_id='basic-pitch generator')
#df_basic_pitch

current file: P_Rihanna_Umbrell_accompaniment
done: convertion jSymbolic xml feature file into csv for P_Rihanna_Umbrell_accompaniment
information extraction (and midi snippet generation) done
done: getting features out of midi with instructions of Panda et al for P_Rihanna_Umbrell_accompaniment

current file: K_Tschaikowski_Schwane_accompaniment
done: convertion jSymbolic xml feature file into csv for K_Tschaikowski_Schwane_accompaniment


  if __name__ == "__main__":
  if __name__ == "__main__":


FileNotFoundError: [Errno 2] No such file or directory: '../../stage1_data_collecting_phase/audio2midi_converter/audio2midi_BasicPitch/GeneratedMIDI_BasicPitch/K_Tschaikowski_Schwane_accompaniment.mid'