# Position Classification

The second part of this work tries to predict the vertical staff line position of an object.

## Training
We'll start off by preparing our dataset. Since the input for the neural network will only be a small patch of the entire input image, we have to prepare our training data similarly. The parameters are as follows:

- `group_by`: You can perform classification by staff_position or by class_name. Since the class name is already provided by the object detector, we are only interested in staff_position here
- `ignore_classes_without_semantic_staff_position`: Some classes do not have a staff-position, such as a barline, a beam or time signatures. They are just in the scores. To not confuse our classifier, we can exclude those symbols from classification with this parameter
- `add_padding_to_force_center`: The position classification network works best, if the object, that should be classified always appears in the center of the image. When the object is located at the very border of the image, this problematic. With this option, you can enable synthetic centering, by adding some padding to the borders, so the object of interest always appears in the center. The image is padded with a reflection of the image along the corresponding axis.
- `width` the fixed width of the image crop, that is used for classifciation
- `height` the fixed height of the image crop, that is used for classifciation

In [1]:
%run extract_sub_image_for_classification.py --group_by staff_position --ignore_classes_without_semantic_staff_position --add_padding_to_force_center --width 224 --height 448

Padding images to force objects to appear in the center


Extracting sub-images for each annotated symbol: 100%|██████████| 60/60 [01:53<00:00,  1.90s/it]


Next, we split our entire dataset into three disjunct datasets for training, validation and testing by running the following command. It will keep 80% of each class for training, 10% for validation and 10% for final testing.

In [2]:
%run dataset_splitter.py

Deleting split directories... 
Splitting data into training, validation and test sets...
Copying 21 training files of L0...
Copying 2 validation files of L0...
Copying 2 test files of L0...
Copying 184 training files of L1...
Copying 22 validation files of L1...
Copying 22 test files of L1...
Copying 849 training files of L2...
Copying 106 validation files of L2...
Copying 106 test files of L2...
Copying 1651 training files of L3...
Copying 206 validation files of L3...
Copying 206 test files of L3...
Copying 2296 training files of L4...
Copying 286 validation files of L4...
Copying 286 test files of L4...
Copying 1039 training files of L5...
Copying 129 validation files of L5...
Copying 129 test files of L5...
Copying 99 training files of L6...
Copying 12 validation files of L6...
Copying 12 test files of L6...
Copying 98 training files of S0...
Copying 12 validation files of S0...
Copying 12 test files of S0...
Copying 208 training files of S1...
Copying 26 validation files of S1...


The last step is now to actually run the training on the provided dataset:
- `model_name` the name of the model to train. 
- `width` and `height` the fixed dimensions of the images, we've create above

In [1]:
%run train_model.py --model_name inception_resnet_v2 --width 224 --height 448

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


Loading configuration and data-readers...
Found 10610 images belonging to 14 classes.
Found 1318 images belonging to 14 classes.
Found 1318 images belonging to 14 classes.
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 448, 224, 3)  0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 223, 111, 32) 864         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 223, 111, 32) 96          conv2d_1[0][0]                   
__________________________________________________________________________________________________
activation_1 (Activation)       (Non

activation_12 (Activation)      (None, 53, 25, 64)   0           batch_normalization_12[0][0]     
__________________________________________________________________________________________________
mixed_5b (Concatenate)          (None, 53, 25, 320)  0           activation_6[0][0]               
                                                                 activation_8[0][0]               
                                                                 activation_11[0][0]              
                                                                 activation_12[0][0]              
__________________________________________________________________________________________________
conv2d_16 (Conv2D)              (None, 53, 25, 32)   10240       mixed_5b[0][0]                   
__________________________________________________________________________________________________
batch_normalization_16 (BatchNo (None, 53, 25, 32)   96          conv2d_16[0][0]                  
__________

batch_normalization_24 (BatchNo (None, 53, 25, 64)   192         conv2d_24[0][0]                  
__________________________________________________________________________________________________
activation_19 (Activation)      (None, 53, 25, 32)   0           batch_normalization_19[0][0]     
__________________________________________________________________________________________________
activation_21 (Activation)      (None, 53, 25, 32)   0           batch_normalization_21[0][0]     
__________________________________________________________________________________________________
activation_24 (Activation)      (None, 53, 25, 64)   0           batch_normalization_24[0][0]     
__________________________________________________________________________________________________
block35_2_mixed (Concatenate)   (None, 53, 25, 128)  0           activation_19[0][0]              
                                                                 activation_21[0][0]              
          

activation_35 (Activation)      (None, 53, 25, 48)   0           batch_normalization_35[0][0]     
__________________________________________________________________________________________________
conv2d_31 (Conv2D)              (None, 53, 25, 32)   10240       block35_3_ac[0][0]               
__________________________________________________________________________________________________
conv2d_33 (Conv2D)              (None, 53, 25, 32)   9216        activation_32[0][0]              
__________________________________________________________________________________________________
conv2d_36 (Conv2D)              (None, 53, 25, 64)   27648       activation_35[0][0]              
__________________________________________________________________________________________________
batch_normalization_31 (BatchNo (None, 53, 25, 32)   96          conv2d_31[0][0]                  
__________________________________________________________________________________________________
batch_norm

activation_46 (Activation)      (None, 53, 25, 32)   0           batch_normalization_46[0][0]     
__________________________________________________________________________________________________
conv2d_44 (Conv2D)              (None, 53, 25, 32)   10240       block35_5_ac[0][0]               
__________________________________________________________________________________________________
conv2d_47 (Conv2D)              (None, 53, 25, 48)   13824       activation_46[0][0]              
__________________________________________________________________________________________________
batch_normalization_44 (BatchNo (None, 53, 25, 32)   96          conv2d_44[0][0]                  
__________________________________________________________________________________________________
batch_normalization_47 (BatchNo (None, 53, 25, 48)   144         conv2d_47[0][0]                  
__________________________________________________________________________________________________
activation

__________________________________________________________________________________________________
block35_7_conv (Conv2D)         (None, 53, 25, 320)  41280       block35_7_mixed[0][0]            
__________________________________________________________________________________________________
block35_7 (Lambda)              (None, 53, 25, 320)  0           block35_6_ac[0][0]               
                                                                 block35_7_conv[0][0]             
__________________________________________________________________________________________________
block35_7_ac (Activation)       (None, 53, 25, 320)  0           block35_7[0][0]                  
__________________________________________________________________________________________________
conv2d_58 (Conv2D)              (None, 53, 25, 32)   10240       block35_7_ac[0][0]               
__________________________________________________________________________________________________
batch_norm

__________________________________________________________________________________________________
batch_normalization_66 (BatchNo (None, 53, 25, 64)   192         conv2d_66[0][0]                  
__________________________________________________________________________________________________
activation_61 (Activation)      (None, 53, 25, 32)   0           batch_normalization_61[0][0]     
__________________________________________________________________________________________________
activation_63 (Activation)      (None, 53, 25, 32)   0           batch_normalization_63[0][0]     
__________________________________________________________________________________________________
activation_66 (Activation)      (None, 53, 25, 64)   0           batch_normalization_66[0][0]     
__________________________________________________________________________________________________
block35_9_mixed (Concatenate)   (None, 53, 25, 128)  0           activation_61[0][0]              
          

__________________________________________________________________________________________________
batch_normalization_73 (BatchNo (None, 26, 12, 384)  1152        conv2d_73[0][0]                  
__________________________________________________________________________________________________
batch_normalization_76 (BatchNo (None, 26, 12, 384)  1152        conv2d_76[0][0]                  
__________________________________________________________________________________________________
activation_73 (Activation)      (None, 26, 12, 384)  0           batch_normalization_73[0][0]     
__________________________________________________________________________________________________
activation_76 (Activation)      (None, 26, 12, 384)  0           batch_normalization_76[0][0]     
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 26, 12, 320)  0           block35_10_ac[0][0]              
__________

__________________________________________________________________________________________________
conv2d_86 (Conv2D)              (None, 26, 12, 128)  139264      block17_2_ac[0][0]               
__________________________________________________________________________________________________
batch_normalization_86 (BatchNo (None, 26, 12, 128)  384         conv2d_86[0][0]                  
__________________________________________________________________________________________________
activation_86 (Activation)      (None, 26, 12, 128)  0           batch_normalization_86[0][0]     
__________________________________________________________________________________________________
conv2d_87 (Conv2D)              (None, 26, 12, 160)  143360      activation_86[0][0]              
__________________________________________________________________________________________________
batch_normalization_87 (BatchNo (None, 26, 12, 160)  480         conv2d_87[0][0]                  
__________

__________________________________________________________________________________________________
conv2d_96 (Conv2D)              (None, 26, 12, 192)  215040      activation_95[0][0]              
__________________________________________________________________________________________________
batch_normalization_93 (BatchNo (None, 26, 12, 192)  576         conv2d_93[0][0]                  
__________________________________________________________________________________________________
batch_normalization_96 (BatchNo (None, 26, 12, 192)  576         conv2d_96[0][0]                  
__________________________________________________________________________________________________
activation_93 (Activation)      (None, 26, 12, 192)  0           batch_normalization_93[0][0]     
__________________________________________________________________________________________________
activation_96 (Activation)      (None, 26, 12, 192)  0           batch_normalization_96[0][0]     
__________

block17_7_conv (Conv2D)         (None, 26, 12, 1088) 418880      block17_7_mixed[0][0]            
__________________________________________________________________________________________________
block17_7 (Lambda)              (None, 26, 12, 1088) 0           block17_6_ac[0][0]               
                                                                 block17_7_conv[0][0]             
__________________________________________________________________________________________________
block17_7_ac (Activation)       (None, 26, 12, 1088) 0           block17_7[0][0]                  
__________________________________________________________________________________________________
conv2d_106 (Conv2D)             (None, 26, 12, 128)  139264      block17_7_ac[0][0]               
__________________________________________________________________________________________________
batch_normalization_106 (BatchN (None, 26, 12, 128)  384         conv2d_106[0][0]                 
__________

__________________________________________________________________________________________________
batch_normalization_115 (BatchN (None, 26, 12, 160)  480         conv2d_115[0][0]                 
__________________________________________________________________________________________________
activation_115 (Activation)     (None, 26, 12, 160)  0           batch_normalization_115[0][0]    
__________________________________________________________________________________________________
conv2d_113 (Conv2D)             (None, 26, 12, 192)  208896      block17_9_ac[0][0]               
__________________________________________________________________________________________________
conv2d_116 (Conv2D)             (None, 26, 12, 192)  215040      activation_115[0][0]             
__________________________________________________________________________________________________
batch_normalization_113 (BatchN (None, 26, 12, 192)  576         conv2d_113[0][0]                 
__________

__________________________________________________________________________________________________
activation_124 (Activation)     (None, 26, 12, 192)  0           batch_normalization_124[0][0]    
__________________________________________________________________________________________________
block17_12_mixed (Concatenate)  (None, 26, 12, 384)  0           activation_121[0][0]             
                                                                 activation_124[0][0]             
__________________________________________________________________________________________________
block17_12_conv (Conv2D)        (None, 26, 12, 1088) 418880      block17_12_mixed[0][0]           
__________________________________________________________________________________________________
block17_12 (Lambda)             (None, 26, 12, 1088) 0           block17_11_ac[0][0]              
                                                                 block17_12_conv[0][0]            
__________

__________________________________________________________________________________________________
batch_normalization_134 (BatchN (None, 26, 12, 128)  384         conv2d_134[0][0]                 
__________________________________________________________________________________________________
activation_134 (Activation)     (None, 26, 12, 128)  0           batch_normalization_134[0][0]    
__________________________________________________________________________________________________
conv2d_135 (Conv2D)             (None, 26, 12, 160)  143360      activation_134[0][0]             
__________________________________________________________________________________________________
batch_normalization_135 (BatchN (None, 26, 12, 160)  480         conv2d_135[0][0]                 
__________________________________________________________________________________________________
activation_135 (Activation)     (None, 26, 12, 160)  0           batch_normalization_135[0][0]    
__________

__________________________________________________________________________________________________
batch_normalization_141 (BatchN (None, 26, 12, 192)  576         conv2d_141[0][0]                 
__________________________________________________________________________________________________
batch_normalization_144 (BatchN (None, 26, 12, 192)  576         conv2d_144[0][0]                 
__________________________________________________________________________________________________
activation_141 (Activation)     (None, 26, 12, 192)  0           batch_normalization_141[0][0]    
__________________________________________________________________________________________________
activation_144 (Activation)     (None, 26, 12, 192)  0           batch_normalization_144[0][0]    
__________________________________________________________________________________________________
block17_17_mixed (Concatenate)  (None, 26, 12, 384)  0           activation_141[0][0]             
          

block17_19 (Lambda)             (None, 26, 12, 1088) 0           block17_18_ac[0][0]              
                                                                 block17_19_conv[0][0]            
__________________________________________________________________________________________________
block17_19_ac (Activation)      (None, 26, 12, 1088) 0           block17_19[0][0]                 
__________________________________________________________________________________________________
conv2d_154 (Conv2D)             (None, 26, 12, 128)  139264      block17_19_ac[0][0]              
__________________________________________________________________________________________________
batch_normalization_154 (BatchN (None, 26, 12, 128)  384         conv2d_154[0][0]                 
__________________________________________________________________________________________________
activation_154 (Activation)     (None, 26, 12, 128)  0           batch_normalization_154[0][0]    
__________

__________________________________________________________________________________________________
mixed_7a (Concatenate)          (None, 12, 5, 2080)  0           activation_158[0][0]             
                                                                 activation_160[0][0]             
                                                                 activation_163[0][0]             
                                                                 max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
conv2d_165 (Conv2D)             (None, 12, 5, 192)   399360      mixed_7a[0][0]                   
__________________________________________________________________________________________________
batch_normalization_165 (BatchN (None, 12, 5, 192)   576         conv2d_165[0][0]                 
__________________________________________________________________________________________________
activation

batch_normalization_174 (BatchN (None, 12, 5, 224)   672         conv2d_174[0][0]                 
__________________________________________________________________________________________________
activation_174 (Activation)     (None, 12, 5, 224)   0           batch_normalization_174[0][0]    
__________________________________________________________________________________________________
conv2d_172 (Conv2D)             (None, 12, 5, 192)   399360      block8_2_ac[0][0]                
__________________________________________________________________________________________________
conv2d_175 (Conv2D)             (None, 12, 5, 256)   172032      activation_174[0][0]             
__________________________________________________________________________________________________
batch_normalization_172 (BatchN (None, 12, 5, 192)   576         conv2d_172[0][0]                 
__________________________________________________________________________________________________
batch_norm

activation_183 (Activation)     (None, 12, 5, 256)   0           batch_normalization_183[0][0]    
__________________________________________________________________________________________________
block8_5_mixed (Concatenate)    (None, 12, 5, 448)   0           activation_180[0][0]             
                                                                 activation_183[0][0]             
__________________________________________________________________________________________________
block8_5_conv (Conv2D)          (None, 12, 5, 2080)  933920      block8_5_mixed[0][0]             
__________________________________________________________________________________________________
block8_5 (Lambda)               (None, 12, 5, 2080)  0           block8_4_ac[0][0]                
                                                                 block8_5_conv[0][0]              
__________________________________________________________________________________________________
block8_5_a

batch_normalization_193 (BatchN (None, 12, 5, 192)   576         conv2d_193[0][0]                 
__________________________________________________________________________________________________
activation_193 (Activation)     (None, 12, 5, 192)   0           batch_normalization_193[0][0]    
__________________________________________________________________________________________________
conv2d_194 (Conv2D)             (None, 12, 5, 224)   129024      activation_193[0][0]             
__________________________________________________________________________________________________
batch_normalization_194 (BatchN (None, 12, 5, 224)   672         conv2d_194[0][0]                 
__________________________________________________________________________________________________
activation_194 (Activation)     (None, 12, 5, 224)   0           batch_normalization_194[0][0]    
__________________________________________________________________________________________________
conv2d_192

batch_normalization_200 (BatchN (None, 12, 5, 192)   576         conv2d_200[0][0]                 
__________________________________________________________________________________________________
batch_normalization_203 (BatchN (None, 12, 5, 256)   768         conv2d_203[0][0]                 
__________________________________________________________________________________________________
activation_200 (Activation)     (None, 12, 5, 192)   0           batch_normalization_200[0][0]    
__________________________________________________________________________________________________
activation_203 (Activation)     (None, 12, 5, 256)   0           batch_normalization_203[0][0]    
__________________________________________________________________________________________________
block8_10_mixed (Concatenate)   (None, 12, 5, 448)   0           activation_200[0][0]             
                                                                 activation_203[0][0]             
__________


Epoch 00001: val_acc improved from -inf to 0.53490, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00002: val_acc improved from 0.53490 to 0.65857, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00003: val_acc improved from 0.65857 to 0.79894, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00004: val_acc improved from 0.79894 to 0.87102, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00005: val_acc did not improve from 0.87102



Epoch 00006: val_acc improved from 0.87102 to 0.89985, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00007: val_acc improved from 0.89985 to 0.91275, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00008: val_acc did not improve from 0.91275



Epoch 00009: val_acc improved from 0.91275 to 0.93020, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00010: val_acc did not improve from 0.93020



Epoch 00011: val_acc did not improve from 0.93020



Epoch 00012: val_acc improved from 0.93020 to 0.94310, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00013: val_acc improved from 0.94310 to 0.94841, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00014: val_acc did not improve from 0.94841



Epoch 00015: val_acc improved from 0.94841 to 0.95372, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00016: val_acc improved from 0.95372 to 0.96206, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00017: val_acc did not improve from 0.96206



Epoch 00018: val_acc did not improve from 0.96206



Epoch 00019: val_acc did not improve from 0.96206



Epoch 00020: val_acc did not improve from 0.96206



Epoch 00021: val_acc did not improve from 0.96206



Epoch 00022: val_acc improved from 0.96206 to 0.96510, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00023: val_acc improved from 0.96510 to 0.97496, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00024: val_acc did not improve from 0.97496



Epoch 00025: val_acc did not improve from 0.97496



Epoch 00026: val_acc did not improve from 0.97496



Epoch 00027: val_acc did not improve from 0.97496



Epoch 00028: val_acc did not improve from 0.97496



Epoch 00029: val_acc did not improve from 0.97496



Epoch 00030: val_acc did not improve from 0.97496



Epoch 00031: val_acc did not improve from 0.97496

Epoch 00031: ReduceLROnPlateau reducing learning rate to 0.5.



Epoch 00032: val_acc did not improve from 0.97496



Epoch 00033: val_acc did not improve from 0.97496



Epoch 00034: val_acc did not improve from 0.97496



Epoch 00035: val_acc did not improve from 0.97496



Epoch 00036: val_acc did not improve from 0.97496



Epoch 00037: val_acc did not improve from 0.97496



Epoch 00038: val_acc did not improve from 0.97496



Epoch 00039: val_acc did not improve from 0.97496

Epoch 00039: ReduceLROnPlateau reducing learning rate to 0.25.



Epoch 00040: val_acc did not improve from 0.97496



Epoch 00041: val_acc improved from 0.97496 to 0.97648, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00042: val_acc did not improve from 0.97648



Epoch 00043: val_acc did not improve from 0.97648



Epoch 00044: val_acc did not improve from 0.97648



Epoch 00045: val_acc improved from 0.97648 to 0.97800, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00046: val_acc did not improve from 0.97800



Epoch 00047: val_acc did not improve from 0.97800



Epoch 00048: val_acc did not improve from 0.97800



Epoch 00049: val_acc did not improve from 0.97800



Epoch 00050: val_acc did not improve from 0.97800



Epoch 00051: val_acc did not improve from 0.97800



Epoch 00052: val_acc improved from 0.97800 to 0.97876, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00053: val_acc did not improve from 0.97876



Epoch 00054: val_acc did not improve from 0.97876



Epoch 00055: val_acc did not improve from 0.97876



Epoch 00056: val_acc did not improve from 0.97876



Epoch 00057: val_acc improved from 0.97876 to 0.97951, saving model to 2018-05-10_inception_resnet_v2.h5



Epoch 00058: val_acc did not improve from 0.97951



Epoch 00059: val_acc did not improve from 0.97951



Epoch 00060: val_acc did not improve from 0.97951



Epoch 00061: val_acc did not improve from 0.97951



Epoch 00062: val_acc did not improve from 0.97951



Epoch 00063: val_acc did not improve from 0.97951



Epoch 00064: val_acc did not improve from 0.97951



Epoch 00065: val_acc did not improve from 0.97951

Epoch 00065: ReduceLROnPlateau reducing learning rate to 0.125.



Epoch 00066: val_acc did not improve from 0.97951



Epoch 00067: val_acc did not improve from 0.97951



Epoch 00068: val_acc did not improve from 0.97951



Epoch 00069: val_acc did not improve from 0.97951



Epoch 00070: val_acc did not improve from 0.97951



Epoch 00071: val_acc did not improve from 0.97951



Epoch 00072: val_acc did not improve from 0.97951



Epoch 00073: val_acc did not improve from 0.97951

Epoch 00073: ReduceLROnPlateau reducing learning rate to 0.0625.



Epoch 00074: val_acc did not improve from 0.97951



Epoch 00075: val_acc did not improve from 0.97951



Epoch 00076: val_acc did not improve from 0.97951



Epoch 00077: val_acc did not improve from 0.97951
Epoch 00077: early stopping

Loading best model from check-point and testing...
Reporting classification statistics with micro average
           precision    recall  f1-score   support

       L0      1.000     1.000     1.000         2
       L1      1.000     0.955     0.977        22
       L2      0.926     0.943     0.935       106
       L3      0.947     0.947     0.947       206
       L4      0.986     0.965     0.975       286
       L5      0.992     1.000     0.996       129
       L6      1.000     1.000     1.000        12
       S0      1.000     1.000     1.000        12
       S1      0.963     1.000     0.981        26
       S2      0.988     0.988     0.988        81
       S3      0.995     1.000     0.997       182
       S4      0.990     0.995     0.992       192
       S5      0.984     1.000     0.992        62

micro avg      0.977     0.977     0.977      1318

Reporting classification statistics with macro

For a full list of all models, you may run:

In [4]:
%run models/ConfigurationFactory.py

Available configurations are:
- vgg4
- vgg_global_average
- inception_resnet_v2
- inception_resnet_v2_pretrained
- vgg16
- res_net_50
- dense_net_201
- res_net_4
