Skip to content

6x Current Status

Hervé Bitteur edited this page Jan 19, 2019 · 8 revisions

6x Current Status

Purpose of this chapter is to present results of Audiveris 6.0 prototype, now that we have reached a first integration of page classifier into Audiveris and a demo of patch classifier.

We have not carried a thorough analysis on numerous scores. But we detail here a hand-full of score examples, which exhibit typical behaviors of 5.x and 6.0 versions:

Example Main Characteristics
Dichterliebe perfect synthetic image
Chula simple real scan
BachInvention5 non-uniform illumination
Wunder poor-quality and crowded

Dichterliebe

This score is part of Audiveris standard examples (by curtesy of Michael Good). Original PDF is available here. It is a 2-page score, digitally rendered (perhaps by Finale?) and thus of perfect quality.

Here are the outputs of page classifier:

  • Page #1, 275 annotations reported, with 9 errors:

    • Dot in "1." title mistaken for articStaccatoBelow
    • Dot in "Mai" title mistaken for articStaccatoAbove
    • accidentalSharp mistaken for accidentalNatural
    • keyboardPedalPed "Ped" mark not detected
    • flag16thDown mistaken for flag32ndDown
    • flag8thUp mistaken for graceNoteAppoggiaturaStemUp
    • flag16thDown mistaken for flag8thDown
    • "8" text mistaken for fingering2
    • lyrics "da ist" (bar 9) mistaken for fingering1
  • Page #2, 361 annotations reported, with 16 errors:

    • accidentalSharp mistaken for accidentalNatural (4 times)
    • flag8thUp mistaken for graceNoteAppoggiaturaStemUp
    • flag8thDown mistaken for flag16thDown (3 times)
    • flag16thDown mistaken for flag8thDown (2 times)
    • "le" text mistaken for fingering3
    • bar number "16" mistaken for fingering0
    • bar number "20" mistaken for fingering2
    • fermataBelow not detected
    • keyboardPedalPed "Ped" mark not detected
    • keyboardPedalUp "*" mark not detected

Total: 25 mistakes on a population of about 639 symbols (636 annotations + 3 non-detected symbols) gives an error rate at 3.9%.
We can consider this result as pretty good. By comparison, Audiveris 5.1 exhibits 0 errors on the same input.

It is worth noting that, while symbol detection is OK, the accuracy of bounding boxes is rather poor. So much that several detected flags were initially rejected by the OMR engine because their left side was abscissa-wise too far from the related stem. We had to relax several OMR engine checks to get these flags accepted.
See the case below, where the left side of the flag box should be aligned with the stem:

Chula

This score is part of Audiveris examples, available here.

It's a scan of a real-world printed score, of good quality though not digitally rendered, rather simple.

  • 214 annotations reported with 15 errors
    • timeSig4 not detected (2 times)
    • rest16th mistaken for rest64th
    • repeatDot mistaken for articStaccatoAbove (3 times)
    • repeatDot mistaken for augmentationDot (2 times)
    • repeatDot not detected
    • rest8th mistaken for rest16th (2 times)
    • pair of repeatDot's mistaken for flag32ndDown
    • keyFlat mistaken for accidentalFlat (2 times)

15 errors for about 217 symbols gives an error rate of 6.9%, not so good. (Audiveris 5.1 gives 3 errors (1.4%) on the same input)

Note that a case of overlapping symbols (a flag and a rest) is well detected. See the snapshot below. Unfortunately this rest8th was mistaken for a rest16th, but it's worth noting that this case could never be resolved by Audiveris 5.x engine.

Here below are the outputs of the patch classifier, for the flag and for the rest. They are both correct:

Wrong classification can lead to invalid key signature, as follows, where one of the keyFlat symbols was mistaken for an accidentalFlat:

Another problem is the non-detection on the two timeSig4 symbols, see below. This resulted in invalid time signatures (numerator without denominator)

What should the OMR engine do when it encounters such invalid configurations? As of this writing, the "confidence" attribute provided with any detected symbol comes with the 1.0 value, so this information cannot be used to resolve invalid configurations.

Here, we are hitting a limitation of the page classifier. It does detection, segmentation and classification:

  • Segmentation results today in poor bounding boxes, but we can more or less live with that.
  • Errors in detection and/or classification are more difficult to cope with.

Some errors can be detected by the OMR engine, as is the case for the invalid configurations mentioned above. But then, we cannot really relaunch the page classifier to disambiguate the cases and anyway its outputs are today not valued: it's an "all or nothing" binary situation.

We thus need an extra tool to help the engine on line, and this may be just a simple classifier (either a glyph classifier or preferably a patch classifier since in these invalid configurations the engine knows rather precisely where to look).

Fortunately, here are the patch classifier outputs on the 2 problems listed above: Respectively the mistaken keyFlat and the missing timeSig4 symbols are correctly recognized:

Also, we can compare the behavior of page and patch classifiers on less frequent symbols, as in this horizontal sequence of pedal symbols here below.

The page annotations show a correct keyboardPedalPed (though its bounds are strongly shifted), a keyboardPedalUp mistaken for a restQuarter and a keyboardPedalPed not detected:

On the same sequence, the patch classifier works much better:

Is this to say that the patch classifier is our silver bullet?

Well it has sometimes a weird behavior, the most problematic one relates to black note heads.

Very often, standard black note heads are mistaken for small black note heads. It is so frequent, that we can suspect a bug somewhere.

Here below, we can observe in a horizontal sequence of natural / head / sharp / head symbols, that accidental symbols are well recognized by the patch classifier, but black note heads are not.

In short, we can say that the patch classifier is promising but still needs to be worked upon.

BachInvention5

This score is part of Audiveris examples, available here. It is a scan of a real-world printed score, moderately complex but with a non-uniform illumination.

This example is interesting to test whether the page classifier should be run on the initial (gray) image or on the binarized (black and white) image. By default, the classifiers run on the binary image but this can easily be changed by setting the option org.audiveris.omr.sheet.Picture.classifiersUseInitial to true.

The result is clear:

  • On the initial (gray) image, 447 symbols are detected, almost every staff header (clef and key) is left blank of symbols
  • On the binary (B&W) image, 490 symbols are detected, most staff headers are correctly detected.

This is the reason why the binary image has been chosen as the default input for 6.0 classifiers.

Wunder

This example exhibits both low quality and crowded input. It is definitely a challenge for OMR programs.

Here are the links for full OMR outputs:

Let's now have a look at a few typical measures.

Measure 2

This measure is very crowded, with an alteration sign which even got merged with the starting bar-line.

On 6.x annotations above:

  • One accidentalNatural sign is detected in first staff, the others are globally mistaken for a gClef.
  • The left part of a decrescendo wedge is mistaken for a noteHeadWhole.
  • An accidentalNatural in second staff is mistaken for a rest32nd.

But all the other shapes in this measure are correctly recognized.

On 5.1 output above, both on first and second staves, a group of accidentals is mistaken for a noteheadWhole.

This mistake happens frequently in 5.x, because of the way a noteheadWhole is recognized:

  • It uses template matching, checking black and white pixels within and around the template.
    But, since this specific head is never linked to a stem (as opposed to all other note heads), it can't get any support from a stem nearby.
  • To alleviate this lack of potential support, the OMR engine "boosts" its grade artificially. This allows this candidate to compete with other heads, but has the side effect to augment the number of false positive noteheadWhole items.

And since note heads are no longer called into question when other symbols are searched for, there is no search for accidental symbols in the corresponding areas.

In this measure, the 6.0 page classifier behaves significantly better than 5.1 engine based on template matching and glyph classifier.

Measure 4

The page classifier works rather well on this measure. The only mistakes are:

  • On first staff an articAccentAbove mistaken for a noteheadWHole, and a false positive for a noteheadBlack.
  • On second staff, an augmentationDot not detected and 2 noteheadHalf's mistaken for noteheadBlack's (with one strongly shifted).

On the same measure, the 5.1 engine totally missed the two accidentalSharp's in the left corner because of two noteheadHalf false positives.
And on the second staff, the false noteheadWhole stroke again, preventing the recognition of a rest8th and an accidentalSharp. Two augmentationDot symbols (bottom center) were missed while a small stain was taken as an augmentationDot (bottom right).

Here again, the 5.1 engine did not shine...

Measure 20

Above, the page classifier made really few mistakes:

  • First staff, 2 flag8thUp mistaken for graceNoteAppoggiaturaStemUp.
  • Second staff, 1 articTenutoBelow mistaken for rest8th, and 2 other articTenutoBelow mistaken for noteHeadWhole symbols.
  • Third staff, 1 augmentationDot missed, 1 false positive noteHeadWhole.

On the same measure, the 5.1 engine was much worse:

  • First staff, 2 flag8thUp symbols missed.
  • Second staff, a disaster: 8 heads, 2 accidentalNatural and 3 articTenutoBelow missed!
  • Third staff, 1 noteheadHalf and its augmentationDot missed.

Early lessons

Although it is still too early to draw general conclusions, we can nevertheless see from the examples studied that:

  • The (old) 5.1 engine which combines template matching, glyph classifier and many ad hoc strategies gives generally good results on good-quality and rather clear scores but suffers a lot on degraded and crowded ones.
  • The (new) 6.0 page classifier appears to be less dependent on score quality and complexity. It arrives second on good and clear scores but behaves better and surprisingly well on degraded and crowded ones.
  • The (new) 6.0 patch classifier appears to be rather efficient if we except its frequent mistakes between noteheadBlack and noteheadBlackSmall symbols.

From the perspective of Audiveris OMR software, we think that we have gone as far as possible with the old 5.1 engine regarding the precise task of symbol recognition. Next to the engine, the OMR software needs a full set of end-user interactive tools for score validation and correction, and one of the main goals of coming 5.2 version is to complete these UI tools.

We believe that the next big engine improvements will come from the page and patch classifiers. We already have very promising prototypes, with well identified weaknesses to be worked upon:

  • Severely unbalanced training sets (page & patch)
  • Poor bounding boxes (page)
  • No confidence information (page)
  • Recurrent mistakes on black heads (patch)
  • Huge CPU needs at inference time (page)