Simple quality measurement tool - Java adaptation #563

PeterGreth · 2022-04-03T17:28:18Z

This is the Java/Junit adaptation of #122:

In order to make Audiveris better, we need some way to measure how good it already is. This is an attempt to actually measure the recognition quality with the simplest possible metric: number of notes and rests correctly placed in the recognition result.

One sample test case is included (actually taken from issue #78), along with its "ideal" recognition result.

The proposed workflow is this (updated):

before starting work on a recognition issue, add a minimal example of it to src/test/resources/conversion/score (source.png or source.pdf and target.xml);
measure recognition quality by executing the Junit test conversion.score.ConversionScoreRegressionTest;
fix the issue in the code;
measure quality again, checking that it went up. If it went down, it means that we have either broken something else or not fixed what we intended.

The quality score calculation is quite simplistic right now, just checking pitch and duration for the notes and rests. Later on, within the same framework and using existing test cases, we can start taking into account more stuff (keys, time signatures, measure durations, dynamics, etc).

This version's Mxl.Input closes the underlying resources, allowing regression test execution to clean up afterwards

This frees the underlying log file, allowing regression test execution to clean up afterwards

@avm

Java/Junit adaptation of PR by Alexander Myltsev (@avm). Test cases can be configured using ConversionScoreTestCase.java. Tests are executed using ConversionScoreRegressionTest.java. The similarity of actual output and expected output is calculated in ScoreSimilarity.java.

@avm

As provided in PR Audiveris#122 by Alexander Myltsev (@avm).

hbitteur · 2022-04-06T10:35:14Z

Hi Peter,

First I need to apologize:

I have been very busy that last couple of days helping new contributors on issue Drum notation? #33
I'm still not used to PRs. So please be kind with me.

I had a look at your Java code. I'd suggest you limit source line length at 100 characters, it's a minor point but it would make code reading much more comfortable for me. I'm sure you can ask your IDE to take care of this automagically.

Then, I checked out your PR to test it locally. I did so via a gh pr checkout 563 in a local dedicated folder.
And then I tried your documented workflow, using the 01-klavier example.

Running test ConversionScoreRegressionTest failed with an exception.
Here is the top of the stack dump:

java.lang.AssertionError
	at org.junit.Assert.fail(Assert.java:87)
	at org.junit.Assert.assertTrue(Assert.java:42)
	at org.junit.Assert.assertTrue(Assert.java:53)
	at conversion.score.ConversionScoreRegressionTest.audiverisBatchExport(ConversionScoreRegressionTest.java:170)

Reading the code around line 170, I understand that the program did not find a .xml file, but which one precisely? I opened a file manager and there is indeed a input.mxl file in build/resources/test/conversion/score/01-klavier folder
So what happened? I don't know the full path of the requested file.

I'd suggest you modify assertTrue(Files.exists(outputMxlFile)); instructions with something which, in case of failure, prints out some understandable message for the end user, such as the full path name of the requested file.

I need some help.
Thanks
/Hervé

Hard wrap at 100 chars.

PeterGreth · 2022-04-07T18:50:04Z

Hi Hervé,

First I need to apologize

you don't have to apologize. I think it's wonderful that you put your time into this project, especially helping others with their needs :)

I had a look at your Java code.

Sure, I set a hard wrap at 100 characters and reformatted my changes again. I initially spent some time configuring IntelliJ to autoformat consistently to the existing code, but here and there deviations still exist.
Since this is entirely your code base feel free to propose further changes.

I'd suggest you modify [...] asserts

Good point, thanks for the suggestion. All asserts now have a (hopefully understandable) message.
Your missing file is the MusicXml output file that Audiveris should produce, when given 01-klavier/input.png as input.

I'm surprised that the test is failing for you. I just successfully tried it with Github CLI and a clean copy of Audiveris. I'm working on Windows with jdk 17.0.2 installed. I used exactly these commands:

gh repo clone https://github.com/Audiveris/audiveris
cd audiveris
gh pr checkout 563
gradlew test --tests conversion.score.ConversionScoreRegressionTest --rerun-tasks

The test command produces the following output:

> Configure project:
targetOS=windows-x86_64
> Task :compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
> Task :test
conversion.score.ConversionScoreRegressionTest > testConversionScoreChanged[TestCase 01-klavier] STANDARD_OUT
20:46:05.718 [cachedLow-thread-1] WARN org.audiveris.omr.sheet.ScaleBuilder - No reliable beam height found, guessed value: 12
Estimating resolution as 384
BUILD SUCCESSFUL in 15s
8 actionable tasks: 8 executed

Maybe this helps :)
Thanks
Peter

hbitteur · 2022-04-07T20:16:00Z

Hi Peter,

I repeated exactly the same commands as yours:

$ git clone https://github.com/Audiveris/audiveris.git pull-request
Cloning into 'pull-request'...
remote: Enumerating objects: 107228, done.
remote: Counting objects: 100% (9793/9793), done.
remote: Compressing objects: 100% (4243/4243), done.
remote: Total 107228 (delta 6445), reused 8120 (delta 5079), pack-reused 97435
Receiving objects: 100% (107228/107228), 276.88 MiB | 10.53 MiB/s, done.
Resolving deltas: 100% (85776/85776), done.
Updating files: 100% (1707/1707), done.

$ cd pull-request/

$ gh pr checkout 563
remote: Enumerating objects: 77, done.
remote: Counting objects: 100% (77/77), done.
remote: Compressing objects: 100% (29/29), done.
remote: Total 64 (delta 26), reused 63 (delta 25), pack-reused 0
Unpacking objects: 100% (64/64), 518.75 KiB | 1.13 MiB/s, done.
From https://github.com/Audiveris/audiveris
 * [new ref]             refs/pull/563/head -> feature/quality-measurement-java
Updating files: 100% (952/952), done.
Switched to branch 'feature/quality-measurement-java'


←[0;33mA new release of gh is available:←[0m ←[0;36m2.4.0←[0m → ←[0;36mv2.7.0←[0m
←[0;33mhttps://github.com/cli/cli/releases/tag/v2.7.0←[0m


$ ./gradlew test --tests conversion.score.ConversionScoreRegressionTest --rerun-tasks

> Configure project :
targetOS=windows-x86_64

> Task :compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

> Task :compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

> Task :test

conversion.score.ConversionScoreRegressionTest > testConversionScoreChanged[TestCase 01-klavier] STANDARD_OUT
    22:03:55.640 [cachedLow-thread-2] WARN org.audiveris.omr.sheet.ScaleBuilder - No reliable beam height found, guessed value: 12

conversion.score.ConversionScoreRegressionTest > testConversionScoreChanged[TestCase 01-klavier] FAILED
    java.lang.AssertionError at ConversionScoreRegressionTest.java:184
Warning: Parameter not found: enable_new_segsearch
Estimating resolution as 384

1 test completed, 1 failed

> Task :test FAILED

I'm puzzled by this enable_new_segsearch, it seems to come from Tesseract.

Now, the exception is raised on the line 184 below:

        assertTrue(String.format("Audiveris batch export seems to have failed. Cannot find " +
                                         "output file '%s'", outputMxlFile),
                   Files.exists(outputMxlFile));

I'm a bit lost.

hbitteur · 2022-04-08T08:30:41Z

Hold on, I think the problem is on my side.
There is a confusion about where the exported .xml file is and should be.
More on this later.

hbitteur · 2022-04-08T15:59:26Z

OK, the guilty item is the constant BookManager.useInputBookFolder whose text tip is "Should we store book outputs next to book input?"
This still undocumented feature allows to store all the outputs of a book (such as its exported .mxl file) as a sibling of its input file. In other words, the processing of some/path/to/input.pdf file will be stored in some/path/to/input.mxl file, regardless of the -output option!

I use that strategy a lot to organize files related to Audiveris issues. For example, say Issue 123 relates to foo.pdf and bar.pdf files, then I create an Issue-123 folder into which I store foo.pdf and bar.pdf. Then all processing files related to these input files will be stored in the same Issue-123 folder.

By default (your case) the switch is set to false, in my case it was set to true. Hence the "file not found" errors.

In my defense :-(, I can say that I created issue #555 on this point, last February. And it is still on the TODO list...

So now, I can successfully run your test case.
Well, not that successfully, because if the score increases between expected and actual, the test FAILS. It succeeds only when scores do not evolve. I would understand such failure in case of decrease, not for increase. You are killing any bit of motivation in developer's head!!! :-)

hbitteur · 2022-04-09T07:48:02Z

Hi Peter,
I understand that the simple increase of the number of recognized notes is not always an evidence of improvement, it can even be the opposite in case of false positives.

So, we have to be careful on results analysis.
To this end, I suggest that we don't use temporary folders to hold Audiveris outputs, be they .xml, .mxl or .omr files. Because if the test tells us that the notes number has changed, in one direction or another, it is important to be able to:

visually check these outputs after the end of the test
perhaps use these outputs as new references for future tests (new "expected-output" files)

Today, Audiveris project is published with a small number of score examples, located in data/examples folder.
We could:

augment these examples with other representative scores
provide the corresponding .mxl artifacts of all the data/examples

More generally, I think your work can be the basis for something we lack today in Audiveris project: a "continuous integration" à la Jenkins. Is it something you are familiar with (or simply interested in) and could take the lead on?

PeterGreth · 2022-06-10T16:45:28Z

Hi Hervé,
sorry for not responding for such a long amount of time.

So now, I can successfully run your test case.

Perfect! 😃

I suggest that we don't use temporary folders to hold Audiveris outputs

You are right, I understand the benefits. I think it would be possible to store test outputs in subfolders of the build directory. Would this be okay for you?

the guilty item is the constant BookManager.useInputBookFolder

We have to find the output file nevertheless. Is using -option org.audiveris.omr.sheet.BookManager.useInputBookFolder=false the correct way to go (for now)? I probably have to set useSeparateBookFolders=false too, correct?

If the score increases between expected and actual, the test FAILS. You are killing any bit of motivation in developer's head!!! :-)

The conversion scores will certainly go up and down over time. The score for the 01-klavier example might be 15 right now, but after 1 year of Audiveris development it might for example be 30. If the expected score in the test file then still is set to 15, the test would probably always pass after every change, no matter if the change improved the score or not.
This is why I let the test fail (in my defense: the test prints "Well done, the conversion score increased from xx to yy (diff: zz). Please adapt ... accordingly."). I see however that it might be demotivating and cumbersome for the developer to manually copy the new results into the file.
An alternative might be the following: The test fails only if the conversion score decreased. Expected conversion scores are not stored in the source code, but in a csv/json/properties file. If the score improved for some test case, the file is automatically updated with the new score. The developer sees the change in git and can enjoy his success by simply committing the updated file 😛 What do you think about this solution?

Audiveris project is published with a small number of score examples, located in data/examples folder. We could augment these examples with other representative scores

I think this is a good idea for starting with these kind of tests 👍

We could provide the corresponding .mxl artifacts of all the data/examples

The original author (of the equivalent PR in python) decided to provide an .xml as the expected output. You propose to use .mxl. What would be the advantage? Do you want the expected output files to be stored in data/examples as well?

I think your work can be the basis for something we lack today in Audiveris project: a "continuous integration" à la Jenkins. Is it something you are familiar with (or simply interested in) and could take the lead on?

Yes! 😄 I created a discussion for that: #581

Looking forward to hearing your thoughts 👍

hbitteur · 2022-06-10T17:11:37Z

The original author (of the equivalent PR in python) decided to provide an .xml as the expected output. You propose to use .mxl. What would be the advantage? Do you want the expected output files to be stored in data/examples as well?

The .mxl format is merely a compressed version of the original .xml file plus a manifest file. It is the default exchange format between MusicXML-compliant applications.
And yes I think keeping the expected output files next to their input files in data/examples would be a good idea..

PeterGreth added 4 commits April 3, 2022 18:17

Update proxymusic to 4.0.2

7264446

This version's Mxl.Input closes the underlying resources, allowing regression test execution to clean up afterwards

Stop log appender before detaching

de1dbe5

This frees the underlying log file, allowing regression test execution to clean up afterwards

First regression test case "01-klavier"

b2b2938

As provided in PR Audiveris#122 by Alexander Myltsev (@avm).

PeterGreth mentioned this pull request Apr 3, 2022

Simple quality measurement tool #122

Open

PeterGreth added 2 commits April 7, 2022 20:30

Code style improvement

ad3e3db

Hard wrap at 100 chars.

Improve assertion messages

1c430d6

PeterGreth mentioned this pull request Aug 15, 2023

Add Github Actions Workflow to build and (unit-)test Audiveris #665

Merged

Merge branch 'Audiveris:master' into feature/quality-measurement-java

aeb58c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple quality measurement tool - Java adaptation #563

Simple quality measurement tool - Java adaptation #563

PeterGreth commented Apr 3, 2022

hbitteur commented Apr 6, 2022

PeterGreth commented Apr 7, 2022

hbitteur commented Apr 7, 2022

hbitteur commented Apr 8, 2022

hbitteur commented Apr 8, 2022

hbitteur commented Apr 9, 2022 •

edited

Loading

PeterGreth commented Jun 10, 2022

hbitteur commented Jun 10, 2022

Simple quality measurement tool - Java adaptation #563

Are you sure you want to change the base?

Simple quality measurement tool - Java adaptation #563

Conversation

PeterGreth commented Apr 3, 2022

hbitteur commented Apr 6, 2022

PeterGreth commented Apr 7, 2022

hbitteur commented Apr 7, 2022

hbitteur commented Apr 8, 2022

hbitteur commented Apr 8, 2022

hbitteur commented Apr 9, 2022 • edited Loading

PeterGreth commented Jun 10, 2022

hbitteur commented Jun 10, 2022

hbitteur commented Apr 9, 2022 •

edited

Loading