Inspired by the famous example of MNIST public database (60000 labelled images of hand-written digits), we acknowledge the need for a well-known and representative data set to help the development of applications in the specific domain of Optical Music Recognition.
- OMR samples for the training and testing of symbol classifiers
- Ground-truth material for the evaluation or comparison of OMR engines
Ultimately, once data structuring and content are sufficiently validated, we think this reference should preferably be hosted by the International Music Score Library Project (IMSLP).
Meanwhile, the purpose of this
omr-dataset Github repository is to gather the material used to build preliminary versions of the target reference.
This project is handled by gradle tool, and can be driven from an IDE or the command line.
[NOTA: Noise addition tools are not yet included in this gradle build]
From command line, for a full rebuild, use:
gradle clean build
To just display usage rules, use:
this will display:
Syntax: [OPTIONS] -- [INPUT_FILES] @file: Content to be extended in line Options: -clean : Cleans up output -controls : Generates control images -features : Generates .csv and .dat files -help : Displays general help then stops -mistakes : Saves mistake images -model <.zip file> : Defines path to model -names : Prints all possible symbol names -nones : Generates none symbols -output <folder> : Defines output directory -subimages : Generates subimages -training : Trains classifier on features Input file extensions: .xml: annotations file
To clean up output, use:
gradle run -PcmdLineArgs="-output,data/output,-clean"
To generate features, with all options, using input from
gradle run -PcmdLineArgs="-output,data/output,-features,-nones,-controls,-subimages,--,data/input-images"
To launch training on generated features, while saving mistaken images, and targeting a specific model file, use:
gradle run -PcmdLineArgs="-output,data/output,-training,-mistakes,-model,data/patch-classifier.zip"
Remark: the training task lasts about 15 minutes when run on the toy example
To monitor the neural network being trained, simply open a browser on http://localhost:9000 url.
See the related wiki for more details.