Music Object Detector with TensorFlow
Clone or download
Pull request Compare This branch is 243 commits ahead, 352 commits behind tensorflow:master.
apacha Merge pull request #2 from fpaupier/master
FIX Remove deprecated score_threshold arg in tf.image.non_max_suppression
Latest commit 3a00f6e Dec 11, 2018

README.md

Music Object Detector

This is the repository for the fast and reliable Music Symbol detector with Deep Learning, based on the Tensorflow Object Detection API:

If you want to try out the full-page detection on your own images, you can try it online in the DIVAServices Spotlight.

The scientific reasoning can be found in this scientific article. The detailed results for various combinations of object-detector, feature-extractor, etc. can be found in this spreadsheet.

Music object detection in image crops

If you are interested in previous work, presented at the DAS 2018 on cropped images like these, please refer to the corresponding release

Original Image Detection results as training progresses
Original image Image with detected objects

The scientific reasoning can be found in this scientific article. The detailed results for various combinations of object-detector, feature-extractor, etc. can be found in this spreadsheet.

Preparing the application

This repository contains several scripts that can be used independently of each other. Before running them, make sure that you have the necessary requirements installed.

Install required libraries

  • Python 3.6
  • Tensorflow 1.8.0 (or optionally tensorflow-gpu 1.8.0)
  • pycocotools (more infos)
    • On Linux, run pip install git+https://github.com/waleedka/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI
    • On Windows, run pip install git+https://github.com/philferriere/cocoapi.git#egg=pycocotools^&subdirectory=PythonAPI
  • Some libraries, as specified in requirements.txt

Build Protobuf files on Linux

cd research
protoc object_detection/protos/*.proto --python_out=.

Build Protobuf files on Windows

Run DownloadAndBuildProtocolBuffers.ps1 to automate this step or manually build the protobufs by first installing protocol buffers and then run:

cd research
protoc object_detection/protos/*.proto --python_out=.

Note, that you have to use version 3.4.0 because of a bug in 3.5.0 and 3.5.1

Dataset

Run PrepareDatasetsForTensorflow.ps1 to automate this step on Windows or manually prepare the datasets with the following steps (on Linux).

Run the following scripts to reproduce the dataset locally:

# cd into MusicObjectDetector folder
python download_muscima_dataset.py
python prepare_muscima_annotations.py
python dataset_splitter.py --source_directory=data/muscima_pp_cropped_images_with_stafflines --destination_directory=data/training_validation_test_with_stafflines

These scripts will download the datasets automatically, prepare the annotations and split the images into three reproducible parts for training, validation and test.

Now you can create the Tensorflow Records that are required for actually running the training.

python create_muscima_tf_record.py --data_dir=data/training_validation_test_with_stafflines --set=training --annotations_dir=Annotations --output_path=data/all_classes_with_staff_lines_writer_independent_split/training.record --label_map_path=mapping_all_classes.txt
python create_muscima_tf_record.py --data_dir=data/training_validation_test_with_stafflines --set=validation --annotations_dir=Annotations --output_path=data/all_classes_with_staff_lines_writer_independent_split/validation.record --label_map_path=mapping_all_classes.txt
python create_muscima_tf_record.py --data_dir=data/training_validation_test_with_stafflines --set=test --annotations_dir=Annotations --output_path=data/all_classes_with_staff_lines_writer_independent_split/test.record --label_map_path=mapping_all_classes.txt

By providing a different mapping, you can reduce the classes, you want to be able to detect, e.g. mapping_71_classes.txt:

Running the training

Adding source to Python path

There are two ways of making sure, that the python script discoveres the correct binaries:

Permanently linking the source code as pip package

To permanently link the source-code of the project, for Python to be able to find it, you can link the two packages by running:

# From tensorflow/models/research/
pip install -e .
cd slim
# From inside tensorflow/models/research/slim
pip install -e .

Temporarily adding the source code before starting the training

Make sure you have all required folders appended to the Python path. This can temporarily be done inside a shell, before calling any training scrips by the following commands:

For Linux:

# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

For Windows (Powershell):

$pathToGitRoot = "[GIT_ROOT]"
$pathToSourceRoot = "$($pathToGitRoot)/object_detection"
$env:PYTHONPATH = "$($pathToGitRoot);$($pathToSourceRoot);$($pathToGitRoot)/slim"

Adjusting paths

For running the training, you need to change the paths, according to your system

  • in the configuration, you want to run, e.g. configurations/faster_rcnn_inception_resnet_v2_atrous_muscima_pretrained_reduced_classes.config
  • if you use them, in the PowerShell scripts in the training_scripts folder.

Run the actual training script, by using the pre-defined Powershell scripts in the training_scripts folder, or by directly calling

# Start the training
python [GIT_ROOT]/research/object_detection/train.py --logtostderr --pipeline_config_path="[GIT_ROOT]/MusicObjectDetector/configurations/[SELECTED_CONFIG].config" --train_dir="[GIT_ROOT]/MusicObjectDetector/data/checkpoints-[SELECTED_CONFIG]-train"

# Start the validation
python [GIT_ROOT]/research/object_detection/eval.py --logtostderr --pipeline_config_path="[GIT_ROOT]/MusicObjectDetector/configurations/[SELECTED_CONFIG].config" --checkpoint_dir="[GIT_ROOT]/MusicObjectDetector/data/checkpoints-[SELECTED_CONFIG]-train" --eval_dir="[GIT_ROOT]/MusicObjectDetector/data/checkpoints-[SELECTED_CONFIG]-validate"

A few remarks: The two scripts can and should be run at the same time, to get a live evaluation during the training. The values, may be visualized by calling tensorboard --logdir=[GIT_ROOT]/MusicObjectDetector/data.

Restricting GPU memory usage

Notice that usually Tensorflow allocates the entire memory of your graphics card for the training. In order to run both training and validation at the same time, you might have to restrict Tensorflow from doing so, by opening train.py and eval.py and uncomment the respective (prepared) lines in the main function. E.g.:

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

Training with pre-trained weights

It is recommended that you use pre-trained weights for known networks to speed up training and improve overall results. To do so, head over to the Tensorflow detection model zoo, download and unzip the respective trained model, e.g. faster_rcnn_inception_resnet_v2_atrous_coco for reproducing the best results, we obtained. The path to the unzipped files, must be specified inside of the configuration in the train_config-section, e.g.

train-config: {
  fine_tune_checkpoint: "C:/Users/Alex/Repositories/MusicObjectDetector-TF/MusicObjectDetector/data/faster_rcnn_inception_resnet_v2_atrous_coco_2017_11_08/model.ckpt"
  from_detection_checkpoint: true
}

Note that inside that folder, there is no actual file, called model.ckpt, but multiple files called model.ckpt.[something].

Dimension clustering

For optimizing the performance of the detector, we adopted the dimensions clustering algorithm, proposed in the YOLO 9000 paper. While preparing the dataset, the muscima_image_cutter.py script created a file called Annotations.csv and a folder called Annotations. Both will contain the same annotations, but in different formats. While the csv-file contains all annotations in a plain list, the Annotations folder contains one xml-file per image, complying with the format used for the Pascal VOC project.

To perform dimension clustering on the cropped images, run the following scripts:

python generate_muscima_statistics.py
python muscima_dimension_clustering.py

The first script will load all annotations and create four csv-files containing the dimensions for each annotation from all images, including their relative sizes, compared to the entire image. The second script loads those statistics and performs dimension clustering, use a k-means algorithm on the relative dimensions of annotations.

Inference

Standalone inference

We recommend to check out the demo folder first, which provides a self-contained script for performing object detection and does not depend on this library. It comes with a pre-trained model for convenience and a simple text output for interoperability with other applications.

Inference from within this library

If you have trained a model by yourself, this document describes how to prepare it. Basically, you just run export_inference_graph.py with appropriate arguments or freeze_model.ps1 after setting the paths accordingly. Alternatively, a pre-trained model can be download from here: 2018-05-15_faster-rcnn_inception-resnet-v2_2000-proposals_full-page-detection_muscima-pp.pb.

Once you have the frozen model, you can perform inference on a single image by running

# From [GIT_ROOT]/MusicObjectDetection
python inference_over_image.py \
    --inference_graph ${frozen_inference_graph.pb} \
    --label_map mapping.txt \
    --input_image ${IMAGE_TO_BE_CLASSIFIED} \
    --output_image image_with_detection.jpg

or for an entire directory of images by running

# From [GIT_ROOT]/MusicObjectDetection
python inference_over_directory.py \
    --inference_graph ${frozen_inference_graph.pb} \ 
    --label_map mapping.txt \
    --input_directory ${DIRECTORY_TO_IMAGES} \
    --output_directory ${OUTPUT_DIRECTORY}

License

Published under MIT License,

Copyright (c) 2018 Alexander Pacha, TU Wien and Kwon-Young Choi

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.