Skip to content

Benchmark utility

Kilian Brachtendorf edited this page Jan 16, 2019 · 3 revisions

Figuring out which algorithm to use with which settings is a bit tricky. Starting with version 2.0.0 the AlgorithmBenchmarker allows you to directly look at your test images and composes statistics on how the individual algorithm are doing.

Preconditions

The benchmark utility requires JavaFX which is not a automatically imported dependency of JImageHash to keep the dependency tree as clean as possible. If you did not clone the project but imported JImageHash and use a newer JDKs which stopped shipping JavaFX by defaul, make sure to explicitly declare javafx dependencies in your pom. https://mvnrepository.com/search?q=javafx

Sample

//Easy setup. Configure a SingleImageMatcher with the algorithms you want to test.

SingleImageMatcher matcher = new SingleImageMatcher();
	
//Naively add a bunch of algorithms using normalized hamming distance with a 40% threshold
matcher.addHashingAlgorithm(new AverageHash(16), 0.4f);
matcher.addHashingAlgorithm(new AverageHash(64), 0.4f);
matcher.addHashingAlgorithm(new PerceptiveHash(16), 0.4f);
....

//Create the benchmarker 
boolean includeSpeedMicrobenchmark = true;
AlgorithmBenchmarker bm = new AlgorithmBenchmarker(matcher,includeSpeedMicrobenchmark);

//Add test images  Category label, image file
bm.addTestImages(new TestData(0, new File("src/test/resources/ballon.jpg")));
bm.addTestImages(new TestData(1, new File("src/test/resources/copyright.jpg")));
bm.addTestImages(new TestData(1, new File("src/test/resources/highQuality.jpg")));
bm.addTestImages(new TestData(1, new File("src/test/resources/lowQuality.jpg")));
bm.addTestImages(new TestData(1, new File("src/test/resources/thumbnail.jpg")));

//Enjoy your benchmark
//bm.display(); bm.toConsole();
bm.toFile(); ;

You will get something like this:

benchmark

Note: The report is generated as a HTML document and may either be directly displayed in a JavaFX webview via the display(); call or saved as .html file ('.toFile()') to be opened in a browser of your choice. Due to the webview not supporting javascript entirely the chart component will only work if viewed in a browser.

Lets run down the table and see if we can interpret the results: Each image is tested against each other using every of the supplied hashing algorithm. If two images are carrying the same class label they are expected to be matched. Yellow numbers indicate a deviation from the expected behaviour. The table displays the distances (normalized or not depending on the what was specified in the addHashingAlgorithm() method call). Same categories expect numbers below the threshold while distinct categories expect a number above the threshold.

  • The avg match category displays the average distance of all the images in the same category (expected matches).
  • The avg distinct column shows the average distance of all images which are not in the same category.(expected distinct images)
The actual value of these two cells isn't really important. It doesn't matter if the avg match is .2 or .5 as long as the delta between match and distinct is big enough to allow to differentiate between categories.

The perceptive hash is showing a great differentiation between matches and distinct images. We can see that the difference hash (32 tipple precision) is struggling. When using this algorithm you may want alter the threshold to split the two groups of images.

While the average values are a great indication what is really important are the min and max values. If they do not overlap a perfect categorization for your test images is possible if you simply pick any value inbetween these two bounds. At the bottom of the table you can find a confusion matrix allowing you to calculate recall or any other metric as desired.

Precision indicates that if images are considered a match how likely are they matched. It's noted that due to chaining algorithms a weak precision value can be increased.

Lets apply the benchmark to a different set of images:

	//Running the test with your expected type of images is important!!
	db.addTestImages(new TestData(0, new File("src/test/resources/ballon.jpg")));

	//Rotated images
	db.addTestImages(new TestData(2, new File("src/test/resources/Lenna.png")));
	db.addTestImages(new TestData(2, new File("src/test/resources/Lenna90.png")));
	db.addTestImages(new TestData(2, new File("src/test/resources/Lenna180.png")));
	db.addTestImages(new TestData(2, new File("src/test/resources/Lenna270.png")));

rot

Suddenly a lot of the other algorithms fail, since they are not robust against rotational attacks. If the threshold of the RotPHash algorithm gets adjusted to .1 it would be able to identify all images perfectly.

Clone this wiki locally