Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for transparent images #38

Closed
fbarrella opened this issue May 20, 2019 · 10 comments
Closed

Support for transparent images #38

fbarrella opened this issue May 20, 2019 · 10 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@fbarrella
Copy link

I'm having a unusual problem in my project where I'm getting the same hash to two different images. Those are the mentioned images:
citroen
mini
The project is simply using the hash(BufferedImage image) method from the API and I'm not sure if there are different approaches to do that, but is there any possible solution to this problem? Thanks in advance!

@KilianB
Copy link
Owner

KilianB commented May 20, 2019

Hash collisions happen by design due to the fact that you map arbitrarily many images to a fixed length hash.

For example using java's default hashcode implementation converts the string Hash to a numeric value of 2241838. Just looking at 4 character strings, any of the following word have the exact same hashcode

  • Hc5h
  • ID5h
  • IBsh
  • HbTh
  • ICTh
  • Hc6I
  • ID6I
  • HatI
  • IBtI
  • HbUI
  • ICUI

This is also true for images. A solution would be to use a secondary hash function looking at different features of the image (e.g. average hash and perceptive hash) to confirm your classification. If they agree the image is most likely a duplicate.

The chain algorithm example show how you might approach this:

https://github.com/KilianB/JImageHash/blob/master/src/main/java/com/github/kilianB/examples/ChainAlgorithms.java

@fbarrella
Copy link
Author

fbarrella commented May 23, 2019

Thanks!!! But sadly I might say I'm having a bad time at trying to find a solution to this. Oddly enough, I've tried to use multiple of the available hashing methods, but always kept ending up with the same digits for both of 'em images. Even changing the "bit resolution" didn't worked; What intrigues me the most is the fact that, as you can see from the images uploaded in my prior comment, they are indeed different images. I would love to find a way to show that for the hash methods as well, hahaha. There are any other possibilities in the game? Anyway, thank you very much for the help!

@fbarrella
Copy link
Author

fbarrella commented May 23, 2019

I'm also going to leave here the actual piece of code I'm using to test the similarity between the images! Maybe I'm doing something I can't see right now! Would appreciate some insights!

@PostMapping(value = "/v1.0/hashTest")
public ResponseEntity getImageHashTest(@RequestParam(name="file") MultipartFile file,
                                       @RequestParam(name="file2") MultipartFile file2){
    SingleImageMatcher matcher = new SingleImageMatcher();
    Map<String, String> hashMap = new HashMap<>();

    try {
        BufferedImage image = ImageIO.read(file.getInputStream());
        BufferedImage image2 = ImageIO.read(file2.getInputStream());

        matcher.addHashingAlgorithm(new AverageHash(8), 0.4);
        matcher.addHashingAlgorithm(new AverageHash(32), 0.4);
        matcher.addHashingAlgorithm(new AverageHash(64), 0.4);

        matcher.addHashingAlgorithm(new PerceptiveHash(32), 0.4);
        matcher.addHashingAlgorithm(new PerceptiveHash(64), 0.4);

        matcher.addHashingAlgorithm(new MedianHash(32), 0.4);
        matcher.addHashingAlgorithm(new MedianHash(64), 0.4);

        matcher.addHashingAlgorithm(new DifferenceHash(64, DifferenceHash.Precision.Simple), 0.4);
        matcher.addHashingAlgorithm(new DifferenceHash(32, DifferenceHash.Precision.Triple), 0.4);

        if(matcher.checkSimilarity(image, image2))
            hashMap.put("similarity", "yes");
        else
            hashMap.put("similarity", "no");

        return ResponseEntity.ok(hashMap);
    } catch (IOException e) {
        e.printStackTrace();
    }
        
    return ResponseEntity.noContent().build();
}

@KilianB
Copy link
Owner

KilianB commented May 23, 2019

I did some testing and indeed those images will result in the same hash no matter what you try. Upon further investigation the issue arises due to the alpha channel. The black parts of the image are solid black, the white parts simply have an opacity of 0.
As far as the program is aware, computing the luminosity values only takes the rgb values into account which are the same for each pixel.
Are there any guidelines how transparency should be regarded when calculating Y in the YCbCr color model? I assume that for this trivial case an alpha of 0 can
be assumed as white, but this isn't entirely correct for every single use case.
For now an ugly work around would be to replace the opaque pixels with a white color until I can figure out how to correctly compute luminosity. (Is there a formula how to handle alpha? Always assume white?)

@KilianB KilianB added bug Something isn't working help wanted Extra attention is needed labels May 23, 2019
@KilianB
Copy link
Owner

KilianB commented May 23, 2019

Yes, choosing a different hash method will not make a difference since the issue resides at the hash precalculation step.
I see where you are coming from and this indeed is an issue. Semantically there isn't a valid solution I am afraid. We never know what color a missing pixel (invisible) will be.
More often than not those pixels are displayed as white, as seen in the above images. I will add an option to let people choose how to handle transparency. This will fix your current issue.

@fbarrella
Copy link
Author

That's awesome! I was actually going to come back with this exactly answer! Going through a little search, I've noticed how much the transparency affected the hash calculation by the API and ended up with the idea of simply modifying the original BufferedImage with an white background and then generating an hash over it. The only problem I see is when the actual image is an white png icon. Maybe we could iterate over this solution to get to a good place.

@KilianB
Copy link
Owner

KilianB commented May 24, 2019

Everything is pretty much implemented I just need to do some unit tests in order to ensure I didn't mess up anything else.
The heavy lifting is done at the utility code repository.

From now on you can define:

   HashingAlgorithm aHasher = new AverageHash(64);
   //Define how to handle opaque pixel
   double alphaThreshold = 0;
   aHasher.setOpaqueHandling(Color.white,alphaThreshold);
		
   //Proceed as normal`

Will this suit you or do you have any other ideas? By default I will retain the old behavior to not break backwards compatibility.
For strictly black and white images with transparent background we simply use an arbitrary color and handle both use cases.

@fbarrella
Copy link
Author

Ok, if I got it right, the hasher will by default treat the image with a white background while also letting me choose another color/threshold if demanded, right? If so, it is amazing! It solves the problem as we can get even more preciser hashes! About the black and white w/ no background: what if you calculate the bg color over the luminance of the predominant image color? Maybe so we can avoid as much as possible making the user set the color manually!

@KilianB
Copy link
Owner

KilianB commented May 26, 2019

While refactoring the utility code I changed a few design decisions which takes a while longer than expected. I really wanted to get the new version released this night but sadly it will take a tiny bit.

@fbarrella
Copy link
Author

Cool! Man, I would like to report a new ununsual case after the solution of adding a white bg to transparent backgrounds... For some reason my code resulted to generate equal codes (once again) when trying to hash those two following images using new PerceptiveHash(32) (respectively, one being the transparent png and the other being just a regular jpeg):

GLE 350D HIGHWAY - LATERAL 2
audi_vermelho

Would you please try to hash 'em so we can test if the anomaly isn't only at my side?

@KilianB KilianB mentioned this issue Jan 20, 2020
KilianB added a commit to KilianB/UtilityCode that referenced this issue Jun 15, 2021
@KilianB KilianB self-assigned this Jun 17, 2021
@KilianB KilianB added enhancement New feature or request and removed help wanted Extra attention is needed labels Jun 18, 2021
@KilianB KilianB changed the title Different images resulting into same hash Support for transparent images Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants