-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Allow usage of perceptual hashes for images #65
Comments
Interesting. I think this would break my difference algorithm, but I'm intrigued. Can you elaborate on the use case a bit more? |
First Usecase (Saved Wallpapers or Memes) The User has a lot of images saved on his computer. Images are often reuploaded on the internet (think memes for example) and by that, are most likely recompressed and have additional quality loss. If I come across an image I like, I might not remember having it downloaded already (also think about wallpapers for a diashow here) and download it again. Now I have the same image twice, as in duplicate, however, the images are unlikely to have the same sha512 or similar hash, even when downloaded from the same page, but at different URLs, as the different URL is likely to be a reupload and has a different quality or even image format (jpg vs png vs webp, etc). A perceptual hash can Identify duplicates even with a quality loss, as in compression artifacts, or reduced resolution. Second Usecase (Memes again) Memes are known to reuse the same template over and over again, each time producing new (hopefully) original content by adding different texts or using faceswap (bron memes). Third Usecase (Searching higher quality version of an image) A user might download and image and then remembers to have the same image, but with better quality somewhere on his PC. By searching for perceptual duplicates of the image, the user will find the higher quality version already stored on his PC. |
So if I have this right Use case 1I have one image so I perceptual hash (phash) that and look for similar images Use case 2I have one or more directory of images and I want to see the buckets of images as defined by the phash (maybe ordered by quality or file size or something) Use case three seems like a repeat of use case one to me in the But two questions
This strikes me like its own project to be honest, but if you're willing to provide feedback and testing then I'm at least willing to give it a crack 😄 |
I would totally beta test and use this. No idea how to design the user interface though. Hashes for videos are a bit more difficult. There was a csharp solution for gifs. Maybe they just took a frame every 0.25 seconds and compared them. You would have to rasterize svg to make use of phash. Same for gimp. |
Cool. I'll try to get some time this weekend to start experimenting with it. Probably just jpegs to start with but we'll see. I'll ping you on the new repo 👍 |
@grayfallstown give darakian/rustExperiments#2 a gander. It's a quick and dirty first pass but let me know. |
This would allow users to dedup pictures even if they have different file formats, resolutions or quality. Usually one wants to keep the largest file.
im_hash delivers the rust functionality.
The text was updated successfully, but these errors were encountered: