-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perceptual Image Hashing v1 #2
Comments
If we can find a video- or image-hashing algorithm which can hash the media into a string, this is very easy to implement. The interface would just need an object to represent what is being uploaded and the file path of the media as the input to said algorithm, then we can use that string output as the It doesn't matter if the algorithm is slow or old, it just needs to work so we can get the system off the ground. |
Starting with image hashing - video hashing is much more complex, we'll build up to that. Looking into perceptual hashes which can hash an image into the same value regardless of minor modifications (e.g. similar resolutions, similar colors, text, etc.). I found an implementation of a well-known open-source algorithm pHash, so I'll try implementing that this week. I'll also write up a guide on digital signal/image processing which would be useful for programming and image hashing. We'll need a "hashing guide" which covers general hashing algorithms, image hashing, and video hashing soon, too. |
Here is a basic guide with some pseudocode to implement a perceptual hash. Not up to pHash's robustness though. Here is a really robust perceptual hash developed by the person who created pHash (above) - we can worry about implementing this later since it'll take more time, but this should be strongly considered later. |
I implemented a rough version of perceptual image hashing. For now, it just takes an image as input, compresses it to 8x8, converts it to grayscale, extracts a bit from each pixel, and outputs the bits as a 16 digit hex string. This is far from a complete version of image hashing, but it is a good enough first step for v0.1. |
I'm removing the v0.1 label and adding the v0.3 label. This is enough to work through v0.2. Edit: I'm going to undo that relabel so we can see this as progress in the v0.1 category, and I'll open a new issue for a future version of image hashing. |
The system currently only supports text data as a test. At a minimum, we need to be able to upload and hash images and videos.
I will edit this issue with more information and a specific course of action.
The text was updated successfully, but these errors were encountered: