-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exiftool based image exif cleaner #263
Conversation
I think it might be best to only handle local images if possible and not make it the job of the instance owner to scrub all remote media. I don't think anything legally harmful can come from this metadata, it's more of a personal data / security thing? I would expect most incoming to already be scrubbed, it looks like |
cleaning only local images does make sense, but then I'll either go back into the 2nd point where reusing existing image is less effective, or having to duplicate the also choosing to clean remote images may not be entirely pointless: it reduces amount of image copies that contains sensitive information -- so ideally there would be only one copy with sensitive information unintentionally added on origin instance/server, instead of who knows how many instances that the same image was federated out and mirrored. if anything maybe make it an opt in options to enable it |
.env.example
Outdated
@@ -72,6 +72,12 @@ OAUTH_KEYCLOAK_VERSION= | |||
# If true, sign ins and sign ups will only be possible through the OAuth providers configured above | |||
SSO_ONLY_MODE= | |||
|
|||
# image exif cleaning options | |||
# available value: none, sanitize, scrub | |||
EXIF_CLEAN_MODE=sanitize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep in mind, to also set some defaults. To avoid regression during updating for server owners who do not yet have this option in the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see what I can do
23e65d0
to
7294902
Compare
7294902
to
fbe4ed1
Compare
updates:
|
5baf07b
to
d795f48
Compare
d795f48
to
029ec17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm to me, ran with both sanitize and scrub on local uploads and it correctly removed gps data in the first test and all the shutter/focal data in the second test
one thing to note is that uploading an image that already was uploaded pre-cleaned seemed to just reuse the same image and not clean it, which I think is expected, I imagine it detects it before running the logic which makes sense, just a note that reusing past uploaded images won't change the metadata after this
add an image metadata sanitizer/cleaner based on exiftool, comes with following modes configurable via env variables: - none: disable image metadata cleaning. default for external/federated images - sanitize: strips just GPS data and what might looks like serial numbers. default for uploaded image - scrub: strip most of metadata save for those needed for proper image rendition and some attribution tags tried to set this up as an optional features, as in it will work when `exiftool` binary exist and usable and not impede existing image processing/store flow if wasn't present
029ec17
to
65e1d82
Compare
something I mentioned to put up weeks ago
add an image metadata sanitizer/cleaner based on exiftool, comes with following modes, configurable via env variables:
none
: disable image metadata cleaning. default for external images.sanitize
: strips just GPS data and what might looks like serial numbers. default for uploaded images.scrub
: strip most of metadata save for those needed for proper image rendition and some attribution tags.I tried to rig this up as an optional feature, as in it would only work when a valid
exiftool
binary is found and is executable, and it's effectively a no op when that's not the case without impeding the entire image processing flow.concerns about where to plug the processor
now the hard part that I could use some suggestion: right now I plugged the cleaner inside
ImageRepository::findOrCreateFromPath
, right just before storing the image, and I'm not sure if there's more appropiate place to put this:findOrCreateFrom{Path,Upload}
could results in image lookup by hash being less effective because the input image will have to be cleaned before checking for hash although the same, cleaned image might already exists, which means what's supposedly fast path is now slower and the cleaning job is practically wastedended up refactored/rearranged the
findOrCreateFrom{Path,Upload}
functions, reviewed separately at #287