-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase maximum EOF offset for all JPG signatures #53
Comments
Hi Scott, this came up a few times in the past. Adding more range to the EOF offset is an option, but many folks keep the default 65536 maximum byte scan setting in DROID. This definitely needs to be addressed as we see this more and more. This change would affect multiple PUID's, some of them might never have this padding. The EXIF specs state that any JPG render should ignore anything after the last FFD9 marker. So we should find something similar for PRONOM. I discussed a bit and linked to samples here: https://preservation.tylerthorsted.com/2023/06/23/jpg-structure/ |
If you aggregate all the PRONOM signatures, the ceiling for EOF scanning is currently 131084: Making this offset larger than that would raise that ceiling and mean doing much larger end-of-file reads for all file types. You've got another option which is to define additional signatures where the FFD9 marker is anchored to the beginning of the file but at a wild card offset. e.g. In terms of changing the default setting in DROID, I definitely agree. The benchmarks I run show that the impact should be tolerable (DROID still works really well with a -1 setting): https://www.itforarchivists.com/siegfried/benchmarks |
Richard's suggesting of anchoring FFD9 at the end of the BOF signature might work best. Two concerns:
|
I echo Tyler's questions. I like the idea of anchoring FFD9 from the BOF, but what happens if there is more than one FFD9 in a file? Does that pose a problem? |
Any further thoughts on how to handle these JPG files with extra data after the final FFD9? As I've experimented with anchoring to the BOF, I find that it may identify files that wouldn't identify normally, but since we're using JHOVE as a validation of these files afterward, I'm not so worried about missing a file that has a problem. Is anchoring to the BOF a viable option for everyone? |
We are getting a lot of .JPG files from modern camera phones that add a lot of zeroes after the final FFD9, so many as to exceed the maximum EOF offset. The result is that DROID doesn't identify those files.
The existing JPG signature files have a maximum EOF offset of 16000 or 65536 or 131072. Were those offsets chosen for a specific reason? Is there any reason the offset couldn't be extended much higher to account for the extra padding in these modern files?
As a test, I created a signature that matches fmt/645 but I increased the maximum EOF offset to 999999999 (note: I found that going higher by adding even one more 9 resulted in DROID failing to load the profile). I ran a sample file through and it identified it correctly. There is apparently a ceiling past which the profile won't load, but even 999999999 should be sufficient, I think.
Any thoughts or experience with increasing the maximum EOF offset? I've attached a sample .jpg that identifies as fmt/645 if you either remove the padding or increase the maximum EOF offset.
fmt645 if you remove padding.JPG.zip
The text was updated successfully, but these errors were encountered: