Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMAGING-351] EXIF data corruption during copy #275

Merged
merged 2 commits into from May 8, 2023

Conversation

charleshope
Copy link

@charleshope charleshope commented Feb 5, 2023

I believe I've uncovered a bug. Using the pattern shown in the WriteExifMetadataExample example code, it seems that some fields are not accurately copied.

This unit test copies a test file to a temp file, and then compares the EXIF data of the two files. Each test inspects a different metadata class. Both tests fail, as there are minor differences found.

[IMAGING-351] EXIF data corruption during copy

@charleshope charleshope changed the title Unit test that checks for EXIF data corruption Bug: failing unit test that checks for EXIF data corruption Feb 5, 2023
@charleshope
Copy link
Author

I'm not sure if this is a proper bug I've uncovered, or a misuse of the API. I'd appreciate a comment from someone much more familiar with the code, before I spend any time attempting to fix it.

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Thanks for providing a unit test. I was not expecting the output to match, since I am not sure if we write everything we are able to read, nor if we were writing everything in the same order.

But some of the results look concerning, e.g.

org.opentest4j.AssertionFailedError: ImageMetadataItem toString mismatch. ==> 
Expected :Orientation: 1
Actual   :Orientation: 256

Expected :ExifOffset: 212
Actual   :ExifOffset: 7216

Expected :ExifOffset: 196
Actual   :ExifOffset: 3070

...

It may take a while to investigate all these differences 👍 It would help to have a JIRA issue @charleshope, as that will be required for the fix & changelog entry. Could you create it, please?

-Bruno

@charleshope
Copy link
Author

@kinow I'd love to create a Jira ticket, but the instructions for getting an account weren't clear to me. How do I sign up for one now? Thank you!

@kinow
Copy link
Member

kinow commented Feb 24, 2023

@kinow I'd love to create a Jira ticket, but the instructions for getting an account weren't clear to me. How do I sign up for one now? Thank you!

Hi @charleshope . Sorry the process is not very clear. It was the only solution found for the numerous spam bots creating accounts in the ASF JIRA. For your new account, you need to follow these steps: https://infra.apache.org/jira-guidelines.html#who

See the part in bold, as well as the bullet list in the section below, with the information that will be requested from you (email address, preferred account names, display name). Someone will read your email and ask JIRA to create an account. JIRA will send you an email and you will be able to then access it.

Bruno

@charleshope
Copy link
Author

Right, so I followed the instructions and tried to send it to private@imaging.apache.org, but the email address doesn't exist.

@aherbert
Copy link
Contributor

Try private@commons.apache.org. imaging is a sub-project of commons

@charleshope
Copy link
Author

Try private@commons.apache.org. imaging is a sub-project of commons

Excellent, thank you.

@charleshope charleshope changed the title Bug: failing unit test that checks for EXIF data corruption [IMAGING-351] EXIF data corruption during copy Feb 27, 2023
Copies a test file to a temp file, then compares the EXIF data.
Fails if not equal.
@StefanOltmann
Copy link
Contributor

StefanOltmann commented May 8, 2023

@gwlucastrig has identified the reason for data corruption and proposed a fix in https://issues.apache.org/jira/browse/IMAGING-319. I conducted a test on some of the Unsplash Images using the suggested solution and compared the results using exiftool -g1 -a. My result is that it indeed fixes problems in 20 out of 56 test files.

@garydgregory In light of these findings, I strongly recommend an immediate release of a new version with the proposed fix, as the current version 1.0-alpha3 is causing data corruption during write operations.

@garydgregory
Copy link
Member

@gwlucastrig has identified the reason for data corruption and proposed a fix in https://issues.apache.org/jira/browse/IMAGING-319. I conducted a test on some of the Unsplash Images using the suggested solution and compared the results using exiftool -g1 -a. My result is that it indeed fixes problems in 20 out of 56 test files.

@garydgregory In light of these findings, I strongly recommend an immediate release of a new version with the proposed fix, as the current version 1.0-alpha3 is causing data corruption during write operations.

-1 as the PR breaks the build.

1 similar comment
@garydgregory
Copy link
Member

@gwlucastrig has identified the reason for data corruption and proposed a fix in https://issues.apache.org/jira/browse/IMAGING-319. I conducted a test on some of the Unsplash Images using the suggested solution and compared the results using exiftool -g1 -a. My result is that it indeed fixes problems in 20 out of 56 test files.

@garydgregory In light of these findings, I strongly recommend an immediate release of a new version with the proposed fix, as the current version 1.0-alpha3 is causing data corruption during write operations.

-1 as the PR breaks the build.

@StefanOltmann
Copy link
Contributor

-1 as the PR breaks the build.

I want to clarify that the current pull request under discussion is not related to fixing the issue of corrupted EXIF data during copy. Rather, it only adds a unit test that demonstrates the problem.

To fix the issue, we need to refer to the code provided by @gwlucastrig in the last comment on https://issues.apache.org/jira/browse/IMAGING-319. I have personally tested this code and can confirm that it effectively resolves the problem addressed by the unit test in this pull request.

@garydgregory
Copy link
Member

@StefanOltmann
Thanks for the clarification, I'll try it tonight.

@garydgregory garydgregory merged commit 95f95ed into apache:master May 8, 2023
9 checks passed
@StefanOltmann
Copy link
Contributor

StefanOltmann commented May 9, 2023

@garydgregory, I believe that the ExifRewriterRoundtripTest still requires some adjustments. Specifically, we need to ignore the offsets because they may have changed after reordering.

The suggested addition of length -= 1; by @gwlucastrig is correct.

Please find below another test photo: testphoto.jpg

Load and save the photo without making any changes, and then review the metadata using a viewer of your choice. You will notice that the copyright "CC-BY 3.0" has become ".C-BY 3.0" due to the bug. Additionally, other files may have been impacted, including camera and lens names, as well as other string types.

It is easy to confirm the correctness of the fix without a unit test.

@garydgregory
Copy link
Member

@StefanOltmann
Thank you for the update.

We need a unit test to validate the fix, otherwise we are just an innocuous commit away from a regression.

Feel free to open a PR ;-)

@StefanOltmann
Copy link
Contributor

Hey, @garydgregory!

I also needed the metadata functionality on iOS, so I rewrote the parts that handle metadata in Kotlin Multiplatform. I have just published it on https://github.com/realashampoo/kim

I have included tests there to ensure that the fix we spoke about is correct.

@gwlucastrig, thank you so much for figuring it out. You saved me a great deal of time.

@charleshope, thank you for suggesting the idea of having a test that confirms that no data is lost when saving an unmodified OutputSet. I have also included such a test in my library.

Commons Imaging has been incredibly helpful thus far, and I would like to express my gratitude to the entire team.

From now on, I will be using my own fork that includes the fixes that were made via pull request to this repo but have not been merged.

@StefanOltmann
Copy link
Contributor

Here is my version of this test: https://github.com/RealAshampoo/kim/blob/44dbd2c5f0a182e4d401fd17d14ac0310da04440/src/jvmTest/kotlin/com/ashampoo/kim/format/jpeg/JpegRewriterTest.kt#LL156C1-L156C1

Note how I need to skip some offset fields that change due to the reordering:

if (expectedField.tagInfo == ExifTag.EXIF_TAG_EXIF_OFFSET ||
                        expectedField.tagInfo == ExifTag.EXIF_TAG_GPSINFO ||
                        expectedField.tagInfo == ExifTag.EXIF_TAG_INTEROP_OFFSET ||
                        expectedField.tagInfo == TiffTag.TIFF_TAG_JPEG_INTERCHANGE_FORMAT
                    )
                        continue

@kinow
Copy link
Member

kinow commented Jun 10, 2023

Passing by, just cleaning GitHub pending notifications. I don't have bandwidth to work on ASF projects at the moment due to $dayjob. I checked the IMAGING-351 issue and it doesn't have this latest information from @StefanOltmann (for example) in the comments (it might be hidden in activity, but I rarely check that).

Might be worth commenting there as this PR has been merged (with any other further context that might help whoever works on that JIRA issue 👍 )

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants