New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IMAGING-319] Fix for lost first character on String #359
Conversation
@StefanOltmann |
@garydgregory https://github.com/Ashampoo/kim/blob/5d1b28e4423e6bc240196d3ad7db25495da095c9/NOTICE.txt#L13-L15 |
@StefanOltmann |
Alright, this might take a bit of time. I've added it to the regular test files, but none of the existing tests are failing. I'm not entirely sure what they cover, but it seems like they are overlooking some aspects. In my library, the JpegRewriterTest is able to identify this issue. Unfortunately, the author of This pull request is also related to https://issues.apache.org/jira/browse/IMAGING-351, as it addresses the same underlying problem. Also it's worth to note that it does not always affect String fields. It can corrupt any field / offset. In the provided sample the MakerNotes get corrupted, due to an offset shift. @kinow FYI |
@StefanOltmann |
I did not commit it. I added the file above locally to the test folders hoping that one of the rewrite tests will pick it up and fail. It gets actually processed, but the existing tests do not fail, because they don’t detect the issue. There is apparently a lack of coverage here. The test that might pick it up is still disabled, because the author never finished it. See this comment: #275 (review) The offsets are actually expected to change and must be ignored. The wrong orientation value is most likely a problem that came from changing the byte order. I guess I need to fix that test first to have a proper unit test, before I can add an image that makes it fail. |
@StefanOltmann |
@garydgregory I have fixed the test first. As soon you pulled the change I will refresh this PR. |
See my comments in your other PR. To summarize, please provide your main change and test in the same PR. |
As I commented on the other PR you will not see a proof that it fixes something then. You demanded a proof, because you won't believe my words. So I fixed the roundtrip test that it actually tests something and will show the problem. If I commit the test and the fix together you won't see it. This is why I instructed you to drop the test file locally to see it fail. |
@garydgregory As you requested #367 is a combined PR of bugfix and repaired unit test plus a file that shows the problem. |
A PR should include main and test changes, not one PR for test changes, and a different PR for main changes. |
Yes, but the bugfix and the test are unrelated. One PR fixes a bug without a test. The other PR completes a test that’s right now disabled, because it wasn’t working. |
Oh, I just see that @gwlucastrig once provided a PR for his fix in #203. For some reason he closed the PR a few months later, but the fix is still valid. Looks like a unit test was missing. I provided that with #366, but unfortunately we don't have a test image that can be used to proof the test. You will need to verify that locally without checking the image in. You could also pull his PR, because he is the one who found and fixed the bug. |
Oh, I will have to re-read the issues and pull requests as I'm a bit out of the loop, but thanks for pinging me here, @StefanOltmann ! I know @gwlucastrig from previous issues and emails. Let's give him some days, and I can ping him by email too if needed. I had a look and the PR was indeed closed... but I cannot recall if there was any reason for that 😥
Sounds good too, although I am sure he wouldn't mind, will keep it in mind to see if I can pull the branches locally and maybe merge commits or combine to give credit where it's due. I will add something in the Thanks @StefanOltmann ! |
(in the meantime, approved the GH Actions to run -- gotta go 🏃 ) |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #359 +/- ##
============================================
+ Coverage 71.48% 71.58% +0.09%
- Complexity 3580 3584 +4
============================================
Files 352 352
Lines 16919 16907 -12
Branches 2648 2648
============================================
+ Hits 12095 12103 +8
+ Misses 3785 3764 -21
- Partials 1039 1040 +1 ☔ View full report in Codecov by Sentry. |
Still no unit test. I'm not sure how we can avoid a future regression without one. 😞 |
Wrong. There is a unit test in #366. |
Reviewed #366, I think that one can be merged. That pull request fixes a Once that's merged, and the test is re-enabled, there is a test image in #366 that should fail. I think here we have to:
Cheers |
@kinow Great news, thanks! Maybe we can use the image attached to https://issues.apache.org/jira/browse/IMAGING-319. I just asked the reporter. |
Excellent idea! That'd save us a lot of time, thanks for the initiative, @StefanOltmann 👍 |
Merged your other PR, @StefanOltmann (thanks very much for that), I'll push a new commit here but that'll be just to rebase it for now 👍 |
@kinow Yes, you can pull this one or the original #203, which are both the same change @gwlucastrig proposed in JIRA. Credit for this should solely go to him. :) Unfortunately the author of https://issues.apache.org/jira/browse/IMAGING-319 did not respond. So we could ask legal if attaching an image to the JIRA ticket can be seen as a permission to use that one. Just to make progress and have this nasty bug fixed for 1.0-alpha5 |
Thanks @StefanOltmann . I searched my hard drive for JPEG images to check if I could reproduce the issue with any, but only the IMAGING-319 had this issue (maybe it happened in another field, but I only tested for the Offset Time like the issue reported). package org.apache.commons.imaging;
import org.apache.commons.imaging.common.ImageMetadata;
import org.apache.commons.imaging.formats.jpeg.JpegImageMetadata;
import org.apache.commons.imaging.formats.jpeg.exif.ExifRewriter;
import org.apache.commons.imaging.formats.tiff.TiffImageMetadata;
import org.apache.commons.imaging.formats.tiff.write.TiffOutputSet;
import org.apache.commons.io.IOUtils;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.util.Objects;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Search for images that cause the bug in IMAGING-319.
*/
public class LibraryTest {
public static void main(String[] args) throws ImagingException, IOException, InterruptedException {
File path = new File("/home/kinow/Desktop/haystack/");
String needle = "Offset Time\\s+:\\s+";
Pattern pattern = Pattern.compile(String.format("^%s$", needle), Pattern.MULTILINE);
for (File source : Objects.requireNonNull(path.listFiles(s -> s.getName().toLowerCase().endsWith(".jpg")))) {
File result = Files.createTempFile("test_", ".jpg").toFile();
try {
final ImageMetadata metadata = Imaging.getMetadata(source);
final JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
if (jpegMetadata != null) {
final TiffImageMetadata exif = jpegMetadata.getExif();
if (exif != null) {
TiffOutputSet outputSet = exif.getOutputSet();
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(Files.newOutputStream(result.toPath()));
new ExifRewriter().updateExifMetadataLossless(source, bufferedOutputStream, outputSet);
String[] cmd = {
"/bin/sh",
"-c",
String.format("exiftool %s | grep \"Offset Time\"", result.getAbsolutePath())
};
Process process = Runtime.getRuntime().exec(cmd);
process.waitFor();
String output = IOUtils.toString(process.getInputStream(), Charset.defaultCharset());
// System.out.println(output);
Matcher m = pattern.matcher(output);
if (m.find()) {
System.out.printf("Bug detected in %s%n", source.getAbsolutePath());
}
}
}
} catch (RuntimeException|ImagingException ignore) {
ignore.printStackTrace();
} // invalid metadata or bad format
}
}
} I will debug the code again to either craft a test image by hand, or write a test code + mock. Let's see if that works... |
@kinow I spent some time looking around different sources, too. Luckily I just found another test image. It's https://github.com/drewnoakes/metadata-extractor-images/blob/main/jpg/Sony%20DigitalMavica.jpg There is no LICENSE file in the repository, but the README says For my metadata library I used some other images of that repository, too. I added them to my NOTICE for completeness: https://github.com/Ashampoo/kim/blob/9ab5fb627d3d450286c94d1d7bb0f3335629c6c2/NOTICE.txt#L17C1-L19C73 But I believe this is not strictly necessary in this case. What do you think? Sounds like "public domain with no attribution required" to me. |
Thanks for searching @StefanOltmann ! I modified the script to search for any field in the Code: package org.apache.commons.imaging;
import org.apache.commons.imaging.common.ImageMetadata;
import org.apache.commons.imaging.formats.jpeg.JpegImageMetadata;
import org.apache.commons.imaging.formats.jpeg.exif.ExifRewriter;
import org.apache.commons.imaging.formats.tiff.TiffImageMetadata;
import org.apache.commons.imaging.formats.tiff.write.TiffOutputSet;
import org.apache.commons.io.IOUtils;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.util.Objects;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Search for images that cause the bug in IMAGING-319.
*/
public class LibraryTest {
//
// public static void main2(String[] args) throws ImagingException, IOException {
// File source = new File("/home/kinow/Desktop/haystack/20220514_102409.jpg");
// File result = new File("/home/kinow/Desktop/haystack/editted-20220514_102409.jpg");
// final ImageMetadata metadata = Imaging.getMetadata(source);
// final JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
// final TiffImageMetadata exif = jpegMetadata.getExif();
// TiffOutputSet outputSet = exif.getOutputSet();
// BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(Files.newOutputStream(result.toPath()));
// new ExifRewriter().updateExifMetadataLossless(source, bufferedOutputStream, outputSet);
// }
public static void main(String[] args) throws ImagingException, IOException, InterruptedException {
File path = new File("/home/kinow/Desktop/haystack/");
String needle = ".*([\\s+])*\\s+:\\s+";
Pattern pattern = Pattern.compile(String.format("^%s$", needle), Pattern.MULTILINE);
for (File source : Objects.requireNonNull(path.listFiles(s -> s.getName().toLowerCase().endsWith(".jpg")))) {
File result = Files.createTempFile("test_", ".jpg").toFile();
try {
final ImageMetadata metadata = Imaging.getMetadata(source);
final JpegImageMetadata jpegMetadata = (JpegImageMetadata) metadata;
if (jpegMetadata != null) {
final TiffImageMetadata exif = jpegMetadata.getExif();
if (exif != null) {
TiffOutputSet outputSet = exif.getOutputSet();
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(Files.newOutputStream(result.toPath()));
new ExifRewriter().updateExifMetadataLossless(source, bufferedOutputStream, outputSet);
String[] cmd = {
"/bin/sh",
"-c",
String.format("/home/kinow/Desktop/haystack/test.sh %s %s", source.getAbsolutePath(), result.getAbsolutePath())
// String.format("exiftool %s | grep \"Offset Time\"", result.getAbsolutePath())
};
Process process = Runtime.getRuntime().exec(cmd);
process.waitFor();
String output = IOUtils.toString(process.getInputStream(), Charset.defaultCharset());
// System.out.println(output);
Matcher m = pattern.matcher(output);
if (m.find()) {
System.out.printf("Bug detected in %s%n", source.getAbsolutePath());
}
}
}
} catch (RuntimeException|ImagingException ignore) {
ignore.printStackTrace();
} // invalid metadata or bad format
}
}
} Test image (I drew something some years ago, covered parts of it that were irrelevant). To confirm this is causing the issue, I tried the following:
kinow@ranma:~/Desktop/haystack$ diff <(exiftool 20220514_102409.jpg) <(exiftool editted-20220514_102409.jpg)
2c2
< File Name : 20220514_102409.jpg
---
> File Name : editted-20220514_102409.jpg
5,8c5,8
< File Modification Date/Time : 2024:04:14 00:29:34+02:00
< File Access Date/Time : 2024:04:14 00:29:37+02:00
< File Inode Change Date/Time : 2024:04:14 00:29:34+02:00
< File Permissions : -rwxrwxrwx
---
> File Modification Date/Time : 2024:04:14 00:36:13+02:00
> File Access Date/Time : 2024:04:14 00:36:13+02:00
> File Inode Change Date/Time : 2024:04:14 00:36:13+02:00
> File Permissions : -rw-rw-r--
15c15
< Camera Model Name : SM-S901E
---
> Camera Model Name :
29c29
< Offset Time : +12:00
---
> Offset Time : +12:00.
40c40
< Thumbnail Offset : 586
---
> Thumbnail Offset : 38 (Camera Model Name ended up being replaced by empty string, like the JIRA example)
if (offset != length && (offset & 1L) != 0 && length > outputItemLength) {
int i = 1; // breakpoint here to just confirm the file triggers this...
} And the breakpoint was reached for the iPhone photo from JIRA, and also for my image file 🎉 I will clean up my local modifications, and add my test image to this branch, then rebase + squash. Feel free to check the image file and confirm it if you'd like, @StefanOltmann. Thank you for the help, and for the JIRA OP for providing a useful code snippet to verify the issue. |
I am glad that we finally have a usable test image for this PR! 🎉💪 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nearly 1AM now, and I rebased/edited commits. @StefanOltmann feel free to have a look at the changes to confirm everything looks OK.
I've left one extra commit here where I added the image, and the commit message points that the author (I) is the author of the image. Hopefully that way we won't have any more issues with Legal/etc. (someday I'm really going to write a test images generator based on specs).
👋 thanks! (p.s. if you find nothing wrong here, @StefanOltmann , I think this is one more ready to be merged for alpha5).
Thanks for the help and enjoy the weekend!
Bruno
<!-- UPDATE --> | ||
<action dev="ggregory" type="update" due-to="Dependabot">Bump org.apache.commons:commons-parent from 67 to 69 #382.</action> | ||
<action dev="ggregory" type="update" due-to="Dependabot">Bump commons-io:commons-io from 2.16.0 to 2.16.1 #385.</action> | ||
<action issue="IMAGING-351" dev="kinow" type="update" due-to="Stefan Oltmann, Gary Lucas"> | ||
<action issue="IMAGING-351" dev="kinow" type="update" due-to="Stefan Oltmann, Gary Lucas, Charles Hope"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@StefanOltmann didn't have much time to look at history (a bit late, won't have time tomorrow/next week), so I added your suggestion here 👍, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it’s fair. Charles wrote the first version and did half the job. I think he did not understand why the offsets changed and lost interest in fixing his test, but we are a community, so I stepped in. 🙂
Gary was not actually involved in the unit test, but as he is the hero solving the problem, I’m fine with mentioning him. 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~450kb, tried reducing the image with GIMP, but that caused the image to not fail the test anymore 🤷♂️ , but I think 450kb for a new test file is OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s a really small image and should not be a problem.
To my lib I added images that are 5 MB in size. 🫣
@kinow By porting this library to Kotlin I found and fixed many minor bugs, but this was the only really serious one as it corrupts users data in a non-recoverable way. (For example the problem of dropping the GpsVersionId, which is optional and has always the same value if the file is written/modified by ExifTool, is not a serious bug.) I don’t know what the planned release cycles are, but the last release was two years ago. I would recommend against allowing this bug to stay for another two years. People might use this library in production, so I think the next release should be soon. You may agree that the problem is serious enough. (I don’t understand why Gary made a release with knowledge of this bug anyway.) |
👍
We do not really have planned release cycles. Sometimes volunteers have time for cutting releases, sometimes they get busy with other projects, $work, life, etc. There are times where we have a lot of people active, and times of low bandwidth. For me it is more or less based on weather and some yearly milestones (cold? ==> open source for me, taxes season or good weather ==> no open source for a while for me). And for the past few years I have been working with Python on HPC with climate models. So I don't have $work time to dedicate to open source Java projects like Commons. But if I work again on Java projects, I will probably be more active around here.
RERO, release early, release often (as often as we can). If we have unreleased changes, and one of us (Apache committers) has time, it's common that we just go around releasing multiple components so users can use at least what was not released but fixed. Unfortunately, I think Gary is the only one that has been doing that, so I must thank him for going around and keeping the release train going on Apache Commons components. We may be slow to fix/release, but believe me, without what Gary is doing, we would definitely be much slower! Thanks! |
All good. Your changes look good to me. 🙂 |
FWIW and in general I put whomever was involved in an issue in the changes file in chrono order. This avoids (for me) deciding whom is worth mentioning and in what order. HTH. |
Thanks for the prompt reply (as always!). Going to merge and update JIRA then. |
As to why a bug was present in any release, we are still in alpha and we can create an RC whenever we want, and (in general) I prefer RERO instead of big bang releases. |
Out of curiosity: |
Well, I can do one in the morning (EST) but why don't you make sure there are no other changes you'd like to see in there. |
There are no other open PRs for bugfixes. And right now I don’t plan to create one. I would suggest a timely release because of the serious bug that got fixed by this PR, but of course you don’t have to do that. Especially not for me as I don’t depend on this library (no more). |
@kinow Should we create a release candidate for Alpha 5 ASAP or wait? |
I'd wait a bit more to add more things into alpha5. In the meantime users can use the snapshot jars from Maven (IIRC we had that enabled in Jenkins?). For alpha5 I think it'd be good to knock out a few more 1.0 issues, so that we continue progressing towards the final release. I should have more time this year as I'm finally getting settled in in Spain 🌞 🥘 So I plan to continue voting on threads (a bit quicker so we have enough binding votes sooner rather than later) and slowly working my way through the Imaging/Jena/OpenNLP issues over next months. But up to you, if you'd like to cut an alpha5 now that's Ok to me too (no harm done :)) Cheers |
Feel free to approach this matter in any way you see fit, but I'd like to offer my perspective: It's generally ill-advised to utilize a snapshot version of a dependency in a production application, as its behavior can change unpredictably and may not undergo thorough testing. Currently, this project boasts over 400 stars, suggesting that potentially 400+ individuals or projects are employing it in production environments - thus risking the corruption of user metadata in this very moment. Despite being labeled as "alpha", there's no stable alternative version available - at least not one that's not many years old. While Android offers "ExifInterface" for handling metadata, JVM desktop applications lack adequate alternatives, especially for writing metadata. While "metadata-extractor" suffices for reading, options for writing are limited, with "pixymeta" being the only one I found. I chose to build upon Commons Imaging for "Ashampoo Kim" due to issues encountered with "pixymeta", notably its tendency to corrupt Makernote data during writing due to a design flaw. Commons Imaging only had this one serious issue, which Gary Lucas addressed in said JIRA comment. In my view, it's crucial to recognize the likelihood of numerous production projects relying on this library, underscoring the responsibility that accompanies its maintenance. "Alpha" status or not, people trust Apache projects and you shouldn't let them down. Should any bugs be discovered in "Ashampoo Kim" I would promptly release a fix — not only due to the dependency of my own production project "Ashampoo Photos", but also out of a broader sense of responsibility to the community. Even with the existence of my library, people might still want to use Commons Imaging, because my library comes with a Kotlin dependency. That's something a pure JVM project might not want. So in my view Commons Imaging is for pure JVM projects (that need to write metadata and can't accept a Kotlin dependency) still the go-to project for production use. Keep that in mind. |
@kinow |
That is a good and very well put argument @StefanOltmann, thanks! I hadn't considered the impact of a user with metadata being written incorrectly, although I am afraid it has probably happened for a while (the JIRA Issue had alpha2 in the impacted version, but I wouldn't be surprised if the code ended up being from when the project was still Sanselan).
I started working on Imaging as I needed a pure-java for IIIF servers and couldn't find one, and the metadata is really important. @garydgregory if you have time to cut a release in the next days/weeks that'd be good. But given you have done so many times, I can do that in one or two weeks if you prefer, so that you can continue with the other projects you were planning to release. |
I encountered a bug while testing the Kotlin Multiplatform port (https://github.com/ashampoo/kim) of this library. The bug is a critical issue causing metadata corruption during the write process.
This bug has been reported at https://issues.apache.org/jira/browse/IMAGING-319. Fortunately, a functional fix was identified by @gwlucastrig, although, for some reason, it hasn't been incorporated into a pull request. All credit for the fix goes to him.
Here is the equivalent code in my library:
https://github.com/Ashampoo/kim/blob/01fbb7825d3ccf53170c4625aad63b713614919e/src/commonMain/kotlin/com/ashampoo/kim/format/tiff/write/TiffWriterLossless.kt#L257-L261
I can confirm that the fix is effective. Removing that specific line causes my unit tests to fail, particularly with the test image photo_28.jpg.
Unfortunately, I'm unable to include this image in the pull request since I don't own it. The image falls under the Unsplash license, requiring attribution, and I cannot make decisions regarding this for Apache.