Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip XMP raw packet before decoding #2133

Merged
merged 2 commits into from
Mar 10, 2022
Merged

Strip XMP raw packet before decoding #2133

merged 2 commits into from
Mar 10, 2022

Conversation

kmilos
Copy link
Collaborator

@kmilos kmilos commented Mar 8, 2022

Addresses XML parsing in #2126

@kmilos kmilos requested a review from clanmills March 8, 2022 09:12
@piponazo
Copy link
Collaborator

piponazo commented Mar 8, 2022

Is it possible to add a new test case with the image provided in #2126 ?

@kmilos
Copy link
Collaborator Author

kmilos commented Mar 8, 2022

Can do once we're happy w/ the fix...

Copy link
Collaborator

@clanmills clanmills left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three comments:

  1. This feels in the wrong place. actions.cpp is in the exiv2 application. Shouldn't this code be in the library? Of is this code only used when reading sidecars?

  2. What about `\n' ?

  3. Are you certain only the XMLsdk can throw XMLsdk error messages.

@codecov
Copy link

codecov bot commented Mar 8, 2022

Codecov Report

Merging #2133 (134c476) into main (89e66ff) will decrease coverage by 0.00%.
The diff coverage is 75.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2133      +/-   ##
==========================================
- Coverage   63.35%   63.35%   -0.01%     
==========================================
  Files          97       97              
  Lines       19174    19176       +2     
  Branches     9712     9713       +1     
==========================================
+ Hits        12148    12149       +1     
  Misses       4758     4758              
- Partials     2268     2269       +1     
Impacted Files Coverage Δ
src/xmp.cpp 77.21% <75.00%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89e66ff...134c476. Read the comment docs.

@kmilos
Copy link
Collaborator Author

kmilos commented Mar 8, 2022

  1. This feels in the wrong place. actions.cpp is in the exiv2 application. Shouldn't this code be in the library? Of is this code only used when reading sidecars?

The library already takes std::string as input to Exiv2::Image::setXmpPacket(), which is not supposed to be terminated. This addresses the root of the problem, i.e. it sanitizes the original input: blob -> string happens in the app only. We can potentially add a doc comment to Exiv2::Image::setXmpPacket() to highlight the string is supposed to be the C++ unterminated kind...

  1. What about `\n' ?

The LF didn't create the problem, as I commented here. It really is the NUL terminator only.

  1. Are you certain only the XMLsdk can throw XMLsdk error messages.

Nope, haven't addressed this (yet).

@kmilos
Copy link
Collaborator Author

kmilos commented Mar 8, 2022

Actually @clanmills you might be right, could be an idea to sanitize this first thing in Exiv2::XmpParser::decode() instead?

@clanmills
Copy link
Collaborator

clanmills commented Mar 8, 2022

@kmilos You've understood my point. Trap this between Exiv2 and XMPsdk. It the case of OM.JPEG, JpegImage::readMetada() parses the JPEG to find the XMP "blob" and sends it to the XMPsdk. Every image handler (pngimage.cpp etc) uses the same strategy.

Kev wanted check_input() for good reasons that I have forgotten. That's where we need the chomp before calling XMP_Parse() to verify that we have valid XML. I believe XMPsdk can throw when presented with legal XML. We shouldn't throw an XMPsdk exception for illegal XML. We should have a new exception from check_input() such as "XMPdata is not valid XML".

@kmilos
Copy link
Collaborator Author

kmilos commented Mar 8, 2022

I'm still not 100% satisfied w/ that approach: it only makes libexpat happy at the point of check(), but still passes the unsanitized buffer to the XMP toolkit (which luckily is not unhappy with it so far)... OTOH, checking this in every image handler at the point of xmlPacket_.assign() is cumbersome, might have to add yet another abstraction method for setting this...

@clanmills
Copy link
Collaborator

clanmills commented Mar 8, 2022

@kmilos. Can you circle back and discuss check_input() with @kevinbackhouse.

Here are the goals that I see for this PR:

  1. OM.JPEG shouldn't throw Validator errors.
  2. Let's ask @hassec to deal with the Olympus/MakerNote for that file. That's a different PR.
  3. When I run exiv2 on my 80,000 files, the number of 'Validator' issues is not to increase from 151.
  4. There could be issues with sidecars and other XMP metadata sources.
  5. The test suite must pass, preferably without changes.
  6. Exiv2 code should never thrown an XMPsdk error (only Adobe code should do that).

@kmilos kmilos marked this pull request as draft March 8, 2022 11:45
@kmilos kmilos changed the title Chomp XML binary blob before constructing string Strip XMP raw packet before decoding Mar 8, 2022
@kevinbackhouse
Copy link
Collaborator

Is the XMLValidator causing an issue? I added it in #1878. The reason is that the Adobe xmpsdk library has various bugs that OSS-Fuzz keeps hitting. Most of them are triggered by malformed XML. Because xmpsdk is third party code, we try not to modify it. Instead, I decided to add XMLValidator, which checks that the XML isn't malformed before we hand it off to xmpsdk. It reduces the chance of hitting a bug in xmpsdk.

@kmilos
Copy link
Collaborator Author

kmilos commented Mar 8, 2022

Is the XMLValidator causing an issue?

Yes. libexpat is not tolerating a trailing \0 while XMP SDK is.

@kmilos
Copy link
Collaborator Author

kmilos commented Mar 8, 2022

Ha, some tests that were failing before seem to pass now? Or throw a different error... Quite a few to update anyway.

@kmilos
Copy link
Collaborator Author

kmilos commented Mar 9, 2022

Ok, ended up listening to gray hair @clanmills but a level above so we can pass the same data to both the libexpat validator and then the XMP SDK - this is less messy than touching all the image handlers...

@kmilos
Copy link
Collaborator Author

kmilos commented Mar 9, 2022

@piponazo Turns out we already had test files that exhibit this problem!

@kmilos kmilos marked this pull request as ready for review March 9, 2022 17:07
@clanmills
Copy link
Collaborator

Ok, ended up listening to gray hair @clanmills but a level above so we can pass the same data to both the libexpat validator and then the XMP SDK - this is less messy than touching all the image handlers...

Ah, Milos, you say the nicest things. At least I'm still gray and not white yet. Alison said "tell him you haven't got and old brain". Mind you, my hearing isn't so good, maybe she said "tell him you haven't got an old nag!".

Copy link
Collaborator

@piponazo piponazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work guys! 👏

@kmilos kmilos merged commit 1e2b9da into main Mar 10, 2022
@kmilos kmilos deleted the fix_2126_xml_term branch March 10, 2022 12:16
@kmilos kmilos added the bug label Mar 10, 2022
@kmilos kmilos added this to the v1.00 milestone Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants