Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix quadratic complexity performance bug #1657

Merged

Conversation

kevinbackhouse
Copy link
Collaborator

This fixes GHSA-w8mv-g8qq-36mj

The problem is that xmlParser->ParseBuffer() keeps reparsing the entire string from the beginning. On a reasonably large (malicious) input file, containing lots of invalid (not UTF-8) characters, this causes very slow performance due to quadratic complexity. As far as I can see, this is primarily a limitation of libexpat1, which is outside of our control. But we can make it much better by making sure that we only call xmlParser->ParseBuffer() once during this function, rather than repeatedly during the loop.

I have implemented it by copying everything into a std::string and then calling xmlParser->ParseBuffer() at the end.

The poc for the bug is a large invalid file. I am not sure if it's worth wasting storage space on that in our repo, so I didn't add it as a test. Let me know if you think I should add it.

@piponazo
Copy link
Collaborator

Wow, awesome work done here. I can imagine you had to spend quite some time investigating this issue. These are my thoughts:

  • The fix is done in xmpsdk/src/XMPMeta-Parse.cpp. XMPSdk which is an "external" dependency of the project that is included in our source repository for historical reasons. We could say that xmpsdk is not part of Exiv2.
  • Just a piece of information: In conanfile.py we added few years ago the option to bring a newer version of XMPSdk (XmpSdk/2016.7). At that time, we could use the newer version instead of the one existing in our repository (which is quite old). Nonetheless, I think nobody is using that option and people is just using the copy we have inside the exiv2 repository.
  • Regarding the poc, I would not include it either because we would be testing xmpsdk rather than exiv2.
  • I would also port this fix to main.

Thanks for your efforts! 💪

@kevinbackhouse kevinbackhouse added the forward-to-main Forward changes in a 0.28.x PR to main with Mergify label May 16, 2021
@kevinbackhouse kevinbackhouse merged commit 8c53e46 into Exiv2:0.27-maintenance May 16, 2021
@hassec hassec added this to the v0.27.4 milestone May 21, 2021
@clanmills clanmills mentioned this pull request May 22, 2021
@clanmills
Copy link
Collaborator

That's a really interesting find, Kev. Let's not pollute the repository with large test files.

One of the Topics for v1.00 is "Deep Test Coverage". One day somebody will enhance the test suite to run more extensive tests on lots of big files. The folks at pixels.us have offered to host it. When that's done, your big fat file might find a home and associated test script.

Another Topic for v1.00 is "Consider removing XMPsdk from code base". The tree xmpsdk/ was added in the early days of the project. That made sense because of the state of the Adobe build support for XMPsdk (which is shudder horrible) so Andreas (or Brad) beafed up the Exiv2 autoconf build code.

15 years later, we have Conan available to build dependencies and we may be about to remove XMPsdk. #1466 (comment)

@kevinbackhouse kevinbackhouse deleted the GHSA-w8mv-g8qq-36mj branch September 18, 2021 12:51
@nohle
Copy link

nohle commented Oct 3, 2023

@kevinbackhouse thanks so much for the fix with a detailed explanation! I ran into the same issue using XMP toolkit, and this pull request was super helpful.

Was there any appetite for landing this upstream in https://github.com/adobe/XMP-Toolkit-SDK? I can file a bug there if this hasn't been proposed already.

@kevinbackhouse
Copy link
Collaborator Author

@nohle: I'm glad it helped you! I haven't had any success with upstream (example), but you're welcome to try. Exiv2's codebase contains an out-of-date copy of XMP, which we try to do minimal maintenance on to keep it working. We'd prefer to pull XMP in as a dependency but haven't had any success with that so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug forward-to-main Forward changes in a 0.28.x PR to main with Mergify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants