Fix quadratic complexity performance bug #1657
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
This fixes GHSA-w8mv-g8qq-36mj
The problem is that
xmlParser->ParseBuffer()keeps reparsing the entire string from the beginning. On a reasonably large (malicious) input file, containing lots of invalid (not UTF-8) characters, this causes very slow performance due to quadratic complexity. As far as I can see, this is primarily a limitation of libexpat1, which is outside of our control. But we can make it much better by making sure that we only callxmlParser->ParseBuffer()once during this function, rather than repeatedly during the loop.I have implemented it by copying everything into a
std::stringand then callingxmlParser->ParseBuffer()at the end.The poc for the bug is a large invalid file. I am not sure if it's worth wasting storage space on that in our repo, so I didn't add it as a test. Let me know if you think I should add it.