feat(xml/unstable): add XML parsing and serialization module #6981
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New XML parsing and serialization module
The goal of this PR is to implement XML parsing, especially streaming, to handle large XML files without consuming a lot of memory. I chose
saxesas the library to beat, since it seems to be the best streaming parser out there.Functionality - Deno std vs saxes
Deno Parser Advantages
Saxes Advantages
XML 1.1 support is really the only significant feature saxes has that Deno lacks. And XML 1.1 is rarely used in practice.
Performance streaming - Deno std vs saxes
All tests are run on a MacBook M1 Max. I used a 597MB XML file with product data in the google product format.
The performance in parsing time is similar, with a gain on the memory side (compared to
SAXandhtmlparser2we are a lot faster)Performance DOM-style/non-streaming - Deno std vs others
DOM-style streaming is not the main focus of this PR, but running benchmarks against a few dozen files (various types of XML, both small and bigger files), this module performs very well while offering significantly stronger XML conformance than these libs:
Conformance - Deno std vs saxes
No mainstream npm XML library aims to pass the entire W3C XML test suite (especially DTD/external entity/validation cases). Full support is complex and often disabled or omitted in JS parsers for security reasons (XXE/entity-expansion DoS).
I chose the same approach as saxes: To not support custom entity expansion from DTD, as that poses a security risk (XXE vulnerabilities and Billion-laughs attacks)
The below results are based on the official test suite, with the following caveats:
With Skip List (1294 tests)
Without Skip List (1736 tests)