Skip to content

Conversation

@tomas-zijdemans
Copy link
Contributor

@tomas-zijdemans tomas-zijdemans commented Feb 3, 2026

New XML parsing and serialization module

Sorry, I accidentally closed this PR. This is a revamp that adds more functionality and handles not-wellformed XML better.

The goal of this PR is to implement XML parsing, especially streaming, to handle large XML files without consuming a lot of memory. I chose saxes as the library to beat, since it seems to be the best streaming parser out there.

Functionality - Deno std vs saxes

Deno Parser Advantages

Feature Notes
Dual API (DOM + Streaming) parse() returns a tree, parseXmlStream() for events. Saxes is streaming-only.
Round-trip Support Built-in stringify() for serialization. Saxes is parse-only.
Zero-allocation Streaming Callback API with XmlAttributeIterator avoids per-event object creation

Saxes Advantages

Feature Notes
XML 1.1 Support Full XML 1.1 spec. Deno is XML 1.0 only.

 XML 1.1 support is really the only significant feature saxes has that Deno lacks. And XML 1.1 is rarely used in practice.

Performance streaming - Deno std vs saxes

All tests are run on a MacBook M1 Max. I used a 597MB XML file with product data in the google product format.

Metric Deno std Saxes
Heap Delta 4.14 MB 21.26 MB
Peak Heap 34.89 MB 34.77 MB
Duration 5.00s 5.56s

The performance in parsing time is similar, with a gain on the memory side (compared to SAX and htmlparser2 we are a lot faster)

Performance DOM-style/non-streaming - Deno std vs others

DOM-style streaming is not the main focus of this PR, but running benchmarks against a few dozen files (various types of XML, both small and bigger files), this module performs very well while offering significantly stronger XML conformance than these libs:

Library Small Files Large Files
htmlparser2 1.2x slower 1.7x slower
SAX 1.4x slower 2.8x slower
fast-xml-parser 2.2x slower 4.1x slower
xml2js 2.9x slower 5.3x slower

Conformance - Deno std vs saxes

No mainstream npm XML library aims to pass the entire W3C XML test suite (especially DTD/external entity/validation cases). Full support is complex and often disabled or omitted in JS parsers for security reasons (XXE/entity-expansion DoS).

I chose the same approach as saxes: To not support custom entity expansion from DTD, as that poses a security risk (XXE vulnerabilities and Billion-laughs attacks)

The below results are based on the official test suite, with the following caveats:

  • DTD expansions and validation tests are put in a Skip list
  • Only XML 1.0 Fifth version

With Skip List (1294 tests)

Parser Passed Total Rate
Saxes 1290 1294 99.7%
Deno (sync) 1294 1294 100.0%
Deno (stream) 1294 1294 100.0%

Without Skip List (1736 tests)

Parser Passed Total Rate
Saxes 1312 1736 75.6%
Deno (sync) 1442 1736 83.1%
Deno (stream) 1417 1736 81.6%

@codecov
Copy link

codecov bot commented Feb 3, 2026

Codecov Report

❌ Patch coverage is 82.07850% with 726 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.18%. Comparing base (e80e6bb) to head (23dc3d4).

Files with missing lines Patch % Lines
xml/_tokenizer.ts 79.05% 392 Missing and 15 partials ⚠️
xml/_parse_sync.ts 74.80% 252 Missing and 13 partials ⚠️
xml/_common.ts 89.66% 28 Missing and 3 partials ⚠️
xml/_parser.ts 96.46% 16 Missing ⚠️
xml/_name_chars.ts 89.09% 2 Missing and 4 partials ⚠️
xml/parse_stream.ts 96.77% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6981      +/-   ##
==========================================
- Coverage   94.26%   93.18%   -1.08%     
==========================================
  Files         602      613      +11     
  Lines       43662    47713    +4051     
  Branches     7063     8175    +1112     
==========================================
+ Hits        41159    44463    +3304     
- Misses       2447     3159     +712     
- Partials       56       91      +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant