Skip to content

Adds parser options and safer parsing defaults#35

Merged
redvers merged 1 commit into
mainfrom
feature/parser-options
May 22, 2026
Merged

Adds parser options and safer parsing defaults#35
redvers merged 1 commit into
mainfrom
feature/parser-options

Conversation

@redvers
Copy link
Copy Markdown
Collaborator

@redvers redvers commented May 22, 2026

Closes #12. Replaces the internals of Xml2Doc.parseDoc / Xml2Doc.parseFile so they route through libxml2's options-aware xmlReadDoc / xmlReadFile, and adds an Xml2ParserOptions class for callers who need explicit control.

Both constructors gain an optional options parameter; without it they use a safe-by-default Xml2ParserOptions instance:

no_net = true (XML_PARSE_NONET)
substitute_entities = false (XML_PARSE_NOENT off)
load_dtd = false (XML_PARSE_DTDLOAD off)
load_dtd_attrs = false (XML_PARSE_DTDATTR off)

This is the review H6 fix: XXE / SSRF surface closed for all existing parseDoc / parseFile callers, not just those who knew to ask for safer parsing. Callers that rely on libxml2's previous permissive behaviour can restore it by passing an explicit options instance.

Xml2ParserOptions exposes typed boolean fields mapping 1:1 to libxml2 XML_PARSE_* flags. The error_recovery field is named with the error_ prefix because recover is a Pony keyword.

Tests cover:

  • defaults are safe (no_net on, substitute_entities off, all other flags off; to_flags() == 2048)
  • to_flags() composition across the full flag set
  • no_blanks discards inter-element whitespace
  • error_recovery accepts malformed XML that strict parsing rejects
  • default parsing leaves internal entity references un-expanded, while substitute_entities = true expands them

Counterfactual: flipping the substitute_entities default to true fails three tests including the XXE-relevant behaviour assertion, confirming the safety defaults are covered from multiple angles.

Closes #12. Replaces the internals of Xml2Doc.parseDoc /
Xml2Doc.parseFile so they route through libxml2's options-aware
xmlReadDoc / xmlReadFile, and adds an Xml2ParserOptions class for
callers who need explicit control.

Both constructors gain an optional `options` parameter; without it
they use a safe-by-default Xml2ParserOptions instance:

  no_net               = true   (XML_PARSE_NONET)
  substitute_entities  = false  (XML_PARSE_NOENT off)
  load_dtd             = false  (XML_PARSE_DTDLOAD off)
  load_dtd_attrs       = false  (XML_PARSE_DTDATTR off)

This is the review H6 fix: XXE / SSRF surface closed for all
existing parseDoc / parseFile callers, not just those who knew to
ask for safer parsing. Callers that rely on libxml2's previous
permissive behaviour can restore it by passing an explicit options
instance.

Xml2ParserOptions exposes typed boolean fields mapping 1:1 to
libxml2 XML_PARSE_* flags. The `error_recovery` field is named with
the `error_` prefix because `recover` is a Pony keyword.

Tests cover:
  - defaults are safe (no_net on, substitute_entities off, all
    other flags off; to_flags() == 2048)
  - to_flags() composition across the full flag set
  - no_blanks discards inter-element whitespace
  - error_recovery accepts malformed XML that strict parsing rejects
  - default parsing leaves internal entity references un-expanded,
    while substitute_entities = true expands them

Counterfactual: flipping the substitute_entities default to true
fails three tests including the XXE-relevant behaviour assertion,
confirming the safety defaults are covered from multiple angles.
@redvers redvers merged commit aed8ac5 into main May 22, 2026
7 checks passed
github-actions Bot pushed a commit that referenced this pull request May 22, 2026
github-actions Bot pushed a commit that referenced this pull request May 22, 2026
@redvers redvers deleted the feature/parser-options branch May 22, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1. Parser options and robust parsing

1 participant