Adds parser options and safer parsing defaults#35
Merged
Conversation
Closes #12. Replaces the internals of Xml2Doc.parseDoc / Xml2Doc.parseFile so they route through libxml2's options-aware xmlReadDoc / xmlReadFile, and adds an Xml2ParserOptions class for callers who need explicit control. Both constructors gain an optional `options` parameter; without it they use a safe-by-default Xml2ParserOptions instance: no_net = true (XML_PARSE_NONET) substitute_entities = false (XML_PARSE_NOENT off) load_dtd = false (XML_PARSE_DTDLOAD off) load_dtd_attrs = false (XML_PARSE_DTDATTR off) This is the review H6 fix: XXE / SSRF surface closed for all existing parseDoc / parseFile callers, not just those who knew to ask for safer parsing. Callers that rely on libxml2's previous permissive behaviour can restore it by passing an explicit options instance. Xml2ParserOptions exposes typed boolean fields mapping 1:1 to libxml2 XML_PARSE_* flags. The `error_recovery` field is named with the `error_` prefix because `recover` is a Pony keyword. Tests cover: - defaults are safe (no_net on, substitute_entities off, all other flags off; to_flags() == 2048) - to_flags() composition across the full flag set - no_blanks discards inter-element whitespace - error_recovery accepts malformed XML that strict parsing rejects - default parsing leaves internal entity references un-expanded, while substitute_entities = true expands them Counterfactual: flipping the substitute_entities default to true fails three tests including the XXE-relevant behaviour assertion, confirming the safety defaults are covered from multiple angles.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #12. Replaces the internals of Xml2Doc.parseDoc / Xml2Doc.parseFile so they route through libxml2's options-aware xmlReadDoc / xmlReadFile, and adds an Xml2ParserOptions class for callers who need explicit control.
Both constructors gain an optional
optionsparameter; without it they use a safe-by-default Xml2ParserOptions instance:no_net = true (XML_PARSE_NONET)
substitute_entities = false (XML_PARSE_NOENT off)
load_dtd = false (XML_PARSE_DTDLOAD off)
load_dtd_attrs = false (XML_PARSE_DTDATTR off)
This is the review H6 fix: XXE / SSRF surface closed for all existing parseDoc / parseFile callers, not just those who knew to ask for safer parsing. Callers that rely on libxml2's previous permissive behaviour can restore it by passing an explicit options instance.
Xml2ParserOptions exposes typed boolean fields mapping 1:1 to libxml2 XML_PARSE_* flags. The
error_recoveryfield is named with theerror_prefix becauserecoveris a Pony keyword.Tests cover:
Counterfactual: flipping the substitute_entities default to true fails three tests including the XXE-relevant behaviour assertion, confirming the safety defaults are covered from multiple angles.