Skip to content

[REVIEW] secure-code-review: add multi-language XXE parser evidence gates #882

@minorstep

Description

@minorstep

Skill Being Reviewed

Skill name: secure-code-review
Skill path: skills/appsec/secure-code-review/

False Positive Analysis

Benign XML parsing that can be over-reported if every XML parser use is flagged:

from defusedxml.ElementTree import fromstring

def parse_metadata(xml_body: bytes):
    return fromstring(xml_body)

Why this is a false positive:

The parser is from defusedxml, which is designed to disable dangerous XML features such as external entities and entity expansion attacks. XML parsing itself is not automatically XXE. A review needs evidence that untrusted XML reaches a parser with DTDs, external entities, external schemas, or network/entity resolvers enabled before reporting CWE-611.

Coverage Gaps

Missed variant 1: Java parser leaves external entities and DTD loading enabled

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(request.getInputStream());

Why it should be caught:

The main secure-code-review skill does not require reviewers to inspect XML parser configuration. In Java, Python, .NET, PHP, Ruby, and Node, unsafe parser defaults or explicit resolver settings can expose local files, internal HTTP endpoints, or trigger denial-of-service through entity expansion.

Missed variant 2: external schema or XSLT resolution is enabled even when entities appear disabled

SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(new StreamSource(request.getInputStream()));

Why it should be caught:

XXE-style risk is not limited to <!DOCTYPE> in ordinary document parsing. External schema, XSLT, XInclude, and resolver APIs can still perform filesystem or network access. The skill should require parser, schema, transformer, and resolver evidence rather than only looking for DocumentBuilderFactory.

Edge Cases

  • JSON-only APIs do not need XML parser findings unless XML is accepted through alternate content types, uploads, SOAP/SAML, webhooks, or import jobs.
  • Internal-only XML processing is still risky when files originate from partners, tenants, support bundles, SAML responses, or uploaded documents.
  • A parser may disable external entities but still permit large entity expansion or external schema fetches unless those features are separately disabled.
  • Some legacy integrations require DTDs. The finding should then document source trust, resolver allowlists, network isolation, size limits, and why the exception is needed.

Remediation Quality

  • Fix resolves the vulnerability
  • Fix doesn't introduce new security issues
  • Fix doesn't break functionality
  • Issues found: Add a multi-language XXE evidence gate to the main secure-code-review skill. Require reviewers to record XML input source, parser type, DTD/entity settings, external resolver/schema/XSLT settings, size limits, and false-positive rationale for safe parsers such as defusedxml or explicitly hardened framework settings.

Comparison to Other Tools

Tool Catches this? Notes
Semgrep Partial Can match unsafe parser constructors and missing features, but needs language-specific safe-parser exceptions.
CodeQL Partial Can model unsafe XML parser configuration in supported languages, but reviewers still need source trust and resolver evidence.
OWASP Cheat Sheet Series Yes Provides language-specific guidance for disabling XXE and related external resolution features.
CWE Yes CWE-611 tracks improper restriction of XML external entity references.

Overall Assessment

Strengths:

The skill has good coverage of common injection, authentication, authorization, logging, file upload, deserialization, and SSRF paths.

Needs improvement:

The main multi-language secure-code-review process does not include an XML parser evidence gate. That can cause false negatives for unsafe XML parsers and false positives for hardened parser libraries or configurations.

Priority recommendations:

  1. Add CWE-611 to Step 2 or Step 8 coverage.
  2. Add vulnerable and safe examples for XML parser configuration.
  3. Add language-aware search patterns for Java, Python, .NET, PHP, Ruby, and Node XML parsers.
  4. Add report evidence fields for parser type, source, DTD/entity state, external resolver/schema/XSLT state, size limits, and false-positive rationale.

References

Bounty Info

  • I have read and agree to the CONTRIBUTING.md bounty terms
  • Preferred payment method: Payment details can be provided privately after maintainer acceptance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions