Skip to content

WIP on doctype restriction #519

Closed
mbeckerle wants to merge 1 commit intoapache:masterfrom
mbeckerle:daf-1422-doctype
Closed

WIP on doctype restriction #519
mbeckerle wants to merge 1 commit intoapache:masterfrom
mbeckerle:daf-1422-doctype

Conversation

@mbeckerle
Copy link
Copy Markdown
Contributor

DAFFODIL-1422 is a ticket about restricting the XML we accept so that we do not allow DOCTYPE declarations in it.
This is a security related provision, as use of DOCTYPEs leaves XML loaders subject to a variety of problems such as documents exploding in size as the DOCTYPE declarations are expanded.
DOCTYPEs are an old obsolete idea, and simply disallowing them entirely is the best option.

There are other jira tickets about disallowing resolvers and loaders from dereferencing URIs treating them as internet URLs.

The upshot of all this is that "loading" XML is tricky, and needs to be done carefully via a centralized library that provides the various options different loading requires, while not exposing/allowing the security vulnerabilities.

This change set does not yet include establishing a single central library that all XML loading goes through.

Right now this is at the point where it is apparent such a library is needed, because there are too many places that are invoking XML loaders for them to just all be "done the right way". Under maintenance this is too likely to drift.

A key starting point is to survey every place Daffodil does XML loading. These include loading of:

  • DFDL schemas and the include/import schema files they reference
    ** This can include DFDL schemas, but also XSD for annotation languages (e.g., schematron annotations)
    ** Note that this validating loader is loading a schema, but loading it not as a schema, but as ordinary XML. This should be validated against the schema for DFDL schemas.
  • XML Infosets being unparsed
    ** (Currently not a validating loader)
  • TDML files for testing - this can in turn lead to loading of DFDL schemas
    ** Test cases can load XML Infoset files.
    ** validation here involves validating defineSchema elements which contain DFDL schema.
  • Config files
  • daffodil-propgen loads the XSD files for DFDL annotations, tunables, etc.
  • Xerces validator loads a schema, and the include/import files it references
  • Xerces validator loads XML being validated

This may not be a comprehensive list.

@mbeckerle
Copy link
Copy Markdown
Contributor Author

We need tests that use doctype decls and general entities and verify that we reject them.

@mbeckerle
Copy link
Copy Markdown
Contributor Author

Note: if we disable DOCTYPE we are disabling general entities, as those can only be supplied in a doctype.

@mbeckerle mbeckerle force-pushed the daf-1422-doctype branch from 78c9448 to c3583ea Compare May 7, 2021 17:24
factory.setResourceResolver(DFDLCatalogResolver.get)
val schema = factory.newSchema(new StreamSource(extVarXsd))
val validator = schema.newValidator()
validator.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to XMLUtils.setSecureDefaults(v: Validator) ??

def setSecureDefaults(xmlReader: XMLReader) : Unit = {
try {
xmlReader.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)
xmlReader.setFeature(XMLUtils.XML_EXTERNAL_PARAMETER_ENTITIES_FEATURE, false)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment that these next 2 restrictions are not strictly speaking necessary, as they are implied by disallowing doctypes.

}
}

// def setSecureDefaults(schemaFactory: org.apache.xerces.jaxp.validation.XMLSchemaFactory) : Unit = {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code. Add comment above that the disallowing of doctypes works for XMLReader and Validator, but not XMLSchemaFactory.

}
val enc = determineEncoding(file) // The encoding is needed for ConstructingParser
val input = scala.io.Source.fromURI(file.toURI)(enc)
val node = ConstructingParser.fromSource(input, true).document.docElem
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment that this was changed from the Constructing loader because of a unexpected error, and reusing the DaffodlXMLLoader works, no need for a special case loader for external variable bindings.

@mbeckerle
Copy link
Copy Markdown
Contributor Author

Closing. Will open a new PR for this.

@mbeckerle mbeckerle closed this May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant