Conversation
|
We need tests that use doctype decls and general entities and verify that we reject them. |
|
Note: if we disable DOCTYPE we are disabling general entities, as those can only be supplied in a doctype. |
DAFFODIL-1422, DAFFODIL-1659
78c9448 to
c3583ea
Compare
daffodil-lib/src/main/scala/org/apache/daffodil/api/DaffodilSchemaSource.scala
Show resolved
Hide resolved
| factory.setResourceResolver(DFDLCatalogResolver.get) | ||
| val schema = factory.newSchema(new StreamSource(extVarXsd)) | ||
| val validator = schema.newValidator() | ||
| validator.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true) |
There was a problem hiding this comment.
move to XMLUtils.setSecureDefaults(v: Validator) ??
| def setSecureDefaults(xmlReader: XMLReader) : Unit = { | ||
| try { | ||
| xmlReader.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true) | ||
| xmlReader.setFeature(XMLUtils.XML_EXTERNAL_PARAMETER_ENTITIES_FEATURE, false) |
There was a problem hiding this comment.
Add comment that these next 2 restrictions are not strictly speaking necessary, as they are implied by disallowing doctypes.
| } | ||
| } | ||
|
|
||
| // def setSecureDefaults(schemaFactory: org.apache.xerces.jaxp.validation.XMLSchemaFactory) : Unit = { |
There was a problem hiding this comment.
Remove commented code. Add comment above that the disallowing of doctypes works for XMLReader and Validator, but not XMLSchemaFactory.
| } | ||
| val enc = determineEncoding(file) // The encoding is needed for ConstructingParser | ||
| val input = scala.io.Source.fromURI(file.toURI)(enc) | ||
| val node = ConstructingParser.fromSource(input, true).document.docElem |
There was a problem hiding this comment.
Add comment that this was changed from the Constructing loader because of a unexpected error, and reusing the DaffodlXMLLoader works, no need for a special case loader for external variable bindings.
|
Closing. Will open a new PR for this. |
DAFFODIL-1422 is a ticket about restricting the XML we accept so that we do not allow DOCTYPE declarations in it.
This is a security related provision, as use of DOCTYPEs leaves XML loaders subject to a variety of problems such as documents exploding in size as the DOCTYPE declarations are expanded.
DOCTYPEs are an old obsolete idea, and simply disallowing them entirely is the best option.
There are other jira tickets about disallowing resolvers and loaders from dereferencing URIs treating them as internet URLs.
The upshot of all this is that "loading" XML is tricky, and needs to be done carefully via a centralized library that provides the various options different loading requires, while not exposing/allowing the security vulnerabilities.
This change set does not yet include establishing a single central library that all XML loading goes through.
Right now this is at the point where it is apparent such a library is needed, because there are too many places that are invoking XML loaders for them to just all be "done the right way". Under maintenance this is too likely to drift.
A key starting point is to survey every place Daffodil does XML loading. These include loading of:
** This can include DFDL schemas, but also XSD for annotation languages (e.g., schematron annotations)
** Note that this validating loader is loading a schema, but loading it not as a schema, but as ordinary XML. This should be validated against the schema for DFDL schemas.
** (Currently not a validating loader)
** Test cases can load XML Infoset files.
** validation here involves validating defineSchema elements which contain DFDL schema.
This may not be a comprehensive list.