Skip to content
Permalink
Browse files
Verify that we get errors if the files contain a DOCTYPE
Tests that show this error detection for each of
* DFDL schemas
* included DFDL schemas
* imported DFDL schemas
* TDML files
* XML Infoset files used in TDML tests
* Config files used in TDML tests
* External variable binding files
* XML input to XML Text Infoset Inputters (using Woodstox library)
* XML input to JDOM infoset Inputters
* XML input to Scala XML Infoset Inputters

Eliminating use of DOCTYPE also eliminates any possibility of
XML General or Paremter Entities.

Consolidated loader/validators of XML and XSD.

Everything that loads XML, XSD, TDML, config, external vars,
or ".dfdl.xsd", now uses DaffodilXMLLoader to do it.

Specific validators that were for Config files or
External Variable Binding files are now gone.

DaffodilXMLLoader was simplified and made more uniform.
Single purpose mixin traits and adapters were eliminated.

The validation in DaffodilXMLLoader uses two methods.

First it uses the XercesValidator (used by the validation feature).

Second it does a validating load with Xerces. This seems to
catch/report different validation problems. (Instrumentation to
prove this may be worth it, so that we can get rid of redundant work
if possible)

DFDL schemas are validated by constructing an XML Schema
object from them, as well as by loading them and validating
against the XML Schema for DFDL schemas.

The DaffodilXMLLoader validates (if requested) using a supplied XML
Schema. But then always loads using the DaffodilConstructingParser
which uses the underlying ConstructingParser - this is needed because
in many cases we are dependent on properly handling CDATA regions
(e.g., TDML files, test infoset XML files, etc) which Xerces doesn't
do properly.

I attempted to implement DAFFODIL-288 to validate the infoset XML
(before unparsing) also but was unsuccessful, but a TODO
DAFFODIL-288 marks the place where that fix goes.

New validation is more uniform, and thorough. This caught a number of
small issues like missing "tdml:" prefixes on numerous files'
testSuite elements.
There were various adjustments to accomodate the
more strict validation.

Changes to SAX due to simple types now being series of Text, Atom, and
EntityRef.

Various other small fixes to TDML runner to insure no diagnostic
errors are being hidden.

Fix MS-Windows failure due to CRLF issues. There should be less CRLF
sensitivity now.  The new unified DaffodilXMLLoader which we use
everywhere always normalizes CRLF to LF and isolated CR to LF.
This is done in Text, CDATA, and COMMENT objects.

The only reason we use the constructing parser now is the behavior
around CDATA/PCDATA nodes, which is broken in Xerces. There are tests
to characterize this behavior in Xerces so if it does get fixed we can
adapt. However, if it did get fixed it would require a mode switch to
turn this different behavior on, so we probably just won't notice.

Upgraded scala-xml library to version 2.0.0

TDMLRunner no longer gets NPE in one situation.

Also documented why we need the 2nd xerces load beyond just the
regular XercesValidator call, which is for xsi:schemaLocation.

Note added about xsi:noNamespaceSchemaLocation

Added comments about useDefaultNamespace in tdml.xsd. Also
added a TODO in the code. We really want this to be false by
default, but 81 tests in daffodil-test fail if you change that, so not
doing in this change set.

DAFFODIL-1422, DAFFODIL-1659, DAFFODIL-1816
  • Loading branch information
mbeckerle committed May 26, 2021
1 parent 41cf56f commit 62ac1047a18979bea1f5d6c668eefeaaae39d64b
Showing 74 changed files with 2,024 additions and 772 deletions.
@@ -1265,20 +1265,22 @@ class TestCLIdebugger {
shell.sendLine("step")
shell.sendLine("step")
shell.sendLine("step")
shell.sendLine("step")
shell.sendLine("step")
shell.expect(contains("bitPosition: 0 -> 8"))
shell.expect(contains("foundDelimiter: (no value) -> ,"))
shell.expect(contains("foundField: (no value) -> 0"))
shell.sendLine("step")
shell.sendLine("step")
shell.expect(contains("bitPosition: 8 -> 16"))
shell.expect(contains("childIndex: 1 -> 2"))
shell.expect(contains("foundDelimiter: , -> (no value)"))
shell.expect(contains("foundField: 0 -> (no value)"))
shell.expect(contains("groupIndex: 1 -> 2"))
shell.expect(contains("occursIndex: 1 -> 2"))
shell.sendLine("step")
shell.expect(contains("bitPosition: 8 -> 16"))
shell.expect(contains("foundDelimiter: , -> (no value)"))
shell.expect(contains("foundField: 0 -> (no value)"))
shell.sendLine("step")
shell.expect(contains("bitPosition: 16 -> 24"))
shell.expect(contains("foundDelimiter: (no value) -> ,"))
shell.expect(contains("foundField: (no value) -> 1"))
shell.sendLine("quit")
} finally {
shell.close()
@@ -1331,26 +1333,29 @@ class TestCLIdebugger {
shell.expect(contains("(debug)"))
shell.sendLine("display info diff")
shell.expect(contains("(debug)"))
shell.sendLine("set diffExcludes childIndex")
shell.expect(contains("(debug)"))
shell.sendLine("step")
shell.expect(contains("bitPosition: 0 -> 8"))
shell.sendLine("step")
shell.sendLine("step")
shell.sendLine("step")
shell.expect(contains("bitPosition: 0 -> 8"))
shell.sendLine("set diffExcludes childIndex")
shell.sendLine("step")
shell.sendLine("step")
shell.sendLine("step")
shell.expect(regexp("\\+ Suppressable.* for cell"))
shell.sendLine("step")
shell.sendLine("step")
shell.sendLine("step")
shell.sendLine("step")
shell.sendLine("step")
shell.sendLine("step")
shell.sendLine("step")
shell.expect(regexp("\\+ Suppressable.* for cell"))
shell.sendLine("step")
shell.expect(regexp("\\+ Alignment.* for cell"))
shell.expect(regexp("RegionSplit.* for cell"))
shell.sendLine("info suspensions")
shell.expect(regexp("Suppressable.* for cell"))
shell.expect(regexp("Alignment.* for cell"))
shell.expect(regexp("RegionSplit.* for cell"))
shell.sendLine("quit")
} finally {
shell.close()
@@ -28,7 +28,6 @@ import java.nio.channels.Channels
import java.nio.file.Paths
import java.util.Scanner
import java.util.concurrent.Executors

import com.typesafe.config.ConfigFactory

import scala.concurrent.Await
@@ -37,9 +36,7 @@ import scala.concurrent.Future
import scala.concurrent.duration.Duration
import scala.xml.Node
import scala.xml.SAXParseException

import javax.xml.parsers.DocumentBuilderFactory
import javax.xml.parsers.SAXParserFactory
import javax.xml.transform.TransformerFactory
import javax.xml.transform.dom.DOMSource
import javax.xml.transform.stream.StreamResult
@@ -55,7 +52,6 @@ import org.apache.daffodil.api.ValidationMode
import org.apache.daffodil.api.WithDiagnostics
import org.apache.daffodil.compiler.Compiler
import org.apache.daffodil.compiler.InvalidParserException
import org.apache.daffodil.configuration.ConfigurationLoader
import org.apache.daffodil.debugger.CLIDebuggerRunner
import org.apache.daffodil.debugger.InteractiveDebugger
import org.apache.daffodil.debugger.TraceDebuggerRunner
@@ -98,6 +94,7 @@ import org.apache.daffodil.util.LoggingDefaults
import org.apache.daffodil.util.Misc
import org.apache.daffodil.util.Timer
import org.apache.daffodil.validation.Validators
import org.apache.daffodil.xml.DaffodilSAXParserFactory
import org.apache.daffodil.xml.QName
import org.apache.daffodil.xml.RefQName
import org.apache.daffodil.xml.DaffodilXMLLoader
@@ -106,8 +103,10 @@ import org.rogach.scallop
import org.rogach.scallop.ArgType
import org.rogach.scallop.ScallopOption
import org.rogach.scallop.ValueConverter
import org.xml.sax.XMLReader

import scala.util.matching.Regex
import scala.xml.SAXParser

class CommandLineSAXErrorHandler() extends org.xml.sax.ErrorHandler with Logging {

@@ -576,7 +575,7 @@ object Main extends Logging {
*/
def loadConfigurationFile(file: File) = {
val loader = new DaffodilXMLLoader()
val node = ConfigurationLoader.getConfiguration(loader, file.toURI)
val node = loader.load(URISchemaSource(file.toURI), Some(XMLUtils.dafextURI))
node
}

@@ -803,14 +802,27 @@ object Main extends Logging {
case Left(bytes) => new ByteArrayInputStream(bytes)
case Right(is) => is
}
scala.xml.XML.load(is)
val parser: SAXParser = {
val f = DaffodilSAXParserFactory()
f.setNamespaceAware(false)
val p = f.newSAXParser()
p
}
scala.xml.XML.withSAXParser(parser).load(is)
}
case InfosetType.JDOM => {
val is = data match {
case Left(bytes) => new ByteArrayInputStream(bytes)
case Right(is) => is
}
new org.jdom2.input.SAXBuilder().build(is)
val builder = new org.jdom2.input.SAXBuilder() {
override protected def createParser(): XMLReader = {
val rdr = super.createParser()
XMLUtils.setSecureDefaults(rdr)
rdr
}
}
builder.build(is)
}
case InfosetType.W3CDOM => {
val byteArr = data match {
@@ -821,6 +833,7 @@ object Main extends Logging {
override def initialValue = {
val dbf = DocumentBuilderFactory.newInstance()
dbf.setNamespaceAware(true)
dbf.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)
val db = dbf.newDocumentBuilder()
db.parse(new ByteArrayInputStream(byteArr))
}
@@ -1484,7 +1497,7 @@ object Main extends Logging {
private def unparseWithSAX(
is: InputStream,
contentHandler: DFDL.DaffodilUnparseContentHandler): UnparseResult = {
val xmlReader = SAXParserFactory.newInstance.newSAXParser.getXMLReader
val xmlReader = DaffodilSAXParserFactory().newSAXParser.getXMLReader
xmlReader.setContentHandler(contentHandler)
xmlReader.setFeature(XMLUtils.SAX_NAMESPACES_FEATURE, true)
xmlReader.setFeature(XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE, true)
@@ -20,13 +20,13 @@ package org.apache.daffodil.dsom
import org.xml.sax.SAXParseException
import org.apache.daffodil.xml.DaffodilXMLLoader
import org.apache.daffodil.xml.NS
import org.apache.daffodil.xml.XMLUtils
import org.apache.daffodil.api._
import org.apache.daffodil.dsom.IIUtils._
import org.apache.daffodil.api.Diagnostic
import org.apache.daffodil.oolag.OOLAG
import org.apache.daffodil.util.LogLevel
import org.apache.daffodil.util.Misc
import org.apache.daffodil.xml.XMLUtils

/**
* represents one schema document file
@@ -114,14 +114,14 @@ final class DFDLSchemaFile(
}
val node = try {
log(LogLevel.Resolver, "Loading %s.", diagnosticDebugName)
val ldr = new DaffodilXMLLoader(this)
//
// We do not want to validate here ever, because we have to examine the
// root xs:schema eleemnt of a schema to decide if it is a DFDL schema
// root xs:schema element of a schema to decide if it is a DFDL schema
// at all that we're even supposed to compile.
//
ldr.setValidation(false)
val node = ldr.load(schemaSource)
val loader = new DaffodilXMLLoader(this)
// need line numbers for diagnostics
val node = loader.load(schemaSource, None, addPositionAttributes = true)
schemaDefinitionUnless(node != null, "Unable to load XML from %s.", diagnosticDebugName)
node
} catch {
@@ -134,20 +134,17 @@ final class DFDLSchemaFile(

lazy val isDFDLSchemaFile = iiXMLSchemaDocument.isDFDLSchema

private lazy val loader = new DaffodilXMLLoader(this)

lazy val iiXMLSchemaDocument = LV('iiXMLSchemaDocument) {
val res = loadXMLSchemaDocument(seenBefore, Some(this))
if (res.isDFDLSchema && sset.validateDFDLSchemas) {
//
// We validate DFDL schemas, only if validation is requested.
// Some things, tests generally, want to turn this validation off.
//

val ldr = new DaffodilXMLLoader(this)
ldr.setValidation(true)
try {
ldr.load(schemaSource) // validate as XML file with XML Schema for DFDL Schemas
ldr.validateSchema(schemaSource) // validate as XSD (catches UPA errors for example)
} catch {
try loader.validateAsDFDLSchema(schemaSource) // validate as XSD (catches UPA errors for example)
catch {
// ok to absorb SAX Parse Exception as we've captured those errors in error handling
// elsewhere.
case _: org.xml.sax.SAXParseException => // ok
@@ -31,6 +31,8 @@ import org.apache.daffodil.processors.parsers.NotParsableParser
import org.apache.daffodil.processors.unparsers.NotUnparsableUnparser
import org.apache.daffodil.util.LogLevel

import java.io.ObjectOutputStream

trait SchemaSetRuntime1Mixin { self : SchemaSet =>

requiredEvaluationsAlways(parser)
@@ -72,37 +74,61 @@ trait SchemaSetRuntime1Mixin { self : SchemaSet =>
}.value

def onPath(xpath: String): DFDL.DataProcessor = {
Assert.usage(!isError)
if (xpath != "/") root.notYetImplemented("""Path must be "/". Other path support is not yet implemented.""")
val rootERD = root.elementRuntimeData
root.schemaDefinitionUnless(
rootERD.outputValueCalcExpr.isEmpty,
"The root element cannot have the dfdl:outputValueCalc property.")
val validationMode = ValidationMode.Off
val p = if (!root.isError) parser else null
val u = if (!root.isError) unparser else null
val ssrd = new SchemaSetRuntimeData(
p,
u,
this.diagnostics,
rootERD,
variableMap,
typeCalcMap)
if (root.numComponents > root.numUniqueComponents)
log(LogLevel.Info, "Compiler: component counts: unique %s, actual %s.",
root.numUniqueComponents, root.numComponents)
val dataProc = new DataProcessor(ssrd, tunable, self.compilerExternalVarSettings)
if (dataProc.isError) {
// NO longer printing anything here. Callers must do this.
// val diags = dataProc.getDiagnostics
// log(LogLevel.Error,"Compilation (DataProcessor) reports %s compile errors/warnings.", diags.length)
// diags.foreach { diag => log(LogLevel.Error, diag.toString()) }
} else {
log(LogLevel.Compile, "Parser = %s.", ssrd.parser.toString)
log(LogLevel.Compile, "Unparser = %s.", ssrd.unparser.toString)
log(LogLevel.Compile, "Compilation (DataProcesor) completed with no errors.")
}
dataProc
Assert.usage(!isError)
if (xpath != "/") root.notYetImplemented("""Path must be "/". Other path support is not yet implemented.""")
val rootERD = root.elementRuntimeData
root.schemaDefinitionUnless(
rootERD.outputValueCalcExpr.isEmpty,
"The root element cannot have the dfdl:outputValueCalc property.")
val validationMode = ValidationMode.Off
val p = if (!root.isError) parser else null
val u = if (!root.isError) unparser else null
val ssrd = new SchemaSetRuntimeData(
p,
u,
this.diagnostics,
rootERD,
variableMap,
typeCalcMap)
if (root.numComponents > root.numUniqueComponents)
log(LogLevel.Info, "Compiler: component counts: unique %s, actual %s.",
root.numUniqueComponents, root.numComponents)
val dataProc = new DataProcessor(ssrd, tunable, self.compilerExternalVarSettings)
//
// now we fake serialize to a dev/null-type output stream which forces
// any lazy evaluation that hasn't completed to complete.
// Those things could signal errors, so we do this before we check for errors.
//
// Note that calling preSerialization is not sufficient, since that's only mixed into
// objects with lazy evaluation. A SSRD is just a tuple-like object, does not mixin
// preSerialization, and shouldn't need to. We need to
// serialize all its substructure to insure all preSerializations, that force
// all lazy evaluations, are done.
//
// Overhead-wise, this is costly, if the caller is about to save the processor themselves
// But as there have been cases of Runtime1 processors which end up doing lazy evaluation
// that ends up happening late, this eliminates a source of bugs, albeit, by masking them
// so they are not detectable.
//
// Best to address this for real when we refactor Runtime1 to fully separate it from
// the schema compiler. At that point we can draw a firmer line about the compiler's output
// being fully realized before runtime objects are constructed.
//
// We don't call save() here, because that does a few other things than just serialize.
val oos = new ObjectOutputStream(org.apache.commons.io.output.NullOutputStream.NULL_OUTPUT_STREAM)
oos.writeObject(dataProc)

if (dataProc.isError) {
// NO longer printing anything here. Callers must do this.
// val diags = dataProc.getDiagnostics
// log(LogLevel.Error,"Compilation (DataProcessor) reports %s compile errors/warnings.", diags.length)
// diags.foreach { diag => log(LogLevel.Error, diag.toString()) }
} else {
log(LogLevel.Compile, "Parser = %s.", ssrd.parser.toString)
log(LogLevel.Compile, "Unparser = %s.", ssrd.unparser.toString)
log(LogLevel.Compile, "Compilation (DataProcesor) completed with no errors.")
}
dataProc
}

}
@@ -0,0 +1,26 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!--
This is a bad DTD, on purpose. The external DTD will not be found
and an error to that effect tells us if the XML processor was processing
the DTD, or ignoring it.
-->
<!DOCTYPE root SYSTEM "notFound.dtd">
<root xmlns="http://example.com">
<foo xmlns="">bar</foo>
</root>
@@ -19,7 +19,6 @@ package org.apache.daffodil.infoset

import org.apache.daffodil.xml.XMLUtils
import org.apache.daffodil.util._
import org.apache.daffodil.Implicits._
import org.apache.daffodil.compiler._
import org.junit.Assert._
import org.junit.Test
@@ -90,7 +89,6 @@ object TestInfoset {
val msgs = pf.getDiagnostics.map { _.getMessage() }.mkString("\n")
fail("pf compile errors: " + msgs)
}
pf.sset.root.erd.preSerialization // force evaluation of all compile-time constructs
val dp = pf.onPath("/").asInstanceOf[DataProcessor]
if (dp.isError) {
val msgs = dp.getDiagnostics.map { _.getMessage() }.mkString("\n")

0 comments on commit 62ac104

Please sign in to comment.