Skip to content

Commit

Permalink
Support XML strings in XMLTextInfosetInputter/Outputter
Browse files Browse the repository at this point in the history
- When outputting string elements with the stringAsXml runtime property
  set to true, instead of escaping the simple element content, we
  instead output the content to raw XML as if it were part of the
  infoset. When inputting, we read the XML and convert it back to a
  string as the simple content. When the XML is added to the infoset, a
  wrapper element called "stringAsXml" is added to ensure the default
  namespace is reset. As an example:

  Infoset without the stringAsXml property:

    <foo>&lt;payload&gt;content&lt;/payload&gt;</foo>

  Infoset with the stringAsXml property:

    <foo><stringAsXml xmlns=""><payload>content</payload></stringAsXml></foo>

  Note that there are multiple ways to read/write XML that are
  syntactically different but semantically the same, making it unlikely
  that unparsed XML will be byte-for-byte the same as the original XML.

  Also note that the result of the XMLTextInfosetOutputter is used for
  "full" validation. Because this changes its output, essentially
  converting a simple string type into a complex type, this will break
  full validation if stringAsXml is used. If full validation is needed,
  one must use external validation with a modified schema. And example
  of this schema is included in new tests.

- We currently ignore the return value of InfosetOutputter functions,
  and any exceptions thrown just bubble to the top and appear as an
  unexpected exception. Instead, if the InfosetOutputter throws an
  exception, we create an SDE. The logic for finalizing the walker is
  moved into the doParse function so that the SDE is caught and
  correctly added as a diagnostic. This is need to handle
  non-well-formed XML. Additionally, InfosetOutputter's returning false
  has been deprecated and will result in a usage error.

- We cannot use normal TDML tests to test this behavior because the
  XMLTextInfosetOutputter outputs a different infoset than the other
  infoset outputters, hand written tests added to verify the correct
  behavior.

DAFFODIL-2708
  • Loading branch information
stevedlawrence committed Aug 12, 2022
1 parent 87a0d8b commit 3b213ce
Show file tree
Hide file tree
Showing 26 changed files with 1,135 additions and 59 deletions.
Expand Up @@ -17,7 +17,12 @@

package org.apache.daffodil.infoset

import scala.util.Failure
import scala.util.Success
import scala.util.Try

import org.apache.daffodil.exceptions.Assert
import org.apache.daffodil.exceptions.ThrowsSDE
import org.apache.daffodil.util.MStackOfInt
import org.apache.daffodil.util.MStackOf

Expand Down Expand Up @@ -408,14 +413,28 @@ class InfosetWalker private (
containerIndexStack.setTop(top + 1)
}

private def doOutputter(outputterFunc: => Boolean, desc: String, context: ThrowsSDE): Unit = {
Try(outputterFunc) match {
case Success(true) => // success
// $COVERAGE-OFF$
case Success(false) => Assert.usageError("InfosetOutputter false return value is deprecated. Throw an Exception instead.")
// $COVERAGE-ON$
case Failure(e) => {
val cause = e.getCause
val msg = if (cause == null) e.toString else cause.toString
context.SDE("Failed to %s: %s", desc, msg)
}
}
}

/**
* Start the document. Note that because the top of container index is
* initialized to one less that the starting index, we also call
* moveToNextSibling to increment the starting index to the correct
* position.
*/
private def infosetWalkerStepStart(): Unit = {
outputter.startDocument()
doOutputter(outputter.startDocument(), "start infoset document", startingContainerNode.erd)
moveToNextSibling()
}

Expand All @@ -425,7 +444,7 @@ class InfosetWalker private (
* should not call walk() again because it is finished.
*/
private def infosetWalkerStepEnd(): Unit = {
outputter.endDocument()
doOutputter(outputter.endDocument(), "end infoset document", startingContainerNode.erd)
containerNodeStack = null
containerIndexStack = null
finished = true
Expand All @@ -452,8 +471,8 @@ class InfosetWalker private (
if (child.isSimple) {
if (!child.isHidden || walkHidden) {
val simple = child.asInstanceOf[DISimple]
outputter.startSimple(simple)
outputter.endSimple(simple)
doOutputter(outputter.startSimple(simple), "start infoset simple element", simple.erd)
doOutputter(outputter.endSimple(simple), "end infoset simple element", simple.erd)
}
// now we can remove this simple element to free up memory
containerNode.freeChildIfNoLongerNeeded(containerIndex, releaseUnneededInfoset)
Expand All @@ -462,9 +481,11 @@ class InfosetWalker private (
// must be complex or array, exact same logic for both
if (!child.isHidden || walkHidden) {
if (child.isComplex) {
outputter.startComplex(child.asInstanceOf[DIComplex])
val complex = child.asInstanceOf[DIComplex]
doOutputter(outputter.startComplex(complex), "start infoset complex element", complex.erd)
} else {
outputter.startArray(child.asInstanceOf[DIArray])
val array = child.asInstanceOf[DIArray]
doOutputter(outputter.startArray(array), "start infoset array", array.erd)
}
moveToFirstChild(child)
} else {
Expand All @@ -485,9 +506,11 @@ class InfosetWalker private (

// create appropriate end event
if (containerNode.isComplex) {
outputter.endComplex(containerNode.asInstanceOf[DIComplex])
val complex = containerNode.asInstanceOf[DIComplex]
doOutputter(outputter.endComplex(complex), "end infoset complex element", complex.erd)
} else {
outputter.endArray(containerNode.asInstanceOf[DIArray])
val array = containerNode.asInstanceOf[DIArray]
doOutputter(outputter.endArray(array), "end infoset array", array.erd)
}

// we've ended this array/complex associated with the container, so we
Expand Down

0 comments on commit 3b213ce

Please sign in to comment.