Skip to content
Permalink
Browse files
Evaluate suspensions earlier
- The goal here is to evaluate suspensions much earlier during an
  unparse rather than waiting to the very end. This should hopefully
  allow Daffodil to release buffered data and reduce the memory overhead
  required while parsing some files.
- The implementation is inspired by how Java handles garbage collection.
  A new SuspensionTracker is add to maintain two lists of suspensions, a
  young and an old. The young list is evaluated shortly after the
  suspension is created. If a young suspension fails to evaluate, we
  bump it to the old suspension list. We evaluate old suspensions much
  less frequently. This logic allows us to hopefully reduce buffering
  without adding much overhead for evaluating suspensions that are
  blocked for a while, all without making the code too complex.
- This SuspensionTracker outputs statistics in debug mode so we can
  easily determine how many suspensions were created and how many times
  suspensions were evaluated. Ideally the numbers would be the same.
- Add new Assert.invariant macro to provide a custom error message when an
  invariant fails

These changes also revealed a few bugs related to unparsing. The
following changes were to resolve those issues:

- Remove the ability to create nested suspensions. It is possible for
  the optional separator suspension to require a mandatory text
  alignment suspension, which means it's possible to have nested
  suspensions, which breaks assumptions of how our data output streams
  work. Rather than trying to support this, this uncouples the MTA and
  separator unparsers and adds special logic to handle optional
  alignment with optional separators. Additionally, add test showing
  that MTA is correctly skipped when optional separators are not
  unparsed due to a zero length field
- Fix bug where setting the absolute end position state modified the
  maybeStartDOS pointer rather than the maybeEndDOS pointer. Because the
  maybeDOS pointers are used to determine if the position value
  represents an absolute or relative position, this bug made is so
  setting the absolute end position changed a relative start position
  into absolute. This broke some length calculations since they then
  thought they knew were an element starts.
- Fix bug when delivering buffered content. We currently use the format
  that causes the direct DOS to be finished to deliver all following
  finished DOS's. But it's possible that the bitOrder's of DOS can
  change and thus this FormatInfo isn't correct for following DOSs.
  Instead, when a DOS is finished, we save the FormatInfo last used when
  it was finished. Then when delivering DOSs, we use this saved
  FormatInfo. This way, we always deliver buffered DOS using the
  FormatInfo that was last used to write.
- Fix issue in logging macro. If you did something like:

    log(LogLevel.Error, "%d %f", 1, 3.0)

  The log macro wraps the args in a Seq(). With no type on the Seq,
  Scala will cast the parameters to similar types, so Seq(1, 3.0)
  becomes Seq[Double](1.0, 3.0). These types no longer match the types
  in the printf style message, and so you get a TypeCast failure. By
  specifying Seq[Any] in the logger macro, it disables this casting and
  parameters keep their expected types. Add a test for this issue and
  refactor logger testing a bit to be more thread safe and clear.

DAFFODIL-1272
  • Loading branch information
stevedlawrence committed Oct 14, 2020
1 parent 565712f commit bfe0a9d03efb41224ed5b4a5324eb31a23fb3709
Show file tree
Hide file tree
Showing 24 changed files with 515 additions and 168 deletions.
@@ -183,7 +183,18 @@ trait SequenceGrammarMixin
*/
lazy val hasSeparator = !separatorParseEv.isConstantEmptyString

/**
* Note that the sequence separator does not include the delimMTA grammar
* like initiators/terminators. This is because unparsing needs to uncouple
* MTA and Separator unparsers to properly support optional separators with
* potential alignment. Grammars are expected to handle the delimMTA when
* necessary
*/
lazy val sequenceSeparatorMTA = prod("sequenceSeparatorMTA", hasSeparator) {
delimMTA
}
lazy val sequenceSeparator = prod("separator", hasSeparator) {
delimMTA ~ SequenceSeparator(this)
SequenceSeparator(this)
}

}
@@ -66,7 +66,7 @@ trait TermGrammarMixin
/**
* Mandatory text alignment for delimiters
*/
protected final lazy val delimMTA = prod(
final lazy val delimMTA = prod(
"delimMTA",
{
hasDelimiters
@@ -92,6 +92,8 @@ abstract class SequenceChild(protected val sq: SequenceTermBase, child: Term, gr
protected def separatedHelper: SeparatedHelper
protected def unseparatedHelper: UnseparatedHelper

protected lazy val sepMtaGram = sq.delimMTA

protected lazy val sepGram = {
sscb match {
case _: PositionalLike =>
@@ -101,8 +103,8 @@ abstract class SequenceChild(protected val sq: SequenceTermBase, child: Term, gr
sq.sequenceSeparator
}

protected lazy val sepParser = sepGram.parser
protected lazy val sepUnparser = sepGram.unparser
protected lazy val sepParser = (sepMtaGram ~ sepGram).parser
protected lazy val sepUnparser = (sepMtaGram ~ sepGram).unparser

final protected def srd = sq.sequenceRuntimeData
final protected def trd = child.termRuntimeData
@@ -24,6 +24,8 @@ import org.apache.daffodil.processors.parsers._
import org.apache.daffodil.schema.annotation.props.SeparatorSuppressionPolicy
import org.apache.daffodil.processors.unparsers._
import org.apache.daffodil.util.Misc
import org.apache.daffodil.util.Maybe
import org.apache.daffodil.util.MaybeInt

/**
* Base class for all kinds of sequences.
@@ -45,8 +47,27 @@ abstract class SequenceCombinator(sq: SequenceTermBase, sequenceChildren: Seq[Se
class OrderedSequence(sq: SequenceTermBase, sequenceChildrenArg: Seq[SequenceChild])
extends SequenceCombinator(sq, sequenceChildrenArg) {

private lazy val sepMtaGram = sq.sequenceSeparatorMTA
// Note that we actually only ever use one of these depending on
// various factors. If there is an optional separator and a suspension is
// used to unparse that separator, then we cannot use the sepMtaUnparser
// because it results in nested suspensions, which isn't allowed. In that
// case, the suspension ends up handling both the optional separator and
// alignment using sepMtaAlignmentMaybe.
private lazy val (sepMtaAlignmentMaybe, sepMtaUnparserMaybe) =
if (sepMtaGram.isEmpty) {
(MaybeInt.Nope, Maybe.Nope)
} else {
(MaybeInt(sq.knownEncodingAlignmentInBits), Maybe(sepMtaGram.unparser))
}

private lazy val sepGram = sq.sequenceSeparator
private lazy val sepParser = sepGram.parser

private lazy val sepParser = (sepMtaGram ~ sepGram).parser
// we cannot include the mtaGram in the sepUnparser. This is because the
// sepUnparser is run in a suspension, and the mtaGram can result in
// suspension, which means nested suspensions. Instead, the unparser handles
// the mta parser differently to avoid this
private lazy val sepUnparser = sepGram.unparser

private lazy val sequenceChildren = sequenceChildrenArg.toVector
@@ -82,7 +103,11 @@ class OrderedSequence(sq: SequenceTermBase, sequenceChildrenArg: Seq[SequenceChi
sq.hasSeparator match {
case true => new OrderedSeparatedSequenceUnparser(
srd,
sq.separatorSuppressionPolicy, sq.separatorPosition, sepUnparser,
sq.separatorSuppressionPolicy,
sq.separatorPosition,
sepMtaAlignmentMaybe,
sepMtaUnparserMaybe,
sepUnparser,
childUnparsers)
case false =>
new OrderedUnseparatedSequenceUnparser(
@@ -98,8 +123,27 @@ class UnorderedSequence(sq: SequenceTermBase, sequenceChildrenArg: Seq[SequenceC

import SeparatedSequenceChildBehavior._

private lazy val sepMtaGram = sq.delimMTA
// Note that we actually only ever use one of these depending on
// various factors. If there is an optional separator and a suspension is
// used to unparse that separator, then we cannot use the sepMtaUnparser
// because it results in nested suspensions, which isn't allowed. In that
// case, the suspension ends up handling both the optional separator and
// alignment using sepMtaAlignmentMaybe.
private lazy val (sepMtaAlignmentMaybe, sepMtaUnparserMaybe) =
if (sepMtaGram.isEmpty) {
(MaybeInt.Nope, Maybe.Nope)
} else {
(MaybeInt(sq.knownEncodingAlignmentInBits), Maybe(sepMtaGram.unparser))
}

private lazy val sepGram = sq.sequenceSeparator
private lazy val sepParser = sepGram.parser

private lazy val sepParser = (sepMtaGram ~ sepGram).parser
// we cannot include the mtaGram in the sepUnparser. This is because the
// sepUnparser is run in a suspension, and the mtaGram can result in
// suspension, which means nested suspensions. Instead, the unparser handles
// the mta parser differently to avoid this
private lazy val sepUnparser = sepGram.unparser

private lazy val sequenceChildren = sequenceChildrenArg.toVector
@@ -173,7 +217,11 @@ class UnorderedSequence(sq: SequenceTermBase, sequenceChildrenArg: Seq[SequenceC
sq.hasSeparator match {
case true => new OrderedSeparatedSequenceUnparser(
srd,
SeparatorSuppressionPolicy.AnyEmpty, sq.separatorPosition, sepUnparser,
SeparatorSuppressionPolicy.AnyEmpty,
sq.separatorPosition,
sepMtaAlignmentMaybe,
sepMtaUnparserMaybe,
sepUnparser,
childUnparsers)
case false =>
new OrderedUnseparatedSequenceUnparser(
@@ -99,6 +99,7 @@ trait SchemaSetRuntime1Mixin { self : SchemaSet =>
// diags.foreach { diag => log(LogLevel.Error, diag.toString()) }
} else {
log(LogLevel.Compile, "Parser = %s.", ssrd.parser.toString)
log(LogLevel.Compile, "Unparser = %s.", ssrd.unparser.toString)
log(LogLevel.Compile, "Compilation (DataProcesor) completed with no errors.")
}
dataProc
@@ -411,8 +411,24 @@ class DirectOrBufferedDataOutputStream private[io] (
Assert.invariant(isDirect)
}

/**
* We need to keep track of what FormatInfo was used as the last write to
* this DOS. This way, when this DOS becomes direct and we start
* delivering following buffered DOS's, we are using the correct FormatInfo
* to check for bitOrder changes and to write data correctly. We set this
* value when this DOS is marked as finished with setFinished() and then use
* it when delivering buffered content.
*/
private var finishedFormatInfo: Maybe[FormatInfo] = Nope

override def setFinished(finfo: FormatInfo): Unit = {
Assert.usage(!isFinished)
Assert.usage(finishedFormatInfo.isEmpty)

// this DOS is finished, save this format info so we use it when delivering
// buffered content
finishedFormatInfo = One(finfo)

// if we are direct, and there's a buffer following this one
//
// we know it isn't finished (because of flush() above)
@@ -433,9 +449,14 @@ class DirectOrBufferedDataOutputStream private[io] (
first.setAbsStartingBitPos0b(dabp)
}

DirectOrBufferedDataOutputStream.deliverBufferContent(directStream, first, finfo) // from first, into direct stream's buffers
// from first, into direct stream's buffers. Make sure we use the
// format info last used for this DOS.
DirectOrBufferedDataOutputStream.deliverBufferContent(
directStream,
first,
directStream.finishedFormatInfo.get)

// so now the first one is an EMPTY not necessarily a finished buffered DOS
//
first.convertToDirect(directStream) // first is now the direct stream
directStream.setDOSState(Uninitialized) // old direct stream is now dead
directStream = first // long live the new direct stream!
@@ -84,12 +84,21 @@ object Assert extends Assert {
def usage(testAbortsIfFalse: Boolean): Unit = macro AssertMacros.usageMacro1

/**
* test for something that the program is supposed to be insuring.
* test for something that the program is supposed to be ensuring.
*
* This is for more complex invariants than the simple 'impossible' case.
*/
def invariant(testAbortsIfFalse: Boolean): Unit = macro AssertMacros.invariantMacro1

/**
* test for something that the program is supposed to be ensuring, with a custom error message.
*
* This is for more complex invariants than the simple 'impossible' case.
*
* The msg parameter is only evaluated if the test fails
*/
def invariant(testAbortsIfFalse: Boolean, msg: String): Unit = macro AssertMacros.invariantMacro2

/**
* Conditional behavior for NYIs
*/
@@ -96,18 +96,6 @@ abstract class LogWriter {
}
}

object ForUnitTestLogWriter extends LogWriter {
var loggedMsg: String = null
// protected val writer = actor { loop { react { case msg : String =>
// loggedMsg = msg
// Console.out.println("Was Logged: " + loggedMsg)
// Console.out.flush()
// } } }
def write(msg: String): Unit = {
loggedMsg = msg
}
}

object NullLogWriter extends LogWriter {
//protected val writer = actor { loop { react { case msg : String => } } }
def write(msg: String): Unit = {
@@ -21,63 +21,72 @@ import org.junit.Assert._
import org.apache.daffodil.exceptions._
import org.junit.Test

class MyClass extends Logging {

lazy val msg = {
// System.err.println("computing the message string.")
"Message %s"
}

lazy val argString = {
// System.err.println("computing the argument string.")
"about nothing at all."
class ForUnitTestLogWriter extends LogWriter {
var loggedMsg: String = null
def write(msg: String): Unit = {
loggedMsg = msg
}
}

lazy val bombArg: String = Assert.abort("bombArg should not be evaluated")
lazy val bombMsg: String = Assert.abort("bombMsg should not be evaluated")
class TestLogger {
@Test def test1(): Unit = {

def logSomething(): Unit = {
class A extends Logging {

ForUnitTestLogWriter.loggedMsg = null
lazy val msg = "Message %s"
lazy val argString = "about nothing at all."

setLogWriter(ForUnitTestLogWriter)
lazy val bombArg: String = Assert.abort("bombArg should not be evaluated")
lazy val bombMsg: String = Assert.abort("bombMsg should not be evaluated")

// won't log because below threshhold. Won't even evaluate args.
log(LogLevel.Debug, msg, bombArg) // Won't show up in log. Won't bomb.
def logSomething(): Unit = {
// won't log because below threshhold. Won't even evaluate args.
log(LogLevel.Debug, msg, bombArg) // Won't show up in log. Won't bomb.

// alas, no by-name passing of var-args.
// so instead, we pass by name, a by-name/lazy constructed tuple
// instead.
// alas, no by-name passing of var-args.
// so instead, we pass by name, a by-name/lazy constructed tuple
// instead.

// Illustrates that our Glob object, because its parts are all passed by name,
// does NOT force evaluation of the pieces that go into it.
// So it really makes the whole system behave like it was entirely lazy.
// Illustrates that our Glob object, because its parts are all passed by name,
// does NOT force evaluation of the pieces that go into it.
// So it really makes the whole system behave like it was entirely lazy.

// If we're logging below the threshhold of Debug, then this log line
// doesn't evaluate bombMsg or bombArg. So it is ok if those are expensive
// to compute.
// If we're logging below the threshhold of Debug, then this log line
// doesn't evaluate bombMsg or bombArg. So it is ok if those are expensive
// to compute.

log(LogLevel.Debug, bombMsg, bombArg) // bomb is not evaluated at all.
log(LogLevel.Error, msg, argString) // Will show up in log.
log(LogLevel.Debug, bombMsg, bombArg) // bomb is not evaluated at all.
log(LogLevel.Error, msg, argString) // Will show up in log.
}
}

setLoggingLevel(LogLevel.Info)
setLogWriter(ConsoleWriter)
}
}

class TestLogger {

@Test def test1(): Unit = {
val c = new MyClass
val lw = new ForUnitTestLogWriter
val c = new A
c.setLoggingLevel(LogLevel.Error)
c.setLogWriter(lw)
c.logSomething()
Console.out.flush()
val fromLog = ForUnitTestLogWriter.loggedMsg
// println(fromLog)
val fromLog = lw.loggedMsg
val hasExpected = fromLog.contains("Message about nothing at all.")
val doesntHaveUnexpected = !fromLog.contains("number 1")
assertTrue(hasExpected)
assertTrue(doesntHaveUnexpected)
}

@Test def test_var_args_different_primitives(): Unit = {

class A extends Logging {
def logSomething(): Unit = {
log(LogLevel.Error, "Message: int=%d float=%f", 1, 3.0)
}
}

val lw = new ForUnitTestLogWriter
val a = new A
a.setLoggingLevel(LogLevel.Error)
a.setLogWriter(lw)
a.logSomething()
assertTrue(lw.loggedMsg.contains("Message: int=1 float=3.0"))
}

}
@@ -57,6 +57,16 @@ object AssertMacros {
"""
}

def invariantMacro2(c: Context)(testAbortsIfFalse: c.Tree, msg: c.Tree): c.Tree = {
import c.universe._

q"""
if (!($testAbortsIfFalse)) {
Assert.abort("Invariant broken. " + { $msg })
}
"""
}

def notYetImplementedMacro0(c: Context)(): c.Tree = {
import c.universe._

@@ -59,7 +59,7 @@ object LoggerMacros {
val $level = $lvl
val $l = $level.lvl
if ($ths.getLoggingLevel().lvl >= $l)
$ths.doLogging($level, $msg, Seq(..$args))
$ths.doLogging($level, $msg, Seq[Any](..$args))
}
"""
}

0 comments on commit bfe0a9d

Please sign in to comment.