Add support for dfdl:checkRangeInclusive/checkRangeExclusive functions by olabusayoT · Pull Request #1199 · apache/daffodil

olabusayoT · 2024-03-28T20:09:51Z

add implementation for checkRangeInclusive/Exclusive in Expressions and DFDLFunctions
add tests for checkRangeInclusive/Exclusive for numeric types and non-numeric types

tuxji

Some questions first...

tuxji · 2024-03-29T00:47:31Z

+    checkArgCount(3)
+    val argDPath = args(0).compiledDPath
+    val rangeFrom = args(1).compiledDPath
+    val rangeTo = args(2).compiledDPath


You can use arg1, arg2, and arg3 which are evaluated only once (their evaluation calls checkArgCount(3) too) instead of repeating work. That is, let's remove the checkArgCount(3) call and change the next 3 lines to:

val argDPath = arg1.compiledDPath val rangeFrom = arg2.compiledDPath val rangeTo = arg3.compiledDPath

Also, if you reference arg1/arg2/arg3 it will force evaluation of line 3056 and call checkArgCount(3), so you can also remove the duplicate check on 3061.

tuxji · 2024-03-29T00:48:42Z

+    checkArgCount(3)
+    val argDPath = args(0).compiledDPath
+    val rangeFrom = args(1).compiledDPath
+    val rangeTo = args(2).compiledDPath


Likewise, can use arg1, arg2, arg3 here.

tuxji · 2024-03-29T00:53:05Z

+    val res = executeCheck(
+      dataVal: DataValue.DataValuePrimitiveNullable,
+      minVal: DataValue.DataValuePrimitiveNullable,
+      maxVal: DataValue.DataValuePrimitiveNullable,


I'm confused by the syntax here. If this is a function call, which it appears to be, why do the arguments look like parameter definitions? That is, why not just call executeCheck(dataVal, minVal, maxVal)?

tuxji · 2024-03-29T01:06:24Z

+    val saved = dstate.currentNode
+    dataRecipe.run(dstate)
+    val dataVal = dstate.currentValue
+    dstate.setCurrentNode(saved)
+    minRecipe.run(dstate)
+    val minVal = dstate.currentValue
+    dstate.setCurrentNode(saved)
+    maxRecipe.run(dstate)
+    val maxVal = dstate.currentValue


I'm not familiar with evalution of expressions. Why does this code call dstate.setCurrentNode(saved) after both dataRecipe.run(state) and minRecipe.run(dstate), but not after maxRecipe.run(dstate)? Why doesn't dstate need to have its saved state restored after the last maxRecipe,run(dstate)?

The currentNode DState can be used by a run implementation to know where to evaluate a relative path expression from (e.g. ../foo/bar/baz).

But calling run can also potentially mutate currentNode, changing it to something else, which can interfere with other calls to run, so it needs to be reset.

For example, say we had something like this:

dfdl:checkRangeExclusive(../foo/bar/baz, ../uno/dos/tres, ../one/two/three)`

Before we call run(), let's say currentNode points to an element called root.

After we call dataRecipe.run() which evaluates ../foo/bar/baz relative to root, the currentNode will point the baz element, because DState was mutated. If we do not reset currentNode back to root, then when we call minRecipe.run() to evaluate ../uno/dos/tres that relative path will be evaluated relative to the baz element instead of the root element. So that's why we need to save and restore curentNode after each call to run.

Note that we don't have to restore currentNode after calling maxReipce.run() because we aren't personally calling anymore runs, so whatever currentNode ended up as doesn't matter to us. If some other expression was called that led to this run() being called, then they were responsible for saving and restoring currentNode if they needed it.

Note that we don't have to restore currentNode after calling maxReipce.run() because we aren't personally calling anymore runs, so whatever currentNode ended up as doesn't matter to us. If some other expression was called that led to this run() being called, then they were responsible for saving and restoring currentNode if they needed it.

Actually, why do run implementations even mutate expression state (current node, current value) inside dstate at all? A more functional way would be to pass dstate or a smaller expression state contained by dstate to run(exprState) as a parameter and receive a new expression state (with the new current node and new current value) as a result value from the run(exprState) call. Then these dstate.setCurrentNode(saved) calls would go away because dstate's expression state (current node, current value) never gets mutated even after it's passed to the run implementation.

I don't remember, but maybe we do it to mimic PState, which is mutable? Back in the early days PState wasn't mutable and we just copied it each time something changed, but they overhead of copies started becoming very noticeable so we switched to a mutable PState.

DState might be small enough that the overhead of creating multiple DStates might not be noticeable? Hard to say with out making the change, and it would probably be a pretty significant change--all the run() functions would need to be changed to allocate a new DState. I would be much cleaner and less prone to error though.

tuxji · 2024-03-29T01:09:36Z

+    (dataVal.value, minVal.value, maxVal.value) match {
+      case (data: Integer, min: Integer, max: Integer) => data >= min && data < max
+      case (data: java.lang.Double, min: java.lang.Double, max: java.lang.Double) =>
+        data >= min && data < max


This range check looks like it's still doing an inclusive comparison on the min side and doing an exclusive comparison only on the max side. Is that how the DFDL function "checkRangeExclusive" is supposed to work?

There wasn't any information on the DFDL spec about how to treat exclusive, and the redmine link on the tickets leads to a 404 error and attempts to find the issue referenced were not successful so I used the mathematical definition of exclusive, which deals with just the upper value (as seen in this SO post: https://stackoverflow.com/questions/39010041/what-is-the-meaning-of-exclusive-and-inclusive-when-describing-number-ranges)

Here's the latest DFDL spec: https://github.com/OpenGridForum/DFDL/blob/master/docs/current/gwdrp-dfdl-v1.0.8-GFD-R-P.240.pdf

Page 184 talks about these functions, but doesn't clarify what it means by inclusive/exclusive.

An alternative interpretation is that the min/max values are treated similar to minInclusive, maxInclusive, minExclusive, and maxExclusive facet restrictions, which is more like what tuxji (and me) were expecting. But your reasoning makes sense too.

@mbeckerle, do you know the intention of these functions?

Inclusive means both ends are part of the range, Exclusive means both ends are outside the range.

This is only seems ambiguous in English as in "1 to 10 inclusive" where it becomes a question as to whether you mean inclusive to apply to "1 to 10", or just "10". But as "inclusive" is an adjective that describes a characteristic of an interval/range, not a single number. So I claim this is not ambiguous. It applies to both ends of the range, always. It requires more words to talk about the mixed combinations of including and excluding.

In the DFDL functions, both ends of the ranges are definitely what was intended.

tuxji · 2024-03-29T01:13:28Z

+<!--          <isInRangeExc>false</isInRangeExc>-->
+<!--        </expMinMax_mixedIntAndFloat>-->
+<!--      </tdml:dfdlInfoset>-->
+<!--    </tdml:infoset>-->


Commented out code shouldn't be left around for a long time.

stevedlawrence

Small suggestions to reduce duplication.

stevedlawrence · 2024-03-29T11:20:47Z

+    checkArgCount(3)
+    val argDPath = args(0).compiledDPath
+    val rangeFrom = args(1).compiledDPath
+    val rangeTo = args(2).compiledDPath


Also, if you reference arg1/arg2/arg3 it will force evaluation of line 3056 and call checkArgCount(3), so you can also remove the duplicate check on 3061.

stevedlawrence · 2024-03-29T11:33:49Z

+    val rangeFrom = args(1).compiledDPath
+    val rangeTo = args(2).compiledDPath
+    val c = conversions
+    val res = new CompiledDPath(DFDLCheckRangeExclusive(argDPath, rangeFrom, rangeTo) +: c)


DFDLCheckRangeExclusiveExpr is exactly the same as the Inclusive version except for this one line. What are your thoughts on only having a single DFDLCheckRangeExpr that takes a boolean parameter (e.g. isExcsluve) that signifies inclusive vs exclusive? Then, using a suggestion below you could have a single DFDLCheckRange class that accepts min/max comparision operations, something like:

val (_, _, lt, le, _) = ComparisonOpts.forType(targetType) val compare = if (isExclusive) lt else le val res = new CompiledDPath(DFDLCheckRange(argDPath, rangeFrom, rangeTo, compare) +: c)

Then we can combine DFDLCheckRangeInclusive/Exclusive to a single DFDLCheckRange class that does something like:

def run(...) { ... val res = compare.operate(minValue, dataValue) && compare.operate(dataValue, maxValue) res }

The benefit to this is approach is we avoid lots of duplication and we also figure out the right comparison logic/function at schema compile time instead of run-time.

Just a note that the above code is run treats the minVal as inclusive, or exclusive depending on the function called, answering the question within this comment: #1199 (comment)

stevedlawrence · 2024-03-29T11:58:28Z

+      case (data: java.math.BigDecimal, min: java.math.BigDecimal, max: java.math.BigDecimal) =>
+        data.compareTo(min) >= 0 && data.compareTo(max) <= 0
+      case (data: BigInteger, min: BigInteger, max: BigInteger) =>
+        data.compareTo(min) >= 0 && data.compareTo(max) <= 0


This match case isn't exhaustive. It's missing Short, Byte, and Long.

Though, instead of doing the type check at runtime and duplicating a bunch of checks, what if we figure out the right ComparisonOps to use at compile time?

I suggest we create a new static function, something like

object ComparisonOps { def forType(type: NodeInfo.Kind): (CompareOpBase, CompareOpBase, CompareOpBase, CompareOpBase, CompareOpBase, CompareOpBase) = { type match { case PrimType.Boolean => (EQ_Compare, NE_Compare, LT_Boolean, LE_Boolean, GT_Boolean, GE_Boolean) case PrimType.Integer => (EQ_Compare, NE_Compare, LT_Integer, LE_Integer, GT_Integer, GE_Integer) ... } } ... }

So it returns a 6-tuple of all the comparison functions that might be needed for a specific type. We can use this anywhere we need comparison functions, like in the DFDLCheckRangeExpr suggested above. We could also use it in ComparisonExpression. That probably wants to be something like this:

lazy val compareOp = { val (eq, ne, lt, le, gt, ge) = CopmarisonOps.forType(convertedArgType) op match { ... case "eq"=> eq case "ne" => ne case "lt" => lt ... } }

Note that all the comparison functions used in ComparisionExpression should be used in the new forType function.

We could also could use it in repTypeComparers, e.g.

lazy val ( repTypeCompareLT: NumberCompareOp, repTypeCompareLE: NumberCompareOp, ) = LV('repTypeComparers) { val (_, _, lt, le, _, _) = ComparisonOps.forType(repTypeElementDec.primType) (lt, le) }

There might be other places where we have a match/case to figure out comparison ops that could be simplified in this way an remove some duplication.

An improvement, maybe forType can be implemented as a Map for constant lookup and reduced allocations? e.g.

object ComparisonOps { val forType: Map[NodeInfo.Kind, (CompareOpBase, CompareOpBase, CompareOpBase, CompareOpBase, CompareOpBase, CompareOpBase)] = Map( PrimType.Boolean -> (EQ_Compare, NE_Compare, LT_Boolean, LE_Boolean, GT_Boolean, GE_Boolean), PrimType.Integer -> (EQ_Compare, NE_Compare, LT_Integer, LE_Integer, GT_Integer, GE_Integer), ... ) }

The use is still the same, e.g. ComparisonOps.forType(somePrimType)

stevedlawrence · 2024-03-29T12:18:38Z

+    val saved = dstate.currentNode
+    dataRecipe.run(dstate)
+    val dataVal = dstate.currentValue
+    dstate.setCurrentNode(saved)
+    minRecipe.run(dstate)
+    val minVal = dstate.currentValue
+    dstate.setCurrentNode(saved)
+    maxRecipe.run(dstate)
+    val maxVal = dstate.currentValue


The currentNode DState can be used by a run implementation to know where to evaluate a relative path expression from (e.g. ../foo/bar/baz).

But calling run can also potentially mutate currentNode, changing it to something else, which can interfere with other calls to run, so it needs to be reset.

For example, say we had something like this:

dfdl:checkRangeExclusive(../foo/bar/baz, ../uno/dos/tres, ../one/two/three)`

Before we call run(), let's say currentNode points to an element called root.

After we call dataRecipe.run() which evaluates ../foo/bar/baz relative to root, the currentNode will point the baz element, because DState was mutated. If we do not reset currentNode back to root, then when we call minRecipe.run() to evaluate ../uno/dos/tres that relative path will be evaluated relative to the baz element instead of the root element. So that's why we need to save and restore curentNode after each call to run.

Note that we don't have to restore currentNode after calling maxReipce.run() because we aren't personally calling anymore runs, so whatever currentNode ended up as doesn't matter to us. If some other expression was called that led to this run() being called, then they were responsible for saving and restoring currentNode if they needed it.

stevedlawrence

+1 👍 just a couple small suggestion

stevedlawrence · 2024-03-30T10:20:14Z

+      case "gt" => gt
+      case "le" => le
+      case "ge" => ge
      case _ => subsetError("Unsupported operation '%s' on type %s.", op, convergedArgType)


I think we currently hit this subsetError at compile time for lt, le, gt, ge operations with xs:hexBinary, but I don't think we will anymore? Instead we will create the new LT_HexBinary, etc. classes at compile time and fail at runtime with an assertion. We probably want to keep the same subset behavior so it SDEs at compile time?

So, can you add a check somewhere in this function that creates this subset error for hex binary and those operations so we get the compile time subset error instead of a runtime assertion?

Also, I think this case probably wants to be an Assert.invariantFailed now, since this ComparisonExpression should only be created with one of these operators (see def Comp in DFDLExpressionParser.scala). If we ever hit this case, it's probably a bug.

Also, should we create a ticket to implement the rest of the operators for hexBinary and can remove this subset error and support those operators for hexBinary? Maybe we wait until we drop support for Java 8 and can use Arrays.compare() so it should be a trivial change.

Created https://issues.apache.org/jira/browse/DAFFODIL-2886

Though I think for ease of creating TDML tests, we should leave the error as a subsetError that results in a SDE rather than an Assert.invariantFailed. I can easily see someone creating a schema that tries to compare hexBinary elems and an SDE would be a better error for them to get until this is implemented.

we should leave the error as a subsetError that results in a SDE rather than an Assert.invariantFailed

Yeah, the way you made the change this should stay a subset error. In my head the change would be something like this:

if (convergedArgType == PrimType.HexBinary && (op == "lt || op == "gt" || op == "le" || op == ge")) { subsetError(...) } op match { ... }

So the hexBinary test would be outside the match case. If it were done that way the default case would want to be an Assert.invariant. But they way did in the match case you're right that it should be a subset error. And your way is probably more scala-y so is better than what I was thinking. And your new hexBinary tests confirm its doing the right thing 👍

stevedlawrence · 2024-03-30T10:36:55Z

 import org.apache.daffodil.runtime1.infoset.DataValue.DataValuePrimitive

+object ComparisonOps {
+  lazy val forType: Map[


Can you add scaladoc for this, especially making sure to mention the order of the returned 6-tuple so we don't have to read the code to figure out which operation is which? It's not too hard to figure it out, but it would make things just a little easier. Maybe also include an example of its use with the val (...) = ... to easily extract the comparison ops?

Another option might be to create a new case class and make that the value of the map, e.g.:

case class ComparisonOps(eq: ComparisonOp, ne: ComparisonOp, lt: ComparisonOp, ...) ... Map( PrimType.Boolean -> ComparisonOps(EQ_Compare, NE_Compare, LT_Boolean, ...), ... )

Then uses become something like:

val compOps = ComparisonOps.forType(targetTypeForSubExpr) val compare = if (isExclusive) compOps.lt else compOps.le

It's a little more verbose, but using it is a bit more clear and less prone to errors since it's impossible for callers of forType to get the order of the 6-tuple wrong.

stevedlawrence · 2024-03-30T10:54:54Z

+  // call to targetTypeForSubexpression since we don't care about the specific
+  // value of subexpr in targetTypeForSubexpression, and we need all 3 sub
+  // expressions to come to the right target target for the args
+  private lazy val targetTypeForSubExpr: NodeInfo.Kind = {


I think in places where we do something like this (i.e. figure out a common type for all arguments to convert to) we call this convergedArgType. We should do the same here for consistentcy.

stevedlawrence · 2024-03-30T11:07:53Z

+          <data>1</data>
+          <min>1.9</min>
+          <max>2.5</max>
+          <isInRangeInc>false</isInRangeInc>


Nice test 👍, if we converged the types wrong there's a good chance 1.9 would be cast to int, become 1, and isInRangeInc would be true and the test would fail.

tuxji

+1

Looks very nice, especially the descriptive names used in the tests!

tuxji · 2024-04-01T14:26:11Z

+ * @param eq represents the equality comparison
+ * @param ne represents the unequality comparison
+ * @param lt represents the less than comparison
+ * @param le represents the less than and equal to comparison


less than or equal to

tuxji · 2024-04-01T14:26:26Z

+ * @param lt represents the less than comparison
+ * @param le represents the less than and equal to comparison
+ * @param gt represents the greater than comparison
+ * @param ge represents the greater than and equal to comparison


greater than or equal to

- add implementation for checkRangeInclusive/Exclusive in Expressions and DFDLFunctions - create case class ComparisonOp instead of the 6-tuple for the forType Map value - add ComparisonOps.forType lazy val map and use in ComparisonExpression, RepTypeMixin and DFDLCheckRangeExpr - add tests for checkRangeInclusive/Exclusive for numeric types and non-numeric types - add unittest to test out usage exceptions for hexBinary comparisons DAFFODIL-1515

olabusayoT requested review from mbeckerle and stevedlawrence March 28, 2024 20:09

tuxji reviewed Mar 29, 2024

View reviewed changes

stevedlawrence requested changes Mar 29, 2024

View reviewed changes

stevedlawrence approved these changes Mar 30, 2024

View reviewed changes

tuxji approved these changes Apr 1, 2024

View reviewed changes

olabusayoT force-pushed the daf-1515-dfdlCheckRangeIncExc branch from 22ac0dd to 8bf9547 Compare April 1, 2024 22:29

olabusayoT merged commit fe6646f into apache:main Apr 2, 2024

olabusayoT deleted the daf-1515-dfdlCheckRangeIncExc branch April 2, 2024 15:28

Conversation

olabusayoT commented Mar 28, 2024

Uh oh!

tuxji left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevedlawrence left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevedlawrence left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tuxji left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants