Add support for dfdl:checkRangeInclusive/checkRangeExclusive functions#1199
Conversation
| checkArgCount(3) | ||
| val argDPath = args(0).compiledDPath | ||
| val rangeFrom = args(1).compiledDPath | ||
| val rangeTo = args(2).compiledDPath |
There was a problem hiding this comment.
You can use arg1, arg2, and arg3 which are evaluated only once (their evaluation calls checkArgCount(3) too) instead of repeating work. That is, let's remove the checkArgCount(3) call and change the next 3 lines to:
val argDPath = arg1.compiledDPath
val rangeFrom = arg2.compiledDPath
val rangeTo = arg3.compiledDPathThere was a problem hiding this comment.
Also, if you reference arg1/arg2/arg3 it will force evaluation of line 3056 and call checkArgCount(3), so you can also remove the duplicate check on 3061.
| checkArgCount(3) | ||
| val argDPath = args(0).compiledDPath | ||
| val rangeFrom = args(1).compiledDPath | ||
| val rangeTo = args(2).compiledDPath |
There was a problem hiding this comment.
Likewise, can use arg1, arg2, arg3 here.
| val res = executeCheck( | ||
| dataVal: DataValue.DataValuePrimitiveNullable, | ||
| minVal: DataValue.DataValuePrimitiveNullable, | ||
| maxVal: DataValue.DataValuePrimitiveNullable, |
There was a problem hiding this comment.
I'm confused by the syntax here. If this is a function call, which it appears to be, why do the arguments look like parameter definitions? That is, why not just call executeCheck(dataVal, minVal, maxVal)?
| val saved = dstate.currentNode | ||
| dataRecipe.run(dstate) | ||
| val dataVal = dstate.currentValue | ||
| dstate.setCurrentNode(saved) | ||
| minRecipe.run(dstate) | ||
| val minVal = dstate.currentValue | ||
| dstate.setCurrentNode(saved) | ||
| maxRecipe.run(dstate) | ||
| val maxVal = dstate.currentValue |
There was a problem hiding this comment.
I'm not familiar with evalution of expressions. Why does this code call dstate.setCurrentNode(saved) after both dataRecipe.run(state) and minRecipe.run(dstate), but not after maxRecipe.run(dstate)? Why doesn't dstate need to have its saved state restored after the last maxRecipe,run(dstate)?
There was a problem hiding this comment.
The currentNode DState can be used by a run implementation to know where to evaluate a relative path expression from (e.g. ../foo/bar/baz).
But calling run can also potentially mutate currentNode, changing it to something else, which can interfere with other calls to run, so it needs to be reset.
For example, say we had something like this:
dfdl:checkRangeExclusive(../foo/bar/baz, ../uno/dos/tres, ../one/two/three)`
Before we call run(), let's say currentNode points to an element called root.
After we call dataRecipe.run() which evaluates ../foo/bar/baz relative to root, the currentNode will point the baz element, because DState was mutated. If we do not reset currentNode back to root, then when we call minRecipe.run() to evaluate ../uno/dos/tres that relative path will be evaluated relative to the baz element instead of the root element. So that's why we need to save and restore curentNode after each call to run.
Note that we don't have to restore currentNode after calling maxReipce.run() because we aren't personally calling anymore runs, so whatever currentNode ended up as doesn't matter to us. If some other expression was called that led to this run() being called, then they were responsible for saving and restoring currentNode if they needed it.
There was a problem hiding this comment.
Note that we don't have to restore currentNode after calling maxReipce.run() because we aren't personally calling anymore runs, so whatever currentNode ended up as doesn't matter to us. If some other expression was called that led to this run() being called, then they were responsible for saving and restoring currentNode if they needed it.
Actually, why do run implementations even mutate expression state (current node, current value) inside dstate at all? A more functional way would be to pass dstate or a smaller expression state contained by dstate to run(exprState) as a parameter and receive a new expression state (with the new current node and new current value) as a result value from the run(exprState) call. Then these dstate.setCurrentNode(saved) calls would go away because dstate's expression state (current node, current value) never gets mutated even after it's passed to the run implementation.
There was a problem hiding this comment.
I don't remember, but maybe we do it to mimic PState, which is mutable? Back in the early days PState wasn't mutable and we just copied it each time something changed, but they overhead of copies started becoming very noticeable so we switched to a mutable PState.
DState might be small enough that the overhead of creating multiple DStates might not be noticeable? Hard to say with out making the change, and it would probably be a pretty significant change--all the run() functions would need to be changed to allocate a new DState. I would be much cleaner and less prone to error though.
| (dataVal.value, minVal.value, maxVal.value) match { | ||
| case (data: Integer, min: Integer, max: Integer) => data >= min && data < max | ||
| case (data: java.lang.Double, min: java.lang.Double, max: java.lang.Double) => | ||
| data >= min && data < max |
There was a problem hiding this comment.
This range check looks like it's still doing an inclusive comparison on the min side and doing an exclusive comparison only on the max side. Is that how the DFDL function "checkRangeExclusive" is supposed to work?
There was a problem hiding this comment.
There wasn't any information on the DFDL spec about how to treat exclusive, and the redmine link on the tickets leads to a 404 error and attempts to find the issue referenced were not successful so I used the mathematical definition of exclusive, which deals with just the upper value (as seen in this SO post: https://stackoverflow.com/questions/39010041/what-is-the-meaning-of-exclusive-and-inclusive-when-describing-number-ranges)
There was a problem hiding this comment.
Here's the latest DFDL spec: https://github.com/OpenGridForum/DFDL/blob/master/docs/current/gwdrp-dfdl-v1.0.8-GFD-R-P.240.pdf
Page 184 talks about these functions, but doesn't clarify what it means by inclusive/exclusive.
An alternative interpretation is that the min/max values are treated similar to minInclusive, maxInclusive, minExclusive, and maxExclusive facet restrictions, which is more like what tuxji (and me) were expecting. But your reasoning makes sense too.
@mbeckerle, do you know the intention of these functions?
There was a problem hiding this comment.
Inclusive means both ends are part of the range, Exclusive means both ends are outside the range.
This is only seems ambiguous in English as in "1 to 10 inclusive" where it becomes a question as to whether you mean inclusive to apply to "1 to 10", or just "10". But as "inclusive" is an adjective that describes a characteristic of an interval/range, not a single number. So I claim this is not ambiguous. It applies to both ends of the range, always. It requires more words to talk about the mixed combinations of including and excluding.
In the DFDL functions, both ends of the ranges are definitely what was intended.
| <!-- <isInRangeExc>false</isInRangeExc>--> | ||
| <!-- </expMinMax_mixedIntAndFloat>--> | ||
| <!-- </tdml:dfdlInfoset>--> | ||
| <!-- </tdml:infoset>--> |
There was a problem hiding this comment.
Commented out code shouldn't be left around for a long time.
stevedlawrence
left a comment
There was a problem hiding this comment.
Small suggestions to reduce duplication.
| checkArgCount(3) | ||
| val argDPath = args(0).compiledDPath | ||
| val rangeFrom = args(1).compiledDPath | ||
| val rangeTo = args(2).compiledDPath |
There was a problem hiding this comment.
Also, if you reference arg1/arg2/arg3 it will force evaluation of line 3056 and call checkArgCount(3), so you can also remove the duplicate check on 3061.
| val rangeFrom = args(1).compiledDPath | ||
| val rangeTo = args(2).compiledDPath | ||
| val c = conversions | ||
| val res = new CompiledDPath(DFDLCheckRangeExclusive(argDPath, rangeFrom, rangeTo) +: c) |
There was a problem hiding this comment.
DFDLCheckRangeExclusiveExpr is exactly the same as the Inclusive version except for this one line. What are your thoughts on only having a single DFDLCheckRangeExpr that takes a boolean parameter (e.g. isExcsluve) that signifies inclusive vs exclusive? Then, using a suggestion below you could have a single DFDLCheckRange class that accepts min/max comparision operations, something like:
val (_, _, lt, le, _) = ComparisonOpts.forType(targetType)
val compare = if (isExclusive) lt else le
val res = new CompiledDPath(DFDLCheckRange(argDPath, rangeFrom, rangeTo, compare) +: c)Then we can combine DFDLCheckRangeInclusive/Exclusive to a single DFDLCheckRange class that does something like:
def run(...) {
...
val res = compare.operate(minValue, dataValue) && compare.operate(dataValue, maxValue)
res
}The benefit to this is approach is we avoid lots of duplication and we also figure out the right comparison logic/function at schema compile time instead of run-time.
There was a problem hiding this comment.
Just a note that the above code is run treats the minVal as inclusive, or exclusive depending on the function called, answering the question within this comment: #1199 (comment)
| case (data: java.math.BigDecimal, min: java.math.BigDecimal, max: java.math.BigDecimal) => | ||
| data.compareTo(min) >= 0 && data.compareTo(max) <= 0 | ||
| case (data: BigInteger, min: BigInteger, max: BigInteger) => | ||
| data.compareTo(min) >= 0 && data.compareTo(max) <= 0 |
There was a problem hiding this comment.
This match case isn't exhaustive. It's missing Short, Byte, and Long.
Though, instead of doing the type check at runtime and duplicating a bunch of checks, what if we figure out the right ComparisonOps to use at compile time?
I suggest we create a new static function, something like
object ComparisonOps {
def forType(type: NodeInfo.Kind): (CompareOpBase, CompareOpBase, CompareOpBase, CompareOpBase, CompareOpBase, CompareOpBase) = {
type match {
case PrimType.Boolean => (EQ_Compare, NE_Compare, LT_Boolean, LE_Boolean, GT_Boolean, GE_Boolean)
case PrimType.Integer => (EQ_Compare, NE_Compare, LT_Integer, LE_Integer, GT_Integer, GE_Integer)
...
}
}
...
}So it returns a 6-tuple of all the comparison functions that might be needed for a specific type. We can use this anywhere we need comparison functions, like in the DFDLCheckRangeExpr suggested above. We could also use it in ComparisonExpression. That probably wants to be something like this:
lazy val compareOp = {
val (eq, ne, lt, le, gt, ge) = CopmarisonOps.forType(convertedArgType)
op match {
...
case "eq"=> eq
case "ne" => ne
case "lt" => lt
...
}
}
Note that all the comparison functions used in ComparisionExpression should be used in the new forType function.
We could also could use it in repTypeComparers, e.g.
lazy val (
repTypeCompareLT: NumberCompareOp,
repTypeCompareLE: NumberCompareOp,
) = LV('repTypeComparers) {
val (_, _, lt, le, _, _) = ComparisonOps.forType(repTypeElementDec.primType)
(lt, le)
}There might be other places where we have a match/case to figure out comparison ops that could be simplified in this way an remove some duplication.
There was a problem hiding this comment.
An improvement, maybe forType can be implemented as a Map for constant lookup and reduced allocations? e.g.
object ComparisonOps {
val forType: Map[NodeInfo.Kind, (CompareOpBase, CompareOpBase, CompareOpBase, CompareOpBase, CompareOpBase, CompareOpBase)] =
Map(
PrimType.Boolean -> (EQ_Compare, NE_Compare, LT_Boolean, LE_Boolean, GT_Boolean, GE_Boolean),
PrimType.Integer -> (EQ_Compare, NE_Compare, LT_Integer, LE_Integer, GT_Integer, GE_Integer),
...
)
}The use is still the same, e.g. ComparisonOps.forType(somePrimType)
| val saved = dstate.currentNode | ||
| dataRecipe.run(dstate) | ||
| val dataVal = dstate.currentValue | ||
| dstate.setCurrentNode(saved) | ||
| minRecipe.run(dstate) | ||
| val minVal = dstate.currentValue | ||
| dstate.setCurrentNode(saved) | ||
| maxRecipe.run(dstate) | ||
| val maxVal = dstate.currentValue |
There was a problem hiding this comment.
The currentNode DState can be used by a run implementation to know where to evaluate a relative path expression from (e.g. ../foo/bar/baz).
But calling run can also potentially mutate currentNode, changing it to something else, which can interfere with other calls to run, so it needs to be reset.
For example, say we had something like this:
dfdl:checkRangeExclusive(../foo/bar/baz, ../uno/dos/tres, ../one/two/three)`
Before we call run(), let's say currentNode points to an element called root.
After we call dataRecipe.run() which evaluates ../foo/bar/baz relative to root, the currentNode will point the baz element, because DState was mutated. If we do not reset currentNode back to root, then when we call minRecipe.run() to evaluate ../uno/dos/tres that relative path will be evaluated relative to the baz element instead of the root element. So that's why we need to save and restore curentNode after each call to run.
Note that we don't have to restore currentNode after calling maxReipce.run() because we aren't personally calling anymore runs, so whatever currentNode ended up as doesn't matter to us. If some other expression was called that led to this run() being called, then they were responsible for saving and restoring currentNode if they needed it.
stevedlawrence
left a comment
There was a problem hiding this comment.
+1 👍 just a couple small suggestion
| case "gt" => gt | ||
| case "le" => le | ||
| case "ge" => ge | ||
| case _ => subsetError("Unsupported operation '%s' on type %s.", op, convergedArgType) |
There was a problem hiding this comment.
I think we currently hit this subsetError at compile time for lt, le, gt, ge operations with xs:hexBinary, but I don't think we will anymore? Instead we will create the new LT_HexBinary, etc. classes at compile time and fail at runtime with an assertion. We probably want to keep the same subset behavior so it SDEs at compile time?
So, can you add a check somewhere in this function that creates this subset error for hex binary and those operations so we get the compile time subset error instead of a runtime assertion?
Also, I think this case probably wants to be an Assert.invariantFailed now, since this ComparisonExpression should only be created with one of these operators (see def Comp in DFDLExpressionParser.scala). If we ever hit this case, it's probably a bug.
Also, should we create a ticket to implement the rest of the operators for hexBinary and can remove this subset error and support those operators for hexBinary? Maybe we wait until we drop support for Java 8 and can use Arrays.compare() so it should be a trivial change.
There was a problem hiding this comment.
Created https://issues.apache.org/jira/browse/DAFFODIL-2886
Though I think for ease of creating TDML tests, we should leave the error as a subsetError that results in a SDE rather than an Assert.invariantFailed. I can easily see someone creating a schema that tries to compare hexBinary elems and an SDE would be a better error for them to get until this is implemented.
There was a problem hiding this comment.
we should leave the error as a subsetError that results in a SDE rather than an Assert.invariantFailed
Yeah, the way you made the change this should stay a subset error. In my head the change would be something like this:
if (convergedArgType == PrimType.HexBinary && (op == "lt || op == "gt" || op == "le" || op == ge")) {
subsetError(...)
}
op match {
...
}So the hexBinary test would be outside the match case. If it were done that way the default case would want to be an Assert.invariant. But they way did in the match case you're right that it should be a subset error. And your way is probably more scala-y so is better than what I was thinking. And your new hexBinary tests confirm its doing the right thing 👍
| import org.apache.daffodil.runtime1.infoset.DataValue.DataValuePrimitive | ||
|
|
||
| object ComparisonOps { | ||
| lazy val forType: Map[ |
There was a problem hiding this comment.
Can you add scaladoc for this, especially making sure to mention the order of the returned 6-tuple so we don't have to read the code to figure out which operation is which? It's not too hard to figure it out, but it would make things just a little easier. Maybe also include an example of its use with the val (...) = ... to easily extract the comparison ops?
There was a problem hiding this comment.
Another option might be to create a new case class and make that the value of the map, e.g.:
case class ComparisonOps(eq: ComparisonOp, ne: ComparisonOp, lt: ComparisonOp, ...)
...
Map(
PrimType.Boolean -> ComparisonOps(EQ_Compare, NE_Compare, LT_Boolean, ...),
...
)Then uses become something like:
val compOps = ComparisonOps.forType(targetTypeForSubExpr)
val compare = if (isExclusive) compOps.lt else compOps.leIt's a little more verbose, but using it is a bit more clear and less prone to errors since it's impossible for callers of forType to get the order of the 6-tuple wrong.
| // call to targetTypeForSubexpression since we don't care about the specific | ||
| // value of subexpr in targetTypeForSubexpression, and we need all 3 sub | ||
| // expressions to come to the right target target for the args | ||
| private lazy val targetTypeForSubExpr: NodeInfo.Kind = { |
There was a problem hiding this comment.
I think in places where we do something like this (i.e. figure out a common type for all arguments to convert to) we call this convergedArgType. We should do the same here for consistentcy.
| <data>1</data> | ||
| <min>1.9</min> | ||
| <max>2.5</max> | ||
| <isInRangeInc>false</isInRangeInc> |
There was a problem hiding this comment.
Nice test 👍, if we converged the types wrong there's a good chance 1.9 would be cast to int, become 1, and isInRangeInc would be true and the test would fail.
tuxji
left a comment
There was a problem hiding this comment.
+1
Looks very nice, especially the descriptive names used in the tests!
| * @param eq represents the equality comparison | ||
| * @param ne represents the unequality comparison | ||
| * @param lt represents the less than comparison | ||
| * @param le represents the less than and equal to comparison |
| * @param lt represents the less than comparison | ||
| * @param le represents the less than and equal to comparison | ||
| * @param gt represents the greater than comparison | ||
| * @param ge represents the greater than and equal to comparison |
- add implementation for checkRangeInclusive/Exclusive in Expressions and DFDLFunctions - create case class ComparisonOp instead of the 6-tuple for the forType Map value - add ComparisonOps.forType lazy val map and use in ComparisonExpression, RepTypeMixin and DFDLCheckRangeExpr - add tests for checkRangeInclusive/Exclusive for numeric types and non-numeric types - add unittest to test out usage exceptions for hexBinary comparisons DAFFODIL-1515
22ac0dd to
8bf9547
Compare
DAFFODIL-1515