Add support for validation of length facet#1192
Conversation
| Assert.invariant(hasLength) | ||
| schemaDefinitionUnless( | ||
| isSimpleType, | ||
| "Facet length is allowed only on types string and hexBinary.", |
There was a problem hiding this comment.
How is calling isSimpleType supposed to tell you if the type is string, hexBinary, or something else?
There was a problem hiding this comment.
You're right! I updated the message to reflect the check in both length and computeMinMaxLength (where this was copied from)
7fd68d5 to
225b876
Compare
stevedlawrence
left a comment
There was a problem hiding this comment.
+1 minor suggestions and things to confirm with the XSD spec
| repElement.hasMaxLength, | ||
| "String with dfdl:lengthKind='implicit' must have an XSD maxLength facet value.", | ||
| repElement.hasMaxLength || repElement.hasLength, | ||
| "String with dfdl:lengthKind='implicit' must have an XSD maxLength or XSD Length facet value.", |
There was a problem hiding this comment.
Make XSD Length a lowercase l
There was a problem hiding this comment.
Even better, could dispense with XSD (some people won't know XSD means XML Schema Language) and shorten the rest to just must have a length or maxLength facet value.
| typeDef.optRestriction.flatMap { _.enumerationValues } | ||
| } | ||
|
|
||
| final lazy val length: java.math.BigDecimal = { |
There was a problem hiding this comment.
This looks like mostly a copy from computeMinMaxLength, I wonder if we can move this logic into computeMinMaxLenghth I'd like to avoid duplicating code--if we ever need to change one place we're probably going to forget to change it in the other.
There was a problem hiding this comment.
Agreed, something like:
final lazy val (length: java.math.BigDecimal, minLength: java.math.BigDecimal, maxLength: java.math.BigDecimal) =
computeMinMaxLengthThere was a problem hiding this comment.
We might not even need length in the tuple? Maybe deep down in this function we can do something like
if (hasLength)
(length, length)
else
(minLength, maxLength)So the fact that length was used kindof disappears, and length just means minLength == maxLength
| } else { | ||
| computeMinMaxLength | ||
| } | ||
| } |
There was a problem hiding this comment.
This condition goes away if we can merge length into computeMinMaxLength, so it simplifies the code a bit.
| schemaDefinitionUnless( | ||
| isSimpleType, | ||
| "Facets minLength and maxLength are allowed only on types string and hexBinary.", | ||
| "Facets minLength and maxLength are allowed only on simple types or types derived from simple types.", |
There was a problem hiding this comment.
Is this correct, do we allow min/maxLength on numeric types? I wonder if the message was correct but the check was wrong?
There was a problem hiding this comment.
I see the check for the correct type (string or hexBinary) and corresponding error message is performed on lines 994 and 1065 of this ElementBase.scala's class already (note: Steve already suggested merging all checks together into one function, computeMinMaxLength, so these two checks will be merged together later):
// length at line 994
val typeOK = pt == PrimType.String || pt == PrimType.HexBinary
schemaDefinitionWhen(
!typeOK && hasLength,
"Facet length is not allowed on types derived from type %s.\nIt is allowed only on types derived from string and hexBinary.",
pt.name,
)
// minLength, maxLength at line 1065
val typeOK = pt == PrimType.String || pt == PrimType.HexBinary
schemaDefinitionWhen(
!typeOK && (hasMinLength || hasMaxLength),
"Facets minLength and maxLength are not allowed on types derived from type %s.\nThey are allowed only on typed derived from string and hexBinary.",
pt.name,
)Therefore, what we have here is a preliminary check saying you can't use the min/max/length facets on a complex type. Calling isSimpleType is what's being checked here. I do agree with Steve, the error message should not say the facets are allowed on all simple types - a more correct message would say Facets length, minLength, maxLength are not allowed on complex types.
| // so we won't get to this code in those cases unless Xerces validation is turned off | ||
| Assert.usage( | ||
| ml.isEmpty || localLengthValue.isEmpty, | ||
| "Length and minLength cannot be specified together", |
There was a problem hiding this comment.
It might be wroth specify something like "Length and minLength facets ...", just to make it clear this is about facets and now about dfdl:length?
There was a problem hiding this comment.
Agreed, and let's put the word "facets" first so we don't have to capitalize length (because the length facet always begins with the lowercase letter l): "Facets length and maxLength ..."
| } | ||
| case Facet.maxLength | Facet.fractionDigits => { | ||
| case Facet.length | Facet.maxLength | Facet.fractionDigits => { | ||
| errorOnLocalGreaterThanBaseFacet(theLocalFacet, theRemoteFacet, facetType) |
There was a problem hiding this comment.
This means that a local length facet must be less than the base length facet, is that correct? Have we confirmed with the XSD spec that that is the right behavior? Feels like make a local length should always be the same as a base length, since there is no way to narrow a single value instead of a range like min/maxValue have.
There was a problem hiding this comment.
Found this in the XSD spec:
Schema Component Constraint: length valid restriction
It is an ·error· if length is among the members of {facets} of {base type definition} and {value} is not equal to the {value} of the parent length.
So I think this is wrong. This needs to error if local is not equal to base.
| else if (hasMinLength || hasMaxLength) { | ||
| if (maxLengthLong != -1 && len > maxLengthLong) warn("maxL", maxLengthLong) | ||
| Assert.invariant(minLengthLong >= 0) | ||
| if (minLengthLong > 0 && len < minLengthLong) warn("minL", minLengthLong) |
There was a problem hiding this comment.
Suggest we just pass in the string "minLength", "maxLength" and "length", the above %sength is kindof hard to parse and only saves us a few characters.
| e.length.foreach { length => | ||
| val lenAsLong = length.longValue() | ||
| val isLengthGreaterThanEqToZero = lenAsLong.compareTo(0L) >= 0 | ||
| if (isLengthGreaterThanEqToZero) { |
There was a problem hiding this comment.
I don't really understand why we do a length >= 0 check. I see it's just copied from min/maxLength, but seems like that should never happen and should be caught during complication? length, minLength, and maxLength can never be negative.
There was a problem hiding this comment.
Codecov also warns that line 353 is not covered by tests, so that may be another clue that this code is redundant.
| val data = diNode.dataValue.getByteArray | ||
|
|
||
| val dataLen = data.length.toLong | ||
| val isDataLengthEqual = dataLen.compareTo(lenAsLong) == 0 |
There was a problem hiding this comment.
Should length compare against the actual number of bytes, or the number of chars in the string representation of the infoset?
Note that min/maxLength have the exact same logic, but shouldn't they have less-than or greather-than? I wonder if those are unrelated bugs?
There was a problem hiding this comment.
Found this in the XSD spec:
1.2 if {primitive type definition} is hexBinary or base64Binary, then the length of the value, as measured in octets of the binary data, must be equal to {value};
So what we have is correct since we are checking the number of bytes. I'm still not sure why min/maxLength use equals for minLength/maxLength, feels like a bug.
There was a problem hiding this comment.
For hexBinary, it's definitely number of bytes, not number of characters.
For strings,...it's supposed to be characters, but a little DFDL tricky thing is that if dfdl:lengthKind is 'implicit' then the length is in characters, but the charset must be SBCS (single-byte character set) when dfdl:lengthUnits='bytes' so that the maxLength facet, which is used to obtain the length, will be correct for either bytes or characters. Otherwise DFDL's interpretation of the maxLength value could be inconsistent with XSDs interpretation.
tuxji
left a comment
There was a problem hiding this comment.
+1, but could be improved with some changes.
| repElement.hasMaxLength, | ||
| "String with dfdl:lengthKind='implicit' must have an XSD maxLength facet value.", | ||
| repElement.hasMaxLength || repElement.hasLength, | ||
| "String with dfdl:lengthKind='implicit' must have an XSD maxLength or XSD Length facet value.", |
There was a problem hiding this comment.
Even better, could dispense with XSD (some people won't know XSD means XML Schema Language) and shorten the rest to just must have a length or maxLength facet value.
| typeDef.optRestriction.flatMap { _.enumerationValues } | ||
| } | ||
|
|
||
| final lazy val length: java.math.BigDecimal = { |
There was a problem hiding this comment.
Agreed, something like:
final lazy val (length: java.math.BigDecimal, minLength: java.math.BigDecimal, maxLength: java.math.BigDecimal) =
computeMinMaxLength| } | ||
| } | ||
| // TODO: why are we using java.math.BigDecimal, when scala has a much | ||
| // nicer decimal class? |
There was a problem hiding this comment.
While we're all reviewing this PR, can anyone explain why we're using java.math.BigDecimal, or should we change this comment to // TODO: Consider replacing all uses of java.math.BigDecimal with Scala's much nicer decimal class?
There was a problem hiding this comment.
I think we moved away from Scala types in most cases a while ago. I don't remember why, but for example all of our Infoset primitives just java types.
There was a problem hiding this comment.
I think this was just avoiding boxes on the boxes on the boxes. Scala's decimal is built on java's decimal so one more layer of object between the actual data and operations on it. We did not strictly speaking need the additional functionality of the scala decimal class, so we decided to go with Java's boxed number types as our representation types in the infoset. I rather wish we had gone with ULong for xs:unsignedLong because it is far more efficient, but this decision is 10 years ago and probably before we learned of the ULong library.
| schemaDefinitionUnless( | ||
| isSimpleType, | ||
| "Facets minLength and maxLength are allowed only on types string and hexBinary.", | ||
| "Facets minLength and maxLength are allowed only on simple types or types derived from simple types.", |
There was a problem hiding this comment.
I see the check for the correct type (string or hexBinary) and corresponding error message is performed on lines 994 and 1065 of this ElementBase.scala's class already (note: Steve already suggested merging all checks together into one function, computeMinMaxLength, so these two checks will be merged together later):
// length at line 994
val typeOK = pt == PrimType.String || pt == PrimType.HexBinary
schemaDefinitionWhen(
!typeOK && hasLength,
"Facet length is not allowed on types derived from type %s.\nIt is allowed only on types derived from string and hexBinary.",
pt.name,
)
// minLength, maxLength at line 1065
val typeOK = pt == PrimType.String || pt == PrimType.HexBinary
schemaDefinitionWhen(
!typeOK && (hasMinLength || hasMaxLength),
"Facets minLength and maxLength are not allowed on types derived from type %s.\nThey are allowed only on typed derived from string and hexBinary.",
pt.name,
)Therefore, what we have here is a preliminary check saying you can't use the min/max/length facets on a complex type. Calling isSimpleType is what's being checked here. I do agree with Steve, the error message should not say the facets are allowed on all simple types - a more correct message would say Facets length, minLength, maxLength are not allowed on complex types.
| // so we won't get to this code in those cases unless Xerces validation is turned off | ||
| Assert.usage( | ||
| ml.isEmpty || localLengthValue.isEmpty, | ||
| "Length and minLength cannot be specified together", |
There was a problem hiding this comment.
Agreed, and let's put the word "facets" first so we don't have to capitalize length (because the length facet always begins with the lowercase letter l): "Facets length and maxLength ..."
| e.length.foreach { length => | ||
| val lenAsLong = length.longValue() | ||
| val isLengthGreaterThanEqToZero = lenAsLong.compareTo(0L) >= 0 | ||
| if (isLengthGreaterThanEqToZero) { |
There was a problem hiding this comment.
Codecov also warns that line 353 is not covered by tests, so that may be another clue that this code is redundant.
| if (isDataLengthEqual) java.lang.Boolean.TRUE | ||
| else java.lang.Boolean.FALSE | ||
| } | ||
| case _ => e.SDE("Length facet is only valid for string and hexBinary.") |
There was a problem hiding this comment.
This line is not covered by tests, codecov says, so an earlier check prevents execution from coming here. I'd still leave the check in for the fallback "match everything else" case, but put the word "Facet" first and spell length with a lowercase l.
| <xs:element name="e2_8" dfdl:lengthKind="delimited" dfdl:encoding="iso-8859-1"> | ||
| <xs:simpleType> | ||
| <xs:restriction base="xs:hexBinary"> | ||
| <xs:length value="0"/> | ||
| </xs:restriction> | ||
| </xs:simpleType> | ||
| </xs:element> |
There was a problem hiding this comment.
Please note that e2_8 is a type derived from hexBinary like the others, but the comment on the test case that uses e2_8 says the type "int" which indicates either e2_8 should be a type derived from int or that comment is incorrect.
Also please note that element names like e2_2, etc., are not well chosen or easily maintainable names. You can't insert new elements between them easily without having to renumber or tack on new numbers. In fact, the numbers tacked on the original element names seem like they were intended to provide some information about the facets, not just give some kind of ordering to the elements, see this example below:
<xs:element name="e2_2" dfdl:lengthKind="delimited">
<xs:simpleType>
<xs:restriction base="ex:stMinLength_1">
<xs:minLength value="2" />
</xs:restriction>
</xs:simpleType>
</xs:element>A name like ex:stMinLength_1 is much more clear that it's defining a string with a minLength of 1. So, these tests would become easier to maintain if they used similarly descriptive names such as hex_length_0 instead of e2_8.
|
|
||
| <tdml:parserTestCase name="validation_testFacets_03" | ||
| root="e2_8" model="TestFacets" | ||
| description="checks that int doesn't work with length facet" |
There was a problem hiding this comment.
Either this description or the entire test is incorrect since e2_8 is really a hex_length_0.
| <tdml:documentPart type="byte">deadbeef</tdml:documentPart> | ||
| </tdml:document> | ||
|
|
||
| <!-- this is a Xerces Error --> |
There was a problem hiding this comment.
Is Xerces used in both validation modes "limited" and "on"?? I thought the point of "limited" validation was to validate faster by using only Daffodil itself, skipping calling Xerces since the user doesn't want to wait for Xerces to validate the infoset.
There was a problem hiding this comment.
I think this comes from validating the DFDL schema, not the infoset. So limited/on/full doesn't affect this.
The Java and Scala API's don't allow turning off validating the DFDL schema. It can be disabled using the internal ProcessorFactory:
I think the TDML runner can also disable validating dfdl schemas, but that's only done in special cases I think.
| // if hasLength is true, then optRestriction is defined | ||
| val r = st.optRestriction.get | ||
| (r.lengthValue, r.lengthValue) | ||
| } |
There was a problem hiding this comment.
If repType is defined then we need to get length information from the repTypeElementDecl, not this element (i.e r). Since repTypeElementDec.minLength and repTypeElementDecl.maxlength come from computeMinMaxLength, and since that takes into account the length facet of the rep type element, I think we can remove this new case and just rely on the case below. That should should use the correct minLength/maxLength/length facet.
| schemaDefinitionWhen( | ||
| (pt == PrimType.String || pt == PrimType.HexBinary) && lengthKind == LengthKind.Implicit, | ||
| "Facets minLength and maxLength must be defined for type %s with lengthKind='implicit'", | ||
| "The length, minLength and maxLength facets must be defined for type %s with lengthKind='implicit'", |
There was a problem hiding this comment.
This makes it sound like we need all three of length, minLength, and maxLength. We should reword this to make it more clear you need minLength/maxLength or length.
| case (false, false, false, _) => (zeroBD, unbBD) | ||
| case (false, true, false, _) => (r.minLengthValue, unbBD) | ||
| case (true, _, _, _) => | ||
| SDE("Facet length cannot be defined with minLength and maxLength facets") |
There was a problem hiding this comment.
This is a minor organizational thing, but it might be helpful to move this case right after the (true, false,false, _)case. That way the first two cases are when hasLength is true--we either use it or error if either minLength/maxLength is defined. That handles all the cases where hasLength is true, so it's maybe a little more mentally easier to think about the more complex minLength/maxLength cases that follow, since we don't have to consider hasLength anymore.
Also, should this be an assert.invariantFailed, since xerces should fail before we get here if length is defined with a minLength/maxLength?
165e14b to
b417b85
Compare
- add support for length facet, setting minLength and maxLength with the value of length, when hasLength is true, and those variables are queried - ensure length and minLength and maxLength facets are not declared together (Xerces Error, but we have asserts in case Xerces validation is turned off) - fix error message for isSimpleType check in computeMinMaxLength definition - add tests for validation=limited and validation=on - added hexBinary tests including test with length=0 - remove greater than or equal to zero checks for check[Length,MinLength,MaxLength] functions - remove some commented out code - add test to verify local facet len must be equal to base restriction length - add test that ensures dfdl:length is not less than base length facet - add reptype tests - convert SDE to Assert.invariantFailed for the case when hasLength is true and minLen/maxLen is anything but false DAFFODIL-2842
b417b85 to
250eaef
Compare
DAFFODIL-2842