Add emptyElementParsePolicy to main DFDL namespace.#898
Conversation
stevedlawrence
left a comment
There was a problem hiding this comment.
Looks good! Just a couple minor comments and questions
| /** | ||
| * Must deal with nils, emptyness and string/hexBinary exceptional behavior | ||
| * including the behavior for dfdlx:emptyElementParsePolicy 'treatAsMissing' which special cases | ||
| * including the behavior for dfdlx:emptyElementParsePolicy 'treatAsAbsent' which special cases |
There was a problem hiding this comment.
Should we remove the x in the dfdlx prefix here since the extension is deprecated?
There was a problem hiding this comment.
Mike, please grep (or rg) your checkout thoroughly for all such remaining places (the dfdlx prefix where dfdl should be used and the treatAsMissing value where treatAsAbsent should be used). We prefer PR authors to perform such searches rather than leave it to reviewers to look for overlooked places. In places where we have to keep "dfdlx" or "treatAsMissing" to avoid breaking schemas immediately, I suggest adding "// deprecated" as a comment so we can locate such places when removing all old properties/values in a future cleanup.
| <xs:restriction base="xs:string"> | ||
| <xs:enumeration value="treatAsEmpty" /> | ||
| <xs:enumeration value="treatAsMissing" /> | ||
| <xs:enumeration value="treatAsAbsent" /> |
There was a problem hiding this comment.
I don't think treatAsAbsent should be added here? This simple type restriction is used only for the dfdx extension property, which uses treatAsMissing instead of treatAsAbsent. If someone wants to use treatAsAbsent, let's force them to use the non-extension property.
There was a problem hiding this comment.
This is a bit of a hack and I couldn't find a way to easily make this cleaner. Let me know if you can think of a better way.
The issue is that we want to support the new and old attribute, for now, at the same time. An element could have a dfdl:emptyElementParsePolicy or a dfdlx:emptyElementParsePolicy but the property generator fabricates variables using the unprefixed name. Therefore we cannot easily have both of these properties with different types since they map to the same symbol. The schema validator correctly uses the empty/absent enumeration when evaluating the attribute with the dfdl prefix but then the parser uses the dfdlx enumeration definition (which is manually constructed and ignored by the property generator). The absent value was added to the dfdlx enumeration to support both new and old since there can be only one type. I guess the cleanest way would be to change the property generator to use fully qualified attribute names, or at least use them when necessary, but I'm not sure what the wider consequences of that change would be.
There was a problem hiding this comment.
I think I'm still confused.
I understand why ByHandMixin.scala needs all three enumerations since both dfdl and dfdlx properties use the same EmptyElementParsePolicy object to convert a string to the type.
but the property generator fabricates variables using the unprefixed name
The property generator excludes the EmptyElementParsePolicy type, so doesn't it completely ignore the definitions in this dfdlx.xsd file? So aren't the enumerations in this file just about doing DFDL schema validation and nothing else? And since treatAsAbsent isn't valid in he dfdlx property shouldn't it be removed here and only be defined in the DFDL_partX xsd file?
There was a problem hiding this comment.
I think I was the one that was confused. I didn't realize that with the exclusion that the xsd file was completely ignored. I've removed that value from the schema and all the tests run fine. Thanks for that clarification.
| * dfdlx:emptyElementParsePolicy is 'treatAsMissing', a required string/hexBinary that has EmptyRep | ||
| * dfdl:emptyElementParsePolicy is 'treatAsAbsent', a required string/hexBinary that has EmptyRep | ||
| * causes a Parse Error, and an optional EmptyRep causes nothing to be added to the infoset (the empty string | ||
| * or hexBinary value is suppressed). When dfdlx:emptyElementParsePolicy is 'treatAsEmpty', a required |
There was a problem hiding this comment.
dfdlx should be dfdl here.
daffodil-lib/src/main/scala/org/apache/daffodil/schema/annotation/props/ByHandMixins.scala
Show resolved
Hide resolved
| Note: dfdlx:emptyElementParsePolicy should become regular DFDL emptyElementParsePolicy | ||
| once implemented in DAFFODIL-2496. The enum 'treatAsMissing' is renamed to 'treatAsAbsent' | ||
| --> | ||
| emptyElementParsePolicy="treatAsEmpty"/><!-- remove extension property for IBM cross tests --> |
There was a problem hiding this comment.
Does IBM support this property now? Do we no longer need to remove this property for cross tests to work?
There was a problem hiding this comment.
I refactored this a bit.
Schema 4 now uses the old dfdlx:emptyElementParsePolicy and the IBM test for that is the same as before. I added a new test for daffodil only that checks that a parse error and a deprecation warning is issued.
Schema 5 uses the new dfdl:emptyElementParsePolicy and that test only runs on daffodil. I'm not sure what happens on IBM for that.
There was a problem hiding this comment.
The property dfdl:emptyElementParsePolicy value 'treatAsEmpty' is an optional feature of DFDL.
IBM does not support 'treatAsEmpty' behavior, only 'treatAsAbsent'.
As for whether they added this property, let's test it. We have the IBM DFDL Cross tester. https://github.com/OpenDFDL/ibmDFDLCrossTester
There was a problem hiding this comment.
Unfortunately, https://www.ibm.com/docs/en/integration-bus/10.0?topic=dfdl-unsupported-features does not say anything about emptyElementParsePolicy.
There was a problem hiding this comment.
@mbeckerle helped me out this morning getting the IBM test runner working and we found out that this property is not supported at this time. We updated the tests to reflect this.
| Note: dfdlx:emptyElementParsePolicy should become regular DFDL emptyElementParsePolicy | ||
| once implemented in DAFFODIL-2496. The enum 'treatAsMissing' is renamed to 'treatAsAbsent' | ||
| --> | ||
| emptyElementParsePolicy="treatAsAbsent"/> <!-- remove extension property for IBM cross tests --> |
There was a problem hiding this comment.
Same question here, does this still need to be removed for IBM?
tuxji
left a comment
There was a problem hiding this comment.
I think this PR needs to re-enable a warning, and I hope it won't break any tests, but we need to check first.
| /** | ||
| * Must deal with nils, emptyness and string/hexBinary exceptional behavior | ||
| * including the behavior for dfdlx:emptyElementParsePolicy 'treatAsMissing' which special cases | ||
| * including the behavior for dfdlx:emptyElementParsePolicy 'treatAsAbsent' which special cases |
There was a problem hiding this comment.
Mike, please grep (or rg) your checkout thoroughly for all such remaining places (the dfdlx prefix where dfdl should be used and the treatAsMissing value where treatAsAbsent should be used). We prefer PR authors to perform such searches rather than leave it to reviewers to look for overlooked places. In places where we have to keep "dfdlx" or "treatAsMissing" to avoid breaking schemas immediately, I suggest adding "// deprecated" as a comment so we can locate such places when removing all old properties/values in a future cleanup.
daffodil-lib/src/main/scala/org/apache/daffodil/schema/annotation/props/ByHandMixins.scala
Show resolved
Hide resolved
| <xsd:element name="BodyPart" dfdl:lengthKind="delimited" | ||
| minOccurs="1" maxOccurs="unbounded" type="xsd:string" | ||
| dfdl:occursCountKind="implicit" | ||
| dfdlx:emptyElementParsePolicy="treatAsEmpty" /> | ||
| dfdl:emptyElementParsePolicy="treatAsEmpty" /> |
There was a problem hiding this comment.
Previous lines were indented with tabs and the changed line is indented with spaces, which makes the changed line seem to be indented differently than the previous lines in this diff. An automatic formatter would make this problem go away, but I suggest re-indenting this line with tabs to keep it consistent with the previous lines for now.
| Note: dfdlx:emptyElementParsePolicy should become regular DFDL emptyElementParsePolicy | ||
| once implemented in DAFFODIL-2496. The enum 'treatAsMissing' is renamed to 'treatAsAbsent' | ||
| --> | ||
| emptyElementParsePolicy="treatAsEmpty"/><!-- remove extension property for IBM cross tests --> |
There was a problem hiding this comment.
Unfortunately, https://www.ibm.com/docs/en/integration-bus/10.0?topic=dfdl-unsupported-features does not say anything about emptyElementParsePolicy.
mbeckerle
left a comment
There was a problem hiding this comment.
Small clean ups I think, and we need to find out whether IBM DFDL accepts the real property name or not yet.
daffodil-lib/src/main/scala/org/apache/daffodil/schema/annotation/props/ByHandMixins.scala
Show resolved
Hide resolved
| minOccurs="1" maxOccurs="unbounded" type="xsd:string" | ||
| dfdl:occursCountKind="implicit" | ||
| dfdlx:emptyElementParsePolicy="treatAsEmpty" /> | ||
| dfdl:emptyElementParsePolicy="treatAsEmpty" /> |
There was a problem hiding this comment.
Indentation in this file is totally goofed up. Rather than fix it, and have whitespace fixing be part of this PR, let's accumulate whitespace fixups on a ticket.
I already added this file to this ticket: https://issues.apache.org/jira/browse/DAFFODIL-2759
There was a problem hiding this comment.
I went ahead and updated this from an earlier request.
| representation="text" | ||
| lengthKind="delimited" | ||
| separatorPosition="infix" | ||
| dfdlx:emptyElementParsePolicy="treatAsMissing"/> <!-- remove extension property for IBM cross tests --> |
There was a problem hiding this comment.
Should this be just no-prefix emptyElementParsePolicy='treatAsAbsent' now?
I probably wrote this test originally, but I don't understand the comment "remove extension property for IBM cross tests" since this is used by an IBM cross test below. It must be that IBM DFDL is able to ignore the dfdlx extensions or this wouldn't have worked in test test_sep_ssp_never_4_ibm.
So I think the comment line "remove extension property for IBM cross tests" is spurious now.
There was a problem hiding this comment.
Updated during our meeting.
| dfdlx:emptyElementParsePolicy="treatAsMissing"/> <!-- remove extension property for IBM cross tests --> | ||
|
|
||
| <!-- | ||
| treatAsMissing should have no effect here, because everything is optional. |
There was a problem hiding this comment.
change comment to 'treatAsAbsent'.
There was a problem hiding this comment.
Updated during our meeting
| <!-- Same, but has minOccurs=maxOccurs for the arrays. | ||
| That makes all array elements "required" | ||
|
|
||
| treatAsMissing causes this to fail. Required empty non-defaultable is an error in that case. |
There was a problem hiding this comment.
change comment to treatAsAbsent, and remove parenthetic note about fixing 2496.
There was a problem hiding this comment.
This is either updated or got revised out
daffodil-test/src/test/resources/org/apache/daffodil/usertests/SepTests.tdml
Outdated
Show resolved
Hide resolved
| </tdml:errors> | ||
| </tdml:parserTestCase> | ||
|
|
||
| </tdml:testSuite> No newline at end of file |
There was a problem hiding this comment.
While you are at it, add a final new-line.
tuxji
left a comment
There was a problem hiding this comment.
+1
Have you run the IBM cross tester successfully? If so, changes look good now, but wait for Steve or Mike to review too.
stevedlawrence
left a comment
There was a problem hiding this comment.
+1 👍
Have you run the IBM cross tester successfully?
Yeah, it would be good to confirm that a schema using GeneralFormatPortable still works when used with IBM DFDL.
daffodil-lib/src/main/resources/org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd
Show resolved
Hide resolved
The emptyElementParsePolicy is currently implemented as a property that is an extension to DFDL. As of DFDL 1.0, this property is now available in the main schema. This property has now been added and a deprecation message is emitted when the old property is used. The property value of 'treatAsMissing' has been renamed to 'treatAsAbsent' in DFDL 1.0. DAFFODIL-2496
b445d16 to
b7e8b75
Compare
|
I just noticed that this PR needs to be merged by a Daffodil committer. I confirm that all PR comments have been resolved, the PR has over 2 approvals, @mike-mcgann has squashed the PR to one commit, and all tests have passed successfully. |
The emptyElementParsePolicy is currently implemented as a property that is an extension to DFDL. As of DFDL 1.0, this property is now available in the main schema. This property has now been added and a deprecation message is emitted when the old property is used. The property value of
treatAsMissinghas been renamed totreatAsAbsentin DFDL 1.0.DAFFODIL-2496