Skip to content

Add emptyElementParsePolicy to main DFDL namespace.#898

Merged
tuxji merged 1 commit intoapache:mainfrom
mike-mcgann:daffodil-2429-emptyElementParsePolicy
Dec 27, 2022
Merged

Add emptyElementParsePolicy to main DFDL namespace.#898
tuxji merged 1 commit intoapache:mainfrom
mike-mcgann:daffodil-2429-emptyElementParsePolicy

Conversation

@mike-mcgann
Copy link
Member

The emptyElementParsePolicy is currently implemented as a property that is an extension to DFDL. As of DFDL 1.0, this property is now available in the main schema. This property has now been added and a deprecation message is emitted when the old property is used. The property value of treatAsMissing has been renamed to treatAsAbsent in DFDL 1.0.

DAFFODIL-2496

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a couple minor comments and questions

/**
* Must deal with nils, emptyness and string/hexBinary exceptional behavior
* including the behavior for dfdlx:emptyElementParsePolicy 'treatAsMissing' which special cases
* including the behavior for dfdlx:emptyElementParsePolicy 'treatAsAbsent' which special cases
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the x in the dfdlx prefix here since the extension is deprecated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mike, please grep (or rg) your checkout thoroughly for all such remaining places (the dfdlx prefix where dfdl should be used and the treatAsMissing value where treatAsAbsent should be used). We prefer PR authors to perform such searches rather than leave it to reviewers to look for overlooked places. In places where we have to keep "dfdlx" or "treatAsMissing" to avoid breaking schemas immediately, I suggest adding "// deprecated" as a comment so we can locate such places when removing all old properties/values in a future cleanup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

<xs:restriction base="xs:string">
<xs:enumeration value="treatAsEmpty" />
<xs:enumeration value="treatAsMissing" />
<xs:enumeration value="treatAsAbsent" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think treatAsAbsent should be added here? This simple type restriction is used only for the dfdx extension property, which uses treatAsMissing instead of treatAsAbsent. If someone wants to use treatAsAbsent, let's force them to use the non-extension property.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a hack and I couldn't find a way to easily make this cleaner. Let me know if you can think of a better way.

The issue is that we want to support the new and old attribute, for now, at the same time. An element could have a dfdl:emptyElementParsePolicy or a dfdlx:emptyElementParsePolicy but the property generator fabricates variables using the unprefixed name. Therefore we cannot easily have both of these properties with different types since they map to the same symbol. The schema validator correctly uses the empty/absent enumeration when evaluating the attribute with the dfdl prefix but then the parser uses the dfdlx enumeration definition (which is manually constructed and ignored by the property generator). The absent value was added to the dfdlx enumeration to support both new and old since there can be only one type. I guess the cleanest way would be to change the property generator to use fully qualified attribute names, or at least use them when necessary, but I'm not sure what the wider consequences of that change would be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm still confused.

I understand why ByHandMixin.scala needs all three enumerations since both dfdl and dfdlx properties use the same EmptyElementParsePolicy object to convert a string to the type.

but the property generator fabricates variables using the unprefixed name

The property generator excludes the EmptyElementParsePolicy type, so doesn't it completely ignore the definitions in this dfdlx.xsd file? So aren't the enumerations in this file just about doing DFDL schema validation and nothing else? And since treatAsAbsent isn't valid in he dfdlx property shouldn't it be removed here and only be defined in the DFDL_partX xsd file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was the one that was confused. I didn't realize that with the exclusion that the xsd file was completely ignored. I've removed that value from the schema and all the tests run fine. Thanks for that clarification.

* dfdlx:emptyElementParsePolicy is 'treatAsMissing', a required string/hexBinary that has EmptyRep
* dfdl:emptyElementParsePolicy is 'treatAsAbsent', a required string/hexBinary that has EmptyRep
* causes a Parse Error, and an optional EmptyRep causes nothing to be added to the infoset (the empty string
* or hexBinary value is suppressed). When dfdlx:emptyElementParsePolicy is 'treatAsEmpty', a required
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dfdlx should be dfdl here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Note: dfdlx:emptyElementParsePolicy should become regular DFDL emptyElementParsePolicy
once implemented in DAFFODIL-2496. The enum 'treatAsMissing' is renamed to 'treatAsAbsent'
-->
emptyElementParsePolicy="treatAsEmpty"/><!-- remove extension property for IBM cross tests -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does IBM support this property now? Do we no longer need to remove this property for cross tests to work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this a bit.

Schema 4 now uses the old dfdlx:emptyElementParsePolicy and the IBM test for that is the same as before. I added a new test for daffodil only that checks that a parse error and a deprecation warning is issued.

Schema 5 uses the new dfdl:emptyElementParsePolicy and that test only runs on daffodil. I'm not sure what happens on IBM for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The property dfdl:emptyElementParsePolicy value 'treatAsEmpty' is an optional feature of DFDL.
IBM does not support 'treatAsEmpty' behavior, only 'treatAsAbsent'.

As for whether they added this property, let's test it. We have the IBM DFDL Cross tester. https://github.com/OpenDFDL/ibmDFDLCrossTester

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, https://www.ibm.com/docs/en/integration-bus/10.0?topic=dfdl-unsupported-features does not say anything about emptyElementParsePolicy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbeckerle helped me out this morning getting the IBM test runner working and we found out that this property is not supported at this time. We updated the tests to reflect this.

Note: dfdlx:emptyElementParsePolicy should become regular DFDL emptyElementParsePolicy
once implemented in DAFFODIL-2496. The enum 'treatAsMissing' is renamed to 'treatAsAbsent'
-->
emptyElementParsePolicy="treatAsAbsent"/> <!-- remove extension property for IBM cross tests -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here, does this still need to be removed for IBM?

Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR needs to re-enable a warning, and I hope it won't break any tests, but we need to check first.

/**
* Must deal with nils, emptyness and string/hexBinary exceptional behavior
* including the behavior for dfdlx:emptyElementParsePolicy 'treatAsMissing' which special cases
* including the behavior for dfdlx:emptyElementParsePolicy 'treatAsAbsent' which special cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mike, please grep (or rg) your checkout thoroughly for all such remaining places (the dfdlx prefix where dfdl should be used and the treatAsMissing value where treatAsAbsent should be used). We prefer PR authors to perform such searches rather than leave it to reviewers to look for overlooked places. In places where we have to keep "dfdlx" or "treatAsMissing" to avoid breaking schemas immediately, I suggest adding "// deprecated" as a comment so we can locate such places when removing all old properties/values in a future cleanup.

Comment on lines +54 to +57
<xsd:element name="BodyPart" dfdl:lengthKind="delimited"
minOccurs="1" maxOccurs="unbounded" type="xsd:string"
dfdl:occursCountKind="implicit"
dfdlx:emptyElementParsePolicy="treatAsEmpty" />
dfdl:emptyElementParsePolicy="treatAsEmpty" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous lines were indented with tabs and the changed line is indented with spaces, which makes the changed line seem to be indented differently than the previous lines in this diff. An automatic formatter would make this problem go away, but I suggest re-indenting this line with tabs to keep it consistent with the previous lines for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Note: dfdlx:emptyElementParsePolicy should become regular DFDL emptyElementParsePolicy
once implemented in DAFFODIL-2496. The enum 'treatAsMissing' is renamed to 'treatAsAbsent'
-->
emptyElementParsePolicy="treatAsEmpty"/><!-- remove extension property for IBM cross tests -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, https://www.ibm.com/docs/en/integration-bus/10.0?topic=dfdl-unsupported-features does not say anything about emptyElementParsePolicy.

Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small clean ups I think, and we need to find out whether IBM DFDL accepts the real property name or not yet.

minOccurs="1" maxOccurs="unbounded" type="xsd:string"
dfdl:occursCountKind="implicit"
dfdlx:emptyElementParsePolicy="treatAsEmpty" />
dfdl:emptyElementParsePolicy="treatAsEmpty" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation in this file is totally goofed up. Rather than fix it, and have whitespace fixing be part of this PR, let's accumulate whitespace fixups on a ticket.

I already added this file to this ticket: https://issues.apache.org/jira/browse/DAFFODIL-2759

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and updated this from an earlier request.

representation="text"
lengthKind="delimited"
separatorPosition="infix"
dfdlx:emptyElementParsePolicy="treatAsMissing"/> <!-- remove extension property for IBM cross tests -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be just no-prefix emptyElementParsePolicy='treatAsAbsent' now?

I probably wrote this test originally, but I don't understand the comment "remove extension property for IBM cross tests" since this is used by an IBM cross test below. It must be that IBM DFDL is able to ignore the dfdlx extensions or this wouldn't have worked in test test_sep_ssp_never_4_ibm.

So I think the comment line "remove extension property for IBM cross tests" is spurious now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated during our meeting.

dfdlx:emptyElementParsePolicy="treatAsMissing"/> <!-- remove extension property for IBM cross tests -->

<!--
treatAsMissing should have no effect here, because everything is optional.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change comment to 'treatAsAbsent'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated during our meeting

<!-- Same, but has minOccurs=maxOccurs for the arrays.
That makes all array elements "required"

treatAsMissing causes this to fail. Required empty non-defaultable is an error in that case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change comment to treatAsAbsent, and remove parenthetic note about fixing 2496.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is either updated or got revised out

</tdml:errors>
</tdml:parserTestCase>

</tdml:testSuite> No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are at it, add a final new-line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Have you run the IBM cross tester successfully? If so, changes look good now, but wait for Steve or Mike to review too.

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 👍

Have you run the IBM cross tester successfully?

Yeah, it would be good to confirm that a schema using GeneralFormatPortable still works when used with IBM DFDL.

Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

The emptyElementParsePolicy is currently implemented as a property
that is an extension to DFDL. As of DFDL 1.0, this property is now
available in the main schema. This property has now been added and a
deprecation message is emitted when the old property is used. The
property value of 'treatAsMissing' has been renamed to 'treatAsAbsent'
in DFDL 1.0.

DAFFODIL-2496
@mike-mcgann mike-mcgann force-pushed the daffodil-2429-emptyElementParsePolicy branch from b445d16 to b7e8b75 Compare December 24, 2022 12:46
@tuxji
Copy link
Contributor

tuxji commented Dec 27, 2022

I just noticed that this PR needs to be merged by a Daffodil committer. I confirm that all PR comments have been resolved, the PR has over 2 approvals, @mike-mcgann has squashed the PR to one commit, and all tests have passed successfully.

@tuxji tuxji merged commit abe36cb into apache:main Dec 27, 2022
@mike-mcgann mike-mcgann deleted the daffodil-2429-emptyElementParsePolicy branch February 21, 2023 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants