Skip to content

GH-2828: Remove validation for XML-only datatypes#2846

Merged
afs merged 1 commit intoapache:mainfrom
Ostrzyciel:xerces-1
Nov 20, 2024
Merged

GH-2828: Remove validation for XML-only datatypes#2846
afs merged 1 commit intoapache:mainfrom
Ostrzyciel:xerces-1

Conversation

@Ostrzyciel
Copy link
Copy Markdown
Contributor

GitHub issue resolved #2828

Pull request Description:

Remove the code for validating datatypes that according to the RDF 1.1 spec SHOULD NOT be used: QName, ENTITY, ID, IDREF, NOTATION, ENTITIES, NMTOKENS, IDREFS. Also remove code for validating XSD lists and unions. This should not change the behavior of Jena for users, unless they were relying on it to validate their weird XML data, which I don't think is an official functionality of Jena.

This is the first step to resolving #2828 – first I wanted to focus on untangling the unneeded the logic. In future PRs, I will remove all the dead code.

After applying these changes, the size of the jena-core JAR was reduced by 13 489 bytes. Not huge, but still something.

Details:

  • XSDDatatype
    • Removed commented-out datatype definitions for ENTITIES, NMTOKENS, IDREFS.
    • Removed some other commented-out temporary code that was supposed to be cleaned up some time ago (I guess).
    • I left in the registered datatypes for QName, ENTITY, ID, IDREF, NOTATION, because they are not doing any harm here.
  • XSSimpleType
    • Removed method isIdType() along with its implementations (was unused).
    • Removed method getPrimitiveKind() along with its implementations (was unused).
  • XSSimpleTypeDecl
    • Removed the registration of type validators for: QName, ENTITY, ID, IDREF, NOTATION.
    • Removed validation code for these datatypes.
    • Removed all code related to XSD unions and lists (not needed in RDF). Notably, this left atomic types as the only possible types, which greatly reduced the number of branches.
    • Note: there was some weirdness in the code where anySimpleType was treated as NOTATION for some reason. I have untangled this. I think any changes to this won't matter anyway, because neither NOTATION nor xsd:anySimpleType make particular sense in RDF.
  • XSSimpleTypeDefinition
    • Removed constants VARIETY_UNION, VARIETY_LIST.
    • Removed methods related to handling lists and unions.
  • BaseSchemaDVFactory and FullDVFactory
    • Replaced the code for: QName, ENTITY, ID, IDREF, NOTATION, ENTITIES, NMTOKENS, IDREFS with dummy registrations. We still need to register something for these datatypes here, because this is used by XSDDatatype in a public interface.
    • Removed the code for XSD lists and unions.
  • BaseDVFactory and SchemaDVFactory
    • Removed the code for XSD lists and unions.
  • Removed validator classes (now unused): IDDV, IDREFDV, EntityDV, ListDV, UnionDV

  • Tests are included.
  • Documentation change and updates are provided for the Apache Jena website
  • Commits have been squashed to remove intermediate development commit messages.
  • Key commit messages start with the issue number (GH-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.


See the Apache Jena "Contributing" guide.

// If you see this, remove commented lines.
// Merely temporary during switch over and testing.
// public static final XSDDatatype XSDQName = new XSDDatatype("QName");
public static final XSDDatatype XSDQName = new XSDPlainType("QName");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets' add @deprecated(forRemoval = true) and /** @deprected Do not use */ to XSDQname and the other non-RDF XSD datatypes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that, the rest looks great!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've also added a note to remove these in Jena 6, together with the associated dummy code in BaseSchemaDVFactory.

Issue: apache#2828

Remove the code for validating datatypes that according to the RDF 1.1 spec SHOULD NOT be used: QName, ENTITY, ID, IDREF, NOTATION, ENTITIES, NMTOKENS, IDREFS. Also remove code for validating XSD lists and unions.
@afs afs merged commit dc89c67 into apache:main Nov 20, 2024
@Ostrzyciel Ostrzyciel deleted the xerces-1 branch December 23, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clean up code in jena.ext.xerces

2 participants