Skip to content

NodeValue needlessly materializes lexical forms of non-XSD datatypes #1801

@Aklakan

Description

@Aklakan

Change

NodeValue's _setByValue method only handles xsd datatypes however it eagely materializes the lexical form even of non-xsd namespace'd datatypes. This introduces a noticeable performance overhead when dealing with datatype extensions such as geometries or json objects which are only used as intermediary values.
With my current workload of many small json objects it is around 5-10%.

NodeValue itself bears the following comment

  1. Conversely, delaying turning a value into a graph node is
    valuable because intermediates, like the result of 2+3, will not
    be needed as nodes unless assignment (and there is no assignment
    in SPARQL even if there is for ARQ).
    Node level operations like str() don't need a full node.

The simple solution is to defer materialization of the lexical form after having ensured the given Node has a datatype in the xsd namespace.

As a question, I wonder if it is really necessary for _setByValue to always go via the lexical form for all XSD types, or whether as a future improvement it would be possible to reuse the LiteralLabel's Java object.

Profile without enhancement:
image

Profile with enhancement. Note, that JsonWriter.string() no longer appears:
image

Are you interested in contributing a pull request for this task?

Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions