Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JENA-1948: JSON-LD 1.1 - reader and writer #1045

Merged
merged 2 commits into from Aug 18, 2021
Merged

JENA-1948: JSON-LD 1.1 - reader and writer #1045

merged 2 commits into from Aug 18, 2021

Conversation

afs
Copy link
Member

@afs afs commented Aug 8, 2021

No description provided.

@afs afs marked this pull request as draft August 8, 2021 20:45
@afs afs changed the title JENA-1948: Convert between Jena and Titanium RDF objects JENA-1948: JSON-LD 1.1 - reader and writer Aug 9, 2021
@afs
Copy link
Member Author

afs commented Aug 9, 2021

Now includes prefix extraction (see LangJSONLD11.extractPrefixes - similar to jsonld-java except it follows the comment in the jsonld-java code and only considers an alias to be a prefix the uri string ends in #, / or :.

Includes a basic writer.

@acoburn - haven't figured out how to preload/cache contexts but that looks important for both performance and privacy/security. I think you have experience here?

@ajs6f
Copy link
Member

ajs6f commented Aug 9, 2021

From a security POV, we'd ideally like to offer pluggable context loading with a simple contract, e.g. some folks may want to load from a secure immutable store. That scenario was much-discussed during 1.1 WG. Does Titanium already offer something like that?

@afs
Copy link
Member Author

afs commented Aug 10, 2021

@ajs6f Try asking on JENA-1948 which is more likely to get attention.

(Judging by class names, the answer is "yes" - the question is "how?")

@afs afs marked this pull request as ready for review August 10, 2021 09:55
@afs
Copy link
Member Author

afs commented Aug 10, 2021

This work is now sufficiently complete to be put into the codebase.

@acoburn
Copy link
Member

acoburn commented Aug 10, 2021

haven't figured out how to preload/cache contexts but that looks important for both performance and privacy/security.

In Titanium, one would write something like this:

JsonLd.toRdf(...).loader(myCustomLoader).get();

There is a built-in document loader with certain caching properties; the code above allows a user to override that default with their own implementation of the DocumentLoader interface.

If you're able to use your own DocumentLoader, then you have complete control over what is loaded and how that resolution is performed. That structure also allows you to selectively accept/reject certain context documents based on allow/deny lists. This gets to @ajs6f's comment about using an immutable store. In my own code, I have found this to be really useful, but also very specific to the particular use case.

@afs
Copy link
Member Author

afs commented Aug 10, 2021

@acoburn -- thanks.

It's a JSON-LD issues that contexts are like a data version of tracking cookies and pixels, and not specifically about Titanium.

Anyway, if this PR looks good, we can go with the state here.

This is further along than I hoped for and more complete than the basic first PR. We may be able to smooth the legal issues if we have time. It's 425K more in the downloads.

@acoburn
Copy link
Member

acoburn commented Aug 10, 2021

This PR looks good in its present form.

Eventually, it will be nice to have support for doing something like this:

RDFParser.create().source(...).lang(JSONLD11).context(jsonldReadContext).parse(...);

where that context contains some mechanism to adjust how the JSON-LD @context resolution happens in Titanium (e.g. document loader). For instance, since that Jena Context object is already sent to the LangJSONLD11::read method, it might be possible to use that object to customize how the Titanium parsing proceeds.

@ajs6f
Copy link
Member

ajs6f commented Aug 10, 2021

From an API POV, I'd distinguish .context(jsonldReadContext) from .contextLoader(jsonldReadContextLoader).

return new TriGWriter() ;
if ( Objects.equals(RDFFormat.TRIG_BLOCKS, serialization) )
return new TriGWriterBlocks() ;
if ( Objects.equals(RDFFormat.TRIG_FLAT, serialization) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be convenient for the future to let RDFFormat feature methods to get writers and readers? Then TRIG_BLOCKS has writer() { return new TriGWriterBlocks(); } etc. Happy to send an independent PR if so.

Copy link
Member Author

@afs afs Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These few lines are only in the PR as a small bit of clearup to switch to using a function not an anonymous class.

What capability does that give that isn't available currently?

The right way to get a writer is via RDFWriter.create().

RDFFormat doesn't have readers.

One writer can service several RDFFormats - as happens for current JSON-LD.

The registry+factory approach means the association can be altered dynamically. We can use this to - for example - make Lang.JSONLD/RDFFormat.JSONLD switch from jsonld-java (1.0) and titanium (1.1).

As RDFFormat are keys into maps, returning "new" objects sort of breaks the substitution principle of value-equality. It would work for enums because there is only one object for each enum value. But enums are closed and RDFFormat is open.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see your point about closed vs open-- I guess I haven't ever added a format or had any reason to. In any event, this is a tiny thing and I am happy to leave it with your deep knowledge of the (de)ser systems. I haven't ever used them enough to understand where the proper entry points are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These formats (and more generally Lang) are a way to have "type checked constants". The use of string for everything causes problems e.g. this week a question about model.read(String, String) where the second string is base URI, not syntax. A legacy thing - we are where we are.

Lang in particular,may not even be implemented. Lang also cover result sets.

It's a bit clunky in Java but there again, a programming language with all possible features makes long-term readability suffer.

@afs
Copy link
Member Author

afs commented Aug 10, 2021

The way I think this would be done is via:

public RDFParserBuilder set(Symbol symbol, Object value)
RDFParser.create().source(...).lang(JSONLD11).set(symJsonldContext, jsonldReadContext).parse(...);

This style is done on newer WIP:

public QueryExecDatasetBuilder set(Symbol symbol, Object value) {

where we have set operations for the context as well as context which is replacement.

    public QueryExecDatasetBuilder set(Symbol symbol, Object value)

    public QueryExecDatasetBuilder set(Symbol symbol, boolean value)

    public QueryExecDatasetBuilder context(Context context)

RDFParser only has the last form.
(I am trying to support this style by some simple code pattern but not had time; inheritance is used for something else in other cases.)

This avoids RDFParserBuilder having an API dependency specially for JSON-LD which I think is important.

@ajs6f
Copy link
Member

ajs6f commented Aug 11, 2021

Then we can distinguish with two different symbols, one for context and one for context loader, yes?

@afs
Copy link
Member Author

afs commented Aug 11, 2021

two different symbols, one for context and one for context loader,

Yes.

The (RIOT) context is a map of symbols (best named with URIs for global uniqueness) to Objects.

So (JSON-LD) context setup and the (Titanium) context loader can be (RIOT) context entries.

@afs afs requested a review from acoburn August 11, 2021 10:33
@afs afs merged commit 48187f6 into apache:main Aug 18, 2021
@afs afs deleted the jsonld11-2 branch August 18, 2021 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants