Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing Linked Data Fragments #608

Merged
merged 49 commits into from
Jan 3, 2024
Merged

Introducing Linked Data Fragments #608

merged 49 commits into from
Jan 3, 2024

Conversation

langsamu
Copy link
Contributor

@langsamu langsamu commented Nov 12, 2023

Summary

This change resolves #200 and introduces a Triple Pattern Fragments (TPF) client to dotNetRDF.

The client is an implementation of the IGraph interface which dispatches graph operations to a TPF endpoint.

Value

The advantage of this approach is that developers can interact with Linked Data Fragments in the exact same manner as they do with any other graph (constructed programmatically or loaded from disk or network).

A corollary is the ability to run SPARQL queries against a TPF endpoint. This is a feature that other LDF clients (like Comunica) achieve by mapping the SPARQL algebra to LDF operations. In our case this is achieved by the Leviathan query engine that maps the same to methods exposed by IGraph, which the implementation introduced here translates to TPF network calls.

Limitations

I've chosen to implement TPF as opposed to Quad Pattern Fragments (QPF) because the chosen implementation approach is not viable for QPF in dotNetRDF as we don't really have quads. Of course QPF could be implemented in DNR in multiple ways, but I think there is still value in exploring this implementation paradigm.

This is a naive implementation. There are no robustness/resilience measures like the capability to rety and back off. It is very likely that production usage will suffer from network outage and throttling. For now, mitigating these scenarios is left to applications hosting this client.
This is an intentional choice: I'd rather start with a naive solution and allow usage patterns to drive further development. Better than me trying to introduce plumbing here based on assumption only.

The client introduced here is also naive in its lack of support for caching. Again, this is by design.

Mitigation

Same applies to the current lack of support for authentication, which is likely to be required in all but the most simple use-cases.

There is a limited extensibility mechanism introduced here that might help aleviate some of the above shortcomings: The client accepts an optional Loader. Callers could implement resilience, caching or authentication here.

Architecture

This feature focuses on the consumption of TPF, so it is under a Client namespace.
Future work could implement server features.

  • VDS.RDF.LDF.Client.TpfLiveGraph is the single public API of this client.
    It's an IGraph whose triples are a TpfTripleCollection. This is its major contribution.
    In addition it overrides several methods to advertise lack of support for mutation, blank nodes and RDF*.

  • VDS.RDF.LDF.Client.TpfTripleCollection is the main mapping between our graph operations and LDF.
    It's a BaseTripleCollectionimplemented by linking WithXXX and other methods to our TpfEnumerable.

  • VDS.RDF.LDF.Client.TpfEnumerator (and TpfEnumerable) is the workhorse of the library.
    It is responsible for returning triples from the TPF endpoint to our client.
    It is the implementation of LDF paging.

  • VDS.RDF.LDF.Client.TpfLoader is our representation of the TPF response.

  • VDS.RDF.LDF.Client.TpfMetadataGraph is our representation of the TPF response metadata.

  • VDS.RDF.LDF.Client.TpfParameters is our representation of a TPF request.

Copy link
Contributor

@kal kal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @langsamu. As the PR is still in draft I thought I would just leave these comments for you. I'm happy to take on any/all of the suggested changes myself, but I don't want to step on your toes if there is still some further work you want to do on the PR before submitting it as ready for review.

Libraries/dotNetRdf.LinkedDataFragments/Enumerable.cs Outdated Show resolved Hide resolved
Libraries/dotNetRdf.LinkedDataFragments/Enumerator.cs Outdated Show resolved Hide resolved
Libraries/dotNetRdf.LinkedDataFragments/Graph.cs Outdated Show resolved Hide resolved
Libraries/dotNetRdf.LinkedDataFragments/Graph.cs Outdated Show resolved Hide resolved
Libraries/dotNetRdf.LinkedDataFragments/Parameters.cs Outdated Show resolved Hide resolved
Libraries/dotNetRdf.LinkedDataFragments/TripleStore.cs Outdated Show resolved Hide resolved
Testing/dotNetRdf.LinkedDataFragments.Tests/GraphTests.cs Outdated Show resolved Hide resolved
@langsamu
Copy link
Contributor Author

langsamu commented Nov 19, 2023

Thanks for the initial review @kal.

This is far from done.
I was working on mock responses when I noticed your comment.
I'll definitely tackle naming and assertions as well.

@langsamu
Copy link
Contributor Author

@kal I think this is ready for an interim review.

@kal
Copy link
Contributor

kal commented Dec 27, 2023

This is great, thanks @langsamu ! I've taken a read through the code and don't have any comments to add beyond the few things that you have already got noted in TODOs. FWIW I think that the use of SPARQL rather than SHACL is fine internally. Being able to configure the parser used would be good, but I don't feel that its absolutely critical to getting the feature merged (though if its not done then worth creating an issue as a reminder to get it done later).

The only missing element for me now is documentation of public classes and methods and ideally also some overview documentation somewhere in the Users' Guide.

@langsamu langsamu changed the title Linked Data Fragments Introducing Linked Data Fragments Dec 29, 2023
@langsamu
Copy link
Contributor Author

langsamu commented Dec 29, 2023

This is great, thanks @langsamu ! I've taken a read through the code and don't have any comments to add beyond the few things that you have already got noted in TODOs. FWIW I think that the use of SPARQL rather than SHACL is fine internally. Being able to configure the parser used would be good, but I don't feel that its absolutely critical to getting the feature merged (though if its not done then worth creating an issue as a reminder to get it done later).

The only missing element for me now is documentation of public classes and methods and ideally also some overview documentation somewhere in the Users' Guide.

@langsamu langsamu marked this pull request as ready for review December 29, 2023 23:11
@langsamu
Copy link
Contributor Author

All yours @kal.

Copy link
Contributor

@kal kal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all of this work @langsamu! This is a really useful addition to the library and I'm really happy to approve! 🎉

@kal kal merged commit ca3ebdf into dotnetrdf:main Jan 3, 2024
62 checks passed
@langsamu langsamu deleted the ldf branch January 3, 2024 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Linked Data Fragments
2 participants