Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api-v2): Add an RDF processing façade (DSP-1020) #1754

Merged
merged 36 commits into from Nov 17, 2020

Conversation

@benjamingeer
Copy link
Collaborator

@benjamingeer benjamingeer commented Nov 6, 2020

This PR adds an RDF processing façade in webapi/src/main/scala/org/knora/webapi/messages/util/rdf, with two different implementations (Jena and RDF4J).

The API

  • RdfModel, which represents a set of RDF graphs (a default graph and/or one or more named graphs)
  • RdfNode and its subclasses, which represent RDF nodes (IRIs, blank nodes, and literals)
  • Statement, which represents a triple or quad
  • RdfNodeFactory, which creates nodes and statements
  • RdfModelFactory, which creates empty RDF models
  • RdfFormatUtil, which parses and formats RDF
  • RdfFeatureFactory, which returns instances of RdfNodeFactory, RdfModelFactory, and RdfFormatUtil, using feature toggle configuration.

The implementations

  • The Jena-based implementation, in package org.knora.webapi.messages.util.rdf.jenaimpl
  • The RDF4J-based implementation, in package org.knora.webapi.messages.util.rdf.rdf4jimpl

The feature toggle

jena-rdf-library:

  • on means use the Jena implementation
  • off (the default) means use the RDF4J implementation, which was previously the main one used in Knora

Tasks

  • Add traits for the façade.
  • Wrap model building and querying functionality (RdfModel, RdfModelFactory, RdfNodeFactory).
  • Add a feature toggle and feature factory (RdfFeatureFactory).
  • Wrap formatting and parsing (RdfFormatUtil).
  • Use the façade:
    • JsonLDUtil
    • RouteUtilV2
    • KnoraRequestV2
    • KnoraResponseV2
    • HttpTriplestoreConnector:
      • SparqlConstructRequest
      • SparqlExtendedConstructRequest
  • Add featureFactoryConfig to everything that depends on CONSTRUCT requests (most of the changes in this PR)
  • Add abstract test classes, with subclasses that test using Jena and RDF4J:
    • RDF4JModelSpec
    • RDF4JFormatUtilSpec
    • JsonLDUtilSpec
    • KnoraResponseV2Spec
  • Update tests:
    • E2ESpec and subclasses.
    • R2RSpec and subclasses.
    • MetadataMessagesV2Spec
    • MetadataRouteV2E2ESpec
  • Add docs.
  • Clean up Bazel dependencies.

What still uses RDF4J directly

  • Things that use RDF4J's streaming API to process large amounts of data, especially to avoid constructing a large string in TriG format:
    • ProjectsResponderADM.projectDataGetRequestADM
    • HttpTriplestoreConnector.turtleToTrig
    • RepositoryUpdater
  • The repository update plugin tests, which use SPARQL
  • TEIHeader: uses XSLT that depends on the exact format of RDF/XML generated by RDF4J. The XSLT would need to be improved to handle rdf:Description.
  • GravsearchParser: uses RDF4J's SPARQL parser, not worth changing

TODO in a later PR

  • SHACL validation
  • SPARQL querying
  • A streaming parsing/formatting API for processing very large graphs
@benjamingeer benjamingeer marked this pull request as draft Nov 6, 2020
@benjamingeer benjamingeer self-assigned this Nov 6, 2020
@benjamingeer benjamingeer changed the title feat: Add an RDF processing façade (DSP-1020) feat(api-v2): Add an RDF processing façade (DSP-1020) Nov 6, 2020
benjamingeer added 20 commits Nov 9, 2020
@benjamingeer benjamingeer marked this pull request as ready for review Nov 13, 2020
@benjamingeer benjamingeer requested a review from SepidehAlassi Nov 13, 2020
benjamingeer added 11 commits Nov 14, 2020
graphContents match {
case jsonLDArray: JsonLDArray =>
// Add each of the array's elements to the model.
for (elem <- jsonLDArray.value) {
elem match {

This comment has been minimized.

@SepidehAlassi

SepidehAlassi Nov 17, 2020
Contributor

Please add a comment here: like "Is the element a JSON-LD Object? Yes. Add to the model. No. Invalid graph "

*/
object JsonLDConstants {
object JsonLDKeywords {

This comment has been minimized.

@SepidehAlassi

SepidehAlassi Nov 17, 2020
Contributor

I am so glad you renamed this.


case jsonLDArray: JsonLDArray =>
// It has more than one @type.
// More than one.
for (elem <- jsonLDArray.value) {
elem match {

This comment has been minimized.

@SepidehAlassi

SepidehAlassi Nov 17, 2020
Contributor

Please add a comment here like: "Is each element of @type string?" Yes. Add the type to the model. No. Throw exception.

val thisModelNamedGraphIris: Set[jena.graph.Node] = datasetGraph.listGraphNodes.asScala.toSet
val thatModelNamedGraphIris: Set[jena.graph.Node] = thatDatasetGraph.listGraphNodes.asScala.toSet

// The two models are isomorphic if:

This comment has been minimized.

@SepidehAlassi

SepidehAlassi Nov 17, 2020
Contributor

Nice! without this explanation, I wouldn't have got this part.

val graph1LabelStatement = nodeFactory.makeStatement(
subj = nodeFactory.makeIriNode("http://example.org/6"),
pred = labelPred,
obj = nodeFactory.makeDatatypeLiteral(value = "Lucky's Discount X-Wing Repair", datatype = OntologyConstants.Xsd.String),

This comment has been minimized.

val graph2LabelStatement = nodeFactory.makeStatement(
subj = nodeFactory.makeIriNode("http://example.org/7"),
pred = labelPred,
obj = nodeFactory.makeDatatypeLiteral(value = "Mos Eisley Used Droids", datatype = OntologyConstants.Xsd.String),

This comment has been minimized.

@SepidehAlassi

SepidehAlassi Nov 17, 2020
Contributor

"Mos Eisley used droids that were not allowed in the Cantina"


// Compare that with the model generated by the JsonLDDocument.
jsonLDOutputModel should ===(jsonLDExpectedModel)
}

"correctly convert an RDF model to JSON-LD if it contains a circular reference" in {

This comment has been minimized.

@SepidehAlassi

SepidehAlassi Nov 17, 2020
Contributor

Excellent!

@SepidehAlassi
Copy link
Contributor

@SepidehAlassi SepidehAlassi commented Nov 17, 2020

@benjamingeer This looks great, thanks for all your work.

@benjamingeer
Copy link
Collaborator Author

@benjamingeer benjamingeer commented Nov 17, 2020

Many thanks for reviewing this!

@benjamingeer benjamingeer merged commit 9170419 into main Nov 17, 2020
8 checks passed
8 checks passed
Build Everything Build Everything
Details
API Unit Tests API Unit Tests
Details
API E2E Tests API E2E Tests
Details
API Integration Tests API Integration Tests
Details
Upgrade Integration Tests Upgrade Integration Tests
Details
Docs Build Test Docs Build Test
Details
Update next release draft
Details
Publish (on release only)
Details
@benjamingeer benjamingeer deleted the wip/DSP-1020-rdf-api branch Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants