Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to extract shape objects from a graph by their URI #295

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

TShapinsky
Copy link
Member

TODO:
If this is green lighted I will add requisite tests for this feature.

@TShapinsky
Copy link
Member Author

TShapinsky commented Nov 2, 2023

@gtfierro I put together this implementation of a way to extract relevant triples from a graph by the shape's URI. Let me know what you think.

One thing which is currently missing is adding any non-shape objects to the graph.

Another possibility is including relevant owl:import where an ontology or pre-existing import exists in the graph.

@gtfierro
Copy link
Collaborator

gtfierro commented Nov 5, 2023

Looks like a good start!

We should be specific about what kind of information we want to include inside this graph, and what we hope to do with the resulting graph. Is our intent just to summarize what the shape does, or are we trying to just save the triples necessary to properly conduct validation against the shape? The latter is potentially fairly complicated. I imagine this includes:

  • all triples in the shape's CBD
  • definition of all classes and shapes contained within the CBD, recursively
  • definition of all shapes that refer to the original shape, this includes explicitly (in the triples of the CBD), as well as inside SPARQL queries (which 223P makes heavy use of)

It doesn't look like the notebook ran properly. Would you be able to commit a run of the notebook so I can see what the output looks like?

@TShapinsky
Copy link
Member Author

TShapinsky commented Nov 6, 2023

Looks like a good start!

We should be specific about what kind of information we want to include inside this graph, and what we hope to do with the resulting graph. Is our intent just to summarize what the shape does, or are we trying to just save the triples necessary to properly conduct validation against the shape? The latter is potentially fairly complicated. I imagine this includes:

  • all triples in the shape's CBD
  • definition of all classes and shapes contained within the CBD, recursively
  • definition of all shapes that refer to the original shape, this includes explicitly (in the triples of the CBD), as well as inside SPARQL queries (which 223P makes heavy use of)

It doesn't look like the notebook ran properly. Would you be able to commit a run of the notebook so I can see what the output looks like?

Hey Gabe,

This is a good question. On one hand I think we just want to be able to reason about a shape past its URIRef in the abstract. On the other hand there are definitely circumstances where portability should be considered. And, while portability is nice trying to accomplish that in its entirety will almost certainly hit some nasty edge cases.

Here is my possibly unifying proposal:
When you extract a target shape from a target graph you are essentially creating a subgraph view with the triples that are relevant to that shape. This would mean that any shapes or classes which the target shape references that are in the target graph would be extracted, as would any shapes or classes which they reference. However, if they reference a Brick class per-se, as long as Brick is not in the target graph, the class would not accompany the target shape. In the case where an owl:import statement exists in the target graph and imports a namespace which is referenced by the target shape that import should be included.

If you wanted to get the full shape and everything needed to run it you can pull all of those into one graph before extracting the shape. In general I believe the extracted shape should be as portable and executable as the graph it came from, no more no less. This should help reduce unwanted behavior.

Additionally I'm not sure if I agree that the extracted shape should include all shapes which reference if they are not required to execute the target shape.

Extracted shapes should be:

  • As executable as the graph they came from (no loss of functionality)
  • Set addition with the original graph yields no changes (no triple generation)
  • Supporting nodes should be included in full (don't only include the class declaration triple)
  • Concise (No extraneous nodes)

Thoughts?

@gtfierro
Copy link
Collaborator

In general I believe the extracted shape should be as portable and executable as the graph it came from, no more no less. This should help reduce unwanted behavior.

I think this is a great principle. If you want to include the definitions from terms defined in other ontologies, then you need to make sure they are imported. I could see myself implementing this in https://github.com/gtfierro/ontoenv-rs

Additionally I'm not sure if I agree that the extracted shape should include all shapes which reference if they are not required to execute the target shape.

The question of "what is required" to execute the target shape will be tricky to determine. It will require following shapes and their triggers through subclass hierarchies, inferred properties, etc. If this is a design goal (which I think it should be, eventually) then we should carefully document the algorithm/traversals we use to find the components necessary for executing a given shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants