Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Member Extraction Algorithm #78

Merged
merged 2 commits into from Sep 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
9 changes: 4 additions & 5 deletions spec.bs
Expand Up @@ -11,11 +11,10 @@ Mailing List: public-treecg@w3.org
Mailing List Archives: https://lists.w3.org/Archives/Public/public-treecg/
Editor: Pieter Colpaert, https://pietercolpaert.be
Abstract:
The TREE specification enables API developers to define relations between HTTP resources.
A collection of items can be fragmented, and these fragments can be interlinked.
It is an alternative to one-dimensional HTTP pagination.
Instead of linking to the next or previous page, the relation describes what elements can be found by following the link to another page.
Multiple links can be specified, enabling parallelization when traversing the search space.
The TREE hypermedia specification enables data publishers and API developers to publish collections of entities (the members).
It then allows to create one or more views of this collection.
A view organizes the members into multiple pages or nodes, and these nodes are interlinked using relations and/or search forms.
This way, a user agent that can interpret the TREE hypermedia controls can find the most efficient way to the members of interest to them.
</pre>


Expand Down
158 changes: 154 additions & 4 deletions specs/0-introduction.md
@@ -1,15 +1,165 @@
# Collections # {#introduction}
# Overview # {#overview}

<img src="https://docs.google.com/drawings/d/e/2PACX-1vTTCjBkBum1J4xgbg0oZJaD_H05dpZxhL6jrp1yzqoIsYw5EOa-7D24No_rfEyTipq1rLb-_tPTEYV0/pub?w=1093&amp;h=546" width="100%">

The TREE specification introduces these core concepts:
* a `tree:Collection` is a subclass of `dcat:Dataset`. The specialization is that it is a collection of members that MAY adhere to a certain shape. It typically has these properties when described in a node:
- `tree:member` indicates the object is a member of the collection
- `tree:view` indicates a root node from where all members can be reached
- `tree:shape` indicates the SHACL [[!SHACL]] shape to which each member in the collection adheres
- `tree:member` points at the first focus node from which to retrieve and extract all quads of a member.
- `tree:view` points to a `tree:Node` from which all members can be reached.
- `tree:shape` indicates the SHACL [[!SHACL]] shape to which each member in the collection adheres.
- `tree:viewDescription` links to a description of the view (a `tree:ViewDescription`). Multiple descriptions MAY be provided that MUST be combined.
* a `tree:Node`: is a page on which relations to other pages are described through the `tree:relation` predicate, and/or through which a next `tree:Node` can be found by using the `tree:search` form.
* a `tree:Relation` is a relation from one node to another. An extension of this class indicates a specific type of relation (e.g., a `tree:GreaterThanRelation`). A relation typically has these properties:
- a `tree:node` the URL of the other node
- a `tree:path` indicating to which of the members' properties this relation applies
- a `tree:value` indicating a value constraint on the members' values
- a `tree:remainingItems` defining how many members can be reached when following this relation
* a `tree:ViewDescription` is a subclass of `dcat:DataService` and serves a `tree:Collection`.
- a `tree:search` describes a search form that allows an agent to jump to a specific `tree:Node`.

The first step when creating a TREE hypermedia interface is defining a collection of members:

<div class="example">
```
ex:Collection1 a tree:Collection;
rdfs:label "A Collection of subjects"@en;
tree:member ex:Subject1, ex:Subject2 .

ex:Subject1 a ex:Subject ;
rdfs:label "Subject 1" ;
ex:value 1 .

ex:Subject2 a ex:Subject ;
rdfs:label "Subject 2" ;
ex:value 2 .
```
</div>

From the moment this collection of members grows too big for one page, a fragmentation needs to be created in which an initial set of member can be found on an entry node, and more members can be found by interpreting the TREE _hypermedia controls_. This is illustrated by the next example:

<div class="example">
```
> HTTP GET https://example.org/Node1

ex:Collection1 a tree:Collection;
tree:view ex:Node1 ;
tree:member ex:Subject1, ex:Subject2 .

ex:Node1 a tree:Node ;
tree:relation ex:R1,ex:R2 ;
tree:viewDescription ex:ViewDescription1 .

ex:R1 a tree:GreaterThanOrEqualToRelation ;
tree:node ex:Node3 ; # This is the URL of another page
tree:value 3;
tree:path ex:value .

ex:R1 a tree:LessThanRelation ; # This is very useful for a client that is looking for a value 10 or greater
tree:node ex:Node3 ; # This is the URL of another page
tree:value 10;
tree:remainingItems 7 ;
tree:path ex:value .

ex:R2 a tree:GreaterThanOrEqualToRelation ;
tree:node ex:Node4 ; # This is the URL of another page
tree:value 10;
tree:remainingItems 10 ;
tree:path ex:value .

ex:Subject1 a ex:Subject ;
rdfs:label "Subject 1" ;
ex:value 1 .

ex:Subject2 a ex:Subject ;
rdfs:label "Subject 2" ;
ex:value 2 .
```
</div>

<div class="informative">
Thanks to the [member extraction algorithm](#member-extraction-algorithm), a data publisher can choose to define their members in different ways:
1. As in the examples above: all quads with the object of the `tree:member` quads as a subject (and recursively the quads of their blank nodes) are by default included (see also [CBD](https://www.w3.org/submissions/CBD/)), except when they would explicitely not be included in case 3, when the shape would be closed.
2. Out of band / in band:
- when no quads of a member have been found, the member will be dereferenced. This allows to publish the member on a separate page.
- part of the member can be maintained elsewhere when a shape is defined (see 3)
3. By defining a more complex shape with `tree:shape`, also nested entities can be included in the member
4. By putting the triples in a named graph of the object of `tree:member`, all these triples will be matched.
</div>

# Definitions # {#formalizations}

A `tree:Collection` is a set of `tree:Member`s. The set of members MAY be empty.

A `tree:Member` is a set of (at least one) quad(s) defined by the member extraction algorithm (next subsection).

A `tree:Node` is a dereferenceable resource of `tree:Relation`s and a subset of (`⊆`) members of the collection. In a `tree:Node`, both the set of `tree:Relation`s as the subset of members MAY be empty. The same member MAY be contained in multiple nodes.

A `tree:Relation` is a function denoting a conditional link to another `tree:Node`.

Note: The condition of multiple `tree:Relation`s to the same `tree:Node` MUST be combined with a logical AND.

A View is a specific set of interlinked `tree:Node`s, that together contain all members in a collection. A specific view will adhere to a certain growth or tree balancing strategy. In one view, completeness MUST be guaranteed.

A `tree:search` form is a IRI template, that when filled out with the right parameters becomes a `tree:Node` IRI, or when dereferenced will redirect to a `tree:Node` from which all members in the collection that adhere to the described comparator can be found.

# The member extraction algorithm # {#member-extraction-algorithm}

The first focus node is the object of the `tree:member` triple.
1a. If a shape is set, [create a shape template](#shape-template) and execute the shape template extraction algorithm
1b. If no shape was set, extract all quads with subject the focus node, and recursively include its blank nodes (see also [CBD](https://www.w3.org/submissions/CBD/))
2. Extract all quads with the graph matching the focus node
3. When no quads were extracted from 1 and 2, a client MUST dereference the focus node and re-execute 1 and 2.

## Shape Template extraction ## {#shape-template-extraction}

The Shape Template is a structure that looks as follows:

<div class="example">
```
class ShapeTemplate {
closed: boolean;
requiredPaths: Path[];
optionalPaths: Path[];
nodelinks: NodeLink[];
atLeastOneLists: [ Shape[] ];
}
class NodeLink {
shape: ShapeTemplate;
path: Path;
}
```
</div>

Paths in the shape templates are [SHACL Property Paths](https://www.w3.org/TR/shacl/#property-paths).

A Shape Template has
* __Closed:__ A boolean telling whether it’s closed or not. If it’s open, a client MUST extract all quads, after a potential HTTP request to the focus node, with subject the focus node, and recursively include its blank nodes (see also [CBD](https://www.w3.org/submissions/CBD/))
* __Required paths:__ MUST trigger an HTTP request if the member does not have this path. All quads from paths, after a potential HTTP request, matching this required path MUST be added to the Member set.
* __Optional paths:__ All quads from paths, after a potential HTTP request, matching this path MUST be added to the Member set.
* __Node Links:__ A nodelink contains a reference to another Shape Template, as well as a path. All quads, after a potential HTTP request, matching this path MUST be added to the Member set. The targets MUST be processed again using the shape template extraction algorithm on that
* __atLeastOneLists__: Each atLeastOneList is an array of at least one shape with one or more required paths and atLeastOneLists that must be set. If none of the shapes match, it will trigger an HTTP request. Only the quads from paths matching valid shapes are included in the Member.

Note: Certain quads are going to be matched by the algorithm multiple times. Each quad will of course be part of the member only once.

This results in this algorithm:
1. If it is open, a client MUST extract all quads, after a potential HTTP request to the focus node, with subject the focus node, and recursively include its blank nodes (see also [CBD](https://www.w3.org/submissions/CBD/))
2. If the current focus node is a named node and it was not requested before:
- test if all required paths are set, if not do an HTTP request, if they are set, then,
- test if at least one of each list in the atLeastOneLists was set. If not, do an HTTP request.
3. Visit all paths (required, optional, nodelinks and recursively the shapes in the atLeastOneLists if the shape is valid) paths and add all quads necessary to reach the targets to the result
4. For the results of nodelinks, if the target is a named node, set it as a focus node and repeat this algorithm with that nodelink’s shape as a shape

### Generating a shape template from SHACL ### {#shacl-to-shape-template}

On a `tree:Collection`, a SHACL shape MAY be provided with the `tree:shape` property.
In that case, the SHACL shape MUST be processed towards a Shape Template as follows:

1. Checks if the shape is deactivated (`:S sh:deactivated true`), if it is, don’t continue
2. Check if the shape is closed (`:S sh:closed true`), set the closed boolean to true.
3. All `sh:property` elements with an `sh:node` link are added to the shape’s NodeLinks array
4. Add all properties with `sh:minCount` > 0 to the Required Paths array, and all others to the optional paths.
5. Processes the [conditionals](https://www.w3.org/TR/shacl/#core-components-logical) `sh:xone`, `sh:or` and `sh:and` (but doesn’t process `sh:not`):
- `sh:and`: all properties on that shape template MUST be merged with the current shape template
- `sh:xone` and `sh:or`: in both cases, at least one item must match at least one quad for all required paths. If not, it will do an HTTP request to the current namednode.

Note: The way we process SHACL shapes into Shape Template is important to understand in order to know when an HTTP request will be triggered when designing SHACL shapes. A cardinality constraint not being exactly matched or a `sh:pattern` not being respected will not trigger an HTTP request, and instead just add the invalid quads to the Member. This is a design choice: we only define triggers for HTTP request from the SHACL shape to come to a complete set of quads describing the member the data publisher pointed at using `tree:member`.
74 changes: 40 additions & 34 deletions specs/1-discovery.md
@@ -1,58 +1,64 @@
# Core Concepts # {#core-concepts}
# Discovery and source selection# {#hypermedia}

A node from which all members of a collection can be discovered, can be found through a triple stating `ex:C1 tree:view ex:N1` with `ex:C1` being a `tree:Collection` and `ex:N1` being a `tree:Node`.
TREE tackles discovery on three levels: i) interface discovery, ii) view discovery, and iii) dataset discovery.
Interface discovery discovers what collection the current page is part of, and discovers what the next possible HTTP requests are through relations and search forms.
One dataset can have multiple views that can be published across different servers, selecting one for a certain use case is part of the view discovery.
Dataset discovery is then selecting a `tree:Collection` of interest.


## Interface discovery ## {#interface-discovery}

Interface discovery starts when a URL is provided to a specific `tree:Node`.
A node from which all members of a collection can be discovered (an “entry node”), can be found through a triple stating `ex:C1 tree:view ex:N1` with `ex:C1` being a `tree:Collection` and `ex:N1` being a `tree:Node`.

When the current page is a `tree:Node`, there MUST be a property linking the current page URL to the URI of the `tree:Collection`. However, not from all `tree:Node`s all members can be reached, and therefore 2 other properties can be used: `void:subset`, or the inverse property, `dcterms:isPartOf`.

Three properties may thus be used:
Three properties MAY thus be used:
1. `ex:C1 tree:view <> .`<br/>May be used *only* in the case when the entire `tree:Collection` can be found starting from the current node.
2. `ex:C1 void:subset <> .`<br/>When the node is not a node from which all members can be found, but still is a subset of the collection that can be found.
3. `<> dcterms:isPartOf ex:C1 .`<br/>The inverse property of 2.

## Selecting from multiple Collections ## {#multiple-collections}

When multiple collections are found by a client, it can choose to prune the collections based on the `tree:shape` property.
Therefore a data publisher SHOULD annotate a `tree:Collection` instance with a SHACL shape.
The `tree:shape` points to a SHACL description of the shape (`sh:NodeShape`).

Note: the shape can be a blank node, or a named node on which you should follow your nose when it is defined at a different HTTP URL.

Note: For compatibility with the [Solid specifications](https://solidproject.org/TR/), a ShEx shape may also be given (see the chapter on compatibility bellow).

## Selecting a view from multiple views ## {#multiple-views}

Every entity linked from `tree:view` MUST be an entry point to retrieve **all** members of the collection.
Multiple `tree:view` links MAY be provided, and a TREE client MUST traverse all relations from the `tree:Node`s linked to this particular collection.
We refer to next chapters for traversing across multiple relations, or for using search forms.

Note: A `tree:Node` linked through `tree:view` can thus be used to _view_ all members of the collection, hence the name (this is similar in the Hydra specification).
## View discovery ## {#multiple-views}

How a client picks the right view is use-case specific, and can be prioritized by studying the `tree:ViewDescription`’s properties (see next subsection).
In order to fetch all members, one can be chosen at random if no specific `tree:ViewDescription` is given.
In order to prioritize a specific view, the relations and search forms in the root nodes can be studied for their relation types, path or remaining items.
Every node linked from `tree:view` MUST be an entry point to retrieve **all** members of the collection.
Multiple `tree:view` links MAY be provided.
If a TREE client wants to guarantee compleneteness, it picks one link and then traverses all relations.

Note: How a client picks the right view is use case specific. The `tree:ViewDescription`’s properties can help in that regards.

### Describing the View using a ViewDescription ### {#view-description}

In order to prioritize a specific view link, the relations and search forms in the entry nodes can be studied for their relation types, path or remaining items.
The class `tree:ViewDescription` indicates a specific TREE structure on a `tree:Collection`.
Through the property `tree:viewDescription` a `tree:Node` can create an entity that describes the view, and can be reused in data portals as the dcat:DataService.
The `tree:ViewDescription` class is a `rdfs:subClassOf` a `dcat:DataService`.

Through the property `tree:viewDescription` a `tree:Node` can link to an entity that describes the view, and can be reused in data portals as the `dcat:DataService`.

<div class="example">
```
## What can be found in a tree:Node
ex:N1 a tree:Node ;
tree:viewDescription ex:Fragmentation1 .
tree:viewDescription ex:View1 .

ex:C1 tree:view ex:N1 .

ex:Fragmentation1 a tree:ViewDescription ; #this is an rdfs:subClassOf dcat:DataService
ex:C1 a tree:Collection ;
tree:view ex:N1 .

## What can be found on a data portal
ex:C1 a dcat:Dataset .
ex:View1 a tree:ViewDescription, dcat:DataService ;
dcat:endpointURL ex:N1 ; # The entry point that can be advertised in a data portal
dcat:servesDataset ex:C1 . # Can be infered from the `tree:view → tree:viewDescription` property path.
dcat:servesDataset ex:C1 .
```
</div>

When there is no `tree:viewDescription` property in this page, a client either already discovered the description of this view in an earlier `tree:Node`, either the current `tree:Node` is implicitly the ViewDescription. Therefore, when the property path `tree:view → tree:viewDescription` does not yield a result, the view properties MUST be extracted from the object of the `tree:view` triple.
A `tree:Node` can also be double typed as the `tree:ViewDescription`. A client must thus check for ViewDescriptions on both the current node without the `tree:viewDescription` qualification, as on the current node with the `tree:viewDescription` link.

Note: In Linked Data Event Streams, the [ldes:EventSource class](https://w3id.org/ldes#EventSource) exists to indicate this fragmentation is designed to be the source for all derived views. The Linked Data Event Streams specification can also further elaborate on the ViewDescription by for example describing a retention policy on top of it.

Note: In [the Smart Data Specification](https://w3id.org/sds/specification), a `tree:ViewDescription` can be used to describe the algorithm that created this specific fragmentation.
## Dataset discovery ## {#multiple-collections}

A `tree:Node` can also be double typed as the `tree:ViewDescription`. A client must thus check for ViewDescriptions on both the current node without the `tree:viewDescription` qualification, as on the current node with the `tree:viewDescription` link.
When multiple collections are found by a client, it can choose to prune the collections based on the `tree:shape` property.
Therefore a data publisher SHOULD annotate a `tree:Collection` instance with a SHACL shape.
The `tree:shape` points to a SHACL description of the shape (`sh:NodeShape`).

Note: the shape can be a blank node, or a named node on which you should follow your nose when it is defined at a different HTTP URL.

Note: For compatibility with the [Solid specifications](https://solidproject.org/TR/), a ShEx shape may also be given (see the chapter on compatibility bellow).