# Review of rules and terminology

* author: Nikola Vasiljevic
* date: 2023-01-16

This notebook is a review of the rules and terminology used in `NEAT`. As such it is intended as a reference and tutorial for developers and users of `NEAT`.

Let's start with the introduction of the basic graph related terminology from the perspective of:
- classical graph theory
- RDF data model
- `rdfpath` rules


## Graph theory

In classical graph theory we have the following concepts:
- **node** - a vertex in a graph
- **node attribute** - a property of a node
- **node attribute value** - a value of a node attribute
- **edge** - a connection between two nodes
- **path** - a sequence of nodes and edges connecting two nodes
- **cycle** - a path that starts and ends at the same node
- **graph** - a set of nodes and edges


## RDF data model

In the context of RDF data model specifically for the purpose of `neat` we define the following concepts:
- **prefix** - a namespace prefix used to identify a namespace in RDF graph (e.g. `rdf`, `rdfs`, `owl`, `skos`, etc.). Prefix consist of combination of any English letters, numbers and web safe characters `.`, `-` and `_`.

- **namespace** - a namespace used to identify entities (subjects, predicates, objects) in RDF graph (e.g. `http://www.w3.org/1999/02/22-rdf-syntax-ns#`, `http://www.w3.org/2000/01/rdf-schema#`, `http://www.w3.org/2002/07/owl#`, `http://www.w3.org/2004/02/skos/core#`, etc.)

- **identifier** - a unique identifier of an entity in RDF graph typically provided as combination of `prefix:entity_name` (e.g. `rdf:type` which in reality resolves to full URI `http://www.w3.org/1999/02/22-rdf-syntax-ns#type`)

- **entity** - class, class instance or property

- **(RDF) type** - a globally recognized property used to state that a subject is an instance of a particular (OWL) class

- **OWL class** - Classes provide an abstraction mechanism for defining characteristics of a set of individuals (aka class instances). Classes are defined by a set of properties that are common to all individuals in the class. Classes are also used to define the structure of a knowledge base. (e.g. `cim17:IdentifiedObject`, `cim17:Equipment`, `cim17:PowerSystemResource`, etc.)

- **subject** - an instance of (OWL) class (i.e. a node in a graph)

- **predicate** - a property of subject, in RDF we distunguish:
    - **annotation property** - used primarily to annotate a subject with labels and comments (e.g. `rdfs:label`, `rdfs:comment`, `skos:prefLabel`, `skos:altLabel`, etc.). Annotation properties can be used to annotate both classes, class instances and other properties.
    - **data property** - used to state that a subject, being an instance of class, has a particular value of a particular data type (e.g. `cim17:Identified.mRID` is a a data property which data type `xsd:string`)
    - **object property** - properties for which the value is an instance of class (e.g., `cim17:IdentifiedObject.EquipmentContainer` is an object property which value is an instance of class `cim17:EquipmentContainer`)
    
- **object** - an instance of (OWL) class or property value (see above)

- **triple** - a subject, predicate, object tuple representing "statement" in a graph

- **graph** - a set of triples describing OWL class instances

- **path** - a sequence of triples connecting two nodes in a graph or node and property value of another node 

Based on the RDF data model terminology we can see that:
- subjects map to nodes
- predicates map to edges and node attributes
- objects map to nodes and node attribute values
- triples map to:
    - node - edge - node
    - node - node attribute - node attribute value

> In brother context of RDF graphs, subjects are not necessarily only class instances, but can be also be classes and properties. For example, when we are defining a new property in RDF, which we do through triples (aka statements s,p,o),  the statements' subject will be the property we are defining. Similarly, when we are defining a class, then subject of statements that define that class will be the class we define.

In [None]:
# read Nordic44 and show on the example what each means:

## `rdfpath` rules

`rdfpath` rules are a set of rules that allow to define a path to traverse graph using simplified syntax for the purpose of transforming original graph to new potentially simplified and/or enriched graph. Rules are defined per class property in the new graph, typically in the form of:

```
class_name | property_name | rule
```

exception is when we do not want to define new property in the new graph, but rather want to extract all properties from the original graph, in which case we can define the rule as:

```
class_name | * | wild_card_rule
```

When the rule is run through `NEAT`it resolves in explicit `SPARQL` query that is executed against the original graph (aka domain).


The rules are composed using the following elements:
- **prefix:class_name**: an identifier of particular OWL class which instances we are interested in. `class_name` similar to prefix is a combination of any English letters, numbers and web safe characters such as `.`, `-` and `_`.

- **(prefix:property_name)**: an identifier of particular property which values we are interested in. `property_name` similar to prefix is a combination of any English letters, numbers and web safe `.`, `-` and `_`.

- **(*)**: wild card indicating all properties, used when defining `wild_card_rule`

- **->**: direction of the graph traversal between instances of two OWL classes, which allows omitting defining property that connects them (but requires that one knows that property do exist)

- **<-**: direction of the graph traversal between instances of two OWL classes, which allows omitting defining property that connects them (but requires that one knows that property do exist)

- **OR**: logical OR operator allowing to define multiple rules to traverse the graph




Accordingly we can have following rules defined using the above elements:
- **prefix:class_name(*)**: this particular rule allows extracting all properties and their values for instances of particular class `prefix:class_name`, which are then added to the new graph.

- **prefix:class_name(prefix:property_name)**: this particular rule allows extracting all instances of particular class `prefix:class_name` and values of particular property `prefix:property_name`.

- **prefix:class_name_1->...->prefix:class_name_N**: this particular rule allows defining a traversal path from instances of class_name_1 to instances of class_name_N (where N > 1) resulting in object property of class_name_1 instances which values are identifiers of class_name_N instances which can be connected to them. This basically allows shortening connection between instances of two classes which otherwise would require multi-hop to resolve. Here we do not need to define explicitly properties that connect classes' instances, but instead we state direction of traversal. All of `->` will be replaced by `NEAT` by actual property (i.e. `prefix:property_name`).

- **prefix:class_name_1<-...<-prefix:class_name_N**: same like in the previous statement however different direction of traversal, but same results, i.e. 

- **prefix:class_name_1<-...<-prefix:class_name_M->...->prefix:class_name_N**: similar like previous two definitions but with the multi-directional traversal.

- **prefix:class_name_1->...->prefix:class_name_N(prefix:property_name)**: Instead of resolving to object property, we are extracting data or annotation property value of prefix:property_name of class_name_N instances which are connected to class_name_1 instances.

- **prefix:class_name_1<-...<-prefix:class_name_N(prefix:property_name)**: Same as above but with different direction of traversal.

- **prefix:class_name_1<-...<-prefix:class_name_M->...->prefix:class_name_N(prefix:property_name)**: similar like previous two definitions but with the multi-directional traversal.

- **rule_1 OR rule_2 OR ... OR rule_N**: allows to define multiple way to extract desired property values and merge them into  that will be executed in parallel and results will be merged into one graph.

In [None]:
# read Nordic44 and show on the example how each rules results in a specific SPARQL query: