In [None]:
%reload_ext jupyter-rdfify

In [None]:
# This is a placeholder cell for test initialization. Ignore it.

The two cells below will define some prefixes for you. You can use them throughout the sheet.

In [None]:
%%rdf shex prefix
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

In [None]:
%%rdf turtle --prefix
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

# Shape Validation

RDF is a very general data model which can be used to express pretty much anything. But this is both its greatest strength and greatest weakness. It does not have a clear structure and thus maps poorly to common programming language data structures. The vocabularies we have introduced so far may seem like a good start to structuring our RDF data. Even though RDFS is called RDF *Schema* it is used to define vocabularies, not for validating data structures as it isn't expressive enough. OWL is a lot more expressive than RDFS but is targeted at logic modelling, not validity constraining. So these vocabularies do not help us in this matter. Another possible way would be to use SPARQL ASK queries. While ASK queries are ideal to validate constraints as it is very expressive and has efficient implementations, reading, writing and understanding said ASK queries is difficult even for simple constraints.

This is why we need a way to describe and validate our RDF data. **ShEx** (Shape Expression Language) and SHACL (Shape Constraint Language) are both high level, concise languages to define schemata. A schema specifies shape constraints that a node must fulfill to pass the validation. In this lecture we will focus on ShEx.

# ShEx (ShExC)

ShEx**C** is a compact syntax to define ShEx Schemata. It is inspired by Turtle and thus reuses many of its definitions like prefix declaration (using the PREFIX keyword as seen below), the **a** keyword for rdf:type and that keywords are case insensitive (and = AND = AnD). A Schema is a set of shape expressions and a shape expression is a labeled pattern. Shape expression patterns specify constraints for a focus node (the node tested against the pattern) and its neighborhood (the set of incoming and outgoing triples).  The following is a schema with a very basic shape expression:

**PREFIX ex: &lt;http://example.org/&gt;  
PREFIX xsd: &lt;http://www.w3.org/2001/XMLSchema#&gt;**

**&lt;StudentShape&gt; {  
&nbsp;&nbsp;&nbsp;&nbsp;ex:name xsd:string  
}**

This shape expression has the label StudentShape (labels can be URIs or blank nodes) and contains a pattern which constrains a student to only have one outgoing triple with predicate ex:name which must have the type xsd:string. The following is a graph which passes the validation against the above schema:

**ex:Max ex:name "Maximiliam Mustermann" .**

Here are some graphs which would fail the validation:

**ex:Max ex:name "Maximiliam Mustermann"@de .**  
Because the datatype is rdf:langString not xsd:string

**ex:Max ex:name "Maximiliam Mustermann" , "Max Mustermann" .**  
Because **two** ex:name triples are found.

**ex:Max foaf:name "Maximiliam Mustermann" .**  
Because no ex:name triple is found (foaf:name was used).

## Grouping

To group multiple component in a pattern, one uses semicolons:

**&lt;StudentShape&gt; {  
&nbsp;&nbsp;&nbsp;&nbsp;ex:name xsd:string ;  
&nbsp;&nbsp;&nbsp;&nbsp;ex:age xsd:integer ;  
&nbsp;&nbsp;&nbsp;&nbsp;ex:email xsd:string  
}**

The following graph passes:

**ex:Max ex:name "Max" ;  
&nbsp;&nbsp;&nbsp;&nbsp;ex:age "5"^^xsd:integer ;  
&nbsp;&nbsp;&nbsp;&nbsp;ex:email "max.mustermann@rwth-aachen.de" .**

## Cardinalities

One can specify a cardinality for each pattern component. The cardinality notation on ShEx is inspired by regular expressions and the default cardinality is {1,1}. The following table explains all possible notations:

| Notation | Meaning |
| --- | --- |
| * | 0 or more |
| + | 1 or more |
| ? | 0 or 1 |
| {m} | Exactly m occurences |
| {m,n} | Between m and n occurences |
| {m,} | m or more occurences |

To specify that the email of a student is optional, that a student has to have at least one but can have multiple names and that he can have up to two mentors, one would use the following shape expression:

**&lt;StudentShape&gt; {  
&nbsp;&nbsp;&nbsp;&nbsp;ex:name xsd:string + ;  
&nbsp;&nbsp;&nbsp;&nbsp;ex:mentor IRI {0,2} ;  
&nbsp;&nbsp;&nbsp;&nbsp;ex:email xsd:string ?  
}**

## Alternatives

The pipe operator ( | ) is used to declare alternatives. You can use parantheses to group alternatives. For example, to specify that a student can have either a full name or a first and last name but has to have an email address:

**&lt;StudentShape&gt; {  
&nbsp;&nbsp;&nbsp;&nbsp;(  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ex:fullName xsd:string |  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ex:firstName xsd:string ;  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ex:lastName xsd:string  
&nbsp;&nbsp;&nbsp;&nbsp;) ;  
&nbsp;&nbsp;&nbsp;&nbsp;ex:email xsd:string  
}**

## Value Expressions

We have already used specific datatypes and have seen the keyword IRI being used in the cardinality example but there are many more value expressions which are listed in the following table:

| Type | Example | Description |
|:--- |:--- |:--- |
| Anything | **.** | The object can be anything |
| Datatype | xsd:string | Matches a value of type xsd:string |
| Kind | IRI BNode Literal NonLiteral | The object must have that kind |
| Value set | \[ex:Male ex:Female ex:Other\] | The value must be an element of that set |
| Reference | @&lt;StudentShape&gt; | The object must have the shape &lt;StudentShape&gt; |
| Composed | xsd:string OR IRI | The composition of value expressions using OR, AND, NOT |
| IRI Range | \[foaf:~\] | Start with the IRI associated with foaf. See Value set. |
| Exclusion | \[ex:~ -ex:excluded\] | Any value except ex:excluded. See Value set. |

An example which uses many of the above expressions:

**&lt;StudentShape&gt; {  
&nbsp;&nbsp;&nbsp;&nbsp;ex:name xsd:string ;  
&nbsp;&nbsp;&nbsp;&nbsp;ex:mentor @&lt;StudentShape&gt; {0,2} ;  
&nbsp;&nbsp;&nbsp;&nbsp;ex:gender [ex:Male ex:Female ex:Other] ;  
&nbsp;&nbsp;&nbsp;&nbsp;ex:university IRI OR BNode ;  
&nbsp;&nbsp;&nbsp;&nbsp;ex:status [status:~ -status:Exmatriculated] ;  
&nbsp;&nbsp;&nbsp;&nbsp;rdfs:seeAlso . \*  
}**

## Facets on Datatypes

It is possible to qualify the datatype using XML Schema facets. Some possible facets are the following:

| Facet | Description |
| :--- | :--- |
| MinInclusive, MaxInclusive, MinExclusive, MaxInclusive | Constraints on numeric values which declare the min/max value allowed (either included or excluded)|
| TotalDigits, FractionDigits | Constraints on numeric values which declare the total digits  and fraction digits allowed |
| Length, MinLength, MaxLength | Constraints on string values which declare the length allowed, or the min/max length allowed |

An example for facets:

**&lt;StudentShape&gt; {  
&nbsp;&nbsp;&nbsp;&nbsp;ex:name xsd:string MaxLength 10 ;  
&nbsp;&nbsp;&nbsp;&nbsp;ex:matNr xsd:integer MinInclusive 1 MaxInclusive 999999  
}**

## Task 1: Simple Schema [3 points]

Use ShEx to specify the following constraints for a car:  
- A car has a manufacturer which is expressed as an IRI.  
- A car has a vehicle identification number (VIN) which is expressed as a string with 17 characters.  
- A car can have arbitrarily many previous owners which are expressed either as a IRI, a string or a blank node.  

The shape expression should be labelled **http://example.org/shapes#CarShape**

Use the following URIs:  
manufacturer: **http://example.org/predicates#manufacturer**  
vehicle identification number: **http://example.org/predicates#vin**  
previous owner: **http://example.org/predicates#previousOwner**  
string: **http://www.w3.org/2001/XMLSchema#string**

In [None]:
%%rdf shex parse -l 1
### YOUR SOLUTION HERE

The two cells below can be used to check if your solution is correct. Run both of them and check the output of the second cell for errors. The graph below should pass the validation in the second cell. But note that while this is neccessary for your solution to be correct, it is not sufficient!

In [None]:
%%rdf turtle -l 1check -d none
<http://example.org/#CoolCar> <http://example.org/predicates#manufacturer> <http://example.org/#BMW> ;
    <http://example.org/predicates#vin> "WBA12345600000001" ;
    <http://example.org/predicates#previousOwner> "Max Mustermann", _:1, <http://example.org/#MariaMusterfrau> .

In [None]:
%rdf shex validate -l 1 -g 1check -s http://example.org/shapes#CarShape

In [None]:
# This test will check whether the manufacturer predicate is constrained correctly. [1 point]

In [None]:
# This test will check whether the vehicle identification number predicate is constrained correctly. [1 point]

In [None]:
# This test will check whether the previous owner predicate is constrained correctly. [1 point]

## Task 2: Referencing [4 points]

Use ShEx to model the following schema:  
- A shop has a name which is expressed as a string.  
- A shop must belong to the class commerce. (Note: Value sets with a single element can be used for 'fixed' triples)  
- A product has a name which is expressed as a string.  
- A product either has an ID which is expressed as an integer (use http://www.w3.org/2001/XMLSchema#integer) from 0 inclusive to 99999999 inclusive or alternatively a product code which is expressed as a string with a length between 11 and 13 characters.  
- A shop can sell arbitrarily many products but has to sell at least one.  

The shape expressions should be labelled **http://example.org/shapes#ShopShape** and **http://example.org/shapes#ProductShape**

Use the following URIs:  
has name: **http://example.org/predicates#name**  
has id: **http://example.org/predicates#id**  
has product code: **http://example.org/predicates#code**  
sells product: **http://example.org/predicates#sells**  
Commerce (class): **http://example.org/classes#Commerce**  

In [None]:
%%rdf shex parse -l 2
### YOUR SOLUTION HERE

You can again check your answer using the two cells below.

In [None]:
%%rdf turtle -l 2check -d none
<http://example.org/shops#ebuy> <http://example.org/predicates#name> "eBuy" ;
    a <http://example.org/classes#Commerce> ;
    <http://example.org/predicates#sells> <http://example.org/products#GoodProduct> , <http://example.org/products#BetterProduct> .
<http://example.org/products#GoodProduct> <http://example.org/predicates#name> "Good Product" ;
    <http://example.org/predicates#id> "99999999"^^xsd:integer .
<http://example.org/products#BetterProduct> <http://example.org/predicates#name> "Better Product" ;
    <http://example.org/predicates#code> "ABCDEF123456" .

In [None]:
%rdf shex validate -l 2 -g 2check -s http://example.org/shapes#ShopShape -f http://example.org/shops#ebuy

In [None]:
# This test will check if all basic constraints are correct. [1 point]

In [None]:
# This test will check whether the shop class is constrained correctly. [1 point]

In [None]:
# This test will check whether you used references correctly. [1 point]

In [None]:
# This test will check whether the product identification predicates are constrained correctly. [1 point]

## Inverse Triple constraints

Up to now we have only constrained outgoing triples, in which the focus node is the subject of the triple. But as we will see, it can also make sense to constrain the opposite, incoming triples, in which the focus node is the object. For this one uses **inverse triple constraints** which are prefixed by a Circumflex (^). For example if we want to declare that a lecture can only be attended by students:

**&lt;LectureShape&gt; {  
&nbsp;&nbsp;&nbsp;&nbsp;ex:name xsd:string ;  
&nbsp;&nbsp;&nbsp;&nbsp;^ex:attends @&lt;StudentShape&gt;  
}**

**&lt;StudentShape&gt; {  
&nbsp;&nbsp;&nbsp;&nbsp;ex:matNr xsd:integer MinInclusive 1 MaxInclusive 999999;  
}**

When checking LectureShape against ex:SemWeb, the following graph will pass:

**ex:SemWeb ex:name "Semantic Web WS20" .  
ex:Leia ex:attends ex:SemWeb ;  
&nbsp;&nbsp;&nbsp;&nbsp;ex:matNr 123456 .**

but this one will fail because ex:Luke doesn't match StudentShape:

**ex:SemWeb ex:name "Semantic Web WS20" .  
ex:Luke ex:attends ex:SemWeb .**  

## Repeated Properties

Sometimes when using generic properties like ex:parent (instead of ex:father and ex:mother) they are used multiple times in one shape. We can then give them different constraints. For example, to constrain that a person has to have two parents, a male and a female one:

**&lt;PersonShape&gt; {  
&nbsp;&nbsp;&nbsp;&nbsp;ex:parent @&lt;MaleShape&gt; ;  
&nbsp;&nbsp;&nbsp;&nbsp;ex:parent @&lt;FemaleShape&gt;  
}**

## Closed/Open Properties

Once we constrain a property in a shape, it becomes **closed**. All triples of a closed property have to match one of the constraints. Sometimes we want to constrain a property but still allow other occurrences. To do this, we have to open the property with the **EXTRA** keyword. For example, the following shape uses rdf:type as a closed property:

**&lt;InstitutionShape&gt; {  
&nbsp;&nbsp;&nbsp;&nbsp;a \[ ex:Institution \]   
}**

Then the following graph will fail the test against InstitutionShape because it has a second rdf:type property:

**ex:RWTH a ex:Institution, ex:University .**

But if we open the property rdf:type using **EXTRA \<List of properties\>**, the above graph will pass instead:

**&lt;InstitutionShape&gt; EXTRA a {  
&nbsp;&nbsp;&nbsp;&nbsp;a \[ ex:Institution \]  
}**

To open multiple properties, one does is like this:

**&lt;InstitutionShape&gt; EXTRA ex:chair a {  
&nbsp;&nbsp;&nbsp;&nbsp;a \[ ex:Institution \]  
}**

## Closed/Open Shapes

Similar to properties, shapes can also be open/closed. By default, shapes are open, which means that additional properties not constrained in the shape are ignored by the validation. For example, the following shape is open:

**&lt;InstitutionShape&gt; {  
&nbsp;&nbsp;&nbsp;&nbsp;a \[ ex:Institution \]  
}**

Which means that the following graph will pass the test even though it provides additional properties for ex:RWTH:

**ex:RWTH a ex:Institution ;  
&nbsp;&nbsp;&nbsp;&nbsp;ex:name "Rheinisch-Westfälische Technische Hochschule Aachen"**

But if we close the shape using the **CLOSED** keyword, the above graph will instead fail the test:

**&lt;InstitutionShape&gt; CLOSED {  
&nbsp;&nbsp;&nbsp;&nbsp;a \[ ex:Institution \]  
}**

## Task 3: Complex Schemata [4 points]

Use ShEx to model the following schema:  
- A team must have at least one player as a member.
- A team must have exactly one coach as a member.
- A team must belong to the class of groups but can additionally belong to other classes.
- A team can have no other properties.

The shape expression should be labelled **http://example.org/shapes#TeamShape**  
The shape expressions **http://example.org/shapes#PlayerShape** and **http://example.org/shapes#CoachShape** are already given. You can use them but **do not change them!**

Use the following URIs:  
member of: **http://example.org/predicates#memberOf**  
Group: **http://example.org/classes#Group**

In [None]:
%%rdf shex parse -l 3
<http://example.org/shapes#PlayerShape> {
    a [ <http://example.org/classes#Player> ] ;
    <http://example.org/predicates#memberOf> @<http://example.org/shapes#TeamShape> ?
}

<http://example.org/shapes#CoachShape> {
    a [ <http://example.org/classes#Coach> ] ;
    <http://example.org/predicates#memberOf> @<http://example.org/shapes#TeamShape> ?
}

### YOUR SOLUTION HERE

You can again check your answer using the two cells below.

In [None]:
%%rdf turtle -l 3check
<http://example.org/#Bob> a <http://example.org/classes#Player> ;
    <http://example.org/predicates#memberOf> <http://example.org/#Team> .
<http://example.org/#Alice> a <http://example.org/classes#Player> ;
    <http://example.org/predicates#memberOf> <http://example.org/#Team> .
<http://example.org/#Laura> a <http://example.org/classes#Coach> ;
    <http://example.org/predicates#memberOf> <http://example.org/#Team> .
<http://example.org/#Team> a <http://example.org/classes#Group>, <http://example.org/classes#Trio> .

In [None]:
%rdf shex validate -g 3check -l 3 -s http://example.org/shapes#TeamShape -f http://example.org/#Team

In [None]:
# This test will check if modelled the first constraint correctly. [1 point]

In [None]:
# This test will check if modelled the second constraint correctly. [1 point]

In [None]:
# This test will check if modelled the third constraint correctly. [1 point]

In [None]:
# This test will check if modelled the fourth constraint correctly. [1 point]

## Task 4: Understanding Schemata [4 points]

Consider the following ShEx schema:

In [None]:
%%rdf shex parse -l 4
PREFIX p: <http://example.org/predicates#>
PREFIX c: <http://example.org/classes#>
PREFIX title: <http://example.org/titles#>
PREFIX shape: <http://example.org/shapes#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

shape:ExamShape CLOSED {
    p:id xsd:string Length 10 AND Literal ;
    p:lecture @shape:LectureShape ;
    p:credit xsd:integer MinInclusive 0 MaxInclusive 12 ;
    ^p:examines @shape:AssistentShape {2,} ;
    ^p:examines @shape:ProfessorShape ;
    ^p:attends @shape:StudentShape +
}

shape:LectureShape CLOSED {
    p:name xsd:string ;
    ^p:attends @shape:StudentShape + ;
}

shape:ProfessorShape EXTRA p:title {
    p:title [ title:Professor ]
}

shape:AssistentShape {
    p:assistentOf @shape:ProfessorShape +
}

shape:StudentShape {
    p:matNr xsd:integer MinInclusive 1 MaxInclusive 999999 ;
}

Model a graph using turtle which passes when tested against the above schema.

The validation will start at **http://example.org/shapes#ExamShape** and will focus on the node **http://example.org/#SemWebExam**

In [None]:
%%rdf turtle -l 4 -d graph
### YOUR SOLUTION HERE

You can check whether your graph passes by running the cell below. If it doesn't pass, the validator will give you a hint and if it passes you will get all points!

In [None]:
%rdf shex validate -l 4 -g 4 -s http://example.org/shapes#ExamShape -f http://example.org/#SemWebExam

In [None]:
# This test will check whether your graph passes the validation. [4 points]