# 1. Intro to RDF

## 1.1. Basic concepts of RDF

### 1.1.1. A graph

RDF is a _fundamental_ data model that associates pieces of information in a mathematical _graph_, that is a node-edge-node data structure.

Associating a node with the name "Nick" and the age `42` and with a father names "George" can look like this:

```mermaid
flowchart LR
    n["*"]
    g["*"]
    
    n --name-->Nick
    n --age-->42
    n --father--> g
    g --name-->George

```

The data for this is conceptually:

```
<n> <name> "Nick"
<n> <age> 42
<n> <father> <g>
<g> <name> "George"
```

### 1.1.2. Subject, Predicate, Object relations

Each relationship in the diagram above relates a _subject_ node to an _object_ node via a _predicate_:

subject | predicate | object
--- | --- | ---
&lt;n> | &lt;name> | "Nick"
&lt;n> | &lt;age> | 42
&lt;n> | &lt;father> | &lt;g>
&lt;g> | &lt;name> | "George"

* _subjects_: are always identified nodes or anynymous nodes  
* _predicates_: are always edges of an identified type  
* _objects_: can be identifies nodes, anonymous nodes or literals - simple values (numbers, text...)

### 1.1.3. Node & Edge types

We don't usually use predicates like `<name>` which is interpretable as the word "name" but is not precicely defined. Usually we use predicates from defined models (ontologies), for example [schema.org](https://schema.org)'s `<https://schema.org/name>` which is "The name of the item.".:

&nbsp; | &nbsp; | &nbsp;
--- | --- | ---
&lt;n> | `<https://schema.org/name>` | "Nick"
identified node | identified type | simple value

In the example above,, we have defined the predicate but identified the node with that name only as `<n>`, which only makes sense in this context - this notebook. If I send someone that data, `<n>` would be ambiguous - is it the same as another `<n>`? So we create universally-unique IDs for most notes. How about `<http://example.com/n>`? Still an example thing but globally unique. Better would be a 'real' IRI-based ID in our control. How about `<https://linked.data.gov.au/dataset/people/nick>`? If the Austrlaian Government Linked Data Working Group wanted, they could issue that URL and make it resolve to data online somewhere.

&nbsp; | &nbsp; | &nbsp;
--- | --- | ---
`<https://linked.data.gov.au/dataset/people/nick>` | `<https://schema.org/name>` | "Nick"
identified node | identified type | simple value

### 1.1.4. Prefixes

It's too long to write `<https://linked.data.gov.au/dataset/people/nick>` all the time and even `<https://schema.org/name>` is annoying, so if we say:

`<https://linked.data.gov.au/dataset/people/>` &rarr; `people:`  
`<https://schema.org/name>` &rarr; `schema:`

then:

&nbsp; | &nbsp; | &nbsp;
--- | --- | ---
`people:nick` | `schema:name` | "Nick"


### 1.1.5. The special `type` predicate

What _class_ of thing is `people:nick` which has the `schema:name` "Nick"? We can use the basic property `type` from the fundamental RDF model to indicate `people:nick` is a `schema:Person`:

&nbsp; | &nbsp; | &nbsp;
--- | --- | ---
`people:nick` | `rdf:type` | `schema:Person`
`people:nick` | `schema:name` | "Nick"

If `people:nick` was also a patient in a hospital we could say:

&nbsp; | &nbsp; | &nbsp;
--- | --- | ---
`people:nick` | `rdf:type` | `schema:Person`
`people:nick` | `rdf:type` | `schema:Patient`
`people:nick` | `schema:name` | "Nick"

schema.org tells us that a `schema:Patient` is a special kind of `schema:Person` - a subset - so if we state in data

&nbsp; | &nbsp; | &nbsp;
--- | --- | ---
`people:nick` | `rdf:type` | `schema:Patient`
`people:nick` | `schema:name` | "Nick"

we can _infer_

&nbsp; | &nbsp; | &nbsp;
--- | --- | ---
`people:nick` | `rdf:type` | `schema:Person`
`people:nick` | `schema:name` | "Nick"

1rdf:type` isn't really special at all, it's just another defined edge.

## 1.2. RDF in the Turtle syntax

The [Turtle](https://www.w3.org/TR/rdf12-turtle/) syntax is similar to tables of node-edge-node data compressed down so this data

&nbsp; | &nbsp; | &nbsp;
--- | --- | ---
`people:nick` | `rdf:type` | `schema:Person`
`people:nick` | `rdf:type` | `schema:Patient`
`people:nick` | `schema:name` | "Nick"
`people:nick` | `schema:name`| 42
`people:nick` | `schema:parent` | `people:george`
`people:george` | `schema:name` | "George"

becomes

```turtle
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX schema: <https://schema.org/>

people:nick
    rdf:type
        schema:Person , 
        schema:Patient ;
    schema:name "Nick" ;
    schema:age 42 ;
    schemaparent people:george ;
.

people:george
    schema:name "George" ;
.
```

You can see the same subject, predicate, object information as the table above but with duplicate subjects & predicates removed and `,`, `;` & `.` speperators added to keep track.

We might also add that George is a Person, but not a Patient, and that he is 70 and we can use `a` for rdf:type`:


```turtle
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX schema: <https://schema.org/>

people:nick
    a
        schema:Person , 
        schema:Patient ;
    schema:name "Nick" ;
    schema:age 42 ;
    schemaparent people:george ;
.

people:george
    a schema:Person ; 
    schema:name "George" ;
    schema:age 70 ;    
.
```

There are many other RDF data formats that all store the same information as Turtle but with different foci, such as speed of data loading. Turtle is the most numan-readable.


SPARQL qureries look a lot like Turtle data with some values replaced by variables...


## Querying RDF with graph pattern matches

In the Turtle data above, there are two people, `people:nick` and `people:george`. To find all the people with age greater than 50 (just George), we can query the data like this:

```sparql
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX schema: <https://schema.org/>

SELECT ?p
WHERE {
    ?p 
        a schema:Person ;
        schema:age ?age ;
    .

    FILTER (?age > 50)
}
```

Let's really run this:

    

> **&#9432;** x

In [4]:
from IPython.display import display, Markdown
from kurra.sparql import query
from kurra.utils import render_sparql_result


def table_print(r):
    display(Markdown(render_sparql_result(r)))

rdf_data = """
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX schema: <https://schema.org/>

people:nick
    a
        schema:Person , 
        schema:Patient ;
    schema:name "Nick" ;
    schema:age 42 ;
    schema:parent people:george ;
.

people:george
    a schema:Person ; 
    schema:name "George" ;
    schema:age 70 ;    
.
"""

q = """
PREFIX people: <https://linked.data.gov.au/dataset/people/>
PREFIX schema: <https://schema.org/>

SELECT ?p ?name
WHERE {
    ?p 
        a schema:Person ;
        schema:name ?name ;
        schema:age ?age ;
    .

    FILTER (?age > 50)
}
"""

In [5]:
r = query(rdf_data, q)
table_print(r)

p | name
--- | ---
[george](https://linked.data.gov.au/dataset/people/george) | George
