# What is the Semantic Web and RDF?

**RDF (Resource Description Framework)** is one of the three foundational [Semantic Web](https://en.wikipedia.org/wiki/Semantic_Web) technologies, the other two being SPARQL and OWL.

In particular, RDF is the data model of the Semantic Web. That means that all data in Semantic Web technologies are represented as RDF. If you store Semantic Web data, it's in RDF. If you query Semantic Web data (typically using the SPARQL query language), it's RDF data. If you send Semantic Web data to your friend, it's RDF.

RDF data model is based upon the idea of making statements about resources (in particular web resources) in the form of *subject–predicate–object* expressions, known as [*triples*](https://en.wikipedia.org/wiki/Semantic_triple). The *subject* denotes the resource, and the *predicate* denotes traits or aspects of the resource, and expresses a relationship between the *subject* and the *object*.

For example, one way to represent the notion "The sky has the color blue" in RDF is as the triple: a **subject** denoting *"the sky"*, a **predicate** denoting *"has the color"*, and an **object** denoting *"blue"*. Therefore, RDF uses subject instead of object(or entity) in contrast to the typical approach of an entity–attribute–value model in object-oriented design: entity (sky), attribute (color), and value (blue).<br>
(Resource Description Framework, Wikipedia, 2017)

![RDF_example_graph.png](RDF_example_graph.png)

Find out more:
- https://en.wikipedia.org/wiki/Resource_Description_Framework
- https://www.cambridgesemantics.com/blog/semantic-university/learn-rdf/


# odML to RDF converter

Here we will explore odML to RDF conversion using the `odml/tools/rdf_converter.py` module.

If you are new python odML please read the [tutorial](https://python-odml.readthedocs.io/en/latest/tutorial.html) first to familiarize yourself with odML.

Let's create the example odML document.

In [18]:
import datetime

import odml

doc = odml.Document(author="D. N. Adams", date=datetime.date(1979, 10, 12))

# CREATE AND APPEND THE MAIN SECTIONs
doc.append(odml.Section(name="Arthur Philip Dent",
                        type="crew/person",
                        definition="Information on Arthur Dent"))

# SET NEW PARENT NODE
parent = doc['Arthur Philip Dent']


# APPEND PROPERTIES WITH VALUES
parent.append(odml.Property(name="Species",
                            value="Human",
                            dtype=odml.DType.string,
                            definition="Species to which subject belongs to"))


## The RDFWriter class

The RDFWriter class is used to convert odML documents to one of the supported RDF formats:<br><br>
'xml', 'pretty-xml', 'trix', 'n3', 'turtle', 'ttl', 'ntriples', 'nt', 'nt11', 'trig'.<br>

'turtle' is the format that is best suited for storage and human readability which is why we will use it in our tutorial. For cross-tool usage, saving RDF in its 'XML' variant is probably the safest choice.

The output can be returned as a string.

In [5]:
from odml.tools.rdf_converter import RDFWriter

print(RDFWriter(doc).get_rdf_str('turtle'))


@prefix odml: <https://g-node.org/odml-rdf#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

odml:Hub odml:hasDocument odml:40797785-2e1a-435e-b905-aeeac2ba2b3e .

odml:220489b8-2043-452b-863b-8ba6a4b5e536 a odml:Section ;
    odml:hasDefinition "Information on Arthur Dent" ;
    odml:hasName "Arthur Philip Dent" ;
    odml:hasProperty odml:40ede84a-650b-4aab-af81-b4136c833e58 ;
    odml:hasType "crew/person" .

odml:40797785-2e1a-435e-b905-aeeac2ba2b3e a odml:Document ;
    odml:hasAuthor "D. N. Adams" ;
    odml:hasDate "1979-10-12"^^xsd:date ;
    odml:hasFileName "None" ;
    odml:hasSection odml:220489b8-2043-452b-863b-8ba6a4b5e536 .

odml:40ede84a-650b-4aab-af81-b4136c833e58 a odml:Property ;
    odml:hasDefinition "Species to which subject belongs to" ;
    odml:hasDtype "string" ;
    odml:hasName "Species" ;
    odml:hasValue odml:4425ade2-5d03-4484-a272-764c1e933933 .

odml:4425ade2-5d03-4484-a272-764c1e933933

Or the output can be written to a specified file.

In [3]:
import tempfile

# Create temporary file
f = tempfile.NamedTemporaryFile(mode='w', suffix=".ttl")
path = f.name

RDFWriter(doc).write_file(path, "turtle")

with open(path) as ff:
    data = ff.read()
    print(data)


@prefix odml: <https://g-node.org/odml-rdf#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

odml:Hub odml:hasDocument odml:08f8c7fa-4ea0-4512-8927-ff73c117644d .

odml:08f8c7fa-4ea0-4512-8927-ff73c117644d a odml:Document ;
    odml:hasAuthor "D. N. Adams" ;
    odml:hasDate "1979-10-12"^^xsd:date ;
    odml:hasFileName "None" ;
    odml:hasSection odml:3c86174b-b183-47aa-9e0b-58dfc066a76d .

odml:15eb4c32-73fe-4da1-8cba-3fac965d4d17 a odml:Property ;
    odml:hasDefinition "Species to which subject belongs to" ;
    odml:hasDtype "string" ;
    odml:hasName "Species" ;
    odml:hasValue odml:1ad9c2d6-6055-465b-b281-51943569338b .

odml:1ad9c2d6-6055-465b-b281-51943569338b a rdf:Seq ;
    rdf:_1 "Human" .

odml:3c86174b-b183-47aa-9e0b-58dfc066a76d a odml:Section ;
    odml:hasDefinition "Information on Arthur Dent" ;
    odml:hasName "Arthur Philip Dent" ;
    odml:hasProperty odml:15eb4c32-73fe-4da1-8cba-3fac965d4d17 ;

Please note at this point, that RDF does not respect order. Everytime an unchanged file is written, the content will be identical, but the order of the statements will differ.

## Quering the data with rdflib and SPARQL

The following example depends on specific example files. If you do not already have these files\ you can find and download them from https://github.com/G-Node/python-odml/tree/master/doc/example_rdfs/example_data.

The example will load RDF triples from multiple files and load them into a single, connected graph.

In [10]:
from glob import glob

from rdflib import Graph

graph = Graph()
for file_name in glob("odml_RDF_example_*.ttl"):
    graph.parse(file_name, format="turtle")

print('Total number of triples: ', len(graph))


Total number of triples:  3041


The example query uses an rdflib tool to find each Section with type `Recording` also featuring a Property with the name `Recording duration`. The result prints the Values of the returned Properties.

In [19]:
from rdflib import Graph, Namespace, RDF
from rdflib.plugins.sparql import prepareQuery

from odml.tools.rdf_converter import ODML_NS

rdf_namespace = {"odml": ODML_NS, "rdf": RDF}

q = prepareQuery("""SELECT ?d ?s ?p ?value WHERE {
    ?d odml:hasSection ?s .
    ?s rdf:type odml:Section .
    ?s odml:hasType "Recording" .
    ?s odml:hasProperty ?p .
    ?p rdf:type odml:Property .
    ?p odml:hasName "Recording duration" .
    ?p odml:hasValue ?v .
    ?v rdf:type rdf:Bag .
    ?v rdf:li ?value .}""", initNs=rdf_namespace)

for row in graph.query(q):
    print("Doc: {0}, Sec: {1}, \n"
          "Prop: {2}, Val:{3}".format(row.d, row.s, row.p, row.value))


Doc: https://g-node.org/odml-rdf#cc66e78a-3742-490a-9fdb-1c66761d7652, Sec: https://g-node.org/odml-rdf#5365f7e5-603c-4154-a5ea-33bb1a07a956, 
Prop: https://g-node.org/odml-rdf#41316903-80f1-45a3-9b06-400a02903531, Val:11.25
Doc: https://g-node.org/odml-rdf#cd24b60f-1d5e-4040-9881-5e5a597baef7, Sec: https://g-node.org/odml-rdf#782bd29d-e4b0-4c14-a417-1772a4851ffd, 
Prop: https://g-node.org/odml-rdf#9aeede78-678c-4db8-acb5-fbd6d408b762, Val:13.9
Doc: https://g-node.org/odml-rdf#537c6cc8-7dfe-4d53-a111-24b3ce0f3c1a, Sec: https://g-node.org/odml-rdf#346773f2-abee-4892-b052-840ddcff35ee, 
Prop: https://g-node.org/odml-rdf#1636af03-8e97-4ef2-9d7d-6c7db23dcd02, Val:11.88
Doc: https://g-node.org/odml-rdf#24066355-1ee8-4eb5-a715-96bbb6231cd5, Sec: https://g-node.org/odml-rdf#bbd44815-5016-49e0-9f4b-5b83778d00de, 
Prop: https://g-node.org/odml-rdf#0ed215a2-5d20-48eb-b744-bf3b731459fc, Val:0.33


## FuzzyFinder class

**FuzzyFinder** is a tool for querying an RDF graph through so called *fuzzy* queries. The finder executes multiple queries to better match input parameters. It returns sets of triples and prioritized from more to fewer matched parameters.

The function `find()` accepts several oprtional parameters.
- `graph`: rdflib graph object
- `q_str`: fuzzy query string, we explore it later
- `q_params`: dict object with parameters of a query
- `mode`: default 'fuzzy' and 'match'

Each mode works with specific a type of fuzzy query (`q_str`).

Let's check the `match` mode in an example.

In [13]:
from odml.rdf.fuzzy_finder import FuzzyFinder

query_string = 'prop(name:Date) section(name:Recording-2013-02-08-ak, type:Recording)'

f = FuzzyFinder(graph)
print(f.find(mode='match', q_str=query_string))

SELECT * WHERE {
?d odml:hasSection ?s .
?s rdf:type odml:Section .
?s odml:hasType "Recording" .
?s odml:hasProperty ?p .
?p rdf:type odml:Property .
?p odml:hasName "Date" .
}
Document: https://g-node.org/odml-rdf#cc66e78a-3742-490a-9fdb-1c66761d7652
Section: https://g-node.org/odml-rdf#5365f7e5-603c-4154-a5ea-33bb1a07a956
Property: https://g-node.org/odml-rdf#f1699eb6-4cab-4dd0-9327-120eab2089ae
Document: https://g-node.org/odml-rdf#24066355-1ee8-4eb5-a715-96bbb6231cd5
Section: https://g-node.org/odml-rdf#bbd44815-5016-49e0-9f4b-5b83778d00de
Property: https://g-node.org/odml-rdf#fadffec7-6b23-454e-bfd1-9d5884802abb
Document: https://g-node.org/odml-rdf#537c6cc8-7dfe-4d53-a111-24b3ce0f3c1a
Section: https://g-node.org/odml-rdf#346773f2-abee-4892-b052-840ddcff35ee
Property: https://g-node.org/odml-rdf#138f08f7-23c7-4722-8577-85a6fa633ae1
Document: https://g-node.org/odml-rdf#cd24b60f-1d5e-4040-9881-5e5a597baef7
Section: https://g-node.org/odml-rdf#782bd29d-e4b0-4c14-a417-1772a4851ffd
P

As you can see from the output, the finder builds multiple SPARQL queries from `match` queries, executes them and returns some matched results. The first result always represents the most specific query (the biggest combination of input parameters that returned at least one triple).

The query syntax is pretty straightforward. Just write the name of the entity `property`, `section` or `document` (also possible to use shortened names `prop`, `sec` and `doc`) and add attributes with their values inside the parentheses separated by a colon.

As a code example: `prop(name:Date) section(name:Recording-2013-02-08-ak, type:Recording)`.
Here we search for Sections and Properties where `property` has attribute the `name` and its Value is `Date`.

For building `match` queries you should know exactly to which odML attribute the value(subject) is related. If you write `prop(name:Date) section(name:Recording, type:Recording-2013-02-08-ak)` the `find()` method would not return any triples with Section parameters, because it is unlikely that there is a Section with type `Recording-2013-02-08-ak`.

Non-odML entity attributes will also be ignored (e.g. only `id, author, date, version, repository, sections` can exist in the `Document` object).
In the example `section(not-odml-name:Recording-2013-02-08-ak, record:Recording)` the `find` method returns nothing.

In [14]:
from odml.rdf.fuzzy_finder import FuzzyFinder

query_string = 'section(not-odml-name:Recording-2013-02-08-ak, record:Recording)'

f = FuzzyFinder(graph)
print(f.find(mode='match', q_str=query_string))




This is often inconvenient if you do not know exactly how the diverse data in the graph is related. For situations like this *'fuzzy'* mode comes into play. It is also set by default.

The output logic is similar to the previous mode, but there you can provide more broad information, the finder will match the parameters and create meaningful queries based on the input.

The query string consists of two parts: *FIND* and *HAVING*.

In the *FIND* part a user specifies the set of odML objects and its attributes. 
e.g. `FIND prop(name) section(name, type)`

In the *HAVING* part a user specifies a set of search values which could relate to the attributes in the *FIND* part.
e.g `HAVING Recording, Recording-2012-04-04-ab, Date`

Finally, the complete query will look like this:
`FIND sec(name, type) prop(name) HAVING Recording, Recording-2012-04-04-ab, Date`

As you can see in the example you do not need to know to which attribute search values in the *HAVING* part relate to, the finder can do it for you.

In [17]:
from odml.rdf.fuzzy_finder import FuzzyFinder

query_string = 'FIND sec(name, type) prop(name) HAVING Recording, Recording-2012-04-04-ab, Date, Some_value'

f = FuzzyFinder(graph)
print(f.find(mode='fuzzy', q_str=query_string))


SELECT * WHERE {
?d odml:hasSection ?s .
?s rdf:type odml:Section .
?s odml:hasType "Recording" .
?s odml:hasProperty ?p .
?p rdf:type odml:Property .
?p odml:hasName "Date" .
}
Document: https://g-node.org/odml-rdf#cc66e78a-3742-490a-9fdb-1c66761d7652
Section: https://g-node.org/odml-rdf#5365f7e5-603c-4154-a5ea-33bb1a07a956
Property: https://g-node.org/odml-rdf#f1699eb6-4cab-4dd0-9327-120eab2089ae
Document: https://g-node.org/odml-rdf#24066355-1ee8-4eb5-a715-96bbb6231cd5
Section: https://g-node.org/odml-rdf#bbd44815-5016-49e0-9f4b-5b83778d00de
Property: https://g-node.org/odml-rdf#fadffec7-6b23-454e-bfd1-9d5884802abb
Document: https://g-node.org/odml-rdf#537c6cc8-7dfe-4d53-a111-24b3ce0f3c1a
Section: https://g-node.org/odml-rdf#346773f2-abee-4892-b052-840ddcff35ee
Property: https://g-node.org/odml-rdf#138f08f7-23c7-4722-8577-85a6fa633ae1
Document: https://g-node.org/odml-rdf#cd24b60f-1d5e-4040-9881-5e5a597baef7
Section: https://g-node.org/odml-rdf#782bd29d-e4b0-4c14-a417-1772a4851ffd
P