# What is Semantic Web and RDF?

**RDF (Resource Description Framework)** is one of the three foundational [Semantic Web](https://en.wikipedia.org/wiki/Semantic_Web) technologies, the other two being SPARQL and OWL.

In particular, RDF is the data model of the Semantic Web. That means that all data in Semantic Web technologies is represented as RDF. If you store Semantic Web data, it's in RDF. If you query Semantic Web data (typically using SPARQL), it's RDF data. If you send Semantic Web data to your friend, it's RDF.

RDF data model is based upon the idea of making statements about resources (in particular web resources) in the form of *subject–predicate–object* expressions, known as [*triples*](https://en.wikipedia.org/wiki/Semantic_triple). The *subject* denotes the resource, and the *predicate* denotes traits or aspects of the resource, and expresses a relationship between the *subject* and the *object*.

For example, one way to represent the notion "The sky has the color blue" in RDF is as the triple: a **subject** denoting *"the sky"*, a **predicate** denoting *"has the color"*, and an **object** denoting *"blue"*. Therefore, RDF uses subject instead of object(or entity) in contrast to the typical approach of an entity–attribute–value model in object-oriented design: entity (sky), attribute (color), and value (blue).

![Image](http://dublincore.org/documents/2008/01/14/dc-rdf/rdfexamplefig.png)

Find out more: <br>
- http://fast.wistia.net/embed/iframe/8nm9xf4jip?popover=true <br>
- https://en.wikipedia.org/wiki/Resource_Description_Framework <br>
- https://www.cambridgesemantics.com/semantic-university/rdf-101 <br>
- http://www.cambridgesemantics.com/semantic-university/introduction-semantic-web-0

# RDF<->odML converter

Here we will explore RDF-odML and odML-RDF conversion in `odml/tools/rdf_converter.py` module.

Let's create the example odML document.

In [1]:
import os
os.chdir('..')

import odml
import datetime

doc = odml.Document(author="D. N. Adams",
                    date=datetime.date(1979, 10, 12))

# CREATE AND APPEND THE MAIN SECTIONs
doc.append(odml.Section(name="Arthur Philip Dent",
                           type="crew/person",
                           definition="Information on Arthur Dent"))

# SET NEW PARENT NODE
parent = doc['Arthur Philip Dent']


# APPEND PROPERTIES WITH VALUES
parent.append(odml.Property(name="Species",
                            value="Human",
                            dtype=odml.DType.string,
                            definition="Species to which subject belongs to"))

##RDFWriter class

RDFWriter class is used for conversion documents from odML to one of the supported RDF formats:<br>
'xml', 'pretty-xml', 'trix', 'n3', 'turtle', 'ttl', 'ntriples', 'nt', 'nt11', 'trig', 'json-ld'.<br>
Both one document or list of multiple documents can be passed to `RDFWriter()` constructor.

It's possible to get the output as a string.

In [2]:
from odml.tools.rdf_converter import RDFWriter

print(RDFWriter(doc).get_rdf_str('turtle'))

@prefix odml: <https://g-node.org/projects/odml-rdf#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

odml:Hub odml:hasDocument odml:be49c0c4-58ff-42da-880c-6357abfef048 .

<https://g-node.org/projects/odml-rdf#14f1f5cf-e7c6-4038-ae68-6e7be2a88174> a rdf:Bag ;
    rdf:li "Human" .

<https://g-node.org/projects/odml-rdf#2073adf0-fa85-4a8c-be8a-976f79e2106f> a odml:Property ;
    odml:hasDefinition "Species to which subject belongs to" ;
    odml:hasDtype "string" ;
    odml:hasName "Species" ;
    odml:hasValue <https://g-node.org/projects/odml-rdf#14f1f5cf-e7c6-4038-ae68-6e7be2a88174> .

odml:be49c0c4-58ff-42da-880c-6357abfef048 a odml:Document ;
    odml:hasAuthor "D. N. Adams" ;
    odml:hasDate "1979-10-12"^^xsd:date ;
    odml:hasSection odml:ca9c8204-2619-4f71-848e-7bf074948661 .

odml:ca9c8204-2619-4f71-84

Or write the output to the specified file.

In [3]:
import tempfile
import os

# Create temporary file
f = tempfile.NamedTemporaryFile(mode='w', suffix=".ttl")
path = f.name

# possible to use 'ttl' instead of 'turtle'
RDFWriter(doc).write_file(path, "ttl")

with open(path) as ff:
    data = ff.read()
    print(data)

f.close()

@prefix odml: <https://g-node.org/projects/odml-rdf#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

odml:Hub odml:hasDocument odml:be49c0c4-58ff-42da-880c-6357abfef048 .

<https://g-node.org/projects/odml-rdf#2073adf0-fa85-4a8c-be8a-976f79e2106f> a odml:Property ;
    odml:hasDefinition "Species to which subject belongs to" ;
    odml:hasDtype "string" ;
    odml:hasName "Species" ;
    odml:hasValue odml:f1677a95-e8a1-40bc-b4f4-8a38d235147f .

odml:be49c0c4-58ff-42da-880c-6357abfef048 a odml:Document ;
    odml:hasAuthor "D. N. Adams" ;
    odml:hasDate "1979-10-12"^^xsd:date ;
    odml:hasSection odml:ca9c8204-2619-4f71-848e-7bf074948661 .

odml:ca9c8204-2619-4f71-848e-7bf074948661 a odml:Section ;
    odml:hasDefinition "Information on Arthur Dent" ;
    odml:hasName "Arthur Philip Dent" ;
    odml:hasPrope

##RDFReader class

RDFReader class enables RDF to odML conversion.

There are 2 ways to obtain objects with converted odML documents:
- from **RDF file**  ( `RDFReader().from_file("/path_to_input_rdf", "rdf_format")` )
- from **RDF string**  ( `RDFReader().from_string("rdf file as a string", "rdf_format")` )

In [4]:
from odml.tools.rdf_converter import RDFReader

rdf_file = RDFWriter(doc).get_rdf_str('ttl')
odml_doc = RDFReader().from_string(rdf_file, "ttl")

print(odml_doc)

[<Doc None by D. N. Adams (1 sections)>]


In [5]:
# Create temporary file
rdf_file = tempfile.NamedTemporaryFile(mode='w', suffix=".ttl")
rdf_path = rdf_file.name
RDFWriter(doc).write_file(rdf_path, "ttl")

odml_doc = RDFReader().from_file(rdf_path, "ttl")

print(odml_doc)

[<Doc None by D. N. Adams (1 sections)>]


Another option is to write the output to one or multiple files. <br>
`RDFReader().write_file("/input_path", "rdf_format", "/output_path_to_file")`

In [6]:
# If RDF file contains one odML document, specify output path as file
odml_file = tempfile.NamedTemporaryFile(mode='w', suffix=".odml")
odml_path = odml_file.name

RDFReader().write_file(rdf_path, "ttl", odml_path)

with open(odml_path) as ff:
    data = ff.read()
    print(data)

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet  type="text/xsl" href="odmlTerms.xsl"?>
<?xml-stylesheet  type="text/xsl" href="odml.xsl"?>
<odML version="1.1">
  <id>be49c0c4-58ff-42da-880c-6357abfef048</id>
  <section>
    <id>ca9c8204-2619-4f71-848e-7bf074948661</id>
    <type>crew/person</type>
    <name>Arthur Philip Dent</name>
    <definition>Information on Arthur Dent</definition>
    <property>
      <id>2073adf0-fa85-4a8c-be8a-976f79e2106f</id>
      <value>[Human]</value>
      <name>Species</name>
      <type>string</type>
      <definition>Species to which subject belongs to</definition>
    </property>
  </section>
  <author>D. N. Adams</author>
  <date>1979-10-12</date>
</odML>



If RDF file contains several odML docs, specify output path as a directory.<br>
`RDFReader().write_file("/input_path", "rdf_format", "/output_path_to_directory")`

Module creates files in specified directory and writes parsed docs to them.
Example of created file: `/<dir_path>/doc_<id>.odml` (`<id>` - id of the document).

##SPARQL queries benchmarking

In [7]:
from rdflib import Graph
import os

graph = Graph()
input_dir = os.path.join(os.getcwd(), 'doc/drosophila_17000_triples/')
for file_name in os.listdir(input_dir):
    f = os.path.join(input_dir, file_name)
    if os.path.isfile(f):
        graph.parse(f, format="ttl")
print('Total number of triples: ', len(graph))

Total number of triples:  16982


Simple query without nested information

In [8]:
from odml.tools.query_creator import QueryCreator
import time

creator = QueryCreator()
prepared_query = creator.get_query('sec(name:Recording-2012-02-06-ag,type:Recording)')
print(creator.query)

t0 = time.perf_counter()
for row in graph.query(prepared_query):
    print("Doc: {0}, Sec {1}".format(row.d, row.s))
t1 = time.perf_counter()

print('Execution time: ', t1-t0)

{'Sec': [('name', 'Recording-2012-02-06-ag'), ('type', 'Recording')]}
SELECT * WHERE {
?d odml:hasSection ?s .
?s rdf:type odml:Section .
?s odml:hasName "Recording-2012-02-06-ag" .
?s odml:hasType "Recording" .
}

Doc: https://g-node.org/projects/odml-rdf#5ea08f25-2f2a-4f16-b329-f529031067c4, Sec https://g-node.org/projects/odml-rdf#bbbba3f8-c6ff-4410-af23-292e7b820066
Execution time:  0.000728116000573209


Similar query with more prints

In [9]:
creator = QueryCreator()
prepared_query = creator.get_query('sec(type:Recording)')
print(creator.query)

t0 = time.perf_counter()
for row in graph.query(prepared_query):
    print("Doc: {0}, Sec {1}".format(row.d, row.s))
t1 = time.perf_counter()

{'Sec': [('type', 'Recording')]}
SELECT * WHERE {
?d odml:hasSection ?s .
?s rdf:type odml:Section .
?s odml:hasType "Recording" .
}

Doc: https://g-node.org/projects/odml-rdf#49a5bdb6-6af8-4236-86dc-23c6cbe3bc93, Sec https://g-node.org/projects/odml-rdf#872c8aa1-cb74-413f-b4f8-aab8ff7c0fe4
Doc: https://g-node.org/projects/odml-rdf#5ea08f25-2f2a-4f16-b329-f529031067c4, Sec https://g-node.org/projects/odml-rdf#bbbba3f8-c6ff-4410-af23-292e7b820066
Doc: https://g-node.org/projects/odml-rdf#c5a20ec1-969a-4f6d-9ccf-820eda0739e8, Sec https://g-node.org/projects/odml-rdf#1c58e876-fa1b-4aa6-b987-3fe65c55208f
Doc: https://g-node.org/projects/odml-rdf#883fbce3-a57f-4ee0-8a63-35ea6eaae76b, Sec https://g-node.org/projects/odml-rdf#eec2b554-9b3d-4275-8d20-2c3be0bc64df
Doc: https://g-node.org/projects/odml-rdf#3ce53904-10df-46c7-af95-05c6dbb1ea46, Sec https://g-node.org/projects/odml-rdf#3ad10380-ef56-4eae-8551-15d08f2bd35e
Doc: https://g-node.org/projects/odml-rdf#47ad68ce-beec-4ac4-aad7-b27ab2050b

In [10]:
print('Execution time: ', t1-t0)

Execution time:  0.006796777000090515


Query with a more complex structure

In [11]:
from rdflib import Graph, Namespace, RDF
from rdflib.plugins.sparql import prepareQuery

q = prepareQuery("""SELECT ?d ?s ?p ?value WHERE {
    ?d odml:hasSection ?s .
    ?s rdf:type odml:Section .
    ?s odml:hasType "Recording" .
    ?s odml:hasProperty ?p .
    ?p rdf:type odml:Property .
    ?p odml:hasName "Recording duration" .
    ?p odml:hasValue ?v .
    ?v rdf:type rdf:Bag .
    ?v rdf:li ?value .}""", initNs={"odml": Namespace("https://g-node.org/projects/odml-rdf#"),
                          "rdf": RDF})

t0 = time.perf_counter()
for row in graph.query(q):
    print("Doc: {0}, Sec: {1}, \n"
          "Prop: {2}, Val:{3}".format(row.d, row.s, row.p, row.value))
t1 = time.perf_counter()

Doc: https://g-node.org/projects/odml-rdf#8a661f5e-2777-40bb-81bd-f39094a7282f, Sec: https://g-node.org/projects/odml-rdf#c2edf620-3656-44de-9cf3-ef8e4c970faf, 
Prop: https://g-node.org/projects/odml-rdf#8d9ddac0-a414-4388-8fbb-a18f004f0b56, Val:11.22
Doc: https://g-node.org/projects/odml-rdf#deef4cca-63a7-4319-8191-bffaa91f76b4, Sec: https://g-node.org/projects/odml-rdf#6bfe973d-1fd9-4485-ac51-d1b515453792, 
Prop: https://g-node.org/projects/odml-rdf#5319d476-6027-41ba-8e58-0da9d7a8d0e8, Val:6.73
Doc: https://g-node.org/projects/odml-rdf#3660086f-25df-42d4-b4c3-2c85e3d79aa9, Sec: https://g-node.org/projects/odml-rdf#e8ecc342-3ec3-477b-8701-f9ec213863fa, 
Prop: https://g-node.org/projects/odml-rdf#c453f9d4-9319-4037-b727-8bdb0d8f4e61, Val:11.87
Doc: https://g-node.org/projects/odml-rdf#4c8b22fb-56cd-4f05-aff1-fb210abb846b, Sec: https://g-node.org/projects/odml-rdf#e5247beb-bf71-4401-a123-36128d81f41a, 
Prop: https://g-node.org/projects/odml-rdf#86183eae-169c-46d1-aaf3-b0088d622033, Val

In [12]:
print('Execution time: ', t1-t0)

Execution time:  72.3447958529996


Some above query statistics.

| Number of triples | Time, seconds    |
| ------------------|:-----------------|
| 17000             | 71               |
| 30000             | 416              |
| 50000             | 1606             |
| 100000            | more than 2500   |

Quering subsections

In [14]:
q = prepareQuery("""SELECT ?d ?s ?p ?value WHERE {
    ?d odml:hasSection ?s .
    ?s rdf:type odml:Dataset .
    ?s odml:hasProperty ?p .
    ?p rdf:type odml:Property .
    ?p odml:hasName "repro" .
    ?p odml:hasValue ?v .
    ?v rdf:type rdf:Bag .
    ?v rdf:li ?value .}""", initNs={"odml": Namespace("https://g-node.org/projects/odml-rdf#"),
                          "rdf": RDF})

t0 = time.perf_counter()
for row in graph.query(q):
    print("Doc: {0}, Sec: {1}, \n"
          "Prop: {2}, Val:{3}".format(row.d, row.s, row.p, row.value))
t1 = time.perf_counter()

Doc: https://g-node.org/projects/odml-rdf#3660086f-25df-42d4-b4c3-2c85e3d79aa9, Sec: https://g-node.org/projects/odml-rdf#0b51eb43-006e-4e7f-bb31-c1961eaaba90, 
Prop: https://g-node.org/projects/odml-rdf#a6e29ae5-8bf3-4c55-9954-99647518c822, Val:Patch2LED
Doc: https://g-node.org/projects/odml-rdf#9a3933c4-c379-40d0-a992-4d4570e05371, Sec: https://g-node.org/projects/odml-rdf#bdb6223e-5317-48ab-9260-09374048175a, 
Prop: https://g-node.org/projects/odml-rdf#d47b4e16-de17-4682-9021-37b52f97d0b8, Val:Patch2LED
Doc: https://g-node.org/projects/odml-rdf#363f23d4-3061-4d24-88e3-9c9ee9b8f94b, Sec: https://g-node.org/projects/odml-rdf#0ba5aaa5-a883-4877-a0fd-00b37b925aa4, 
Prop: https://g-node.org/projects/odml-rdf#7ee45193-d70f-4cab-9376-1f7f78cc0cd8, Val:Patch2LED
Doc: https://g-node.org/projects/odml-rdf#450d3073-f3eb-4e1f-a04a-83bbffccaa5e, Sec: https://g-node.org/projects/odml-rdf#5484b788-0c70-4c6a-9ddb-5199ebe8afa7, 
Prop: https://g-node.org/projects/odml-rdf#d6d91abb-b84a-4af1-8022-97ec

Doc: https://g-node.org/projects/odml-rdf#4c8b22fb-56cd-4f05-aff1-fb210abb846b, Sec: https://g-node.org/projects/odml-rdf#689dd11a-d444-419b-97c1-7c32bbd382ba, 
Prop: https://g-node.org/projects/odml-rdf#6e578e49-068d-4810-bb01-72370ef4c427, Val:simpleErg
Doc: https://g-node.org/projects/odml-rdf#9a3933c4-c379-40d0-a992-4d4570e05371, Sec: https://g-node.org/projects/odml-rdf#6d65da0a-16ac-4f4a-95c2-1d04518e007e, 
Prop: https://g-node.org/projects/odml-rdf#2fb7b4a6-6d78-47c3-b5fe-9ac24086d696, Val:Patch2LED
Doc: https://g-node.org/projects/odml-rdf#49a5bdb6-6af8-4236-86dc-23c6cbe3bc93, Sec: https://g-node.org/projects/odml-rdf#02b94e34-d27c-4535-871f-ecdc057499bb, 
Prop: https://g-node.org/projects/odml-rdf#80521410-c1a1-4dc6-a7d3-4266525d8009, Val:Patch2LED
Doc: https://g-node.org/projects/odml-rdf#363f23d4-3061-4d24-88e3-9c9ee9b8f94b, Sec: https://g-node.org/projects/odml-rdf#7a889098-c765-43e0-addb-cd1837699d27, 
Prop: https://g-node.org/projects/odml-rdf#a7accac0-d230-4dce-ad23-6d4c

In [15]:
print('Execution time: ', t1-t0)

Execution time:  566.1721121579994


Some above query statistics for subclass queries.

| Number of triples | Time, seconds    |
| ------------------|:-----------------|
| 6200              | 27               |
| 11000             | 160              |
| 17000             | 595              |

##FuzzyFinder class

**FuzzyFinder** tool for querying graph through *fuzzy* queries. If the user do not know exact attributes and structure of the odML data model, the finder executes multiple queries to better match the parameters and returns sets of triples. <br>
Example:

In [16]:
from odml.tools.fuzzy_finder import FuzzyFinder

query_string = 'prop(name:Date) section(name:Recording-2012-04-04-ab, type:DataAcquisition)'
f = FuzzyFinder(graph)
print(f.find(graph, q_str=query_string))

[[('Prop', ('name', 'Date')),
  ('Sec', ('name', 'Recording-2012-04-04-ab')),
  ('Sec', ('type', 'DataAcquisition'))],
 [('Prop', ('name', 'Date')), ('Sec', ('name', 'Recording-2012-04-04-ab'))],
 [('Prop', ('name', 'Date')), ('Sec', ('type', 'DataAcquisition'))],
 [('Sec', ('name', 'Recording-2012-04-04-ab')),
  ('Sec', ('type', 'DataAcquisition'))],
 [('Prop', ('name', 'Date'))],
 [('Sec', ('name', 'Recording-2012-04-04-ab'))],
 [('Sec', ('type', 'DataAcquisition'))]]
{'Sec': [('name', 'Recording-2012-04-04-ab'), ('type', 'DataAcquisition')], 'Prop': [('name', 'Date')]}
Execution time:  0.010750622999694315
{'Sec': [('name', 'Recording-2012-04-04-ab')], 'Prop': [('name', 'Date')]}
Execution time:  0.018043414000203484
{'Sec': [('type', 'DataAcquisition')], 'Prop': [('name', 'Date')]}
Execution time:  0.12047490700024355
{'Sec': [('name', 'Recording-2012-04-04-ab'), ('type', 'DataAcquisition')]}
Execution time:  0.006116941000072984
Execution time:  0.009307972999522462
{'Sec': [('nam

Query statistics for simple fuzzy queries. Results are represented for all subsets of queries.

| Number of triples | Time, seconds    |
|:------------------|:-----------------|
| 17000             | < 1 sec          |
| 100000            | < 5 sec          |