<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Morph-KGC-Tutorial" data-toc-modified-id="Morph-KGC-Tutorial-1"><span class="toc-item-num">1&nbsp;&nbsp;</span><strong><a href="https://github.com/oeg-upm/morph-kgc" rel="nofollow" target="_blank">Morph-KGC</a> Tutorial</strong></a></span><ul class="toc-item"><li><span><a href="#Load-Knowledge-Graph-to-RDFLib" data-toc-modified-id="Load-Knowledge-Graph-to-RDFLib-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span><strong>Load Knowledge Graph to <a href="https://rdflib.readthedocs.io" rel="nofollow" target="_blank">RDFLib</a></strong></a></span></li><li><span><a href="#Load-Knowledge-Graph-to-Oxigraph" data-toc-modified-id="Load-Knowledge-Graph-to-Oxigraph-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span><strong>Load Knowledge Graph to <a href="https://oxigraph.org/pyoxigraph/" rel="nofollow" target="_blank">Oxigraph</a></strong></a></span></li><li><span><a href="#Create-Knowledge-Graph-via-Command-Line" data-toc-modified-id="Create-Knowledge-Graph-via-Command-Line-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span><strong>Create Knowledge Graph via Command Line</strong></a></span></li></ul></li></ul></div>

# **[Morph-KGC](https://github.com/oeg-upm/morph-kgc) Tutorial**

**[Morph-KGC](https://github.com/oeg-upm/morph-kgc)** is an engine that constructs **[RDF](https://www.w3.org/TR/rdf11-concepts/)** and **[RDF-star](https://w3c.github.io/rdf-star/cg-spec/2021-12-17.html)** knowledge graphs from heterogeneous data sources with the **[R2RML](https://www.w3.org/TR/r2rml/)**, **[RML](https://rml.io/specs/rml/)** and **[RML-star](https://kg-construct.github.io/rml-star-spec/)** mapping languages. The full documentation of Morph-KGC is available in **[Read the Docs](https://morph-kgc.readthedocs.io/en/latest/)**.

There are two different options to run Morph-KGC:

- As a **library**, integrating with **[RDFLib](https://rdflib.readthedocs.io)** and **[Oxigraph](https://oxigraph.org/pyoxigraph)**.
- Via the **command line**.

Morph-KGC currently supports the following input data formats:
- **Relational databases**: **[MySQL](https://www.mysql.com/)**, **[PostgreSQL](https://www.postgresql.org/)**, **[Oracle](https://www.oracle.com/database/)**, **[Microsoft SQL Server](https://www.microsoft.com/sql-server)**, **[MariaDB](https://mariadb.org/)**, **[SQLite](https://www.sqlite.org)**.
- **Tabular files**: **[CSV](https://en.wikipedia.org/wiki/Comma-separated_values)**, **[TSV](https://en.wikipedia.org/wiki/Tab-separated_values)**, **[Excel](https://www.microsoft.com/en-us/microsoft-365/excel)**, **[Parquet](https://parquet.apache.org/documentation/latest/)**, **[Feather](https://arrow.apache.org/docs/python/feather.html)**, **[ORC](https://orc.apache.org/)**, **[Stata](https://www.stata.com/)**, **[SAS](https://www.sas.com)**, **[SPSS](https://www.ibm.com/analytics/spss-statistics-software)**, **[ODS](https://en.wikipedia.org/wiki/OpenDocument)**.
- **Hierarchical files**: **[JSON](https://www.json.org)**, **[XML](https://www.w3.org/TR/xml/)**.

This tutorial shows the different alternatives to run Morph-KGC.


## **Load Knowledge Graph to [RDFLib](https://rdflib.readthedocs.io)**

**[RDFLib](https://rdflib.readthedocs.io)** is the reference library to work with RDF in Python. Morph-KGC can be used as a **library** to create a knowledge graph and load it to RDFLib. In this example we will use the **[GTFS-Madrid-Bench](https://github.com/oeg-upm/gtfs-bench)** with **CSV** data. Morph-KGC allows to access mappings and data **remotely**, so we will use this functionality to avoid downloading the data and the mappings ourselves. The RML mappings are available [here](https://github.com/oeg-upm/morph-kgc/blob/main/examples/tutorial/mapping.gtfs.ttl) and the data is available [here](https://github.com/oeg-upm/morph-kgc/tree/main/examples/csv/data).

First of all, we need to **install** [Morph-KGC](https://pypi.org/project/morph-kgc) (this will also install [RDFLib](https://pypi.org/project/rdflib/) and [Oxigraph](https://pypi.org/project/pyoxigraph/)).

In [1]:
!pip install morph-kgc

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting morph-kgc
  Downloading morph_kgc-2.2.0-py3-none-any.whl (40 kB)
[K     |████████████████████████████████| 40 kB 6.4 MB/s 
Collecting sql-metadata>=2.3.0
  Downloading sql_metadata-2.6.0-py3-none-any.whl (21 kB)
Collecting rdflib>=6.1.1
  Downloading rdflib-6.2.0-py3-none-any.whl (500 kB)
[K     |████████████████████████████████| 500 kB 63.9 MB/s 
[?25hCollecting pyoxigraph>=0.3.0
  Downloading pyoxigraph-0.3.8-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.1 MB)
[K     |████████████████████████████████| 6.1 MB 63.7 MB/s 
[?25hCollecting elementpath>=2.4.0
  Downloading elementpath-3.0.2-py3-none-any.whl (189 kB)
[K     |████████████████████████████████| 189 kB 68.0 MB/s 
[?25hCollecting falcon>=3.0.0
  Downloading falcon-3.1.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |█████████████████

Now we just need to **import** Morph-KGC and we are ready to go!

In [2]:
import morph_kgc

To run Morph-KGC it is neccesary to provide some information. This is done via a config **INI** file. When running Morph-KGC as a **library**, this configuration can be provided as a **string** or as a **file path**. Below there is a basic config file for our example provided as a string. The _config_ indicates the path to a mapping file.

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
config = """
             [/content/drive/MyDrive/MSc/odkg/handsOn/EstudioPatiosEscolares2022updated.csv]
             mappings: /content/drive/MyDrive/MSc/odkg/handsOn/output.rml.ttl
         """

We just need to call `materialize` passing our _config_ and Morph-KGC will create the knowledge graph and load it to RDFLib.

In [5]:
g = morph_kgc.materialize(config)

In [6]:
v = g.serialize(destination='/content/drive/MyDrive/MSc/odkg/handsOn/schoolFinderRDF.ttl', format="ttl")
v

<Graph identifier=N842d7e62f92744c290f1357811823e78 (<class 'rdflib.graph.Graph'>)>

In [8]:
from rdflib import Graph, Namespace, Literal
from rdflib.namespace import RDF, RDFS
from rdflib.plugins.sparql import prepareQuery

#School and contact given a phone

ont = Namespace("http://smartcity.linkeddata.es/schoolfinder/ontology/")

q1 = prepareQuery('''
        SELECT ?s ?c WHERE {
             ?s a ont:School .
             ?s ont:hasContact ?c .
             ?c ont:phone "913 324 348"
         }
  ''',
  initNs = {"ont": ont}
)

for r in v.query(q1):
 print(r.s, r.c)

http://smartcity.linkeddata.es/schoolfinder/resource/school/4693138 http://smartcity.linkeddata.es/schoolfinder/resource/contact/4693138


In [1]:
#Schools in a given district

ont = Namespace("http://smartcity.linkeddata.es/schoolfinder/ontology/")

q1 = prepareQuery('''
        SELECT ?s ?n WHERE {
             ?s a ont:School .
             ?s ont:hasSchoolGround ?sc .
             ?sc ont:hasLocalization ?l .
             ?l ont:district "Valdefuentes" .
             ?s ont:name ?n
         }
  ''',
  initNs = {"ont": ont}
)

for r in v.query(q1):
 print(r.s, r.n)

NameError: name 'Namespace' is not defined

In [None]:
#Schools and their url in a given district

ont = Namespace("http://smartcity.linkeddata.es/schoolfinder/ontology/")

q1 = prepareQuery('''
        SELECT ?s ?n ?url WHERE {
             ?s a ont:School .
             ?s ont:hasSchoolGround ?sc .
             ?sc ont:hasLocalization ?l .
             ?l ont:district "Hortaleza" .
             ?s ont:name ?n .
             ?s ont:hasContact ?c .
             ?c ont:contentURL ?url
         }
  ''',
  initNs = {"ont": ont}
)

for r in v.query(q1):
 print(r.s, r.n, r.url)

In [14]:
#School and their ways to go given the name

ont = Namespace("http://smartcity.linkeddata.es/schoolfinder/ontology/")

q1 = prepareQuery('''
        SELECT ?s ?b ?m ?r WHERE {
             ?s a ont:School .
             ?s ont:hasSchoolGround ?sc .
             ?sc ont:hasAccessibility ?a .
             ?a ont:bus ?b .
             ?a ont:metro ?m .
             ?a ont:renfe ?r .
             ?s ont:name "Colegio público santa maría"  
         }
  ''',
  initNs = {"ont": ont}
)

for r in v.query(q1):
 print(r.s, r.b, r.m, r.r)

http://smartcity.linkeddata.es/schoolfinder/resource/school/5314 , 34 , 36 , 41 , 60 , 116 , 118 , 119 , 148 Embajadores, acacias  embajadores


**That is it!** Now we can work with our RDFLib graph: query, navigate or save the graph and more. For instance, below we query the knowledge graph with [query 3](https://github.com/oeg-upm/gtfs-bench/blob/master/queries/q3.rq) of the GTFS-Madrid-Bench.

In [None]:
from rdflib import Graph, Namespace, Literal
from rdflib.namespace import RDF, RDFS
from rdflib.plugins.sparql import prepareQuery

ont = Namespace("http://smartcity.linkeddata.es/schoolfinder/ontology/")

q1 = prepareQuery('''
        SELECT ?sc WHERE {
             ?sc a ont:Contact

         }
  '''
)

for r in v.query(q1):
 print(r)

AttributeError: ignored

In [None]:
for s, p, o in v.triples((None, None, ont.SchoolGround)):
  print(s, p, o)

In [None]:
q3 = """
         PREFIX ont: <http://smartcity.linkeddata.es/schoolfinder/ontology>

         SELECT ?s WHERE {
             ?s a ont:School 
         }
      """

q3_res = v.query(q3)

for r in q3_res:
    print(r.s)

In [None]:
.?sc ont:hasLocalization ?l 
             ?l ont:district "Hortaleza"^^xsd:string

We could also have run Morph-KGC with the config from a file. Below we create the _config_ file writing it to disk. 

In [None]:
# create the config file
!echo "[GTFS-Madrid-Bench]" > config.ini
!echo "mappings: https://raw.githubusercontent.com/oeg-upm/morph-kgc/main/examples/tutorial/mapping.gtfs.ttl" >> config.ini

# show the config file
!cat config.ini

[GTFS-Madrid-Bench]
mappings: https://raw.githubusercontent.com/oeg-upm/morph-kgc/main/examples/tutorial/mapping.gtfs.ttl


We create our knowledge graph again, this time passing the file path to `materialize`.

In [None]:
g = morph_kgc.materialize('config.ini')

Usually the default configuration is enough for most use cases. However, in some cases we may need to tune Morph-KGC. For this we can use a `CONFIGURATION` section in the _config_ file. For instance, you can specify which values should be interpreted as NULL (e.g., _#N/A_). You can find the full list of configuration options in the **[documentation](https://morph-kgc.readthedocs.io/en/latest/documentation/#engine-configuration)**. Below you can see an example of a more detailed _config_ file.

In [None]:
config = """
             [CONGIGURATION]
             na_values: #N/A,,N/A
             logging_level: DEBUG

             [GTFS-Madrid-Bench]
             mappings: https://raw.githubusercontent.com/oeg-upm/morph-kgc/main/examples/tutorial/mapping.csv.ttl
         """

## **Load Knowledge Graph to [Oxigraph](https://oxigraph.org/pyoxigraph/)**

While RDFLib provides much functionality, it does not support **[RDF-star](https://w3c.github.io/rdf-star/cg-spec/2021-12-17.html)** yet. Morph-KGC can create RDF-star knowledge graphs using **[RML-star](https://kg-construct.github.io/rml-star-spec/)** mappings and load them to **[Oxigraph](https://oxigraph.org/pyoxigraph/)**.

The following example creates an RDF-star knowledge graph of scientific software metadata (the Morph-KGC software in this example), extracted with [SoMEF](https://github.com/KnowledgeCaptureAndDiscovery/somef). SoMEF extract some characteristics of the software which are annotated with the technique that was used to extract them and also with a confidence value. The **JSON** data is available [here](https://github.com/oeg-upm/morph-kgc/blob/main/examples/tutorial/oeg-upm_morph-kgc.json) and the RML-star mappings are available [here](https://github.com/oeg-upm/morph-kgc/blob/main/examples/tutorial/mapping.somef.ttl).

As with RDFLib, we just need to create the _config_ and call `materialize_oxigraph`.

In [None]:
import morph_kgc

config = """
             [SoMEF]
             mappings: https://raw.githubusercontent.com/oeg-upm/morph-kgc/main/examples/tutorial/mapping.somef.ttl
         """

g = morph_kgc.materialize_oxigraph(config)

We loaded our knowledge graph to an Oxigraph store, we can now query it with **[SPARQL-star](https://w3c.github.io/rdf-star/cg-spec/editors_draft.html#sparql-star)**. The query below retrieves the license, the technique used to obtain the information and the confidence value.

In [None]:
q = """
         PREFIX sd: <https://w3id.org/okn/o/sd#>
         PREFIX em: <https://www.w3id.org/okn/o/em#>

         SELECT * WHERE {
             ?sowtware a sd:Software .
             << ?software sd:license ?license >> em:confidence ?confidence .
             << ?software sd:license ?license >> em:technique ?technique .
         }
    """

q_res = g.query(q)

for solution in q_res:
    print(solution['software'], solution ['license'], solution ['technique'], solution['confidence'])

<https://www.w3id.org/okn/i/Software/oeg-upm/morph-kgc> "https://api.github.com/licenses/apache-2.0"^^<http://www.w3.org/2001/XMLSchema#anyURI> "GitHub API" "1.0"


## **Create Knowledge Graph via Command Line**

Morph-KGC can also be executed from the **command line**. This is the most recommended option if you work with **large volumes of data**. As before, we need to create a config file. In this example we use again the data from the GTFS-Madrid-Bench.

In [None]:
# create the config file
!echo "[GTFS-Madrid-Bench]" > config.ini
!echo "mappings: https://raw.githubusercontent.com/oeg-upm/morph-kgc/main/examples/tutorial/mapping.gtfs.ttl" >> config.ini

# show the config file
!cat config.ini

[GTFS-Madrid-Bench]
mappings: https://raw.githubusercontent.com/oeg-upm/morph-kgc/main/examples/tutorial/mapping.gtfs.ttl


The following command will create the knowledge graph and write it to a _knowledge-graph.nt_ file. You just need to provide the path to the _config_ file.

In [None]:
!python3 -m morph_kgc config.ini

INFO | 2022-10-20 15:56:48,980 | 86 mapping rules retrieved.
INFO | 2022-10-20 15:56:49,030 | Mapping partition with 83 groups generated.
INFO | 2022-10-20 15:56:49,031 | Maximum number of rules within mapping group: 2.
INFO | 2022-10-20 15:56:49,035 | Mappings processed in 1.148 seconds.
INFO | 2022-10-20 15:56:51,171 | Number of triples generated in total: 2001.
INFO | 2022-10-20 15:56:51,172 | Materialization finished in 2.134 seconds.


Let's take a look to a subset of the generated RDF!

In [None]:
!head knowledge-graph.nt

<http://transport.linkeddata.es/madrid/agency/00000000000000000001> <http://xmlns.com/foaf/0.1/phone> "00000000000000000001".
<http://transport.linkeddata.es/madrid/agency/00000000000000000001> <http://vocab.gtfs.org/terms#fareUrl> <https://www.crtm.es/billetes-y-tarifas>.
<http://transport.linkeddata.es/madrid/metro/stops/000000000000000000os> <http://xmlns.com/foaf/0.1/name> "000000000000000000br".
<http://transport.linkeddata.es/madrid/metro/stops/0000000000000000006o> <http://xmlns.com/foaf/0.1/name> "0000000000000000008o".
<http://transport.linkeddata.es/madrid/metro/stops/0000000000000000005r> <http://xmlns.com/foaf/0.1/name> "0000000000000000008a".
<http://transport.linkeddata.es/madrid/metro/stops/000000000000000000ud> <http://xmlns.com/foaf/0.1/name> "00000000000000000037".
<http://transport.linkeddata.es/madrid/metro/stops/0000000000000000001r> <http://xmlns.com/foaf/0.1/name> "000000000000000000j1".
<http://transport.linkeddata.es/madrid/metro/stops/000000000000000000dz> <ht

With the generated RDF we could for instance load it to RDFLib (or any triplestore) and pose queries.

In [None]:
import rdflib

g = rdflib.Graph()
g.parse('knowledge-graph.nt')

q3 = """
         PREFIX gtfs: <http://vocab.gtfs.org/terms#>
         PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
         PREFIX dct: <http://purl.org/dc/terms/>

         SELECT * WHERE {
             ?stop a gtfs:Stop . 
             ?stop gtfs:locationType ?location .
             OPTIONAL { ?stop dct:description ?stopDescription . }
             OPTIONAL { 
                 ?stop geo:lat ?stopLat . 
                 ?stop geo:long ?stopLong .
             }
             OPTIONAL {?stop gtfs:wheelchairAccessible ?wheelchairAccessible . }
             FILTER (?location=<http://transport.linkeddata.es/resource/LocationType/2>)
         }
      """

q3_res = g.query(q3)

for row in q3_res:
    print(row['stop'], row['stopLat'], row['stopLong'])

http://transport.linkeddata.es/madrid/metro/stops/000000000000000000lh 87.0 47.0
http://transport.linkeddata.es/madrid/metro/stops/00000000000000000036 151.0 111.0
http://transport.linkeddata.es/madrid/metro/stops/000000000000000000dr 697.0 657.0
http://transport.linkeddata.es/madrid/metro/stops/0000000000000000006o 231.0 191.0
http://transport.linkeddata.es/madrid/metro/stops/000000000000000000xj 739.0 699.0
http://transport.linkeddata.es/madrid/metro/stops/0000000000000000007q 830.0 790.0
http://transport.linkeddata.es/madrid/metro/stops/000000000000000000ho 929.0 889.0
http://transport.linkeddata.es/madrid/metro/stops/000000000000000000tm 579.0 539.0
http://transport.linkeddata.es/madrid/metro/stops/000000000000000000dz 716.0 676.0
http://transport.linkeddata.es/madrid/metro/stops/000000000000000000gv 441.0 401.0
http://transport.linkeddata.es/madrid/metro/stops/000000000000000000qt 750.0 710.0
http://transport.linkeddata.es/madrid/metro/stops/000000000000000000e4 476.0 436.0
