In [1]:
# for use in tutorial and development; do not include this `sys.path` change in production:
import sys ; sys.path.insert(0, "../")
import os

**WIP** during integration

# Load data via Morph-KGC

> [`morph-kgc`](https://github.com/oeg-upm/morph-kgc) is an engine that constructs RDF knowledge graphs from heterogeneous data sources with [R2RML](https://www.w3.org/2001/sw/rdb2rdf/r2rml/) and [RML](https://rml.io/specs/rml/) mapping languages. Morph-KGC is built on top of pandas and it leverages mapping partitions to significantly reduce execution times and memory consumption for large data sources.

For documentation see <https://github.com/oeg-upm/Morph-KGC/wiki/Usage>

This example uses a simple SQLite database with students and sports and transforms it to an RDF knowledge graph using an R2RML mapping.

First, let's visualize the sample database.

We can see that it contains 3 tables, and some data for them.

`Morph-KGC` is configured via a `config.ini` file. Let's create a basic one for our example.

In [2]:
config = f"""
            [StudentSportDB]
            mappings={os.path.dirname(os.getcwd())}/dat/student_sport.r2rml.ttl
            db_url=sqlite:///{os.path.dirname(os.getcwd())}/dat/student_sport.db
         """

# it is also possible to provide a path to the config file:
# config = 'path/to/config.ini'

You can see how to create this config file in the [docs](https://github.com/oeg-upm/Morph-KGC/wiki/Configuration).

Now let's use `morph-kgc` to load the RDF data from the SQLite, based on the the `config.ini` and an R2RML mapping.

In [3]:
from icecream import ic
import icecream
import kglab

namespaces = {
    "ex":  "http://example.com/",
    }

kg = kglab.KnowledgeGraph(
    name = "A KG example with students and sports",
    namespaces = namespaces,
    )

kg.materialize(config)

INFO | 2022-02-25 16:50:14,340 | 7 mapping rules retrieved.
INFO | 2022-02-25 16:50:14,358 | Mapping partition with 1 groups generated.
INFO | 2022-02-25 16:50:14,359 | Maximum number of rules within mapping group: 7.
INFO | 2022-02-25 16:50:14,361 | Mappings processed in 0.778 seconds.
INFO | 2022-02-25 16:50:14,455 | Number of triples generated in total: 22.


<kglab.kglab.KnowledgeGraph at 0x7f7f2821f250>

Data can be loaded from multiple text formats (e.g. CSV, JSON, XML, Parquet), and also through different relational DBMS (PostgresSQL, MySQL, Oracle, Microsoft SQL Server, MariaDB).

Now let's try to query!

In [4]:
sparql = """
    PREFIX ex:  <http://example.com/>

    SELECT ?student_name ?sport_desc
    WHERE {
        ?student rdf:type ex:Student .
        ?student ex:firstName ?student_name .
        ?student ex:plays ?sport .
        ?sport ex:description ?sport_desc
    }
    """

for row in kg._g.query(sparql):
    ic(row.asdict())

ic| row.asdict(): {'sport_desc': rdflib.term.Literal('Formula1'),
                   'student_name': rdflib.term.Literal('Fernando')}
ic| row.asdict(): {'sport_desc': rdflib.term.Literal('Football'),
                   'student_name': rdflib.term.Literal('Fernando')}
ic| row.asdict(): {'sport_desc': rdflib.term.Literal('Tennis'),
                   'student_name': rdflib.term.Literal('Venus')}
ic| row.asdict(): {'sport_desc': rdflib.term.Literal('Football'),
                   'student_name': rdflib.term.Literal('David')}
