In [1]:
# for use in tutorial and development; do not include this `sys.path` change in production:
import sys ; sys.path.insert(0, "../")

# Load data via Morph-KGC

> [`morph-kgc`](https://github.com/oeg-upm/morph-kgc) is an engine that constructs RDF knowledge graphs from heterogeneous data sources with [R2RML](https://www.w3.org/2001/sw/rdb2rdf/r2rml/) and [RML](https://rml.io/specs/rml/) mapping languages. Morph-KGC is built on top of pandas and it leverages mapping partitions to significantly reduce execution times and memory consumption for large data sources.

For documentation see <https://github.com/oeg-upm/Morph-KGC/wiki/Usage>

This example uses a simple SQLite database as input, transforming it into an RDF knowledge graph based on an R2RML mapping for relations between "students" and "sports".

First, let's visualize the sample database:

This has three tables plus the data to populate them.

`Morph-KGC` needs a configuration to describe the mapping, so let's create a basic one for our example:

In [2]:
import os

config = f"""
[StudentSportDB]
mappings={os.path.dirname(os.getcwd())}/dat/student_sport.r2rml.ttl
db_url=sqlite:///{os.path.dirname(os.getcwd())}/dat/student_sport.db
         """

You can see how to create this config file in the [docs](https://github.com/oeg-upm/Morph-KGC/wiki/Configuration).

Alternatively, you provide a path to a config file, for example:
```
config = "path/to/config.ini"
```

Next we'll use `morph-kgc` to load the RDF data from the SQLite based on an R2RML mapping:

In [3]:
from icecream import ic
import kglab

namespaces = {
    "ex":  "http://example.com/",
    }

kg = kglab.KnowledgeGraph(
    name = "A KG example with students and sports",
    namespaces = namespaces,
    )

kg.materialize(config);

INFO | 2022-02-27 12:15:21,403 | 7 mapping rules retrieved.
INFO | 2022-02-27 12:15:21,418 | Mapping partition with 1 groups generated.
INFO | 2022-02-27 12:15:21,419 | Maximum number of rules within mapping group: 7.
INFO | 2022-02-27 12:15:21,420 | Mappings processed in 1.739 seconds.
INFO | 2022-02-27 12:15:21,523 | Number of triples generated in total: 22.


Data can be loaded from multiple text formats, e.g. CSV, JSON, XML, Parquet, and also through different relational DBMS such as PostgresSQL, MySQL, Oracle, Microsoft SQL Server, MariaDB, and so on.

Now let's try to query!

In [4]:
sparql = """
PREFIX ex: <http://example.com/>

SELECT ?student_name ?sport_desc
WHERE {
  ?student rdf:type ex:Student .
  ?student ex:firstName ?student_name .
  ?student ex:plays ?sport .
  ?sport ex:description ?sport_desc
}
    """

for row in kg._g.query(sparql):
    student_name = kg.n3fy(row.student_name)
    sport_desc = kg.n3fy(row.sport_desc)
    ic(student_name, sport_desc)

ic| student_name: 'Venus', sport_desc: 'Tennis'
ic| student_name: 'David', sport_desc: 'Football'
ic| student_name: 'Fernando', sport_desc: 'Football'
ic| student_name: 'Fernando', sport_desc: 'Formula1'
