In [5]:
!pip install morph-kgc==2.8.0
import morph_kgc



In [11]:
# create the config file
!echo "[CONFIGURATION]" > config.ini
!echo "logging_level: DEBUG" >> config.ini
!echo "mappings: mapping.ttl" >> config.ini
!echo "output_file: output.nq" >> config.ini
# show the config file
!cat config.ini

[CONFIGURATION]
logging_level: DEBUG
mappings: mapping.ttl
output_file: output.nq
output_format: TURTLE


In [12]:
!python3 -m morph_kgc config.ini

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/carla.castedo.pereira/anaconda3/lib/python3.12/site-packages/morph_kgc/__main__.py", line 28, in <module>
    config = load_config_from_command_line()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/carla.castedo.pereira/anaconda3/lib/python3.12/site-packages/morph_kgc/args_parser.py", line 70, in load_config_from_command_line
    config = _parse_config(config)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/carla.castedo.pereira/anaconda3/lib/python3.12/site-packages/morph_kgc/args_parser.py", line 52, in _parse_config
    config.validate_configuration_section()
  File "/home/carla.castedo.pereira/anaconda3/lib/python3.12/site-packages/morph_kgc/config.py", line 178, in validate_configuration_section
    raise ValueError(f'{OUTPUT_FORMAT} value `{self.get_output_format()}` is not valid. '
ValueError: output_format value `

# **[Yatter](https://github.com/oeg-upm/yatter) Tutorial**

This tool translates mapping rules from YARRRML in a turtle-based serialization of RML or R2RML.

First of all, we need to install the library via pip.

In [1]:
!pip install yatter==1.1.5



And make the necessary imports (yatter, ruamel for YAML and urllib for the requests)

In [2]:
import yatter, time
from ruamel.yaml import YAML
from urllib import request

We just need to call the function `translate` from the `yatter` package, with a yaml object as a parameter, it returns the mapping translated into RML as an string

In [3]:
yaml = YAML(typ='safe', pure=True)
yarrrml_mapping = request.urlopen("https://raw.githubusercontent.com/kg-construct/tutorials/refs/heads/main/ecai2024/resources/yarrrml_mapping.yml").read().decode('utf-8')

print(f'\n\n------Input Mapping in YARRRML----------\n\n{yarrrml_mapping}')
time.sleep(1)
rml_content = yatter.translate(yaml.load(yarrrml_mapping))
print(f'\n\n------Translated Mapping in RML----------\n\n{rml_content}')



------Input Mapping in YARRRML----------

prefixes:
  rr: http://www.w3.org/ns/r2rml#
  foaf: http://xmlns.com/foaf/0.1/
  xsd: http://www.w3.org/2001/XMLSchema#
  rdfs: http://www.w3.org/2000/01/rdf-schema#
  ontology: http://data.sfgov.org/ontology/
  grel: http://users.ugent.be/~bjdmeest/function/grel.ttl#
  rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
  ex: http://www.ex.org/

mappings:
  Aircraft:
    sources:
      - [data/sfo-aircraft-tail-numbers-and-models.csv~csv]
    s: http://www.ex.org/Aircraft/$(aircraft_id)
    po:
    - [a, ex:Aircraft]
    - [ex:hasID, $(aircraft_id), xsd:string]
    - [ex:hasCreationDate, $(creation_date), xsd:dateTime]
    - [ex:hasModificationDate, $(modification_date), xsd:dateTime]
    - [ex:hasStatus, $(status), xsd:string]
    - [ex:hasTailNumber, $(tail_number), xsd:string]
    - p: ex:hasAircraftModel
      o:
      - mapping: AircraftModel
        condition:
          function: equal
          parameters:
            - [str1, $(aircraft

2024-11-25 17:00:31,222 | INFO: Translating YARRRML mapping to [R2]RML
2024-11-25 17:00:31,223 | INFO: RML content is created!
2024-11-25 17:00:31,229 | INFO: Mapping has been syntactically validated.
2024-11-25 17:00:31,229 | INFO: Translation has finished successfully.




------Translated Mapping in RML----------

@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix ontology: <http://data.sfgov.org/ontology/>.
@prefix grel: <http://users.ugent.be/~bjdmeest/function/grel.ttl#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix ex: <http://www.ex.org/>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix ql: <http://semweb.mmlab.be/ns/ql#>.
@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#>.
@prefix schema: <http://schema.org/>.
@prefix formats: <http://www.w3.org/ns/formats/>.
@prefix comp: <http://semweb.mmlab.be/ns/rml-compression#>.
@prefix void: <http://rdfs.org/ns/void#>.
@prefix fnml: <http://semweb.mmlab.be/ns/fnml#>.
@base <http://example.com/ns#>.


<Aircraft_0> a rr:TriplesMap;

	rml:logicalSource [
		a rml:LogicalSource;
		rml:source "data/sfo-aircraft-tail-nu

# **[Morph-KGC](https://github.com/oeg-upm/morph-kgc) Tutorial**

**[Morph-KGC](https://github.com/oeg-upm/morph-kgc)** is an engine that constructs **[RDF](https://www.w3.org/TR/rdf11-concepts/)** and **[RDF-star](https://w3c.github.io/rdf-star/cg-spec/2021-12-17.html)** knowledge graphs from heterogeneous data sources with the **[R2RML](https://www.w3.org/TR/r2rml/)** and **[RML](https://w3id.org/rml/core/spec)** mapping languages. The full documentation of Morph-KGC is available in **[Read the Docs](https://morph-kgc.readthedocs.io/en/latest/)**.

There are two different options to run Morph-KGC:

- As a **library**, integrating with **[RDFLib](https://rdflib.readthedocs.io)** and **[Oxigraph](https://oxigraph.org/pyoxigraph)**.
- Via the **command line**.


This tutorial shows the different alternatives to run Morph-KGC. Here, we use [RML](https://w3id.org/rml/core/spec) mappings, but the more user-friendly [YARRRML](https://rml.io/yarrrml/spec/) mapping format is also supported. Data transformation, computation, or filtering before generating triples is also supported with [RML-FNML](https://w3id.org/rml/fnml/spec).

## **Load Knowledge Graph to [RDFLib](https://rdflib.readthedocs.io)**

**[RDFLib](https://rdflib.readthedocs.io)** is the reference library to work with RDF in Python. Morph-KGC can be used as a **library** to create a knowledge graph and load it to RDFLib. In this example we will use the Aircrafts Example with **CSV** data. Morph-KGC allows to access mappings and data **remotely**, so we will use this functionality to avoid downloading the data and the mappings ourselves. The RML mappings are available [here](https://github.com/kg-construct/tutorials/blob/main/ecai2024/resources/rml_mapping.rml.ttl) and the data is available [here](https://github.com/kg-construct/tutorials/tree/main/ecai2024/resources/data).

First of all, we need to **install** [Morph-KGC](https://pypi.org/project/morph-kgc) (this will also install [RDFLib](https://pypi.org/project/rdflib/) and [Oxigraph](https://pypi.org/project/pyoxigraph/)).

In [1]:
!pip install morph-kgc==2.8.0



Now we just need to **import** Morph-KGC and we are ready to go!

In [2]:
import morph_kgc

To run Morph-KGC it is necessary to provide some information. This is done via a config **INI** file. When running Morph-KGC as a **library**, this configuration can be provided as a **string** or as a **file path**. Below there is a basic config file for our example provided as a string. The _config_ indicates the path to a mapping file.

In [6]:
config = """
             [KGC-Tutorial]
             mappings: https://raw.githubusercontent.com/kg-construct/tutorials/refs/heads/main/ecai2024/resources/rml_mapping.rml.ttl
         """

In [3]:
config = """
             [KGC-Tutorial]
             mappings: mapping.ttl
         """

We just need to call `materialize` passing our _config_ and Morph-KGC will create the knowledge graph and load it to RDFLib.

In [4]:
g = morph_kgc.materialize(config)

INFO | 2024-12-10 16:14:20,629 | 2 mapping rules retrieved.
INFO | 2024-12-10 16:14:20,637 | Mapping partition with 2 groups generated.
INFO | 2024-12-10 16:14:20,637 | Maximum number of rules within mapping group: 1.
INFO | 2024-12-10 16:14:20,638 | Mappings processed in 0.417 seconds.
INFO | 2024-12-10 16:14:20,719 | Number of triples generated in total: 2.


**That is it!** Now we can work with our RDFLib graph: query, navigate or save the graph and more. For instance, below we query the KG

In [8]:
sparql_query = """
         PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
         PREFIX dct: <http://purl.org/dc/terms/>
         PREFIX ex: <http://www.ex.org/>

         SELECT * WHERE {
             ?aircraft a ex:Aircraft .
             ?aircraft ex:hasID ?id .
             ?aircraft ex:hasAircraftModel ?model
         }
      """

answers = g.query(sparql_query)

for row in answers:
    print(f' Aircraft {row["aircraft"]} has id {row["id"]} and model {row["model"]}')

 Aircraft http://www.ex.org/Aircraft/26 has id 26 and model http://www.ex.org/AircraftModel/B737-400
 Aircraft http://www.ex.org/Aircraft/11745 has id 11745 and model http://www.ex.org/AircraftModel/A321-200
 Aircraft http://www.ex.org/Aircraft/12069 has id 12069 and model http://www.ex.org/AircraftModel/B717-
 Aircraft http://www.ex.org/Aircraft/5835 has id 5835 and model http://www.ex.org/AircraftModel/MD-90-30
 Aircraft http://www.ex.org/Aircraft/86 has id 86 and model http://www.ex.org/AircraftModel/MD-11
 Aircraft http://www.ex.org/Aircraft/462 has id 462 and model http://www.ex.org/AircraftModel/B737-700
 Aircraft http://www.ex.org/Aircraft/40 has id 40 and model http://www.ex.org/AircraftModel/B747-400
 Aircraft http://www.ex.org/Aircraft/485 has id 485 and model http://www.ex.org/AircraftModel/B737-700
 Aircraft http://www.ex.org/Aircraft/11769 has id 11769 and model http://www.ex.org/AircraftModel/B737-800
 Aircraft http://www.ex.org/Aircraft/104 has id 104 and model http://ww

## **Create Knowledge Graph via Command Line**

Morph-KGC can also be executed from the **command line**. This is the most recommended option if you work with **large volumes of data**. As before, we need to create a config file. In this example we use again the data from the GTFS-Madrid-Bench.

In [9]:
# create the config file
!echo "[CONFIGURATION]" > config.ini
!echo "logging_level: DEBUG" >> config.ini
!echo "[AIRCRAFTSKG]" >> config.ini
!echo "mappings: https://raw.githubusercontent.com/kg-construct/tutorials/refs/heads/main/ecai2024/resources/rml_mapping.rml.ttl" >> config.ini

# show the config file
!cat config.ini

[CONFIGURATION]
logging_level: DEBUG
[AIRCRAFTSKG]
mappings: https://raw.githubusercontent.com/kg-construct/tutorials/refs/heads/main/ecai2024/resources/rml_mapping.rml.ttl


The following command will create the knowledge graph and write it to a _knowledge-graph.nt_ file. You just need to provide the path to the _config_ file.

In [10]:
!python3 -m morph_kgc config.ini

DEBUG | 2024-11-25 17:06:20,072 | CONFIGURATION: {'logging_level': 'DEBUG', 'output_file': 'knowledge-graph', 'na_values': ',nan', 'safe_percent_encoding': '', 'read_parsed_mappings_path': '', 'write_parsed_mappings_path': '', 'mapping_partitioning': 'PARTIAL-AGGREGATIONS', 'logging_file': '', 'oracle_client_lib_dir': '', 'oracle_client_config_dir': '', 'udfs': '', 'output_kafka_server': '', 'output_kafka_topic': '', 'output_dir': '', 'output_format': 'N-TRIPLES', 'only_printable_chars': 'no', 'infer_sql_datatypes': 'no', 'number_of_processes': '12'}
DEBUG | 2024-11-25 17:06:20,072 | DATA SOURCE `AIRCRAFTSKG`: {'mappings': 'https://raw.githubusercontent.com/kg-construct/tutorials/refs/heads/main/ecai2024/resources/rml_mapping.rml.ttl'}
DEBUG | 2024-11-25 17:06:20,982 | Removed self-join from mapping rule `#TM6`.
DEBUG | 2024-11-25 17:06:20,982 | Removed self-join from mapping rule `#TM7`.
DEBUG | 2024-11-25 17:06:20,982 | Removed self-join from mapping rule `#TM9`.
INFO | 2024-11-25 17

Let's take a look to a subset of the generated RDF!

In [11]:
!head knowledge-graph.nt

<http://www.ex.org/Aircraft/278> <http://www.ex.org/hasID> "278" .
<http://www.ex.org/Aircraft/11774> <http://www.ex.org/hasID> "11774" .
<http://www.ex.org/Aircraft/1419> <http://www.ex.org/hasID> "1419" .
<http://www.ex.org/Aircraft/95> <http://www.ex.org/hasID> "95" .
<http://www.ex.org/Aircraft/49> <http://www.ex.org/hasID> "49" .
<http://www.ex.org/Aircraft/11838> <http://www.ex.org/hasID> "11838" .
<http://www.ex.org/Aircraft/11919> <http://www.ex.org/hasID> "11919" .
<http://www.ex.org/Aircraft/11748> <http://www.ex.org/hasID> "11748" .
<http://www.ex.org/Aircraft/494> <http://www.ex.org/hasID> "494" .
<http://www.ex.org/Aircraft/11724> <http://www.ex.org/hasID> "11724" .


With the generated RDF we could for instance load it to RDFLib (or any triplestore) and pose queries.

## **Load Knowledge Graph to [Oxigraph](https://oxigraph.org/pyoxigraph/)**

While RDFLib provides much functionality, it does not support **[RDF-star](https://w3c.github.io/rdf-star/cg-spec/2021-12-17.html)** yet. Morph-KGC can create RDF-star knowledge graphs using **[RML-star](https://w3id.org/rml/star/spec)** mappings and load them to **[Oxigraph](https://oxigraph.org/pyoxigraph/)**.

The following example creates an RDF-star knowledge graph of scientific software metadata (the Morph-KGC software in this example), extracted with [SoMEF](https://github.com/KnowledgeCaptureAndDiscovery/somef). SoMEF extract some characteristics of the software which are annotated with the technique that was used to extract them and also with a confidence value. The **JSON** data is available [here](https://github.com/oeg-upm/morph-kgc/blob/main/examples/tutorial/oeg-upm_morph-kgc.json) and the RML-star mappings are available [here](https://github.com/oeg-upm/morph-kgc/blob/main/examples/tutorial/mapping.somef.ttl).

As with RDFLib, we just need to create the _config_ and call `materialize_oxigraph`.

In [12]:
import morph_kgc

config = """
             [SoMEF]
             mappings: https://raw.githubusercontent.com/oeg-upm/morph-kgc/main/examples/tutorial/mapping.somef.ttl
         """

g = morph_kgc.materialize_oxigraph(config)

2024-11-25 17:07:33,938 | DEBUG: CONFIGURATION: {'output_file': 'knowledge-graph', 'na_values': ',nan', 'safe_percent_encoding': '', 'read_parsed_mappings_path': '', 'write_parsed_mappings_path': '', 'mapping_partitioning': 'PARTIAL-AGGREGATIONS', 'logging_file': '', 'oracle_client_lib_dir': '', 'oracle_client_config_dir': '', 'udfs': '', 'output_kafka_server': '', 'output_kafka_topic': '', 'output_dir': '', 'output_format': 'N-TRIPLES', 'only_printable_chars': 'no', 'infer_sql_datatypes': 'no', 'logging_level': 'INFO', 'number_of_processes': '12'}
2024-11-25 17:07:33,939 | DEBUG: DATA SOURCE `SoMEF`: {'mappings': 'https://raw.githubusercontent.com/oeg-upm/morph-kgc/main/examples/tutorial/mapping.somef.ttl'}
2024-11-25 17:07:35,785 | INFO: 235 mapping rules retrieved.
2024-11-25 17:07:35,818 | DEBUG: All predicate maps are constant-valued, invariant subset is not enforced.
2024-11-25 17:07:35,841 | DEBUG: All graph maps are constant-valued, invariant subset is not enforced.
2024-11-25 

We loaded our knowledge graph to an Oxigraph store, we can now query it with **[SPARQL-star](https://w3c.github.io/rdf-star/cg-spec/editors_draft.html#sparql-star)**. The query below retrieves the license, the technique used to obtain the information and the confidence value.

In [14]:
q = """
         PREFIX sd: <https://w3id.org/okn/o/sd#>
         PREFIX em: <https://www.w3id.org/okn/o/em#>

         SELECT * WHERE {
             ?software a sd:Software .
             << ?software sd:license ?license >> em:confidence ?confidence .
             << ?software sd:license ?license >> em:technique ?technique .
         }
    """

q_res = g.query(q)

for solution in q_res:
    print(solution['software'], solution ['license'], solution ['technique'], solution['confidence'])

<https://www.w3id.org/okn/i/Software/oeg-upm/morph-kgc> "https://api.github.com/licenses/apache-2.0"^^<http://www.w3.org/2001/XMLSchema#anyURI> "GitHub API" "1.0"
