# Harmonizer Tool : module to convert Json files into RDF files  

**CSTB** : Nicolas BUS, Guillaume PICINBONO, Nicolas PASTORELLY, Audrey BOUET  

### Import libraries

In [1]:
import os
import json
import pathlib

## Table of Contents

1. [Aim](#1.-Aim)  

2. [Useful Tools](#2.-Useful-Tools)  

3. [Arguments of the module](#3.-Arguments-of-the-module)  
    3.1. [Input File](#3.1.-Input-File)    
    3.2. [RML File](#3.2.-RML-File)  
    3.3 [Sparql Files](#3.3.-Sparql-Files)  
    3.4 [Output File](#3.4.-Output-File)      

4. [Module Operations](#4.-Module-Operations)  
   
5. [Examples of use](#5.-Examples-of-use)  
    5.1. [Simple conversion](#5.1.-Simple-conversion)  
    5.2. [Sparql Estage](#5.2.-Sparql-Estage)  
    5.3. [Conversion and multiples Request](#5.3.-Conversion-and-multiples-Request)   
    
6. [Example of use for DomX](#6.-Example-of-use-for-DomX)  
    6.1. [Simple conversion Timeseries](#6.1.-Simple-conversion-Timeseries)  
    6.2. [Simple conversion Home Data](#6.2.-Simple-conversion-Home-Data)  
  



## 1. Aim  
  
The aim of this module is to convert files containing data in a natural language into a file which can be read by a machine to create an ontologie.
The principal idea is to transform a **JSON** file into a **RDF/Turtle** file by using a mapping file (**RML**) previously created.
Then, we added a module to modify, insert data into a created **RDF** file thantks to some **SPARQL querries** on the previous graph.  

![scheme](documentation/scheme_harmonizer.png)

## 2. Useful Tools  

**Json Parser** [(link)](http://json.parser.online.fr/) : To check the format of the Json file  
**Matey** [(link)](https://rml.io/yarrrml/matey/) : To create the mapping file  
**Python** (v3.10) : To execute the harmonizer tool  
**Java lib** (rml.jar) : To use the mapping file in the Python module to convert json to RDF   
  

## 3. Arguments of the module  

There are 2 modules inside the harmonizer tool :  
* The conversion  
* The Sparql Estage  

To activate the previous modules, you will have to specify them in the call of the tool. 
The basic command line to execute the Python tool is the following :  

&nbsp; **python** &nbsp; $\color{blue}{harmonizer.py}$ &nbsp; $\color{red}{--input}$ &nbsp; $\color{red}{inputFile}$ &nbsp; $\color{orange}{[--mapping}$ &nbsp; $\color{orange}{RMLFile]}$ &nbsp; $\color{green}{[--sparql}$ &nbsp; $\color{green}{SparqlFiles]}$ &nbsp; $\color{purple}{[--output}$ &nbsp; $\color{purple}{outputFilename]}$    



### 3.1. Input File  

In the command line, only the $\color{red}{inputFile}$ is mandatory.  
You can insert a Json file or a Ttl file of they exists.  
- If a **Json** File is specified, you must also specify a mapping file to do the conversion.  
- If a **Ttl** File is specified, the conversion will be skipped.  

### 3.2. RML File  

In the command line, the $\color{orange}{mapping File}$ is optionnal.  
If the **rml** file is specified, then it has to exist and will allow to activate the conversion if a **Json** File is in input.  

$\color{red}{WARNING}$ : The mapping file needs to be modified before to be assigned in the call of the Python module. Indeed, the source of the mapping file must be the input Json file to be converted, then, the python module allows to adjust it in the RML file directly if the source is "\_\_SOURCE__".  

So, during the use of **Matey**, once you checked the creation of rules to convert your example into Ttl file, please, replace the source as "\_\_SOURCE__" in the YARRML and export the new RML file. 


### 3.3. Sparql Files

In the command line, the $\color{green}{[--sparql}$ is optionnal.  
You can insert many sparql files, each file will be read and executed one by one.  
If all $\color{green}{Sparql Files}$ exist, then the request module is activated.  


### 3.4. Output File  

In the command line, the $\color{purple}{outputFilename}$ is optionnal.  
If it is specified, the last file will have this name, else, the filename is **"output.ttl"** by default. 
There are 2 accepted format for the output : **ttl** or **jsonld**.  



## 4. Module Operations  

Here is the scheme of the operations of the Python module :  
![python_scheme](documentation/Python_module.png)

## 5. Examples of use

The following example will use the data from CIMNE, there are a **json** file, a **rml** file and some **sparql** queries.  
You will find them in the **data/DemoCIMNE-v1** folder : 

In [2]:
os.listdir(os.path.join(os.getcwd(),'data\\DemoCIMNE-v1'))

['construct_alignement.txt',
 'construct_source.txt',
 'construct_source2.txt',
 'data.json',
 'mapping.rml']

### The Json File

In [3]:
parsed = json.loads(pathlib.Path("data\\DemoCIMNE-v1\\data.json").read_text())
print(json.dumps(parsed, indent=4, sort_keys=True))

{
    "Buildings": [
        {
            "Classificacio_sol": "na",
            "Clau_qualificacio_urbanistica": "nan",
            "Codi_postal": "",
            "Espai": "Casa del Mar d'El Port de la Selva",
            "Municipi": "El Port de la Selva",
            "Num_Ens_Inventari": "03573",
            "Num_via": "11",
            "Provincia": "Girona",
            "Qualificacio_urbanistica": "nan",
            "Ref_Cadastral": "6875508EG1867N0001OO",
            "Sup_const_sobre_rasant": "213.61",
            "Sup_const_sota rasant": "0.0",
            "Sup_const_total": "213.61",
            "Sup_terreny": "0.0",
            "Via": "de l'Illa"
        }
    ]
}


### 5.1. Simple conversion  

From a Json file and its mapping file, convert it into a **ttl** file :  


In [4]:
!python harmonizer.py --input data/DemoCIMNE-v1/data.json --mapping data/DemoCIMNE-v1/mapping.rml --output data/DemoCIMNE-v1/conversion_1.ttl

Activation of the conversion
Harmonizer without queries


From a Json file and its mapping file, convert it into a **jsonld** file :  

In [5]:
!python harmonizer.py --input data/DemoCIMNE-v1/data.json --mapping data/DemoCIMNE-v1/mapping.rml --output data/DemoCIMNE-v1/conversion_1.jsonld

Activation of the conversion
Harmonizer without queries


### The ttl output file

In [6]:
ttl = pathlib.Path("data\\DemoCIMNE-v1\\conversion_1.ttl").read_text()
print(ttl)

@prefix ns1: <http://bigg-project.eu/> .

<http://bigg-project.eu/instances/building_03573> a ns1:Building ;
    ns1:buildingName "http://bigg-project.eu/instances/Casa del Mar d'El Port de la Selva" ;
    ns1:hasCadastralInfo <http://bigg-project.eu/instances/cadastralInfo_6875508EG1867N0001OO> ;
    ns1:hasLocationInfo <http://bigg-project.eu/instances/locationInfo_03573> ;
    ns1:hasSpace <http://bigg-project.eu/instances/buildingSpace_03573> .




### 5.2. Sparql Estage

In [7]:
sparql = pathlib.Path("data\\DemoCIMNE-v1\\construct_source.txt").read_text()
print(sparql)

prefix bigg: <http://bigg-project.eu/ontology#>
prefix dc: <http://purl.org/dc/terms/> 

CONSTRUCT {
?uri  a bigg:hamonizedData ;
      dc:created ?date ;
      dc:source "Hamonizer v1.0" .
  ?s ?p ?o .
} WHERE {
 {
  SELECT ?date ?uri 
  WHERE {
   BIND(NOW() AS ?date)
   BIND(IRI(UUID()) AS ?uri)
   }
  }
 ?s ?p ?o .
}


In [8]:
!python harmonizer.py --input data/DemoCIMNE-v1/conversion_1.ttl --sparql data/DemoCIMNE-v1/construct_source.txt --output data/DemoCIMNE-v1/sparql_1.ttl

Activation of the request


### The output ttl file with the request

In [9]:
ttl_sparql = pathlib.Path("data\\DemoCIMNE-v1\\sparql_1.ttl").read_text()
print(ttl_sparql)

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ns1: <http://bigg-project.eu/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://bigg-project.eu/instances/building_03573> a ns1:Building ;
    ns1:buildingName "http://bigg-project.eu/instances/Casa del Mar d'El Port de la Selva" ;
    ns1:hasCadastralInfo <http://bigg-project.eu/instances/cadastralInfo_6875508EG1867N0001OO> ;
    ns1:hasLocationInfo <http://bigg-project.eu/instances/locationInfo_03573> ;
    ns1:hasSpace <http://bigg-project.eu/instances/buildingSpace_03573> .

<urn:uuid:42d8b9d4-5467-4be9-9e96-6617823b4a85> a <http://bigg-project.eu/ontology#hamonizedData> ;
    dcterms:created "2023-03-14T15:20:58.574491+00:00"^^xsd:dateTime ;
    dcterms:source "Hamonizer v1.0" .




### 5.3. Conversion and multiples Request

In [10]:
!python harmonizer.py --input data/DemoCIMNE-v1/data.json --mapping data/DemoCIMNE-v1/mapping.rml --sparql data/DemoCIMNE-v1/construct_source.txt data/DemoCIMNE-v1/construct_source2.txt --output data/DemoCIMNE-v1/sparql_2.ttl

Activation of the conversion
Activation of the request
Activation of the request


In [11]:
ttl_sparql2 = pathlib.Path("data\\DemoCIMNE-v1\\sparql_2.ttl").read_text()
print(ttl_sparql2)

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ns1: <http://bigg-project.eu/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://bigg-project.eu/instances/building_03573> a ns1:Building ;
    ns1:buildingName "http://bigg-project.eu/instances/Casa del Mar d'El Port de la Selva" ;
    ns1:hasCadastralInfo <http://bigg-project.eu/instances/cadastralInfo_6875508EG1867N0001OO> ;
    ns1:hasLocationInfo <http://bigg-project.eu/instances/locationInfo_03573> ;
    ns1:hasSpace <http://bigg-project.eu/instances/buildingSpace_03573> .

<urn:uuid:40914f6e-1c2c-45e8-b045-53169402fbeb> a <http://bigg-project.eu/ontology#hamonizedData> ;
    dcterms:created "2023-03-14T15:20:59.559138+00:00"^^xsd:dateTime ;
    dcterms:source "Hamonizer v1.1" .

<urn:uuid:f8462952-fa4d-4307-a176-c99b044ad63d> a <http://bigg-project.eu/ontology#hamonizedData> ;
    dcterms:created "2023-03-14T15:20:59.543347+00:00"^^xsd:dateTime ;
    dcterms:source "Hamonizer v1.0" .




## 6. Example of use for DomX  

### 6.1. Simple conversion Timeseries

The aim of this Demo is to convert the Json file representing the timeseries measures into an RDF one aligned with the BIGG ontology.

In [12]:
timeseries_file = json.loads(pathlib.Path("data\\Demo_DomX\\BIGG_Filtered_timeseries_data.json").read_text())
print(json.dumps(timeseries_file, indent=4, sort_keys=True))

{
    "boiler": [
        {
            "blr_mod_lvl": 0,
            "blr_t": 62.29688,
            "deviceid": "domx_ot_a8:03:2a:4c:24:1c",
            "flame": 0,
            "heat": 0,
            "t_out": 16.75,
            "t_out_1": 16.75,
            "time": "2022-12-06T11:05:02.085Z",
            "water": 0
        },
        {
            "blr_mod_lvl": 0,
            "deviceid": "domx_ot_a8:03:2a:4c:24:1c",
            "flame": 0,
            "heat": 0,
            "time": "2022-12-06T11:05:03.280Z",
            "water": 0
        },
        {
            "blr_mod_lvl": 0,
            "deviceid": "domx_ot_a8:03:2a:4c:24:1c",
            "flame": 0,
            "heat": 0,
            "time": "2022-12-06T11:05:05.468Z",
            "water": 0
        },
        {
            "blr_mod_lvl": 0,
            "deviceid": "domx_ot_a8:03:2a:4c:24:1c",
            "flame": 0,
            "heat": 0,
            "time": "2022-12-06T11:05:07.653Z",
            "water": 0
        },
     

Thanks to the Matey tool, we create rules to convert the JSON file into RDF one. Do not forget to put the rml source as "\_\_SOURCE___"  in the mapping file exported. 

In [13]:
mapping_timeseries = pathlib.Path("data\\Demo_DomX\\output_BIGG_Filtered_timeseries_data.rml").read_text()
print(mapping_timeseries)

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix map: <http://mapping.example.com/> .
@prefix bigg: <http://bigg-project.eu/> .
@prefix i: <http://bigg-project.eu/instances/> .
@prefix grel: <http://users.ugent.be/~bjdmeest/function/grel.ttl#> .

map:map_device2_000 rml:logicalSource map:source_001 ;
	rdf:type rr:TriplesMap ;
	rdfs:label "device2" ;
	rr:predicateObjectMap map:pom_007, map:pom_008 ;
	rr:subjectMap map:s_003 .

map:map_device_000 rml:logicalSource map:source_000 ;
	rdf:type rr:TriplesMap ;
	rdfs:label "device" ;
	rr:predicateObjectMap map:pom_000, map:pom_001 ;
	rr:subjectMap map:s_000 .

map:map_measurement2_000 rml:logicalSource map:source_001 ;
	rdf:type rr:TriplesMap ;
	rdfs:label "measurement2" ;
	rr:predicateObjectMap map:pom_011, map:pom_012, map:pom_013

Then, we can use the harmonizer module to use the mapping rules on the json file to obtain an RDF file aligned with the BIGG ontology.

In [14]:
!python harmonizer.py --input data/Demo_DomX/BIGG_Filtered_timeseries_data.json --mapping data/Demo_DomX/output_BIGG_Filtered_timeseries_data.rml --output data/Demo_DomX/BIGG_Filtered_timeseries_data.ttl

Activation of the conversion
Harmonizer without queries


In [15]:
ttl_timeseries = pathlib.Path("data\\Demo_DomX\\BIGG_Filtered_timeseries_data.ttl").read_text()
print(ttl_timeseries)

@prefix ns1: <http://bigg-project.eu/> .

<http://bigg-project.eu/instances/device_domx_ot_a8%3A03%3A2a%3A4c%3A24%3A1c> a ns1:Device ;
    ns1:hasSensor <http://bigg-project.eu/instances/sensor_boiler>,
        <http://bigg-project.eu/instances/sensor_domx> .

<http://bigg-project.eu/instances/measurement_2022-12-06T11%3A05%3A02.085Z> a ns1:Measurement ;
    ns1:start "2022-12-06T11:05:02.085Z" ;
    ns1:value "16.75" .

<http://bigg-project.eu/instances/measurement_2022-12-06T11%3A05%3A02.091Z> a ns1:Measurement ;
    ns1:start "2022-12-06T11:05:02.091Z" ;
    ns1:value "16.75" .

<http://bigg-project.eu/instances/measurement_2022-12-06T11%3A05%3A03.280Z> a ns1:Measurement ;
    ns1:start "2022-12-06T11:05:03.280Z" .

<http://bigg-project.eu/instances/measurement_2022-12-06T11%3A05%3A05.468Z> a ns1:Measurement ;
    ns1:start "2022-12-06T11:05:05.468Z" .

<http://bigg-project.eu/instances/measurement_2022-12-06T11%3A05%3A07.653Z> a ns1:Measurement ;
    ns1:start "2022-12-06T11:05:07.

### 6.2. Simple conversion Home Data

The aim of this Demo is to convert the Json file representing the static home data into an RDF one aligned with the BIGG ontology.

In [16]:
homeData_file = json.loads(pathlib.Path("data\\Demo_DomX\\BIGG_Static_Home_Data_data.json").read_text())
print(json.dumps(homeData_file, indent=4, sort_keys=True))

{
    "boiler_kw": 24,
    "boiler_make": "BAXI",
    "boiler_model": "DUOTEC COMPACT",
    "climatic_zone": "GRC_A",
    "deviceid": "domx_ot_a8:03:2a:4c:24:1c",
    "homeid": 5,
    "latitude": 40.6039822,
    "longitude": 22.9502155,
    "occupants": 2,
    "sqm": 107
}


Thanks to the Matey tool, we create rules to convert the JSON file into RDF one. Do not forget to replace the path of the json data into the RML mapping file exported. 

In [17]:
mapping_homeData = pathlib.Path("data\\Demo_DomX\\output_BIGG_Static_Home_Data_data.rml").read_text()
print(mapping_homeData)

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix map: <http://mapping.example.com/> .
@prefix bigg: <http://bigg-project.eu/> .
@prefix i: <http://bigg-project.eu/instances/> .
@prefix grel: <http://users.ugent.be/~bjdmeest/function/grel.ttl#> .

map:map_buildingSpace_000 rml:logicalSource map:source_000 ;
	rdf:type rr:TriplesMap ;
	rdfs:label "buildingSpace" ;
	rr:predicateObjectMap map:pom_011, map:pom_012, map:pom_013, map:pom_014 ;
	rr:subjectMap map:s_003 .

map:map_building_000 rml:logicalSource map:source_000 ;
	rdf:type rr:TriplesMap ;
	rdfs:label "building" ;
	rr:predicateObjectMap map:pom_000, map:pom_001, map:pom_002, map:pom_003, map:pom_004 ;
	rr:subjectMap map:s_000 .

map:map_cadastralInfo_000 rml:logicalSource map:source_000 ;
	rdf:type rr:TriplesMap ;
	rdfs:l

Then, we can use the harmonizer module to use the mapping rules on the json file to obtain an RDF file aligned with the BIGG ontology.

In [18]:
!python harmonizer.py --input data/Demo_DomX/BIGG_Static_Home_Data_data.json --mapping data/Demo_DomX/output_BIGG_Static_Home_Data_data.rml --output data/Demo_DomX/BIGG_Static_Home_Data_data.ttl

Activation of the conversion
Harmonizer without queries


In [19]:
ttl_homeData = pathlib.Path("data\\Demo_DomX\\BIGG_Static_Home_Data_data.ttl").read_text()
print(ttl_homeData)

@prefix ns1: <http://bigg-project.eu/> .

<http://bigg-project.eu/instances/building_5> a ns1:Building ;
    ns1:buildingName "building_5" ;
    ns1:hasCadastralInfo <http://bigg-project.eu/instances/cadastralInfo_building_5> ;
    ns1:hasLocationInfo <http://bigg-project.eu/instances/locationInfo_building_5> ;
    ns1:hasSpace <http://bigg-project.eu/instances/buildingSpace_5> .

<http://bigg-project.eu/instances/buildingSpace_5> a ns1:BuildingSpace ;
    ns1:buildingSpaceName "building_5" ;
    ns1:hasOccupencyProfile <http://bigg-project.eu/instances/occupancyProfile_building_5> ;
    ns1:isAssociatedWithElement <http://bigg-project.eu/instances/device_domx_ot_a8%3A03%3A2a%3A4c%3A24%3A1c> .

<http://bigg-project.eu/instances/cadastralInfo_building_5> a ns1:CadastralInfo ;
    ns1:landArea "107" .

<http://bigg-project.eu/instances/deviceType_domx_ot_a8%3A03%3A2a%3A4c%3A24%3A1c> a ns1:DeviceType .

<http://bigg-project.eu/instances/device_domx_ot_a8%3A03%3A2a%3A4c%3A24%3A1c> a ns1:De