# Harmonizer Tool : module to convert Json files into RDF files  

**CSTB** : Nicolas BUS, Guillaume PICINBONO, Nicolas PASTORELLY, Audrey BOUET  

## Table of Contents

1. [Aim](#1.-Aim)  

2. [Useful Tools](#2.-Useful-Tools)  

3. [Arguments of the module](#3.-Arguments-of-the-module)  
    3.1. [Input File](#3.1.-Input-File)    
    3.2. [RML File](#3.2.-RML-File)  
    3.3 [Sparql Files](#3.3.-Sparql-Files)  
    3.4 [Output File](#3.4.-Output-File)      

4. [Module Operations](#4.-Module-Operations)  
   
5. [Examples of use](#5.-Examples-of-use)  
    5.1. [Simple conversion](#5.1.-Simple-conversion)  
    5.2. [Sparql Estage](#5.2.-Sparql-Estage)  
    5.3. [Conversion and multiples Request](#5.3.-Conversion-and-multiples-Request)   
 


## 1. Aim  
  
The aim of this module is to convert files containing data in a natural language into a file which can be read by a machine to create an ontologie.
The principal idea is to transform a **JSON** file into a **RDF/Turtle** file by using a mapping file (**RML**) previously created.
Then, we added a module to modify, insert data into a created **RDF** file thantks to some **SPARQL querries** on the previous graph.  

![scheme](documentation/scheme_harmonizer.png)

## 2. Useful Tools  

**Json Parser** [(link)](http://json.parser.online.fr/) : To check the format of the Json file  
**Matey** [(link)](https://rml.io/yarrrml/matey/) : To create the mapping file  
**Python** (v3.10) : To execute the harmonizer tool  
**Java lib** (rml.jar) : To use the mapping file in the Python module to convert json to RDF   
  

## 3. Arguments of the module  

There are 2 modules inside the harmonizer tool :  
* The conversion  
* The Sparql Estage  

To activate the previous modules, you will have to specify them in the call of the tool. 
The basic command line to execute the Python tool is the following :  

&nbsp; **python** &nbsp; $\color{blue}{harmonizer.py}$ &nbsp; $\color{red}{--input}$ &nbsp; $\color{red}{inputFile}$ &nbsp; $\color{orange}{[--mapping}$ &nbsp; $\color{orange}{RMLFile]}$ &nbsp; $\color{green}{[--sparql}$ &nbsp; $\color{green}{SparqlFiles]}$ &nbsp; $\color{purple}{[--output}$ &nbsp; $\color{purple}{outputFilename]}$    



### 3.1. Input File  

In the command line, only the $\color{red}{inputFile}$ is mandatory.  
You can insert a Json file or a Ttl file of they exists.  
- If a **Json** File is specified, you must also specify a mapping file to do the conversion.  
- If a **Ttl** File is specified, the conversion will be skipped.  

### 3.2. RML File  

In the command line, the $\color{orange}{mapping File}$ is optionnal.  
If the **rml** file is specified, then it has to exist and will allow to activate the conversion if a **Json** File is in input.  


### 3.3. Sparql Files

In the command line, the $\color{green}{[--sparql}$ is optionnal.  
You can insert many sparql files, each file will be read and executed one by one.  
If all $\color{green}{Sparql Files}$ exist, then the request module is activated.  


### 3.4. Output File  

In the command line, the $\color{purple}{outputFilename}$ is optionnal.  
If it is specified, the last file will have this name, else, the filename is **"output.ttl"** by default. 
There are 2 accepted format for the output : **ttl** or **jsonld**.  



## 4. Module Operations  

Here is the scheme of the operations of the Python module :  
![python_scheme](documentation/Python_module.png)

## 5. Examples of use

The following example will use the data from CIMNE, there are a **json** file, a **rml** file and some **sparql** queries.  
You will find them in the **data/DemoCIMNE-v1** folder : 

In [19]:
import os
import json
import pathlib

In [20]:
os.listdir(os.path.join(os.getcwd(),'data\\DemoCIMNE-v1'))

['construct_alignement.txt',
 'construct_source.txt',
 'construct_source2.txt',
 'data.json',
 'mapping.rml']

### The Json File

In [21]:
parsed = json.loads(pathlib.Path("data\\DemoCIMNE-v1\\data.json").read_text())
print(json.dumps(parsed, indent=4, sort_keys=True))

{
    "Buildings": [
        {
            "Classificacio_sol": "na",
            "Clau_qualificacio_urbanistica": "nan",
            "Codi_postal": "",
            "Espai": "Casa del Mar d'El Port de la Selva",
            "Municipi": "El Port de la Selva",
            "Num_Ens_Inventari": "03573",
            "Num_via": "11",
            "Provincia": "Girona",
            "Qualificacio_urbanistica": "nan",
            "Ref_Cadastral": "6875508EG1867N0001OO",
            "Sup_const_sobre_rasant": "213.61",
            "Sup_const_sota rasant": "0.0",
            "Sup_const_total": "213.61",
            "Sup_terreny": "0.0",
            "Via": "de l'Illa"
        }
    ]
}


### 5.1. Simple conversion  

From a Json file and its mapping file, convert it into a **ttl** file :  


In [22]:
!python harmonizer.py --input data/DemoCIMNE-v1/data.json --mapping data/DemoCIMNE-v1/mapping.rml --output data/DemoCIMNE-v1/conversion_1.ttl

Activation of the conversion
Harmonizer without queries


From a Json file and its mapping file, convert it into a **jsonld** file :  

In [23]:
!python harmonizer.py --input data/DemoCIMNE-v1/data.json --mapping data/DemoCIMNE-v1/mapping.rml --output data/DemoCIMNE-v1/conversion_1.jsonld

Activation of the conversion
Harmonizer without queries


### The ttl output file

In [24]:
ttl = pathlib.Path("data\\DemoCIMNE-v1\\conversion_1.ttl").read_text()
print(ttl)

@prefix ns1: <http://bigg-project.eu/> .

<http://bigg-project.eu/instances/building_03573> a ns1:Building ;
    ns1:buildingName "http://bigg-project.eu/instances/Casa del Mar d'El Port de la Selva" ;
    ns1:hasCadastralInfo <http://bigg-project.eu/instances/cadastralInfo_6875508EG1867N0001OO> ;
    ns1:hasLocationInfo <http://bigg-project.eu/instances/locationInfo_03573> ;
    ns1:hasSpace <http://bigg-project.eu/instances/buildingSpace_03573> .




### 5.2. Sparql Estage

In [25]:
sparql = pathlib.Path("data\\DemoCIMNE-v1\\construct_source.txt").read_text()
print(sparql)

prefix bigg: <http://bigg-project.eu/ontology#>
prefix dc: <http://purl.org/dc/terms/> 

CONSTRUCT {
?uri  a bigg:hamonizedData ;
      dc:created ?date ;
      dc:source "Hamonizer v1.0" .
  ?s ?p ?o .
} WHERE {
 {
  SELECT ?date ?uri 
  WHERE {
   BIND(NOW() AS ?date)
   BIND(IRI(UUID()) AS ?uri)
   }
  }
 ?s ?p ?o .
}


In [26]:
!python harmonizer.py --input data/DemoCIMNE-v1/conversion_1.ttl --sparql data/DemoCIMNE-v1/construct_source.txt --output data/DemoCIMNE-v1/sparql_1.ttl

Activation of the request


### The output ttl file with the request

In [27]:
ttl_sparql = pathlib.Path("data\\DemoCIMNE-v1\\sparql_1.ttl").read_text()
print(ttl_sparql)

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ns1: <http://bigg-project.eu/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://bigg-project.eu/instances/building_03573> a ns1:Building ;
    ns1:buildingName "http://bigg-project.eu/instances/Casa del Mar d'El Port de la Selva" ;
    ns1:hasCadastralInfo <http://bigg-project.eu/instances/cadastralInfo_6875508EG1867N0001OO> ;
    ns1:hasLocationInfo <http://bigg-project.eu/instances/locationInfo_03573> ;
    ns1:hasSpace <http://bigg-project.eu/instances/buildingSpace_03573> .

<urn:uuid:ee6c67cc-1929-4d94-8845-9d16dfca9590> a <http://bigg-project.eu/ontology#hamonizedData> ;
    dcterms:created "2023-01-17T15:34:09.209629+00:00"^^xsd:dateTime ;
    dcterms:source "Hamonizer v1.0" .




### 5.3. Conversion and multiples Request

In [29]:
!python harmonizer.py --input data/DemoCIMNE-v1/data.json --mapping data/DemoCIMNE-v1/mapping.rml --sparql data/DemoCIMNE-v1/construct_source.txt data/DemoCIMNE-v1/construct_source2.txt --output data/DemoCIMNE-v1/sparql_2.ttl

Activation of the conversion
Activation of the request
Activation of the request


In [30]:
ttl_sparql2 = pathlib.Path("data\\DemoCIMNE-v1\\sparql_2.ttl").read_text()
print(ttl_sparql2)

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ns1: <http://bigg-project.eu/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://bigg-project.eu/instances/building_03573> a ns1:Building ;
    ns1:buildingName "http://bigg-project.eu/instances/Casa del Mar d'El Port de la Selva" ;
    ns1:hasCadastralInfo <http://bigg-project.eu/instances/cadastralInfo_6875508EG1867N0001OO> ;
    ns1:hasLocationInfo <http://bigg-project.eu/instances/locationInfo_03573> ;
    ns1:hasSpace <http://bigg-project.eu/instances/buildingSpace_03573> .

<urn:uuid:5ca89a10-aed9-4690-9538-f5ddc14bc9bb> a <http://bigg-project.eu/ontology#hamonizedData> ;
    dcterms:created "2023-01-17T15:36:29.725907+00:00"^^xsd:dateTime ;
    dcterms:source "Hamonizer v1.0" .

<urn:uuid:c9b31d43-afa4-4cd4-8fd6-9ec21063adb8> a <http://bigg-project.eu/ontology#hamonizedData> ;
    dcterms:created "2023-01-17T15:36:29.730898+00:00"^^xsd:dateTime ;
    dcterms:source "Hamonizer v1.1" .


