# Variables

## Import libraries

In [None]:
import os
import sys
from rdflib import URIRef

### File and Folder Definitions

This section defines various files and folders used in the process. These are divided into two categories:

1. **Existing files**: These are pre-existing files that are part of the project and are used as input or reference files during the process.
   - `ont_file_name`: The ontology file in Turtle format (`ontology.ttl`).
   - `ruleset_file_name`: The ruleset file in the PIE format (`rules.pie`).

2. **Created files during the process**: These are files generated during the execution of the process, and they are stored in the `tmp_folder`.
   - `local_config_file_name`: Configuration file for the repository in Turtle format (`config_repo.ttl`).
   - `pref_hidden_labels_ttl_file_name`: File containing implicit pref and hidden label triples for landmarks and name attribute version in Turtle format (`pref_hidden_labels.ttl`).

3. **Existing folders**: These are folders that already exist and store files used in the project.
   - `data_folder_name`: The folder containing the data files (`../data`).

4. **Created folder during the process**: This is the folder created during the process to store temporary files.
   - `tmp_folder_name`: Folder for temporary files during the process (`../tmp_files`).

### GraphDB Repository Name
- `repository_name`: The name of the repository in GraphDB where the data is stored (`addresses_from_factoids`).

### Named Graphs Definitions
These are the names of the named graphs in the GraphDB repository.
- `ontology_named_graph_name`: The named graph for the ontology (`ontology`).
- `facts_named_graph_name`: The named graph for the facts data (`facts`).
- `factoids_named_graph_name`: The named graph for the factoids data (`factoids`).
- `permanent_named_graph_name`: The named graph for permanent data (`permanent`).
- `tmp_named_graph_name`: The named graph for temporary data (`temporary`).
- `meta_named_graph_name` : The named graph for metadata data for named graphs (`metadata`).
- `inter_sources_name_graph_name`: The named graph for inter-sources data (`inter_sources`).

### URIs to Access GraphDB
- `str_graphdb_url`: The URL to access the local GraphDB instance (`http://localhost:7200`).

### Code Folder Path
- `py_code_folder_path`: The folder containing the Python code (`./code`).

These variables are used throughout the process to refer to different files, folders, and named graphs in the GraphDB repository. They allow for a modular and flexible approach to handling the data and configuring the process steps.


In [None]:
# Existing files
ont_file_name = "ontology.ttl"
ruleset_file_name = "rules.pie"

# Created files during process (in `tmp_folder`)
local_config_file_name = "config_repo.ttl"
pref_hidden_labels_ttl_file_name = "pref_hidden_labels.ttl"
comp_tmp_file_name = "comparisons.ttl"

# Existing folders
data_folder_name = "../data/fbg_saint_antoine_light"
# data_folder_name = "../data/fbg_saint_antoine"

# Created folder during process
tmp_folder_name = "../tmp_files"

# GraphDB repository name
repository_name = "faubourg_saint_antoine_addresses_light"
# repository_name = "faubourg_saint_antoine_addresses"

# Definition of names of named graphes 
ontology_named_graph_name = "ontology"
facts_named_graph_name = "facts"
inter_sources_named_graph_name = "inter_sources"
comp_named_graph_name = "comparisons"
labels_named_graph_name = "labels"
tmp_named_graph_name = "temporary"
meta_named_graph_name = "metadata"

# Labels for named graphes
facts_named_graph_label = "Faits construits à partir des sources"

# Language for labels
lang = "fr"

# URIs to access to GraphDB
str_graphdb_url = "http://localhost:7200"

py_code_folder_path = "../scripts/"

# Settings for geometric comparisons
comparison_settings = {
    "geom_similarity_coef": 0.85,
    "geom_buffer_radius": 10,
    "geom_crs_uri": URIRef("http://www.opengis.net/def/crs/EPSG/0/2154"),
}

## Processing Global Variables

In this section, we define and process various global variables related to file paths and configurations used throughout the process. This includes:

- **Obtaining absolute file paths**: Converts relative file paths, as defined in the previous section, into absolute paths. This ensures the program can correctly locate the files regardless of the current working directory.
    - `tmp_folder`: The absolute path for the temporary folder used to store intermediate files (`tmp_folder_name`).
    - `data_folder`: The absolute path for the folder containing the data files (`data_folder_name`).
    - `python_code_folder`: The absolute path for the folder containing the Python code (`py_code_folder_path`).
    - `local_config_file`: The absolute path for the local configuration file (`local_config_file_name`), located in the temporary folder.
    - `ont_file`: The absolute path for the ontology file (`ont_file_name`).
    - `ruleset_file`: The absolute path for the ruleset file (`ruleset_file_name`).
    - `pref_hidden_labels_ttl_file`: The absolute path for the file containing implicit facts data (`pref_hidden_labels_ttl_file_name`), located in the temporary folder.

- **Creating a temporary folder**: If the folder specified by `tmp_folder_name` does not already exist, the program will create it to store files that are intended to be deleted after processing.

- **Creating an RDFLib object for `graphdb_url`**: This step converts the string representing the GraphDB URL (`str_graphdb_url`) into an RDFLib `URIRef` object. This object can be used in RDF queries and updates to interact with the GraphDB instance.
    - `graphdb_url`: An RDFLib `URIRef` representing the GraphDB URL (`str_graphdb_url`).

These steps help to set up the working environment by ensuring that the necessary paths and configurations are ready for the process to begin.


In [None]:
tmp_folder = os.path.abspath(tmp_folder_name)
data_folder = os.path.abspath(data_folder_name)

python_code_folder = os.path.abspath(py_code_folder_path)

local_config_file = os.path.join(tmp_folder, local_config_file_name)
ont_file = os.path.abspath(ont_file_name)
ruleset_file = os.path.abspath(ruleset_file_name)
pref_hidden_labels_ttl_file = os.path.join(tmp_folder, pref_hidden_labels_ttl_file_name)
comp_tmp_file = os.path.join(tmp_folder, comp_tmp_file_name)

graphdb_url = URIRef(str_graphdb_url)

## Importing Python Modules

This section imports various Python modules used throughout the project. The modules are organized under the `scripts` folder and are structured into utilities and graph construction modules.

### Utilities

- **file_management**: Provides functions for managing files, including reading, writing, and manipulating file paths.  
- **time_processing**: Handles time-based data, such as parsing timestamps, date normalization, and temporal computations.

### Graph Construction Modules

- **graphdb**: Handles interactions with the GraphDB database, including querying, updating, and managing named graphs.  
- **attribute_version_comparisons**: Facilitates the comparison of different versions of attributes in RDF data.  
- **multi_sources_processing**: Processes data from multiple sources, likely for aggregation, reconciliation, or cleaning.  
- **factoids_creation**: Contains functions to generate factoids, which are small structured pieces of information extracted from sources.  
- **resource_rooting**: Manages the rooting or linking of resources, possibly resolving references and establishing connections between entities.  
- **evolution_construction**: Deals with constructing or evolving entities and data models over time, including temporal versioning.  
- **fact_graph_construction**: Constructs the final knowledge graph by combining all processed data, factoids, and relationships.

### Notes

- The `scripts` folder is added to the Python path dynamically to ensure that modules can be imported both in notebooks and standalone scripts.  
- Imports use **absolute paths from the `scripts` package**, which guarantees consistency and portability across the project.

```python
from scripts.utils import file_management as fm
from scripts.utils import time_processing as tp
from scripts.graph_construction import graphdb as gd
from scripts.graph_construction import attribute_version_comparisons as avc
from scripts.graph_construction import multi_sources_processing as msp
from scripts.graph_construction import factoids_creation as fc
from scripts.graph_construction import resource_rooting as rr
from scripts.graph_construction import evolution_construction as ec
from scripts.graph_construction import fact_graph_construction as fgc


In [None]:
# ------------------------------------------------------------
# Set up the project root and add it to sys.path
# ------------------------------------------------------------

# `python_code_folder` should point to the 'scripts' folder
scripts_folder = os.path.abspath(python_code_folder)

# Get the parent folder of 'scripts' (the project root)
project_root = os.path.dirname(scripts_folder)

# Add the project root to sys.path if it's not already there
if project_root not in sys.path:
    sys.path.insert(0, project_root)

# ------------------------------------------------------------
# Import modules from the scripts package
# ------------------------------------------------------------

# Utilities
from scripts.utils import file_management as fm
from scripts.utils import time_processing as tp

# Graph construction modules
from scripts.graph_construction import graphdb as gd
from scripts.graph_construction import attribute_version_comparisons as avc
from scripts.graph_construction import multi_sources_processing as msp
from scripts.graph_construction import factoids_creation as fc
from scripts.graph_construction import resource_rooting as rr
from scripts.graph_construction import evolution_construction as ec
from scripts.graph_construction import fact_graph_construction as fgc

# Evaluation modules
from scripts.evaluation import create_streetnumber_factoids as csf
from scripts.evaluation import create_addr_links as cal
from scripts.evaluation import evaluate_streetnumber_versions as esv
from scripts.evaluation import evaluate_streetnumber_fragmentary as esf

## Creation of folders if they don't exist

In [None]:
fm.create_folder_if_not_exists(tmp_folder)

### Creating the local directory in GraphDB
For the creation to work, GraphDB must be launched and therefore the URI given by `graphdb_url` must work. If the directory already exists, nothing is done.

### Options

- **`allow_removal`**: If set to `False`, the repository will not be removed during the reinitialization process. Instead, the repository will simply be emptied. This is useful in case the deletion of the repository fails, ensuring the directory is cleared without being deleted and recreated. 
- **`disable_same_as`**: When set to `True`, this option disables the use of `sameAs` in the reasoning process.

### Creation Process

The function `gd.reinitialize_repository` is used to reinitialize the repository. When called, it ensures the repository is set up according to the provided configurations (e.g., `local_config_file`, `ruleset_name`). If the repository already exists, it is reinitialized without removing it (if `allow_removal` is set to `False`).

- **`graphdb_url`**: The URL pointing to the running GraphDB instance.
- **`repository_name`**: The name of the repository to be reinitialized.
- **`local_config_file`**: The configuration file used to initialize the repository.
- **`ruleset_name`**: The name of the ruleset to be used for reasoning, such as `"owl2-rl-optimized"` (no inference by default).
- **`allow_removal`**: Controls whether the repository can be deleted and recreated (`False` will just empty it).

For the creation of the repository to work, GraphDB must be running, and the provided `graphdb_url` must be valid. If the repository already exists, it will be reinitialized without further action.


In [None]:
# Deleting a directory may not work, so to avoid deletion at reset time (deletion + (re)creation)
# `allow_removal` must be False, in which case the directory will just be emptied.
allow_removal = False
disable_same_as = False

gd.reinitialize_repository(graphdb_url, repository_name, local_config_file, disable_same_as=disable_same_as, allow_removal=allow_removal) # No inference
# gd.reinitialize_repository(graphdb_url, repository_name, local_config_file, ruleset_name="rdfsplus-optimized", disable_same_as=disable_same_as, allow_removal=allow_removal)
# gd.reinitialize_repository(graphdb_url, repository_name, local_config_file, ruleset_name="owl2-rl-optimized", disable_same_as=disable_same_as, allow_removal=allow_removal)
# gd.reinitialize_repository(graphdb_url, repository_name, local_config_file, ruleset_file=ruleset_file, disable_same_as=disable_same_as, allow_removal=allow_removal)

## Loading ontologies

In [None]:
# Loading ontology in the repository
gd.load_ontologies(graphdb_url, repository_name, [ont_file], ontology_named_graph_name)
gd.add_named_graph_prefix_to_repository(graphdb_url, repository_name, "graph") # Add prefix for named graphs

## Definition of variables linked to sources

### Paris thoroughfares via Wikidata

* `wd` for "wikidata"
* `wdp_land` for "wikidata paris landmarks"
* `wdp_loc` for "wikidata paris locations"

In [None]:
# Name of the directory where the factoid triples of Wikidata data are stored and constructed
wdp_named_graph_name = "wikidata"

# CSV file to store the result of the selection query
wdp_land_csv_file_name = "wd_paris_landmarks.csv"
wdp_land_csv_file = os.path.join(data_folder, wdp_land_csv_file_name)

# CSV file to store the result of the selection query
wdp_loc_csv_file_name = "wd_paris_locations.csv"
wdp_loc_csv_file = os.path.join(data_folder, wdp_loc_csv_file_name)

# TTL file for structuring knowledge of the Paris thoroughfares
wdp_kg_file_name = "wd_paris.ttl"
wdp_kg_file = os.path.join(tmp_folder, wdp_kg_file_name)

# Time interval of validity of the source (there is not end time)
wdp_valid_time = {
    "start" : {"stamp":"2024-08-26T00:00:00Z","precision":"day","calendar":"gregorian"}
    }

wdp_source = {
    "label": "Wikidata",
    "lang": "mul"
}

### Nomenclature of Paris thoroughfares (Ville de Paris data)

The City of Paris data is made up of two sets:
* [names of current street rights-of-way](https://opendata.paris.fr/explore/dataset/denominations-emprises-voies-actuelles)
* [obsolete street names](https://opendata.paris.fr/explore/dataset/denominations-des-voies-caduques)

Current roads have a geometric right of way, unlike the old thoroughfares.

* `vpt` for ‘ville paris thoroughfares’
* `vpta` for ‘ville paris thoroughfares actuelles’.
* `vptc` for ‘ville paris thoroughfares caduques’.

In [None]:
# Name of the directory where the factoid triples of Ville de Paris data are stored and constructed
vpt_named_graph_name = "ville_de_paris"

# CSV files containting data
vpta_csv_file_name = "denominations-emprises-voies-actuelles.csv"
vpta_csv_file = os.path.join(data_folder, vpta_csv_file_name)
vptc_csv_file_name = "denominations-des-voies-caduques.csv"
vptc_csv_file = os.path.join(data_folder, vptc_csv_file_name)

# TTL file for structuring knowledge of the Paris thoroughfares
vpt_kg_file_name = "voies_paris.ttl"
vpt_kg_file = os.path.join(tmp_folder, vpt_kg_file_name)

# Time interval of validity of the source (there is not end time)
vpta_valid_time = {
    "start" : {"stamp":"2025-04-01T00:00:00Z","precision":"day","calendar":"gregorian"},
    "end" : {"stamp":tp.get_current_timestamp(),"precision":"day","calendar":"gregorian"}
    }

# Description of the source for current thoroughfares of Ville de Paris
vpta_source = {
    "uri": "https://opendata.paris.fr/explore/dataset/denominations-emprises-voies-actuelles/",
    "label" : "Dénominations des emprises des voies actuelles",
    "lang":"fr",
    "publisher" : {
        "label": "Département de la Topographie et de la Documentation Foncière de la Ville de Paris"
    }
}

# Description of the source for caducous thoroughfares of Ville de Paris
vptc_source = {
    "uri": "https://opendata.paris.fr/explore/dataset/denominations-des-voies-caduques/",
    "label" : "Dénominations caduques des voies",
    "lang":"fr",
    "publisher" : {
        "label": "Département de la Topographie et de la Documentation Foncière de la Ville de Paris"
    }
}

### Base Adresse Nationale (BAN)

Data from the [Base Adresse Nationale (BAN)](https://adresse.data.gouv.fr/base-adresse-nationale) (National Address Base), available [here](https://adresse.data.gouv.fr/data/ban/adresses/latest/csv)

bpa` for ‘BAN paris addresses’

In [None]:
# Name of the directory where the factoid triples of BAN data are stored and constructed
bpa_named_graph_name = "ban_adresses"

# CSV file containting data
bpa_csv_file_name = "ban_adresses.csv"
bpa_csv_file = os.path.join(data_folder, bpa_csv_file_name)

# TTL file for structuring knowledge of Paris addresses
bpa_kg_file_name = "ban_adresses.ttl"
bpa_kg_file = os.path.join(tmp_folder, bpa_kg_file_name)

# Time interval of validity of the source (there is not end time)
bpa_valid_time = {
    "start" : {"stamp":"2024-01-01T00:00:00Z","precision":"day","calendar":"gregorian"},
    "end" : {"stamp":"2025-01-01T00:00:00Z","precision":"day","calendar":"gregorian"}
    }

bpa_source = {
    "label" : "Base Adresse Nationale",
    "lang":"fr",
    "publisher" : {
        "label": "DINUM / ANCT / IGN"
    }
}

### OpenStreetMap (OSM)

Extracting data from OpenStreetMap

In [None]:
# Name of the directory where the factoid triples of OSM data are stored and constructed
osm_named_graph_name = "osm"

# CSV files containting data
osm_csv_file_name = "osm_adresses.csv"
osm_csv_file = os.path.join(data_folder, osm_csv_file_name)
osm_hn_csv_file_name = "osm_hn_adresses.csv"
osm_hn_csv_file = os.path.join(data_folder, osm_hn_csv_file_name)

# TTL file for structuring knowledge of OSM addresses
osm_kg_file_name = "osm_adresses.ttl"
osm_kg_file = os.path.join(tmp_folder, osm_kg_file_name)

# Time interval of validity of the source (there is not end time)
osm_valid_time = {
    "start" : {"stamp":"2025-05-01T00:00:00Z","precision":"day","calendar":"gregorian"},
    "end" : {"stamp":tp.get_current_timestamp(),"precision":"day","calendar":"gregorian"}
    }

osm_source = {
    "label" : "OpenStreetMap",
    "lang":"mul"
}

### Integration of data from Geojson files describing thoroughfares

These datasets are derived from the vectorisation of several historical maps of Paris, including:
- the Plan Delagrive (1728);
- the Verniquet atlas (1784–1791);
- the Vasserot atlas (1810–1836);
- Andriveau’s plan (1849);
- the Municipal Atlas map (1888).


#### Plan Delagrive 1728

In [None]:
# Name of the directory where data factoid triples are stored and constructed
del_1728_th_named_graph_name = "plan_delagrive_1728_voies"

# Geojson file containting data
del_1728_th_geojson_file_name = "plan_delagrive_1728_voies.geojson"
del_1728_th_geojson_file = os.path.join(data_folder, del_1728_th_geojson_file_name)
del_1728_th_kg_file_name = "plan_delagrive_1728_voies.ttl"
del_1728_th_kg_file = os.path.join(tmp_folder, del_1728_th_kg_file_name)

del_1728_th_name_attribute = "nom"

# Description of the source within a dictionary
del_1728_th_source = {
    "lang" : "fr", 
    "uri" : "https://gallica.bnf.fr/ark:/12148/btv1b53085122h",
    "label" : "Neuvieme Plan de Paris. Ses accroissemens sous le Regne de Louis XV[...] ",
    "publisher" : {
        "label": "Delagrive"
        }
}

# Time interval of validity of the source
del_1728_th_valid_time = {
    "start" : {"stamp":"1827-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1829-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### Verniquet atlas

In [None]:
# Name of the directory where data factoid triples are stored and constructed
ve_1790_th_named_graph_name = "atlas_verniquet_1791_voies"

# Geojson file containting data
ve_1790_th_geojson_file_name = "atlas_verniquet_1791_voies.geojson"
ve_1790_th_geojson_file = os.path.join(data_folder, ve_1790_th_geojson_file_name)
ve_1790_th_kg_file_name = "atlas_verniquet_1791_voies.ttl"
ve_1790_th_kg_file = os.path.join(tmp_folder, ve_1790_th_kg_file_name)

ve_1790_th_name_attribute = "nom"

# Description of the source within a dictionary
ve_1790_th_source = {
    "uri": "https://gallica.bnf.fr/ark:/12148/bpt6k3167995",
    "lang" : "fr", 
    "label" : "Atlas Général de la Ville",
    "publisher" : {
        "label": "Verniquet"
        }
}

# Time interval of validity of the source
ve_1790_th_valid_time = {
    "start" : {"stamp":"1784-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1791-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### Vasserot atlas

In [None]:
# Name of the directory where data factoid triples are stored and constructed
va_1810_th_named_graph_name = "atlas_vasserot_1810_voies"

# Geojson file containting data
va_1810_th_geojson_file_name = "atlas_vasserot_1810_voies.geojson"
va_1810_th_geojson_file = os.path.join(data_folder, va_1810_th_geojson_file_name)
va_1810_th_kg_file_name = "atlas_vasserot_1810_voies.ttl"
va_1810_th_kg_file = os.path.join(tmp_folder, va_1810_th_kg_file_name)

va_1810_th_name_attribute = "nom_entier"

# Description of the source within a dictionary
va_1810_th_source = {
    "uri": "www.fabriquenumeriquedupasse.fr/explore/dataset/alpage-voies-vasserot",
    "lang" : "fr", 
    "label" : "Cadastre de Paris par îlot : 1810-1836",
    "publisher" : {
        "label": "Vasserot"
        }
}

# Time interval of validity of the source
va_1810_th_valid_time = {
    "start" : {"stamp":"1810-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1836-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### 1836 Jacoubet Atlas streets

In [None]:
# Name of the directory where data factoid triples are stored and constructed
ja_1836_th_named_graph_name = "atlas_jacoubet_1836_voies"

# Geojson file containting data
ja_1836_th_geojson_file_name = "atlas_jacoubet_1836_voies.geojson"
ja_1836_th_geojson_file = os.path.join(data_folder, ja_1836_th_geojson_file_name)
ja_1836_th_kg_file_name = "atlas_jacoubet_1836_voies.ttl"
ja_1836_th_kg_file = os.path.join(tmp_folder, ja_1836_th_kg_file_name)

ja_1836_th_name_attribute = "nom_entier"

# Description of the source within a dictionary
ja_1836_th_source = {
    "uri":"https://bibliotheques-specialisees.paris.fr/ark:/73873/pf0000212158",
    "lang" : "fr", 
    "label" : "Atlas général de la ville, des faubourgs et des monuments de Paris",
    "publisher" : {
        "label": "Jacoubet"
        }
}

# Time interval of validity of the source
ja_1836_th_valid_time = {
    "start" : {"stamp":"1827-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1838-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### Andriveau atlas

In [None]:
# Name of the directory where data factoid triples are stored and constructed
an_1849_th_named_graph_name = "plan_andriveau_1849_voies"

# Geojson file containting data
an_1849_th_geojson_file_name = "plan_andriveau_1849_voies.geojson"
an_1849_th_geojson_file = os.path.join(data_folder, an_1849_th_geojson_file_name)
an_1849_th_kg_file_name = "plan_andriveau_1849_voies.ttl"
an_1849_th_kg_file = os.path.join(tmp_folder, an_1849_th_kg_file_name)

an_1849_th_name_attribute = "nom_entier"

# Description of the source within a dictionary
an_1849_th_source = {
    "lang" : "fr", 
    "label" : "Plan de Paris comprenant l'enceinte des fortifications",
    "publisher" : {
        "label": "Andriveau"
        }
}

# Time interval of validity of the source
an_1849_th_valid_time = {
    "start" : {"stamp":"1848-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1850-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### 1888 Municipal Atlas of Paris

In [None]:
# Name of the directory where data factoid triples are stored and constructed
am_1888_th_named_graph_name = "atlas_municipal_1888_voies"

# Geojson file containting data
am_1888_th_geojson_file_name = "atlas_municipal_1888_voies.geojson"
am_1888_th_geojson_file = os.path.join(data_folder, am_1888_th_geojson_file_name)
am_1888_th_kg_file_name = "atlas_municipal_1888_voies.ttl"
am_1888_th_kg_file = os.path.join(tmp_folder, am_1888_th_kg_file_name)

am_1888_th_name_attribute = "nom_1888"

# Description of the source within a dictionary
am_1888_th_source = {
    "uri":"https://bibliotheques-specialisees.paris.fr/ark:/73873/pf0000935116",
    "lang" : "fr", 
    "label" : "Plan de l'atlas municipal de 1888",
    "publisher" : {
        "label": "Poubelle"
        }
}

# Time interval of validity of the source
am_1888_th_valid_time = {
    "start" : {"stamp":"1887-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1889-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

### Integration of data from Geojson files describing street numbers

The integration of data from GeoJSON files describing street numbers is based on the vectorisation of several historical sources:
- the General cadastre of Paris (1807);
- the Vasserot atlas;
- the Jacoubet atlas;
- the Municipal Atlas map (1888).


#### 1807 General cadastre of Paris

In [None]:
# Name of the directory where data factoid triples are stored and constructed
cad_1807_addr_named_graph_name = "cadastre_paris_1807_adresses"

# Geojson file containting data
cad_1807_addr_geojson_file_name = "cadastre_paris_1807_adresses.geojson"
cad_1807_addr_geojson_file = os.path.join(data_folder, cad_1807_addr_geojson_file_name)
cad_1807_addr_kg_file_name = "cadastre_paris_1807_adresses.ttl"
cad_1807_addr_kg_file = os.path.join(tmp_folder, cad_1807_addr_kg_file_name)

# Attribute names of the street number value and the street name
cad_1807_addr_sn_name_property = "NUMERO TXT"
cad_1807_addr_th_name_property = "NOM_SAISI"

# Description of the source within a dictionary
cad_1807_addr_source = {
    "lang" : "fr", 
    "label" : "Adresses du cadastre général de Paris de 1807",
    "publisher" : {
        "label": "Ville de Paris"
        }
}

# Time interval of validity of the source
cad_1807_addr_valid_time = {
    "start" : {"stamp":"1806-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1808-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### Vasserot atlas

In [None]:
# Name of the directory where data factoid triples are stored and constructed
va_1810_addr_named_graph_name = "atlas_vasserot_1810_adresses"

# Geojson file containting data
va_1810_addr_geojson_file_name = "atlas_vasserot_1810_adresses.geojson"
va_1810_addr_geojson_file = os.path.join(data_folder, va_1810_addr_geojson_file_name)
va_1810_addr_kg_file_name = "atlas_vasserot_1810_adresses.ttl"
va_1810_addr_kg_file = os.path.join(tmp_folder, va_1810_addr_kg_file_name)

# Attribute names of the street number value and the street name
va_1810_addr_sn_name_property = "num_voies"
va_1810_addr_th_name_property = "nom_entier"

# Description of the source within a dictionary
va_1810_addr_source = {
    "uri": "https://www.fabriquenumeriquedupasse.fr/explore/dataset/alpage-adresses-vasserot",
    "lang" : "fr", 
    "label" : "Cadastre de Paris par îlot : 1810-1836",
    "publisher" : {
        "label": "Vasserot"
        }
}

# Time interval of validity of the source
va_1810_addr_valid_time = {
    "start" : {"stamp":"1810-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1836-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### 1836 Jacoubet Atlas

In [None]:
# Name of the directory where data factoid triples are stored and constructed
ja_1836_addr_named_graph_name = "atlas_jacoubet_1836_adresses"

# Geojson file containting data
ja_1836_addr_geojson_file_name = "atlas_jacoubet_1836_adresses.geojson"
ja_1836_addr_geojson_file = os.path.join(data_folder, ja_1836_addr_geojson_file_name)
ja_1836_addr_kg_file_name = "atlas_jacoubet_1836_adresses.ttl"
ja_1836_addr_kg_file = os.path.join(tmp_folder, ja_1836_addr_kg_file_name)

# Attribute names of the street number value and the street name
ja_1836_addr_sn_name_property = "num_voies"
ja_1836_addr_th_name_property = "nom_entier"

# Description of the source within a dictionary
ja_1836_addr_source = {
    "lang" : "fr", 
    "label" : "Atlas de la ville de Paris de Jacoubet de 1836",
    "publisher" : {
        "label": "Ville de Paris"
        }
}

# Time interval of validity of the source
ja_1836_addr_valid_time = {
    "start" : {"stamp":"1836-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1838-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### 1888 Municipal Atlas of Paris

In [None]:
# Name of the directory where data factoid triples are stored and constructed
am_1888_addr_named_graph_name = "atlas_municipal_1888_adresses"

# Geojson file containting data
am_1888_addr_geojson_file_name = "atlas_municipal_1888_adresses.geojson"
am_1888_addr_geojson_file = os.path.join(data_folder, am_1888_addr_geojson_file_name)
am_1888_addr_kg_file_name = "atlas_municipal_1888_adresses.ttl"
am_1888_addr_kg_file = os.path.join(tmp_folder, am_1888_addr_kg_file_name)

# Attribute names of the street number value and the street name
am_1888_addr_sn_name_property = "numbers_va"
am_1888_addr_th_name_property = "normalised"

# Description of the source within a dictionary
am_1888_addr_source = {
    "lang" : "fr", 
    "label" : "Adresses du plan de l'atlas municipal de 1888",
    "publisher" : {
        "label": "Ville de Paris"
        }
}

# Time interval of validity of the source
am_1888_addr_valid_time = {
    "start" : {"stamp":"1887-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1889-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

### Events

TTL file describing events

In [None]:
# Name of the directory where the factoid triples of events data are stored and constructed
events_named_graph_name = "source_events"

# Event file containting data
events_json_file_name = "events.json"
events_json_file = os.path.join(data_folder, events_json_file_name)

# Final TTL file of factoids from events
events_kg_file_name = "events.ttl"
events_kg_file = os.path.join(tmp_folder, events_kg_file_name)

## Final and iterative process

### Create source graphs

For each source, factoids are created independently in separate named graphs

In [None]:
# Process for Ville de Paris
g = fc.create_graph_from_ville_paris(vpta_csv_file, vptc_csv_file, vpta_valid_time, vpta_source, vptc_source, "fr")
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, repository_name, vpt_named_graph_name, vpt_kg_file, "source", meta_named_graph_name)

# Process for BAN
g = fc.create_graph_from_paris_ban(bpa_csv_file, bpa_valid_time, bpa_source, "fr")
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, repository_name, bpa_named_graph_name, bpa_kg_file, "source", meta_named_graph_name)
# Process for Wikidata
# fc.get_data_from_wikidata(wdp_land_csv_file, wdp_loc_csv_file)
g = fc.create_graph_from_wikidata(wdp_land_csv_file, wdp_loc_csv_file, wdp_source, lang)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, repository_name, wdp_named_graph_name, wdp_kg_file, "source", meta_named_graph_name)

# Process for OpenStreetMap
g = fc.create_graph_from_osm(osm_csv_file, osm_hn_csv_file, osm_valid_time, osm_source, lang)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, repository_name, osm_named_graph_name, osm_kg_file, "source", meta_named_graph_name)
# Process for streets of the Delagrive map
g = fc.create_graph_from_geojson_states_of_thoroughfares(
    del_1728_th_geojson_file, lang, del_1728_th_valid_time, del_1728_th_source,
    del_1728_th_name_attribute, del_1728_th_name_attribute)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, repository_name, del_1728_th_named_graph_name, del_1728_th_kg_file, "source", meta_named_graph_name)

# Process for streets of the Verniquet Atlas
g = fc.create_graph_from_geojson_states_of_thoroughfares(
    ve_1790_th_geojson_file, lang, ve_1790_th_valid_time, ve_1790_th_source,
    ve_1790_th_name_attribute, ve_1790_th_name_attribute)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, repository_name, ve_1790_th_named_graph_name, ve_1790_th_kg_file, "source", meta_named_graph_name)

# Process for streets of the Vasserot Atlas
g = fc.create_graph_from_geojson_states_of_thoroughfares(
    va_1810_th_geojson_file, lang, va_1810_th_valid_time, va_1810_th_source,
    va_1810_th_name_attribute, va_1810_th_name_attribute)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, repository_name, va_1810_th_named_graph_name, va_1810_th_kg_file, "source", meta_named_graph_name)

# Process for streets of the 1836 Jacoubet Atlas
g = fc.create_graph_from_geojson_states_of_thoroughfares(
    ja_1836_th_geojson_file, lang, ja_1836_th_valid_time, ja_1836_th_source,
    ja_1836_th_name_attribute, ja_1836_th_name_attribute)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, repository_name, ja_1836_th_named_graph_name, ja_1836_th_kg_file, "source", meta_named_graph_name)

# Process for streets of the Andriveau map
g = fc.create_graph_from_geojson_states_of_thoroughfares(
    an_1849_th_geojson_file, lang, an_1849_th_valid_time, an_1849_th_source,
    an_1849_th_name_attribute, an_1849_th_name_attribute)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, repository_name, an_1849_th_named_graph_name, an_1849_th_kg_file, "source", meta_named_graph_name)

# Process for streets of the 1888 Municipal Atlas
g = fc.create_graph_from_geojson_states_of_thoroughfares(
    am_1888_th_geojson_file, lang, am_1888_th_valid_time, am_1888_th_source,
    am_1888_th_name_attribute, am_1888_th_name_attribute)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, repository_name, am_1888_th_named_graph_name, am_1888_th_kg_file, "source", meta_named_graph_name)

# Process for adresses of the General Cadastre of Paris
g = fc.create_graph_from_geojson_states_of_streetnumbers(
    cad_1807_addr_geojson_file, lang, cad_1807_addr_valid_time, cad_1807_addr_source,
    cad_1807_addr_sn_name_property, cad_1807_addr_th_name_property)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, repository_name, cad_1807_addr_named_graph_name, cad_1807_addr_kg_file, "source", meta_named_graph_name)

# Process for adresses of the Vasserot Atlas
g = fc.create_graph_from_geojson_states_of_streetnumbers(
    va_1810_addr_geojson_file, lang, va_1810_addr_valid_time, va_1810_addr_source,
    va_1810_addr_sn_name_property, va_1810_addr_th_name_property)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, repository_name, va_1810_addr_named_graph_name, va_1810_addr_kg_file, "source", meta_named_graph_name)

# Process for adresses of the 1836 Jacoubet Atlas
g = fc.create_graph_from_geojson_states_of_streetnumbers(
    ja_1836_addr_geojson_file, lang, ja_1836_addr_valid_time, ja_1836_addr_source,
    ja_1836_addr_sn_name_property, ja_1836_addr_th_name_property)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, repository_name, ja_1836_addr_named_graph_name, ja_1836_addr_kg_file, "source", meta_named_graph_name)

# Process for adresses of the 1888 Municipal Atlas plan
g = fc.create_graph_from_geojson_states_of_streetnumbers(
    am_1888_addr_geojson_file, lang, am_1888_addr_valid_time, am_1888_addr_source,
    am_1888_addr_sn_name_property, am_1888_addr_th_name_property)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, repository_name, am_1888_addr_named_graph_name, am_1888_addr_kg_file, "source", meta_named_graph_name)

# Process for events
g = fc.create_graph_from_events(events_json_file)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, repository_name, events_named_graph_name, events_kg_file)

### Creation of the fact graph from source ones

#### Remove facts and inter_source named graph before importing factoids

In [None]:
gd.remove_named_graph(graphdb_url, repository_name, facts_named_graph_name)
msp.remove_construction_named_graphs(graphdb_url, repository_name)

#### Add definition of facts and construction named graph in meta named graph

In [None]:
construction_named_graphs = [inter_sources_named_graph_name, comp_named_graph_name, labels_named_graph_name]

for graph in construction_named_graphs:
    msp.add_construction_named_graph_to_repository(graphdb_url, repository_name, meta_named_graph_name, graph)

#### Build facts graph as a final one

In [None]:
fgc.build_fact_graph_from_sources(
    graphdb_url, repository_name, facts_named_graph_name, facts_named_graph_label,
    meta_named_graph_name, inter_sources_named_graph_name,
    labels_named_graph_name, pref_hidden_labels_ttl_file,
    tmp_named_graph_name,
    comp_named_graph_name, comp_tmp_file, comparison_settings,
    lang=lang
)

#### Get fragmentary data
After having created facts graph, the following step tends to create synthetic fragmentary which will be inserted in the graph :
* states
* events

Then, two new fact graphs will be created : one by adding only states and an other one by adding states and events

In [None]:
fragmentary_data_folder_name = "fragmentary_data"
fragmentary_data_folder = os.path.join(tmp_folder, fragmentary_data_folder_name)
fm.create_folder_if_not_exists(fragmentary_data_folder)

geometry_settings = {
    "epsg_code": "EPSG:2154",
    "max_distance": 10
}

# Sampling ratios for evaluation of fragmentary descriptions
version_sample_ratio=0.6
change_sample_ratio=0.6

csf.create_streetnumber_fragmentary_descriptions(
    graphdb_url, repository_name, facts_named_graph_name,
    fragmentary_data_folder, fragmentary_data_folder, geometry_settings,
    version_sample_ratio, change_sample_ratio)

### Fragmentary streetnumbers

JSON file describing states of fragmentary street numbers

In [None]:
# Name of the directory where the factoid triples of fragmentary street numbers data are stored and constructed
fragmentary_sn_states_named_graph_name = "fragmentary_states_streetnumbers"
fragmentary_sn_states_named_graph_label = "Faits construits en ajoutant des états fragmentaires de numéros de rue"

# fragmentary street numbers file containting data
fragmentary_sn_states_json_file_name = "fragmentary_states_streetnumbers.json"
fragmentary_sn_states_json_file = os.path.join(fragmentary_data_folder, fragmentary_sn_states_json_file_name)

# Final TTL file of factoids from states of fragmentary street numbers
fragmentary_sn_states_kg_file_name = "fragmentary_states_streetnumbers.ttl"
fragmentary_sn_states_kg_file = os.path.join(tmp_folder, fragmentary_sn_states_kg_file_name)

JSON file describing events of fragmentary street numbers

In [None]:
# Name of the directory where the factoid triples of fragmentary street numbers data are stored and constructed
fragmentary_sn_events_named_graph_name = "fragmentary_events_streetnumbers"
fragmentary_sn_events_named_graph_name_label = "Faits construits en ajoutant des états et des événements fragmentaires de numéros de rue"

# fragmentary street numbers file containting data
fragmentary_sn_events_json_file_name = "fragmentary_events_streetnumbers.json"
fragmentary_sn_events_json_file = os.path.join(fragmentary_data_folder, fragmentary_sn_events_json_file_name)

# Final TTL file of factoids from states of fragmentary street numbers
fragmentary_sn_events_kg_file_name = "fragmentary_events_streetnumbers.ttl"
fragmentary_sn_events_kg_file = os.path.join(tmp_folder, fragmentary_sn_events_kg_file_name)

Add fragmentary data in the repository in two named graphs (one for states, the other for events)

In [None]:
# Process for fragmentary states streetnumbers
g = fc.create_graph_from_states(fragmentary_sn_states_json_file)
msp.transfert_rdflib_graph_to_named_graph_repository(
    g, graphdb_url, repository_name, fragmentary_sn_states_named_graph_name, fragmentary_sn_states_kg_file, "source", meta_named_graph_name)

# Process for fragmentary events streetnumbers (is_active=False to not activate events for the moment)
g = fc.create_graph_from_events(fragmentary_sn_events_json_file)
msp.transfert_rdflib_graph_to_named_graph_repository(
    g, graphdb_url, repository_name, fragmentary_sn_events_named_graph_name, fragmentary_sn_events_kg_file, "source", meta_named_graph_name, is_active=False)

Create a new facts graph by adding only states fragmentary data

In [None]:
frag_sn_states_facts_named_graph_name = "facts_with_fragmentary_sn_states"

fgc.build_fact_graph_from_sources(
    graphdb_url, repository_name,
    frag_sn_states_facts_named_graph_name, fragmentary_sn_states_named_graph_label,
    meta_named_graph_name, inter_sources_named_graph_name,
    labels_named_graph_name, pref_hidden_labels_ttl_file,
    tmp_named_graph_name,
    comp_named_graph_name, comp_tmp_file, comparison_settings,
    lang=lang
)

Create a new facts graph by adding states and events fragmentary data

In [None]:
frag_sn_states_events_facts_named_graph_name = "facts_with_fragmentary_sn_states_and_events"

# Activate fragmentary events streetnumbers named graph in the repository before building new facts graph
msp.set_named_graph_active(graphdb_url, repository_name, fragmentary_sn_events_named_graph_name, meta_named_graph_name, active=True)

fgc.build_fact_graph_from_sources(
    graphdb_url, repository_name,
    frag_sn_states_events_facts_named_graph_name, fragmentary_sn_events_named_graph_name_label,
    meta_named_graph_name, inter_sources_named_graph_name,
    labels_named_graph_name, pref_hidden_labels_ttl_file,
    tmp_named_graph_name,
    comp_named_graph_name, comp_tmp_file, comparison_settings,
    lang=lang
)

# Evaluation

## Variables

In [None]:
# Variables
links_folder_name = "links"
db_config_file = "../configs/db_config.ini"
proj_config_file = "../configs/project_config.ini"

# Create links folder if it does not exist
links_folder = os.path.join(tmp_folder, links_folder_name) 
fm.create_folder_if_not_exists(links_folder)

# Output file paths for ground truth and unmatched street numbers
links_ground_truth = os.path.join(links_folder, "links_ground_truth.csv")
sn_without_link_ground_truth = os.path.join(links_folder, "sn_without_link_ground_truth.csv")

# List of address source names to be processed
source_names = [
    "cadastre_paris_1807_adresses",
    "atlas_vasserot_1810_adresses",
    "atlas_jacoubet_1836_adresses",
    "atlas_municipal_1888_adresses",
    "ban_adresses",
    "osm_adresses"
]

# Settings for each historical address source (GeoJSON)
sources_settings = [
    {
        'source_name': 'cadastre_paris_1807_adresses',
        'file': os.path.join(data_folder, 'cadastre_paris_1807_adresses.geojson'),
        'number_prop': 'NUMERO TXT',
        'street_name_prop': 'NOM_SAISI',
        'epsg_code': 2154
    },
    {
        'source_name': 'atlas_vasserot_1810_adresses',
        'file': os.path.join(data_folder, 'atlas_vasserot_1810_adresses.geojson'),
        'number_prop': 'num_voies',
        'street_name_prop': 'nom_entier',
        'epsg_code': 4326
    },
    {
        'source_name': 'atlas_jacoubet_1836_adresses',
        'file': os.path.join(data_folder, 'atlas_jacoubet_1836_adresses.geojson'),
        'number_prop': 'num_voies',
        'street_name_prop': 'nom_entier',
        'epsg_code': 2154
    },
    {
        'source_name': 'atlas_municipal_1888_adresses',
        'file': os.path.join(data_folder, 'atlas_municipal_1888_adresses.geojson'),
        'number_prop': 'numbers_va',
        'street_name_prop': 'normalised',
        'epsg_code': 2154
    }
]

# Settings for BAN (Base Adresse Nationale) CSV source
ban_settings = {
    'source_name': 'ban_adresses',
    'file': os.path.join(data_folder, 'ban_adresses.csv'),
    'number_prop': 'numero',
    'repetition_prop': 'rep',
    'street_name_prop': 'nom_voie',
    'lat_prop': 'lat',
    'lon_prop': 'lon',
    'epsg_code': 4326
}

# Settings for OSM (OpenStreetMap) CSV sources
osm_settings = {
    'source_name': 'osm_adresses',
    'file': os.path.join(data_folder, 'osm_adresses.csv'),
    'hn_file': os.path.join(data_folder, 'osm_hn_adresses.csv'),
    'join_prop': 'houseNumberId',
    'number_prop': 'houseNumberLabel',
    'street_name_prop': 'streetName',
    'geom_prop': 'houseNumberGeomWKT',
    'epsg_code': 4326
}

## Create links

From the various address sources (GeoJSON and CSV), create links between observations of the same street number across two consecutive sources. For each link, indicate whether the geometries of the corresponding numbers are similar, using a maximum distance threshold defined in `proj_config_file`.

In [None]:
cal.create_links(
    db_config_file, proj_config_file,
    sources_settings, ban_settings, osm_settings, source_names, links_folder
)

### Version Quality Metrics Explanation

The evaluation produces two sets of metrics for the reconstructed street number versions:

1. **Number of Versions Metric**  
   - Measures whether the reconstructed street number (SN) has the **same number of versions** as the reference (ground-truth) data.  
   - For each SN:
     - `true` → the number of versions matches the reference.  
     - `false` → the number of versions does not match.  
   - `total` → total number of street numbers evaluated.  
   - `IoU` (Intersection over Union) → fraction of street numbers with the correct number of versions.

2. **Sources Metric**  
   - Measures whether the **sources associated with each version** match the reference, ignoring a specified fragmentary source label.  
   - For each SN:
     - `true` → all versions have matching sources.  
     - `false` → at least one version has mismatched sources.  
   - `total` → total number of street numbers evaluated.  
   - `IoU` → fraction of street numbers with correctly matching sources.

These metrics help assess how well the reconstructed versions replicate both the **temporal granularity** (number of versions) and the **source provenance** of the original data.


In [None]:
version_quality_metrics = esv.run_version_evaluation(graphdb_url, repository_name, facts_named_graph_name, data_folder, links_folder)

# Evaluation of the Impact of Fragmentary Knowledge Insertion

This section analyses the impact of inserting fragmentary states and events
into the knowledge graph on the reconstructed evolution of street numbers.
The evaluation focuses on two complementary dimensions:
(1) attribute version stability and
(2) temporal stability of detected changes.

---

## 1. Attribute Version Stability

### Objective

The first evaluation aims to verify whether the insertion of fragmentary
states and events alters the composition of geometry attribute versions.
More precisely, we check whether versions resulting from the merging of
the same set of sources remain unchanged after enrichment.

### Method

For each street number, we compare:
- the geometry attribute versions extracted from the reference graph,
- the versions extracted from graphs enriched with:
  - fragmentary states only,
  - fragmentary states and events.

Versions whose provenance includes fragmentary factoids are identified
using the source label  
**“Factoïdes générés pour les numéros de rue”**.

The evaluation distinguishes between:
- *unchanged versions*: versions identical to the reference,
- *modified versions*: versions whose source composition differs.

### Results

The results show that:

- In the **fragmentary states** configuration, the vast majority of geometry
  versions remain unchanged, indicating that the insertion of intermediate
  states does not affect the final merged representations.
- In the **fragmentary states and events** configuration, a similar behaviour
  is observed, with only a limited number of modified versions.

These results indicate that the enrichment process preserves the stability
of attribute version reconstruction, even when fragmentary knowledge is added.

---

## 2. Temporal Stability of Attribute Changes

### Objective

The second evaluation assesses whether inserting fragmentary states and events
modifies the temporal localisation of attribute changes.

The objective is to ensure that detected change instants remain consistent
with those obtained from the reference graph.

### Method

For each street number, we compare the change times extracted from:
- the reference graph,
- the enriched graphs.

Each change is classified as:
- *identical*: detected at the same time as in the reference graph,
- *shifted*: detected but at a different time,
- *missing*: present in the reference but absent in the enriched graph.

### Results

The evaluation highlights that:

- Most changes detected in the reference graph are recovered at the same
  temporal positions after enrichment.
- Only a small number of changes are temporally shifted or missing,
  primarily due to the increased temporal granularity introduced by
  fragmentary states.

Overall, the results confirm that the insertion of fragmentary knowledge
does not significantly distort the temporal structure of attribute evolution.

---

## 3. Discussion

These evaluations demonstrate that the proposed enrichment strategy
introduces additional temporal detail without compromising the coherence
of reconstructed evolutions.

In particular:
- attribute versions remain stable despite the insertion of intermediate states,
- change detection remains temporally consistent.

This confirms the robustness of the PeGazUs approach when integrating
heterogeneous and fragmentary historical data.


In [None]:
# Detect street numbers that has been modified during the process
frag_source_label = "Factoïdes générés pour les numéros de rue"

fragmentary_evaluation_metrics = esf.run_fragmentary_evaluation(
    links_folder,
    graphdb_url,
    repository_name,
    facts_named_graph_name,
    frag_sn_states_facts_named_graph_name,
    frag_sn_states_events_facts_named_graph_name,
    frag_source_label
)

## Display evaluation results

### Results from first evaluation

In [None]:
esv.print_version_quality_metrics(version_quality_metrics)

### Results from second evaluation

In [None]:
esf.print_evaluation_tables(fragmentary_evaluation_metrics)