# Variables

## Import libraries

In [1]:
import os
import sys
from rdflib import URIRef

### File and Folder Definitions

This section defines various files and folders used in the process. These are divided into two categories:

1. **Existing files**: These are pre-existing files that are part of the project and are used as input or reference files during the process.
   - `ont_file_name`: The ontology file in Turtle format (`ontology.ttl`).
   - `ruleset_file_name`: The ruleset file in the PIE format (`rules.pie`).

2. **Created files during the process**: These are files generated during the execution of the process, and they are stored in the `tmp_folder`.
   - `local_config_file_name`: Configuration file for the repository in Turtle format (`config_repo.ttl`).
   - `pref_hidden_labels_ttl_file_name`: File containing implicit pref and hidden label triples for landmarks and name attribute version in Turtle format (`pref_hidden_labels.ttl`).

3. **Existing folders**: These are folders that already exist and store files used in the project.
   - `data_folder_name`: The folder containing the data files (`../data`).

4. **Created folder during the process**: This is the folder created during the process to store temporary files.
   - `tmp_folder_name`: Folder for temporary files during the process (`../tmp_files`).

### GraphDB Repository Name
- `facts_repository_name`: The name of the repository in GraphDB where the data is stored (`addresses_from_factoids`).

### Named Graphs Definitions
These are the names of the named graphs in the GraphDB repository.
- `ontology_named_graph_name`: The named graph for the ontology (`ontology`).
- `facts_named_graph_name`: The named graph for the facts data (`facts`).
- `factoids_named_graph_name`: The named graph for the factoids data (`factoids`).
- `permanent_named_graph_name`: The named graph for permanent data (`permanent`).
- `tmp_named_graph_name`: The named graph for temporary data (`temporary`).
- `inter_sources_name_graph_name`: The named graph for inter-sources data (`inter_sources`).

### URIs to Access GraphDB
- `str_graphdb_url`: The URL to access the local GraphDB instance (`http://localhost:7200`).

### Code Folder Path
- `py_code_folder_path`: The folder containing the Python code (`./code`).

These variables are used throughout the process to refer to different files, folders, and named graphs in the GraphDB repository. They allow for a modular and flexible approach to handling the data and configuring the process steps.


In [2]:
# Existing files
ont_file_name = "ontology.ttl"
ruleset_file_name = "rules.pie"

# Created files during process (in `tmp_folder`)
local_config_file_name = "config_repo.ttl"
pref_hidden_labels_ttl_file_name = "pref_hidden_labels.ttl"
comp_tmp_file_name = "comparisons.ttl"

# Existing folders
data_folder_name = "./data/sources"

# Created folder during process
tmp_folder_name = "./tmp_files"

# GraphDB repository name
facts_repository_name = "addresses_from_factoids_with_frag_states"

# Definition of names of named graphes 
ontology_named_graph_name = "ontology"
facts_named_graph_name = "facts"
inter_sources_name_graph_name = "inter_sources"
comp_named_graph_name = "comparisons"
tmp_named_graph_name = "temporary"

# Language for labels
lang = "fr"

# URIs to access to GraphDB
str_graphdb_url = "http://localhost:7200"

py_code_folder_path = "/home/CBernard2/Projects/pegazus-extension/scripts/code"

## Processing Global Variables

In this section, we define and process various global variables related to file paths and configurations used throughout the process. This includes:

- **Obtaining absolute file paths**: Converts relative file paths, as defined in the previous section, into absolute paths. This ensures the program can correctly locate the files regardless of the current working directory.
    - `tmp_folder`: The absolute path for the temporary folder used to store intermediate files (`tmp_folder_name`).
    - `data_folder`: The absolute path for the folder containing the data files (`data_folder_name`).
    - `python_code_folder`: The absolute path for the folder containing the Python code (`py_code_folder_path`).
    - `local_config_file`: The absolute path for the local configuration file (`local_config_file_name`), located in the temporary folder.
    - `ont_file`: The absolute path for the ontology file (`ont_file_name`).
    - `ruleset_file`: The absolute path for the ruleset file (`ruleset_file_name`).
    - `pref_hidden_labels_ttl_file`: The absolute path for the file containing implicit facts data (`pref_hidden_labels_ttl_file_name`), located in the temporary folder.

- **Creating a temporary folder**: If the folder specified by `tmp_folder_name` does not already exist, the program will create it to store files that are intended to be deleted after processing.

- **Creating an RDFLib object for `graphdb_url`**: This step converts the string representing the GraphDB URL (`str_graphdb_url`) into an RDFLib `URIRef` object. This object can be used in RDF queries and updates to interact with the GraphDB instance.
    - `graphdb_url`: An RDFLib `URIRef` representing the GraphDB URL (`str_graphdb_url`).

These steps help to set up the working environment by ensuring that the necessary paths and configurations are ready for the process to begin.


In [3]:
tmp_folder = os.path.abspath(tmp_folder_name)
data_folder = os.path.abspath(data_folder_name)

python_code_folder = os.path.abspath(py_code_folder_path)

local_config_file = os.path.join(tmp_folder, local_config_file_name)
ont_file = os.path.abspath(ont_file_name)
ruleset_file = os.path.abspath(ruleset_file_name)
pref_hidden_labels_ttl_file = os.path.join(tmp_folder, pref_hidden_labels_ttl_file_name)
comp_tmp_file = os.path.join(tmp_folder, comp_tmp_file_name)

graphdb_url = URIRef(str_graphdb_url)

## Importing Python Modules

This section imports various Python modules used throughout the project. The following modules are imported from the `code` folder:

- **file_management**: Provides functions for managing files, such as reading, writing, and manipulating file paths.
- **graphdb**: Handles interactions with GraphDB, including querying, updating, and managing named graphs.
- **graphrdf**: Likely used for handling RDF (Resource Description Framework) data and operations on graphs.
- **attribute_version_comparisons**: Facilitates the comparison of different versions of attributes in the RDF data.
- **multi_sources_processing**: Contains methods for processing data from multiple sources, likely aggregating or reconciling them.
- **factoids_creation**: Includes functions for creating factoids, which are small pieces of information or data extracted or derived from sources.
- **time_processing**: Likely used for managing and processing time-based data or timestamps.
- **resource_transfert**: Handles the transfer of resources, possibly related to moving or copying data between systems or storage locations.
- **evolution_construction**: Likely deals with constructing or evolving data models or entities over time.

These modules provide the necessary functionality for managing files, interacting with a graph database, processing data, and performing other domain-specific tasks related to the project.


In [4]:
# Calling up the `code` folder contains the python codes
sys.path.insert(1, python_code_folder)

import file_management as fm
import graphdb as gd
import attribute_version_comparisons as avc
import multi_sources_processing as msp
import factoids_creation as fc
import time_processing as tp
import resource_rooting as rr
import evolution_construction as ec

## Creation of folders if they don't exist

In [5]:
fm.create_folder_if_not_exists(tmp_folder)

### Creating the local directory in GraphDB
For the creation to work, GraphDB must be launched and therefore the URI given by `graphdb_url` must work. If the directory already exists, nothing is done.

### Options

- **`allow_removal`**: If set to `False`, the repository will not be removed during the reinitialization process. Instead, the repository will simply be emptied. This is useful in case the deletion of the repository fails, ensuring the directory is cleared without being deleted and recreated. 
- **`disable_same_as`**: When set to `True`, this option disables the use of `sameAs` in the reasoning process.

### Creation Process

The function `gd.reinitialize_repository` is used to reinitialize the repository. When called, it ensures the repository is set up according to the provided configurations (e.g., `local_config_file`, `ruleset_name`). If the repository already exists, it is reinitialized without removing it (if `allow_removal` is set to `False`).

- **`graphdb_url`**: The URL pointing to the running GraphDB instance.
- **`facts_repository_name`**: The name of the repository to be reinitialized.
- **`local_config_file`**: The configuration file used to initialize the repository.
- **`ruleset_name`**: The name of the ruleset to be used for reasoning, such as `"owl2-rl-optimized"`.
- **`allow_removal`**: Controls whether the repository can be deleted and recreated (`False` will just empty it).

For the creation of the repository to work, GraphDB must be running, and the provided `graphdb_url` must be valid. If the repository already exists, it will be reinitialized without further action.


In [6]:
# Deleting a directory may not work, so to avoid deletion at reset time (deletion + (re)creation)
# `allow_removal` must be False, in which case the directory will just be emptied.
allow_removal = False
disable_same_as = False

gd.reinitialize_repository(graphdb_url, facts_repository_name, local_config_file, ruleset_name="rdfsplus-optimized", disable_same_as=disable_same_as, allow_removal=allow_removal)
# gd.reinitialize_repository(graphdb_url, facts_repository_name, local_config_file, ruleset_name="owl2-rl-optimized", disable_same_as=disable_same_as, allow_removal=allow_removal)
# gd.reinitialize_repository(graphdb_url, facts_repository_name, local_config_file, ruleset_file=ruleset_file, disable_same_as=disable_same_as, allow_removal=allow_removal)

## Local directory management

## Importing ontologies

In [7]:
gd.load_ontologies(graphdb_url, facts_repository_name, [ont_file], ontology_named_graph_name)

## Definition of variables linked to sources

### Paris thoroughfares via Wikidata

* `wd` for "wikidata"
* `wdp_land` for "wikidata paris landmarks"
* `wdp_loc` for "wikidata paris locations"

In [8]:
# Name of the directory where the factoid triples of Wikidata data are stored and constructed
wdp_named_graph_name = "wikidata"

# CSV file to store the result of the selection query
wdp_land_csv_file_name = "wd_paris_landmarks.csv"
wdp_land_csv_file = os.path.join(data_folder, wdp_land_csv_file_name)

# CSV file to store the result of the selection query
wdp_loc_csv_file_name = "wd_paris_locations.csv"
wdp_loc_csv_file = os.path.join(data_folder, wdp_loc_csv_file_name)

# TTL file for structuring knowledge of the Paris thoroughfares
wdp_kg_file_name = "wd_paris.ttl"
wdp_kg_file = os.path.join(tmp_folder, wdp_kg_file_name)

# Time interval of validity of the source (there is not end time)
wdp_valid_time = {
    "start" : {"stamp":"2024-08-26T00:00:00Z","precision":"day","calendar":"gregorian"}
    }

wdp_source = {
    "label": "Wikidata",
    "lang": "mul"
}

### Nomenclature of Paris thoroughfares (Ville de Paris data)

The City of Paris data is made up of two sets:
* [names of current street rights-of-way](https://opendata.paris.fr/explore/dataset/denominations-emprises-voies-actuelles)
* [obsolete street names](https://opendata.paris.fr/explore/dataset/denominations-des-voies-caduques)

Current roads have a geometric right of way, unlike the old thoroughfares.

* `vpt` for ‘ville paris thoroughfares’
* `vpta` for ‘ville paris thoroughfares actuelles’.
* `vptc` for ‘ville paris thoroughfares caduques’.

In [9]:
# Name of the directory where the factoid triples of Ville de Paris data are stored and constructed
vpt_named_graph_name = "ville_de_paris"

# CSV files containting data
vpta_csv_file_name = "denominations-emprises-voies-actuelles.csv"
vpta_csv_file = os.path.join(data_folder, vpta_csv_file_name)
vptc_csv_file_name = "denominations-des-voies-caduques.csv"
vptc_csv_file = os.path.join(data_folder, vptc_csv_file_name)

# TTL file for structuring knowledge of the Paris thoroughfares
vpt_kg_file_name = "voies_paris.ttl"
vpt_kg_file = os.path.join(tmp_folder, vpt_kg_file_name)

# Time interval of validity of the source (there is not end time)
vpta_valid_time = {
    "start" : {"stamp":"2025-04-01T00:00:00Z","precision":"day","calendar":"gregorian"},
    "end" : {"stamp":tp.get_current_timestamp(),"precision":"day","calendar":"gregorian"}
    }

# Description of the source for current thoroughfares of Ville de Paris
vpta_source = {
    "uri": "https://opendata.paris.fr/explore/dataset/denominations-emprises-voies-actuelles/",
    "label" : "Dénominations des emprises des voies actuelles",
    "lang":"fr",
    "publisher" : {
        "label": "Département de la Topographie et de la Documentation Foncière de la Ville de Paris"
    }
}

# Description of the source for caducous thoroughfares of Ville de Paris
vptc_source = {
    "uri": "https://opendata.paris.fr/explore/dataset/denominations-des-voies-caduques/",
    "label" : "Dénominations caduques des voies",
    "lang":"fr",
    "publisher" : {
        "label": "Département de la Topographie et de la Documentation Foncière de la Ville de Paris"
    }
}

### Base Adresse Nationale (BAN)

Data from the [Base Adresse Nationale (BAN)](https://adresse.data.gouv.fr/base-adresse-nationale) (National Address Base), available [here](https://adresse.data.gouv.fr/data/ban/adresses/latest/csv)

bpa` for ‘BAN paris addresses’

In [10]:
# Name of the directory where the factoid triples of BAN data are stored and constructed
bpa_named_graph_name = "ban_adresses"

# CSV file containting data
bpa_csv_file_name = "ban_adresses.csv"
bpa_csv_file = os.path.join(data_folder, bpa_csv_file_name)

# TTL file for structuring knowledge of Paris addresses
bpa_kg_file_name = "ban_adresses.ttl"
bpa_kg_file = os.path.join(tmp_folder, bpa_kg_file_name)

# Time interval of validity of the source (there is not end time)
bpa_valid_time = {
    "start" : {"stamp":"2024-01-01T00:00:00Z","precision":"day","calendar":"gregorian"},
    "end" : {"stamp":"2025-01-01T00:00:00Z","precision":"day","calendar":"gregorian"}
    }

bpa_source = {
    "label" : "Base Adresse Nationale",
    "lang":"fr",
    "publisher" : {
        "label": "DINUM / ANCT / IGN"
    }
}

### OpenStreetMap (OSM)

Extracting data from OpenStreetMap

In [11]:
# Name of the directory where the factoid triples of OSM data are stored and constructed
osm_named_graph_name = "osm"

# CSV files containting data
osm_csv_file_name = "osm_adresses.csv"
osm_csv_file = os.path.join(data_folder, osm_csv_file_name)
osm_hn_csv_file_name = "osm_hn_adresses.csv"
osm_hn_csv_file = os.path.join(data_folder, osm_hn_csv_file_name)

# TTL file for structuring knowledge of OSM addresses
osm_kg_file_name = "osm_adresses.ttl"
osm_kg_file = os.path.join(tmp_folder, osm_kg_file_name)

# Time interval of validity of the source (there is not end time)
osm_valid_time = {
    "start" : {"stamp":"2025-05-01T00:00:00Z","precision":"day","calendar":"gregorian"},
    "end" : {"stamp":tp.get_current_timestamp(),"precision":"day","calendar":"gregorian"}
    }

osm_source = {
    "label" : "OpenStreetMap",
    "lang":"mul"
}

### Integration of data from Geojson files describing thoroughfares

These files are derived from the vectorisation of maps of Paris:
* the revised Napoleonic cadatre of 1728 ;
* Andriveau’ plan of 1849 ;
* municipal plot plan of 1871 ;
* the Municipal Atlas map of 1888.

#### Plan Delagrive 1728

In [12]:
# Name of the directory where data factoid triples are stored and constructed
del_1728_th_named_graph_name = "plan_delagrive_1728_voies"

# Geojson file containting data
del_1728_th_geojson_file_name = "plan_delagrive_1728_voies.geojson"
del_1728_th_geojson_file = os.path.join(data_folder, del_1728_th_geojson_file_name)
del_1728_th_kg_file_name = "plan_delagrive_1728_voies.ttl"
del_1728_th_kg_file = os.path.join(tmp_folder, del_1728_th_kg_file_name)

del_1728_th_name_attribute = "nom"

# Description of the source within a dictionary
del_1728_th_source = {
    "lang" : "fr", 
    "uri" : "https://gallica.bnf.fr/ark:/12148/btv1b53085122h",
    "label" : "Neuvieme Plan de Paris. Ses accroissemens sous le Regne de Louis XV[...] ",
    "publisher" : {
        "label": "Delagrive"
        }
}

# Time interval of validity of the source
del_1728_th_valid_time = {
    "start" : {"stamp":"1827-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1829-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### Verniquet atlas

In [13]:
# Name of the directory where data factoid triples are stored and constructed
ve_1790_th_named_graph_name = "atlas_verniquet_1791_voies"

# Geojson file containting data
ve_1790_th_geojson_file_name = "atlas_verniquet_1791_voies.geojson"
ve_1790_th_geojson_file = os.path.join(data_folder, ve_1790_th_geojson_file_name)
ve_1790_th_kg_file_name = "atlas_verniquet_1791_voies.ttl"
ve_1790_th_kg_file = os.path.join(tmp_folder, ve_1790_th_kg_file_name)

ve_1790_th_name_attribute = "nom"

# Description of the source within a dictionary
ve_1790_th_source = {
    "uri": "https://gallica.bnf.fr/ark:/12148/bpt6k3167995",
    "lang" : "fr", 
    "label" : "Atlas Général de la Ville",
    "publisher" : {
        "label": "Verniquet"
        }
}

# Time interval of validity of the source
ve_1790_th_valid_time = {
    "start" : {"stamp":"1784-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1791-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### Vasserot atlas

In [14]:
# Name of the directory where data factoid triples are stored and constructed
va_1810_th_named_graph_name = "atlas_vasserot_1810_voies"

# Geojson file containting data
va_1810_th_geojson_file_name = "atlas_vasserot_1810_voies.geojson"
va_1810_th_geojson_file = os.path.join(data_folder, va_1810_th_geojson_file_name)
va_1810_th_kg_file_name = "atlas_vasserot_1810_voies.ttl"
va_1810_th_kg_file = os.path.join(tmp_folder, va_1810_th_kg_file_name)

va_1810_th_name_attribute = "nom_entier"

# Description of the source within a dictionary
va_1810_th_source = {
    "uri": "www.fabriquenumeriquedupasse.fr/explore/dataset/alpage-voies-vasserot",
    "lang" : "fr", 
    "label" : "Cadastre de Paris par îlot : 1810-1836",
    "publisher" : {
        "label": "Vasserot"
        }
}

# Time interval of validity of the source
va_1810_th_valid_time = {
    "start" : {"stamp":"1810-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1836-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### 1836 Jacoubet Atlas streets

In [15]:
# Name of the directory where data factoid triples are stored and constructed
ja_1836_th_named_graph_name = "atlas_jacoubet_1836_voies"

# Geojson file containting data
ja_1836_th_geojson_file_name = "atlas_jacoubet_1836_voies.geojson"
ja_1836_th_geojson_file = os.path.join(data_folder, ja_1836_th_geojson_file_name)
ja_1836_th_kg_file_name = "atlas_jacoubet_1836_voies.ttl"
ja_1836_th_kg_file = os.path.join(tmp_folder, ja_1836_th_kg_file_name)

ja_1836_th_name_attribute = "nom_entier"

# Description of the source within a dictionary
ja_1836_th_source = {
    "uri":"https://bibliotheques-specialisees.paris.fr/ark:/73873/pf0000212158",
    "lang" : "fr", 
    "label" : "Atlas général de la ville, des faubourgs et des monuments de Paris",
    "publisher" : {
        "label": "Jacoubet"
        }
}

# Time interval of validity of the source
ja_1836_th_valid_time = {
    "start" : {"stamp":"1827-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1838-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### Andriveau atlas

In [16]:
# Name of the directory where data factoid triples are stored and constructed
an_1849_th_named_graph_name = "plan_andriveau_1849_voies"

# Geojson file containting data
an_1849_th_geojson_file_name = "plan_andriveau_1849_voies.geojson"
an_1849_th_geojson_file = os.path.join(data_folder, an_1849_th_geojson_file_name)
an_1849_th_kg_file_name = "plan_andriveau_1849_voies.ttl"
an_1849_th_kg_file = os.path.join(tmp_folder, an_1849_th_kg_file_name)

an_1849_th_name_attribute = "nom_entier"

# Description of the source within a dictionary
an_1849_th_source = {
    "lang" : "fr", 
    "label" : "Plan de Paris comprenant l'enceinte des fortifications",
    "publisher" : {
        "label": "Andriveau"
        }
}

# Time interval of validity of the source
an_1849_th_valid_time = {
    "start" : {"stamp":"1848-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1850-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### 1888 Municipal Atlas of Paris

In [17]:
# Name of the directory where data factoid triples are stored and constructed
am_1888_th_named_graph_name = "atlas_municipal_1888_voies"

# Geojson file containting data
am_1888_th_geojson_file_name = "atlas_municipal_1888_voies.geojson"
am_1888_th_geojson_file = os.path.join(data_folder, am_1888_th_geojson_file_name)
am_1888_th_kg_file_name = "atlas_municipal_1888_voies.ttl"
am_1888_th_kg_file = os.path.join(tmp_folder, am_1888_th_kg_file_name)

am_1888_th_name_attribute = "nom_1888"

# Description of the source within a dictionary
am_1888_th_source = {
    "uri":"https://bibliotheques-specialisees.paris.fr/ark:/73873/pf0000935116",
    "lang" : "fr", 
    "label" : "Plan de l'atlas municipal de 1888",
    "publisher" : {
        "label": "Poubelle"
        }
}

# Time interval of validity of the source
am_1888_th_valid_time = {
    "start" : {"stamp":"1887-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1889-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

### Integration of data from Geojson files describing street numbers

#### 1807 General cadastre of Paris

In [18]:
# Name of the directory where data factoid triples are stored and constructed
cad_1807_addr_named_graph_name = "cadastre_paris_1807_adresses"

# Geojson file containting data
cad_1807_addr_geojson_file_name = "cadastre_paris_1807_adresses.geojson"
cad_1807_addr_geojson_file = os.path.join(data_folder, cad_1807_addr_geojson_file_name)
cad_1807_addr_kg_file_name = "cadastre_paris_1807_adresses.ttl"
cad_1807_addr_kg_file = os.path.join(tmp_folder, cad_1807_addr_kg_file_name)

# Attribute names of the street number value and the street name
cad_1807_addr_sn_name_property = "NUMERO TXT"
cad_1807_addr_th_name_property = "NOM_SAISI"

# Description of the source within a dictionary
cad_1807_addr_source = {
    "lang" : "fr", 
    "label" : "Adresses du cadastre général de Paris de 1807",
    "publisher" : {
        "label": "Ville de Paris"
        }
}

# Time interval of validity of the source
cad_1807_addr_valid_time = {
    "start" : {"stamp":"1806-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1808-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### Vasserot atlas

In [19]:
# Name of the directory where data factoid triples are stored and constructed
va_1810_addr_named_graph_name = "atlas_vasserot_1810_adresses"

# Geojson file containting data
va_1810_addr_geojson_file_name = "atlas_vasserot_1810_adresses.geojson"
va_1810_addr_geojson_file = os.path.join(data_folder, va_1810_addr_geojson_file_name)
va_1810_addr_kg_file_name = "atlas_vasserot_1810_adresses.ttl"
va_1810_addr_kg_file = os.path.join(tmp_folder, va_1810_addr_kg_file_name)

# Attribute names of the street number value and the street name
va_1810_addr_sn_name_property = "num_voies"
va_1810_addr_th_name_property = "nom_entier"

# Description of the source within a dictionary
va_1810_addr_source = {
    "uri": "https://www.fabriquenumeriquedupasse.fr/explore/dataset/alpage-adresses-vasserot",
    "lang" : "fr", 
    "label" : "Cadastre de Paris par îlot : 1810-1836",
    "publisher" : {
        "label": "Vasserot"
        }
}

# Time interval of validity of the source
va_1810_addr_valid_time = {
    "start" : {"stamp":"1810-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1836-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### 1836 Jacoubet Atlas

In [20]:
# Name of the directory where data factoid triples are stored and constructed
ja_1836_addr_named_graph_name = "atlas_jacoubet_1836_adresses"

# Geojson file containting data
ja_1836_addr_geojson_file_name = "atlas_jacoubet_1836_adresses.geojson"
ja_1836_addr_geojson_file = os.path.join(data_folder, ja_1836_addr_geojson_file_name)
ja_1836_addr_kg_file_name = "atlas_jacoubet_1836_adresses.ttl"
ja_1836_addr_kg_file = os.path.join(tmp_folder, ja_1836_addr_kg_file_name)

# Attribute names of the street number value and the street name
ja_1836_addr_sn_name_property = "num_voies"
ja_1836_addr_th_name_property = "nom_entier"

# Description of the source within a dictionary
ja_1836_addr_source = {
    "lang" : "fr", 
    "label" : "Atlas de la ville de Paris de Jacoubet de 1836",
    "publisher" : {
        "label": "Ville de Paris"
        }
}

# Time interval of validity of the source
ja_1836_addr_valid_time = {
    "start" : {"stamp":"1836-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1838-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

#### 1888 Municipal Atlas of Paris

In [21]:
# Name of the directory where data factoid triples are stored and constructed
am_1888_addr_named_graph_name = "atlas_municipal_1888_adresses"

# Geojson file containting data
am_1888_addr_geojson_file_name = "atlas_municipal_1888_adresses.geojson"
am_1888_addr_geojson_file = os.path.join(data_folder, am_1888_addr_geojson_file_name)
am_1888_addr_kg_file_name = "atlas_municipal_1888_adresses.ttl"
am_1888_addr_kg_file = os.path.join(tmp_folder, am_1888_addr_kg_file_name)

# Attribute names of the street number value and the street name
am_1888_addr_sn_name_property = "numbers_va"
am_1888_addr_th_name_property = "normalised"

# Description of the source within a dictionary
am_1888_addr_source = {
    "lang" : "fr", 
    "label" : "Adresses du plan de l'atlas municipal de 1888",
    "publisher" : {
        "label": "Ville de Paris"
        }
}

# Time interval of validity of the source
am_1888_addr_valid_time = {
    "start" : {"stamp":"1887-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
    "end" : {"stamp":"1889-01-01T00:00:00Z","precision":"year","calendar":"gregorian"},
}

### Events

TTL file describing events

In [22]:
# Name of the directory where the factoid triples of events data are stored and constructed
events_named_graph_name = "source_events"

# Event file containting data
events_json_file_name = "events.json"
events_json_file = os.path.join(data_folder, events_json_file_name)

# Final TTL file of factoids from events
events_kg_file_name = "events.ttl"
events_kg_file = os.path.join(tmp_folder, events_kg_file_name)

### Fragmentary streetnumbers

JSON file describing states of fragmentary street numbers

In [23]:
# Name of the directory where the factoid triples of fragmentary street numbers data are stored and constructed
fragmentary_sn_states_named_graph_name = "fragmentary_states_streetnumbers"

# fragmentary street numbers file containting data
fragmentary_sn_states_json_file_name = "fragmentary_states_streetnumbers.json"
fragmentary_sn_states_json_file = os.path.join(data_folder, fragmentary_sn_states_json_file_name)

# Final TTL file of factoids from states of fragmentary street numbers
fragmentary_sn_states_kg_file_name = "fragmentary_states_streetnumbers.ttl"
fragmentary_sn_states_kg_file = os.path.join(tmp_folder, fragmentary_sn_states_kg_file_name)

## Final and iterative process

### Define named graph URIs

In [24]:
facts_named_graph_uri = gd.get_named_graph_uri_from_name(graphdb_url, facts_repository_name, facts_named_graph_name)
inter_sources_name_graph_uri = gd.get_named_graph_uri_from_name(graphdb_url, facts_repository_name, inter_sources_name_graph_name)
tmp_named_graph_uri = gd.get_named_graph_uri_from_name(graphdb_url, facts_repository_name, tmp_named_graph_name)

For each source, factoids are created independently in separate named graphs

In [25]:
# Process for Ville de Paris
g = fc.create_graph_from_ville_paris(vpta_csv_file, vptc_csv_file, vpta_valid_time, vpta_source, vptc_source, "fr")
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, vpt_named_graph_name, vpt_kg_file)

# Process for BAN
g = fc.create_graph_from_paris_ban(bpa_csv_file, bpa_valid_time, bpa_source, "fr")
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, bpa_named_graph_name, bpa_kg_file)

# Process for Wikidata
# fc.get_data_from_wikidata(wdp_land_csv_file, wdp_loc_csv_file)
g = fc.create_graph_from_wikidata(wdp_land_csv_file, wdp_loc_csv_file, wdp_source, lang)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, wdp_named_graph_name, wdp_kg_file)

# Process for OpenStreetMap
g = fc.create_graph_from_osm(osm_csv_file, osm_hn_csv_file, osm_valid_time, osm_source, lang)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, osm_named_graph_name, osm_kg_file)

# Process for streets of the Delagrive map
g = fc.create_graph_from_geojson_states_of_thoroughfares(del_1728_th_geojson_file, lang, del_1728_th_valid_time, del_1728_th_source,
                                                         del_1728_th_name_attribute, del_1728_th_name_attribute)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, del_1728_th_named_graph_name, del_1728_th_kg_file)

# Process for streets of the Verniquet Atlas
g = fc.create_graph_from_geojson_states_of_thoroughfares(ve_1790_th_geojson_file, lang, ve_1790_th_valid_time, ve_1790_th_source,
                                                         ve_1790_th_name_attribute, ve_1790_th_name_attribute)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, ve_1790_th_named_graph_name, ve_1790_th_kg_file)

# Process for streets of the Vasserot Atlas
g = fc.create_graph_from_geojson_states_of_thoroughfares(va_1810_th_geojson_file, lang, va_1810_th_valid_time, va_1810_th_source,
                                                         va_1810_th_name_attribute, va_1810_th_name_attribute)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, va_1810_th_named_graph_name, va_1810_th_kg_file)

# Process for streets of the 1836 Jacoubet Atlas
g = fc.create_graph_from_geojson_states_of_thoroughfares(ja_1836_th_geojson_file, lang, ja_1836_th_valid_time, ja_1836_th_source,
                                                         ja_1836_th_name_attribute, ja_1836_th_name_attribute)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, ja_1836_th_named_graph_name, ja_1836_th_kg_file)

# Process for streets of the Andriveau map
g = fc.create_graph_from_geojson_states_of_thoroughfares(an_1849_th_geojson_file, lang, an_1849_th_valid_time, an_1849_th_source,
                                                         an_1849_th_name_attribute, an_1849_th_name_attribute)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, an_1849_th_named_graph_name, an_1849_th_kg_file)

# Process for streets of the 1888 Municipal Atlas
g = fc.create_graph_from_geojson_states_of_thoroughfares(am_1888_th_geojson_file, lang, am_1888_th_valid_time, am_1888_th_source,
                                                         am_1888_th_name_attribute, am_1888_th_name_attribute)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, am_1888_th_named_graph_name, am_1888_th_kg_file)

# Process for adresses of the General Cadastre of Paris
g = fc.create_graph_from_geojson_states_of_streetnumbers(cad_1807_addr_geojson_file, lang, cad_1807_addr_valid_time, cad_1807_addr_source,
                                                         cad_1807_addr_sn_name_property, cad_1807_addr_th_name_property)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, cad_1807_addr_named_graph_name, cad_1807_addr_kg_file)

# Process for adresses of the Vasserot Atlas
g = fc.create_graph_from_geojson_states_of_streetnumbers(va_1810_addr_geojson_file, lang, va_1810_addr_valid_time, va_1810_addr_source,
                                                         va_1810_addr_sn_name_property, va_1810_addr_th_name_property)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, va_1810_addr_named_graph_name, va_1810_addr_kg_file)

# Process for adresses of the 1836 Jacoubet Atlas
g = fc.create_graph_from_geojson_states_of_streetnumbers(ja_1836_addr_geojson_file, lang, ja_1836_addr_valid_time, ja_1836_addr_source,
                                                         ja_1836_addr_sn_name_property, ja_1836_addr_th_name_property)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, ja_1836_addr_named_graph_name, ja_1836_addr_kg_file)

# Process for adresses of the 1888 Municipal Atlas plan
g = fc.create_graph_from_geojson_states_of_streetnumbers(am_1888_addr_geojson_file, lang, am_1888_addr_valid_time, am_1888_addr_source,
                                                         am_1888_addr_sn_name_property, am_1888_addr_th_name_property)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, am_1888_addr_named_graph_name, am_1888_addr_kg_file)

# Process for events
g = fc.create_graph_from_events(events_json_file)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, events_named_graph_name, events_kg_file)

# Process for fragmentary states streetnumbers
g = fc.create_graph_from_states(fragmentary_sn_states_json_file)
msp.transfert_rdflib_graph_to_named_graph_repository(g, graphdb_url, facts_repository_name, fragmentary_sn_states_named_graph_name, fragmentary_sn_states_kg_file)

### Insertion of factoids in the fact graph

#### Remove facts and inter_source named graph before importing factoids

In [26]:
gd.remove_named_graph(graphdb_url, facts_repository_name, facts_named_graph_name)
gd.remove_named_graph(graphdb_url, facts_repository_name, inter_sources_name_graph_name)
gd.remove_named_graph(graphdb_url, facts_repository_name, comp_named_graph_name)

<Response [204]>

#### Insertion of factoids in the fact graph

##### Add alternative and hidden labels

In [27]:
named_graph_names = [
    vpt_named_graph_name,
    bpa_named_graph_name,
    wdp_named_graph_name,
    osm_named_graph_name,
    del_1728_th_named_graph_name,
    ve_1790_th_named_graph_name,
    va_1810_th_named_graph_name,
    ja_1836_th_named_graph_name,
    an_1849_th_named_graph_name,
    am_1888_th_named_graph_name,
    cad_1807_addr_named_graph_name,
    va_1810_addr_named_graph_name,
    ja_1836_addr_named_graph_name,
    am_1888_addr_named_graph_name,
    events_named_graph_name,
    fragmentary_sn_states_named_graph_name
    ]

for named_graph_name in named_graph_names:
    factoids_named_graph_uri = gd.get_named_graph_uri_from_name(graphdb_url, facts_repository_name, named_graph_name)
    msp.add_pref_and_hidden_labels_for_elements(graphdb_url, facts_repository_name, factoids_named_graph_uri, pref_hidden_labels_ttl_file)

##### Links factoids with facts

In [28]:
rr.link_factoids_with_facts(graphdb_url, facts_repository_name, facts_named_graph_uri, inter_sources_name_graph_uri)

### Construction of entities evolution from multi-source data

#### Comparison of version values

In [29]:
comparison_settings = {
    "geom_similarity_coef": 0.85,
    "geom_buffer_radius": 5,
    "geom_crs_uri": URIRef('http://www.opengis.net/def/crs/EPSG/0/2154'),
}
avc.compare_attribute_versions(graphdb_url, facts_repository_name, comp_named_graph_name, comp_tmp_file, comparison_settings)

#### Initialize missing landmark apperance and disapperance changes

* After having imported all factoids, changes which describe the appearance and the disappearance of landmark mainly are not created as they don't exist in factoids named graph.
* This step aims at initializing missing landmark apperance and disapperance changes and their related events for which we give an estimation the time at which it happened. We consider the appearance happened before the earliest time of reference of the landmark in sources. For the disapperance, it happened after the earliest time of reference.

In [30]:
ec.initialize_missing_changes_and_events_for_landmarks(graphdb_url, facts_repository_name, facts_named_graph_uri, inter_sources_name_graph_uri, tmp_named_graph_uri)

#### Split overlapping versions

In [31]:
gd.remove_named_graph_from_uri(tmp_named_graph_uri)
ec.get_elementary_versions_and_changes(graphdb_url, facts_repository_name, facts_named_graph_uri, tmp_named_graph_uri)

#### Get evolution from elementary elements

Get the attribute version evolution from elementary versions and changes
* remove empty attribute versions
    * versions not related to any trace
    * versions which changes are not related to any trace
* merge successive attribute versions which are similar

In [32]:
ec.get_attribute_version_evolution_from_elementary_elements(graphdb_url, facts_repository_name,
                                                            facts_named_graph_uri, inter_sources_name_graph_uri, tmp_named_graph_uri)

# Remove temporary named graph (which is used for construction)
gd.remove_named_graph_from_uri(tmp_named_graph_uri)

<Response [204]>