Simplistic spatial/administrative referential.
Pour une documentation relative aux niveaux administratifs français, veuillez consulter le fichier LISEZMOI.md.
This project is a set of tools to produce a shared spatial/administrative referential based on open datasets.
The purpose is to be embeddable in applications for autocompletion. There is no purpose of universality (country levels are not comparable) nor precision (most sourced datasets have a 100m precision).
These tools work on and exports WGS84 spatial data.
This project uses MongoDB 2.6+ and GDAL as main tooling. Build tools are written in Python 3 and make use of:
- click
- PyMongo
- Fiona
- Shapely
The web interface requires Flask.
Translations requires Babel and Transifex client.
There are many way of getting a development environment started.
Assuming you have Virtualenv and MongoDB installed and configured on you computer:
$ git clone https://github.com/etalab/geozones.git
$ cd geozones
$ virtualenv -p /bin/python3 .
$ source bin/activate
$ pip install -e .
$ geozones -h
There are two main models:
- level hierarchies
- zone/territories
GeoZones use MongoDB as working storage.
They define relationships between levels and their names. They are not stored into the database but they are exported with the following properties:
Property | Description |
---|---|
id | A string identifier for the level (ie. country , fr:commune ...) |
label | The humain string representation in English (ie. World ). * |
admin_level | An administrative scale index (0 is the biggest and 100 the smallest level) |
parents | The list of known parent levels identifier |
*: Labels are optionally translatables
You can contribute your country specific levels. Currently geozones support the following levels:
identifier | administrative level | description |
---|---|---|
country-group |
10 | Groups of countries (World , UE ...) |
contry |
20 | A country |
country-subset |
30 | An administrative subset of a country |
identifier | administrative level | description |
---|---|---|
fr:region |
40 | Regions of France |
fr:epci |
68 | Intercommunality of France |
fr:departement |
60 | Departements of France |
fr:collectivite |
60 | French overseas collectivities |
fr:arrondissement |
70 | Arrondissements of France |
fr:commune |
80 | Communes of France |
fr:canton |
98 | Cantons of France |
fr:iris |
98 | Iris of France |
A zone is a spatial polygon for a given level. It has at least one unique code (unique on its level) and a name. It can have many known keys, that are not necessarily unique (ie. postal codes can be shared by many towns).
Labels are optionally translatable.
Some zones are defined as an aggregation of other zones. They are called aggregation in geozones and built after all data are loaded.
The following properties are exported in the GeoJSON output:
Property | Description |
---|---|
id | A unique identifier defined by <level>:<code>[@creation] |
code | The zone unique identifier in this level |
level | The level identifier |
name | The zone display name (can be translatable) |
population | Estimated/approximative population (optional) |
area | Estimated/approximative area in km² (optional) |
wikipedia | A Wikipedia reference (optional) |
dbpedia | A DBPedia reference (optional) |
flag | A DBPedia reference to a flag (optional) |
blazon | A DBPedia reference to a blazon (optional) |
keys | A dictionary of known keys/code for this zone |
parents | A list of every known parent zone identifier |
Note that you can choose via the keys option which properties you would like to export during the
dist
ribution step.
Level names and some territories are translatable. They are provided as gettext files. Translations are handled on transifex.
Here’s the workflow:
# Ensure you have the optionnal tools to process translations
$ pip install -e .[i18n]
# Extract translatabls labels
$ pybabel extract -F babel.cfg -o geozones/translations/geozones.pot .
# Push updated translations template to Transifex
$ tx push -s
# Fetch last translations from Transifex
$ tx pull
# Compile translations for packaging/distribution
$
To add an extra language:
$ pybabel init -D geozones -i geozones/translations/geozones.pot -d geozones/translations -l <language code>
$ tx push -t -l <language code>
A set of commands are provided for the build process. You can list them all with:
$ geozones --help
Download the required datasets. Datasets will be stored into a downloads
subdirectory.
Load and process datasets into database.
Perform zones aggregations for zones defined as aggregation of others.
Perform some non geospatial processing (ex: set the postal codes, attach the parents…).
--exclude
and --only
options make possible to run a set of postprocess function(s).
Dump the produced dataset as GeoJSON files for distribution. Files are dumped in a build subdirectory.
All in one task equivalent to:
# Perform all tasks from download to distibution
$ geozones download preload load aggregate postprocess dist
Serve a web interface to explore the generated data.
Display some useful informations and statistics.
Commands are chainable so you can write:
# Perform all tasks from download to distibution
$ geozones download load -d aggregate postprocess dist dist -s status
Generate a datasets donwload list for external usage.
This allows using an external download manager by example.
Ex: using 10 parallels threads with curl:
mkdir download && cd download && geozones sourceslist | xargs -P 10 -n 1 curl -O
Fetch zones logos/flags/blazons from Wikipedia when available.
You can export data in (Geo)JSON or msgpack formats.
The msgpack format consumes more CPU on deserialization but does not take many gigabytes of RAM given that it can iterate over data without loading the whole file.
- NaturalEarth administrative boundaries
- The Matic Mapping country boundaries
- OpenStreetMap french regions boundaries
- OpenStreetMap french counties boundaries
- OpenStreetMap french EPCIs boundaries
- OpenStreetMap french districts boundaries
- OpenStreetMap french towns boundaries
- OpenStreetMap french cantons boundaries
- IGN/ISEE IRIS aggregated version
- French postal codes database
- Incremental downloads, maybe with checksum check
- Global post-processor
- Post-processor dependencies
- Audit trail
- Distribute GeoZone as a standalone python executable
- Some quality check tools
- Global weight = f(population, area, level)
- Different precision output
- Localized JSON outputs (Output are english only right now)
- Translations as distributable JSON (as an alternative to the current PO/MO format)
- Translations as Python package
- Model versioning
- Statistics/coverages in levels
- Querying
- Only fetch zones for viewport (less intensive for lower layers)
- A full web-service as a separate project