# 1. Introduction

This notebook explores published geojson data describing the administrative boundaries.

## Scales

### Ontlogy of IGN administrative units

http://data.ign.fr/def/geofla/20190212.htm

## Formats
- Geojson
- Shapefile

## Sources


| Producer | Dataset | Description | Link | 
| :--- |:--- | :--- | :--- |
| data.gouv | GeoZones | This dataset contains the different boundaries scales (regions, departments, iris, muncipalities...) in one geojson file. It contains two files: zones (geojson description of spatial entities) and levels (belonging connections between the entities). The Geojson is not simplified which makes the file pretty big. | https://www.data.gouv.fr/fr/datasets/geozones/ |
| IGN | Geofla | This dataset contains the different boundaries scales (regions, departments, iris, muncipalities...) in one geojson file. | https://www.data.gouv.fr/fr/datasets/geozones/ |
| OpenStreetMap | Geofla | This dataset contains the different boundaries scales (regions, departments, iris, muncipalities...) in one geojson file. | https://www.data.gouv.fr/fr/datasets/geozones/ |
| France-Geojson | France-Geojson | This dataset cotains a simplfied geometry version of geojson files | https://github.com/gregoiredavid/france-geojson |

# 2. Geofla (IGN)

Le fichier Geofla continet deux fichiers shapefile: COMMUNE et LIMTE_COMMUNE.

## fichier Commune

In [8]:
import geopandas as gpd
geo_municipalities_geofla = gpd.read_file("data/raw/territory/boundaries/IGN/GEOFLA/1_DONNEES_LIVRAISON_2016-06-00236/GEOFLA_2-2_SHP_LAMB93_FR-ED161/COMMUNE/COMMUNE.shp")
geo_municipalities_geofla.head()

Unnamed: 0,ID_GEOFLA,CODE_COM,INSEE_COM,NOM_COM,STATUT,X_CHF_LIEU,Y_CHF_LIEU,X_CENTROID,Y_CENTROID,Z_MOYEN,SUPERFICIE,POPULATION,CODE_ARR,CODE_DEPT,NOM_DEPT,CODE_REG,NOM_REG,geometry
0,COMMUNE00000000000000001,216,32216,LOURTIES-MONBRUN,Commune simple,500820,6264958,500515,6265413,252,966,139,3,32,GERS,76,LANGUEDOC-ROUSSILLON-MIDI-PYRENEES,"POLYGON ((499484.600 6265244.300, 499474.800 6..."
1,COMMUNE00000000000000002,33,47033,BOUDY-DE-BEAUREGARD,Commune simple,516424,6384852,515575,6385938,112,1019,414,3,47,LOT-ET-GARONNE,75,AQUITAINE-LIMOUSIN-POITOU-CHARENTES,"POLYGON ((514320.800 6384320.200, 514310.300 6..."
2,COMMUNE00000000000000003,9,32009,ARMOUS-ET-CAU,Commune simple,472979,6278963,473004,6278937,221,932,95,3,32,GERS,76,LANGUEDOC-ROUSSILLON-MIDI-PYRENEES,"POLYGON ((474752.300 6280446.600, 474783.500 6..."
3,COMMUNE00000000000000004,225,38225,AUTRANS-MEAUDRE EN VERCORS,Commune simple,898640,6450689,898625,6451597,1234,3371,2973,1,38,ISERE,84,AUVERGNE-RHONE-ALPES,"POLYGON ((902167.200 6463694.700, 902252.500 6..."
4,COMMUNE00000000000000005,890,62890,WILLEMAN,Commune simple,640049,7028672,640115,7029900,79,1023,178,4,62,PAS-DE-CALAIS,32,NORD-PAS-DE-CALAIS-PICARDIE,"POLYGON ((639510.500 7027545.000, 639456.600 7..."


In [19]:
# subet munucpalities belonging to the idf region
geo_municipalities_geofla_subset = geo_municipalities_geofla[geo_municipalities_geofla.CODE_REG == '11']
# convert back to geojson
geo_municipalities_geofla_geojson = geo_municipalities_geofla_subset.to_crs(epsg='4326').to_json()

In [20]:
import folium
m = folium.Map(location=[48.8534, 2.3488],
              zoom_start=7)


folium.GeoJson(
    geo_municipalities_geofla_geojson,
    name='geojson'
).add_to(m)

<folium.features.GeoJson at 0x256a1d27488>

In [21]:
# plot map
m

In [22]:
geo_municipalities_geofla_LIMITE_COMMUNE = gpd.read_file("data/raw/territory/boundaries/IGN/GEOFLA/1_DONNEES_LIVRAISON_2016-06-00236/GEOFLA_2-2_SHP_LAMB93_FR-ED161/COMMUNE/LIMITE_COMMUNE.shp")
geo_municipalities_geofla_LIMITE_COMMUNE.head()

Unnamed: 0,NATURE,ID_GEOFLA,geometry
0,Limite de commune,LIMI_COM0000000000000001,"LINESTRING (809899.000 6920882.800, 809903.700..."
1,Limite d'arrondissement,LIMI_COM0000000000000002,"LINESTRING (671069.200 6170756.900, 671033.800..."
2,Limite de commune,LIMI_COM0000000000000003,"LINESTRING (825266.400 6859502.500, 825336.400..."
3,Limite de commune,LIMI_COM0000000000000004,"LINESTRING (932228.000 6909094.200, 932258.700..."
4,Limite de commune,LIMI_COM0000000000000005,"LINESTRING (849782.300 6830318.200, 849792.100..."


### Extract muncipalitis belonging information from IGN data

In [91]:
list_municipalities_geofla = geo_municipalities_geofla[['INSEE_COM','NOM_COM','CODE_DEPT','NOM_DEPT','CODE_REG','NOM_REG']]

In [93]:
# write csv
list_municipalities_geofla.to_csv('data/raw/territory/boundaries/IGN/list_municipalities_geofla.csv',
                                    index=False)

# 2. France-geojson

In [24]:
import json

## Municipalities

In [56]:
with open("data/raw/territory/boundaries/France-Geojson/communes-version-simplifiee.json") as f:
    communes_version_simplifiee_geojson = json.load(f)

In [36]:
# convert geojson to geopandadataframe
communes_version_simplifiee_gpd = gpd.GeoDataFrame.from_features(communes_version_simplifiee_geojson["features"])

In [37]:
communes_version_simplifiee_gpd.head()

Unnamed: 0,geometry,code,nom
0,"POLYGON ((5.69816 45.86166, 5.70471 45.86125, ...",1073,Ceyzérieu
1,"POLYGON ((5.06729 45.88115, 5.07370 45.87243, ...",1262,Montluel
2,"POLYGON ((5.23549 46.10047, 5.23991 46.11296, ...",1425,Tranclière
3,"POLYGON ((3.34368 48.99501, 3.33626 48.99923, ...",2042,Azy-sur-Marne
4,"POLYGON ((3.09633 49.51790, 3.12117 49.52097, ...",2140,Camelin


In [39]:
# subset geodata
communes_version_simplifiee_gpd = communes_version_simplifiee_gpd[0:100]

In [40]:
# convert back to geojson
communes_version_simplifiee_geojson = communes_version_simplifiee_gpd.to_json()

In [41]:
# make map
m = folium.Map(location=[48.8534, 2.3488],
              zoom_start=7)

folium.GeoJson(
    communes_version_simplifiee_geojson,
    name='geojson'
).add_to(m)

<folium.features.GeoJson at 0x256a02867c8>

In [42]:
# plot map
m

## Regions

In [71]:
with open("data/raw/territory/boundaries/France-Geojson/regions-version-simplifiee.json") as f:
    regions_version_simplifiee_geojson = json.load(f)

In [72]:
# convert geojson to geopandadataframe
regions_version_simplifiee_gpd = gpd.GeoDataFrame.from_features(regions_version_simplifiee_geojson["features"])
regions_version_simplifiee_gpd.head()

Unnamed: 0,geometry,code,nom
0,"POLYGON ((2.59052 49.07965, 2.63327 49.10838, ...",11,ÃŽle-de-France
1,"POLYGON ((2.87463 47.52042, 2.88845 47.50943, ...",24,Centre-Val de Loire
2,"POLYGON ((3.62942 46.74946, 3.57569 46.74952, ...",27,Bourgogne-Franche-ComtÃ©
3,"POLYGON ((-1.11962 49.35557, -1.07822 49.38849...",28,Normandie
4,"POLYGON ((4.04797 49.40564, 4.03991 49.39740, ...",32,Hauts-de-France


In [73]:
# convert back to geojson
regions_version_simplifiee_geojson = regions_version_simplifiee_gpd.to_json()

In [82]:
import requests, json

content = requests.get("https://raw.githubusercontent.com/gregoiredavid/france-geojson/master/regions-version-simplifiee.geojson")
regions_geojson = json.loads(content.content)


In [84]:
# make map
m = folium.Map(location=[48.8534, 2.3488],
              zoom_start=7)

folium.GeoJson(
    regions_geojson,
    name='region'
).add_to(m)

# plot map
m

In [86]:
# convert geojson to geopandadataframe
gdf_regions = gpd.GeoDataFrame.from_features(regions_geojson["features"])

gdf_regions['Level'] = 'Region'
# rename columns
gdf_regions = gdf_regions.rename(columns={"code": "Code"})
gdf_regions.head()

Unnamed: 0,geometry,Code,nom,Level
0,"POLYGON ((2.59052 49.07965, 2.63327 49.10838, ...",11,Île-de-France,Region
1,"POLYGON ((2.87463 47.52042, 2.88845 47.50943, ...",24,Centre-Val de Loire,Region
2,"POLYGON ((3.62942 46.74946, 3.57569 46.74952, ...",27,Bourgogne-Franche-Comté,Region
3,"POLYGON ((-1.11962 49.35557, -1.07822 49.38849...",28,Normandie,Region
4,"POLYGON ((4.04797 49.40564, 4.03991 49.39740, ...",32,Hauts-de-France,Region


In [88]:
# save geo_regions
with open('data/refine/territory/boundaries/geo_regions_simplified.json', 'w') as f:
    f.write(gdf_regions.to_json())

# 3. Open Street Map

In [54]:
with open("data/raw/territory/boundaries/OSM/communes-20190101/communes-20190101.json") as f:
    communes_osm = json.load(f)

In [45]:
# convert geojson to geopandadataframe
communes_osm_gpd = gpd.GeoDataFrame.from_features(communes_osm["features"])
communes_osm_gpd.head()

Unnamed: 0,geometry,insee,nom,wikipedia,surf_ha
0,"POLYGON ((-60.93595 14.58812, -60.93218 14.585...",97223,Saint-Esprit,fr:Saint-Esprit (Martinique),2318
1,"POLYGON ((-61.12165 14.71928, -61.11852 14.716...",97233,Le Morne-Vert,fr:Le Morne-Vert,1325
2,"POLYGON ((-61.13355 14.74657, -61.13066 14.748...",97208,Fonds-Saint-Denis,fr:Fonds-Saint-Denis,2374
3,"POLYGON ((-61.08459 14.72510, -61.08430 14.722...",97224,Saint-Joseph,fr:Saint-Joseph (Martinique),4324
4,"POLYGON ((-61.08459 14.72510, -61.08061 14.725...",97212,Gros-Morne,fr:Gros-Morne,4601


In [46]:
# subset geodata
communes_osm_gpd = communes_osm_gpd[0:100]
# convert back to geojson
communes_osm_geojson = communes_osm_gpd.to_json()
# make map
m = folium.Map(location=[48.8534, 2.3488],
              zoom_start=7)

folium.GeoJson(
    communes_osm_geojson,
    name='geojson'
).add_to(m)
# plot map
m

# Region il de france

In [95]:
with open("data/raw/territory/boundaries/Datagouv/cog-communes.json") as f:
    communes_idf_datagouv = json.load(f)

In [97]:
# convert geojson to geopandadataframe
communes_idf_datagouv_gpd = gpd.GeoDataFrame.from_features(communes_idf_datagouv["features"])

Unnamed: 0,geometry,typecom,arr,dep,libelle,nccenr,tncc,ncc,com,comparent,geo_point_2d,can
0,,ARM,751,75,Paris 2e Arrondissement,Paris 2e Arrondissement,0,PARIS 2E ARRONDISSEMENT,75102,75056.0,,
1,,ARM,751,75,Paris 17e Arrondissement,Paris 17e Arrondissement,0,PARIS 17E ARRONDISSEMENT,75117,75056.0,,
2,"POLYGON ((2.75442 48.41154, 2.75315 48.40977, ...",COM,774,77,Avon,Avon,1,AVON,77014,,"[48.4132014943, 2.73295114141]",7707.0
3,"POLYGON ((2.68388 48.22254, 2.68421 48.22543, ...",COM,774,77,Bagneaux-sur-Loing,Bagneaux-sur-Loing,0,BAGNEAUX SUR LOING,77016,,"[48.2239749104, 2.70712515945]",7715.0
4,"POLYGON ((2.80322 48.82926, 2.80233 48.82977, ...",COM,775,77,Bailly-Romainvilliers,Bailly-Romainvilliers,0,BAILLY ROMAINVILLIERS,77018,,"[48.8432663218, 2.81651777455]",7721.0


In [98]:
# check null values
null_columns=communes_idf_datagouv_gpd.columns[communes_idf_datagouv_gpd.isnull().any()]
communes_idf_datagouv_gpd[null_columns].isnull().sum()

geometry          20
comparent       1268
geo_point_2d      20
can               21
dtype: int64

In [99]:
# remove lines with null values in geometry
communes_idf_datagouv_gpd = communes_idf_datagouv_gpd[~communes_idf_datagouv_gpd['geometry'].isnull()]

In [100]:
communes_idf_datagouv_gpd.head()

Unnamed: 0,geometry,typecom,arr,dep,libelle,nccenr,tncc,ncc,com,comparent,geo_point_2d,can
2,"POLYGON ((2.75442 48.41154, 2.75315 48.40977, ...",COM,774,77,Avon,Avon,1,AVON,77014,,"[48.4132014943, 2.73295114141]",7707
3,"POLYGON ((2.68388 48.22254, 2.68421 48.22543, ...",COM,774,77,Bagneaux-sur-Loing,Bagneaux-sur-Loing,0,BAGNEAUX SUR LOING,77016,,"[48.2239749104, 2.70712515945]",7715
4,"POLYGON ((2.80322 48.82926, 2.80233 48.82977, ...",COM,775,77,Bailly-Romainvilliers,Bailly-Romainvilliers,0,BAILLY ROMAINVILLIERS,77018,,"[48.8432663218, 2.81651777455]",7721
5,"POLYGON ((3.12699 48.40891, 3.12773 48.41043, ...",COM,773,77,Balloy,Balloy,0,BALLOY,77019,,"[48.3971235829, 3.1512896311]",7718
6,"POLYGON ((2.59817 48.43417, 2.59606 48.43462, ...",COM,774,77,Barbizon,Barbizon,0,BARBIZON,77022,,"[48.4483950589, 2.60083525255]",7707


In [101]:
# save geo_municipalities
with open('data/raw/territory/boundaries/Datagouv/communes_idf_datagouv.json', 'w') as f:
    f.write(communes_idf_datagouv_gpd.to_json())

# 4. Conclusion

## File size

The major difference between the geojson files remains to their size. Since, the geojson data will be used for the chlopeth map, it is important to have the smaller possible file.

In [90]:
import os
# size of osm data
osm_size_bytes = os.stat("data/raw/territory/boundaries/OSM/communes-20190101/communes-20190101.json").st_size
osm_size_Mb = osm_size_bytes/(1024*1024)
print(osm_size_Mb)
# size of simplified geojson data
france_geojson_size_bytes = os.stat("data/raw/territory/boundaries/France-Geojson/communes-version-simplifiee.json").st_size
france_geojson_size_Mb = france_geojson_size_bytes/(1024*1024)
print(france_geojson_size_Mb)

84.65259742736816
18.446434020996094


## Comparison

This table presents a brief comparison of the different data sources. We think that France-Geojson dataset is more suitable for our use case.

| Producer | Dataset | Easy to use | Size  |
| :--- |:--- | :--- | :--- |
| data.gouv | GeoZones | Belonging information are not easy to collect. | 40 Mo for municiplity level |
| IGN | Geofla | Dataset is complete | 44 Mo for municiplity level | 
| OSM | geojson | Dataset is complete | 84 Mo for municiplity level | 
| France-Geojson | France-Geojson | Dataset is complete | 18 Mo for municiplity level | 