In [1]:
import glob
import os
import shutil

import fiona
import geopandas as gpd

# Task 2.1: Update TEOTIL "core" datasets

## Part A: Process historic administrative boundaries

TEOTIL needs access to historic kommune and fylke boundaries and ID codes, which change surprisingly regularly. Some historic datasets are available from Geonorge ([here](https://kartkatalog.geonorge.no/metadata/administrative-enheter-historiske-versjoner/9bc064e3-6c34-4c3a-8421-00290052e9c0)), but the formats, file names and attributes are inconsistent. This notebook attempts to process the historic data to create a geopackage of standardised kommune and fylke boundaries for use later.

In [2]:
data_fold = r"/home/jovyan/shared/teotil3/core_data_june_2022/administrative"
raw_fold = os.path.join(data_fold, "raw")
temp_fold = os.path.join(data_fold, "temp")

## 1. Fylke and kommune codes

Each fylke is assigned a four digit code ending `00`, where the first two digits give the fylkesnummer. Each fylke is then subdivided into kommuner, which also have four digit codes where the first two digits match the fylkesnummer and the last two digits uniquely identify the kommune (i.e. each fylke could potentially have up to 99 kommuner).

Fylke boundaries were fairly stable from ~1980 to 2017, then some changes took place in 2018 and then again in 2020 (see [here](https://no.wikipedia.org/wiki/Fylkesnummer)). However, the exact kommuner assigned to each fylke have changed a bit over time.

Looking at the data from Geonorge, the most complete and consistent datasets are for the kommuner. It is therefore easiest to ignore the fylke datasets completely and instead create them by extracting the first two digits from the kommune dataset and dissolving based on that.

## 2. Convert SOSI to shapefiles

Many of the older datasets are only available in SOSI format. I have downloaded all the data from [here](https://kartkatalog.geonorge.no/metadata/administrative-enheter-historiske-versjoner/9bc064e3-6c34-4c3a-8421-00290052e9c0) (a mixture of `.sos` and `.gdb` files) and renamed them in a consistent way. I have also downloaded [sosicon](https://sosicon.espenandersen.no/), which is a command line utility for converting SOSI files to `.shp`. The application is here

    shared/teotil3/sosicon
    
and all the raw administrative data (up to and including 2022) is here

    shared/teotil3/core_data_june_2022/administrative/raw
    
The code below converts all SOSI files in this folder to `.shp` and stores them in

    shared/teotil3/core_data_june_2022/administrative/temp
    
**Note:** `sosicon` produces a lot of text output, so to avoid the notebook becoming large/messy, it's easier to paste the command below directly into a terminal, rather than running from within the notebook.

**Note2:** For some reason, the sosicon command line program produces invalid output for `admin2017_Kommune_FLATE.shp` and `admin2018_Kommune_FLATE.shp` leaving gaps in the timeseries. However, the [online version](https://app.sosicon.espenandersen.no/) of sosicon seems to handle these conversions OK. I have therefore manually converted `admin2017.sos` and `admin2018.sos` and then replaced the auto-generated versions of `admin2017_Kommune_FLATE.shp` and `admin2018_Kommune_FLATE.shp` with the ones produced online.

In [3]:
#! ls /home/jovyan/shared/teotil3/core_data_june_2022/administrative/raw/*.sos | /home/jovyan/shared/teotil3/sosicon/sosicon -2shp -d /home/jovyan/shared/teotil3/core_data_june_2022/administrative/temp

## 3. Process SOSI-derived shapefiles

Sosicon generates *a lot* of shapefiles. Some of these seem to be invalid and we're only interested in **polygons** for the **kommuner**. The code below ignores everything else, generates fylker from the kommuner, and saves all results to a geopackage.

In [4]:
# List all .shp files
search_path = os.path.join(temp_fold, "*.shp")
flist = glob.glob(search_path)

admin_gpkg = os.path.join(data_fold, "admin_data.gpkg")
if os.path.isfile(admin_gpkg):
    os.remove(admin_gpkg)

# Different col names used to contain the kommune number
kom_cols = ["KOMM", "KOMM      ", "KOMMUNENUM"]

for fpath in flist:
    # Ignore if .shp is invalid
    try:
        gdf = gpd.read_file(fpath)
    except:
        continue

    # If file OK, extract basic info
    geoms = gdf.geom_type.unique()
    fname = os.path.split(fpath)[1]
    admin_type = fname.split("_")[1]
    year = fname.split("_")[0][-4:]

    # Only process Kommune data
    if admin_type == "Kommune":
        # Only Polygon geoms
        if ("Polygon" in geoms) or ("MultiPolygon" in geoms):
            print("Processing", fname)
            # Find col with kommnenummers
            kom_col = [i for i in kom_cols if i in gdf.columns]
            if len(kom_col) != 1:
                print(fname, kom_col)
                print(gdf.columns)
                raise ValueError("Could not identify kommunenummer field.")
            kom_col = kom_col[0]

            # Get fylkesnummer from kommnr
            gdf[kom_col] = gdf[kom_col].astype(str)
            gdf["fylnr"] = gdf[kom_col].str[:2]
            gdf.rename({kom_col: "komnr"}, axis="columns", inplace=True)
            gdf = gdf[["fylnr", "komnr", "geometry"]]

            # Dissolve
            kom_gdf = gdf.dissolve(by="komnr", aggfunc="first").reset_index()
            kom_gdf = kom_gdf[["fylnr", "komnr", "geometry"]]
            fyl_gdf = gdf.dissolve(by="fylnr").reset_index()
            fyl_gdf = fyl_gdf[["fylnr", "geometry"]]

            # Save
            kom_gdf.to_file(
                admin_gpkg,
                driver="GPKG",
                layer=f"kommuner{year}",
                index=False,
            )
            fyl_gdf.to_file(
                admin_gpkg,
                driver="GPKG",
                layer=f"fylker{year}",
                index=False,
            )

shutil.rmtree(temp_fold)

Processing admin2017_Kommune_FLATE.shp
Processing kommuner2011_Kommune_FLATE.shp
Processing kommuner2009_Kommune_FLATE.shp
Processing admin2014_Kommune_FLATE.shp
Processing admin2003_Kommune_FLATE.shp
Processing kommuner2007_Kommune_FLATE.shp
Processing kommuner2010_Kommune_FLATE.shp
Processing admin2015_Kommune_FLATE.shp
Processing kommuner2013_Kommune_FLATE.shp
Processing kommuner2006_Kommune_FLATE.shp
Processing kommuner2005_Kommune_FLATE.shp
Processing admin2018_Kommune_FLATE.shp
Processing kommuner2013_Kommune_FLATE_01.shp
Processing admin2004_Kommune_FLATE.shp
Processing kommuner2008_Kommune_FLATE.shp
Processing kommuner2012_Kommune_FLATE.shp
Processing admin2016_Kommune_FLATE.shp


## 4. Process geodatabase files

The most recent datasets (2019 onwards) are only available in geodatabase format.

In [5]:
search_path = os.path.join(raw_fold, "kommuner*.gdb")
flist = glob.glob(search_path)
for fpath in flist:
    fname = os.path.split(fpath)[1][:-4]
    print("Processing", fname)

    year = fname[-4:]
    gdf = gpd.read_file(fpath, driver="fileGDB", layer="kommune")

    # Get fylkesnummer from kommnr
    gdf.rename({"kommunenummer": "komnr"}, axis="columns", inplace=True)
    gdf["komnr"] = gdf["komnr"].astype(str)
    gdf["fylnr"] = gdf["komnr"].str[:2]
    gdf = gdf[["fylnr", "komnr", "geometry"]]

    # Dissolve
    kom_gdf = gdf.dissolve(by="komnr", aggfunc="first").reset_index()
    kom_gdf = kom_gdf[["fylnr", "komnr", "geometry"]]
    fyl_gdf = gdf.dissolve(by="fylnr").reset_index()
    fyl_gdf = fyl_gdf[["fylnr", "geometry"]]

    # Save
    kom_gdf.to_file(
        admin_gpkg,
        driver="GPKG",
        layer=f"kommuner{year}",
        index=False,
    )
    fyl_gdf.to_file(
        admin_gpkg,
        driver="GPKG",
        layer=f"fylker{year}",
        index=False,
    )

Processing kommuner2022
Processing kommuner2021
Processing kommuner2020
Processing kommuner2019


## 5. Explore layers

In [6]:
sorted(fiona.listlayers(admin_gpkg))

['fylker2003',
 'fylker2004',
 'fylker2005',
 'fylker2006',
 'fylker2007',
 'fylker2008',
 'fylker2009',
 'fylker2010',
 'fylker2011',
 'fylker2012',
 'fylker2013',
 'fylker2014',
 'fylker2015',
 'fylker2016',
 'fylker2017',
 'fylker2018',
 'fylker2019',
 'fylker2020',
 'fylker2021',
 'fylker2022',
 'kommuner2003',
 'kommuner2004',
 'kommuner2005',
 'kommuner2006',
 'kommuner2007',
 'kommuner2008',
 'kommuner2009',
 'kommuner2010',
 'kommuner2011',
 'kommuner2012',
 'kommuner2013',
 'kommuner2014',
 'kommuner2015',
 'kommuner2016',
 'kommuner2017',
 'kommuner2018',
 'kommuner2019',
 'kommuner2020',
 'kommuner2021',
 'kommuner2022']

This notebook has generated complete annual data series for fylker and kommuner from 2003 to 2022 inclusive.