Key parameters for Finnish health districts

The aim of this work is to

1. Creating a GIS layer for Finnish Health Districts
2. Illustrating proportion of population over 65 years and medium houshold income for every district

We will create the health district polygons based on Statistics Finland municipality polygons, population and zip code info and a list of health care districts by municipality from Kuntaliitto.

Finnish municipalities with health district information is an Excel spreadsheet from here: https://www.kuntaliitto.fi/sosiaali-ja-terveysasiat/sairaanhoitopiirien-jasenkunnat File Shp_jäsenkunnat_2020.xls, sheet kunnat_shp_2020_ aakkosjärj.

"shp" stands for "sairaanhoitopiiri" (health district in Finnish). I have changed the name of the file to Shp_jasenkunnat_2020.xls and sheet to kunnat_shp_2020_ aakkosjarj

Municipality polygons from Statistics Finland web feature service: https://www.stat.fi/org/avoindata/paikkatietoaineistot/kuntapohjaiset_tilastointialueet.html wfs: http://geo.stat.fi/geoserver/tilastointialueet/wfs? feature: tilastointialueet:kunta1000k (most recent information about municipality polygons).

Population count for each municipality from Statistics Finland: https://www.stat.fi/org/avoindata/paikkatietoaineistot/vaesto_tilastointialueittain.html

Medium Houshold income for every houshold from the Paavo database: https://www.stat.fi/org/avoindata/paikkatietoaineistot.html (to be added later).

Note, this data set does not include Åland (Ahvenanmaa). Åland municipalities are added in the later step.

In [None]:
import json
import numpy as np
import pandas as pd
import geopandas as gpd
from pyproj import CRS
import matplotlib.pyplot as plt
import xlrd   #to be able to read Excel

In [None]:
#check if you have up to date version of pandas, problem can occur when loading geopandas
pd.show_versions(as_json=False)

In [None]:
#1. Health district data

df_orig = pd.read_excel("Shp_jasenkunnat_2020.xls", sheet_name="kunnat_shp_2020_ aakkosjarj",
                     header=3)
df_orig.dropna(inplace=True)
df_orig.head()

In [None]:
df = df_orig.copy()
print(df.shape)
df.tail()

In [None]:
df.rename(columns={"kunta-\nkoodi":"code", 'sairaanhoitopiiri':'healthCareDistrict'},
          inplace=True)
df = df[['code','healthCareDistrict']]


In [None]:
# Truncate and convert to character string
df["code"] = df["code"].astype(int).astype('str')
df.head()


In [None]:
# Add missing zeros to municipality codes
df["code"] = df["code"].apply(lambda x: "00" + x if len(x)==1 else x)
df["code"] = df["code"].apply(lambda x: "0" + x if len(x)==2 else x)
df.tail()


In [None]:
df.head()

Municipality polygons from Statistics Finland web feature service: https://www.stat.fi/org/avoindata/paikkatietoaineistot/kuntapohjaiset_tilastointialueet.html wfs: http://geo.stat.fi/geoserver/tilastointialueet/wfs? feature: tilastointialueet:kunta1000k (most recent information about municipality polygons)


In [None]:
#2. GIS layer data

# For available features, see http://geo.stat.fi/geoserver/tilastointialueet/wfs?request=GetCapabilities
#slow step! 
url = "http://geo.stat.fi/geoserver/tilastointialueet/wfs?request=GetFeature&typename=tilastointialueet:kunta1000k&outputformat=JSON"
geodata_orig = gpd.read_file(url)

In [None]:
# There are 310 municipalities in Finland in 2020
geodata = geodata_orig.copy()
print(geodata.shape)
geodata.head()

In [None]:
#Select and rename columns
geodata = geodata[['kunta', 'geometry']]
geodata.rename(columns={'kunta':'code'}, inplace=True)
geodata.tail()

In [None]:
#Plot of municipalities
geodata.plot()

Population count for each municipality from Statistics Finland: https://www.stat.fi/org/avoindata/paikkatietoaineistot/vaesto_tilastointialueittain.html

WFS: http://geo.stat.fi/geoserver/vaestoalue/wfs 
Note: Valtimo merged with Nurmes in 2020. Belongs to Pohjois-Karjala health care district. 

In [None]:
# For available features, see http://geo.stat.fi/geoserver/vaestoalue/
#wfs?request=GetCapabilities

url = "http://geo.stat.fi/geoserver/vaestoalue/wfs?request=GetFeature&typename=vaestoalue:kunta_vaki2018&outputformat=JSON"
pop_orig = gpd.read_file(url)

In [None]:
pop = pop_orig.copy()
print(pop.shape)
print(list(pop))
pop.tail()

In [None]:
#Select and rename columns
pop = pop[["kunta", "name", "vaesto","ika_65_", "geometry"]]
pop.rename(columns={'kunta':'code', 'vaesto':'population_31_12_2018', 'ika_65_':'age_65'}, inplace=True)
pop.tail()

In [None]:
# Check length, in 2020, there are 310 Municipalities. 
# 2019 data still contains Valtimo which was merged with Nurmes at the end of 2019
pop.loc[pop['name'] == 'Valtimo']

In [None]:
pop.loc[pop['name'] == 'Nurmes']

In [None]:
pop.loc[292, 'name'] = 'Nurmes'
pop.loc[292, 'code'] = 541

temp = pop.loc[pop['name'] == 'Nurmes']
temp

In [None]:
# Re-join municipality names to the new geometries. 
temp = temp.dissolve(by="name", aggfunc = 'sum')
temp.reset_index(inplace=True)  
temp

In [None]:
temp.loc[0, 'geometry']

In [None]:
pop.loc[292, 'geometry']

In [None]:
pop.loc[176, 'geometry']

In [None]:
pop.loc[176, 'geometry'] = temp.loc[0, 'geometry']
pop.loc[176, 'population_31_12_2018'] = temp.loc[0, 'population_31_12_2018']
pop.loc[176, 'age_65'] = temp.loc[0, 'age_65']
#drop Valtimo
pop = pop.drop(292)
print(pop.shape)
pop.loc[176]

In [None]:
pop.info()

In [None]:
#The population data comes also with municipality polygons.
pop.plot()

In [None]:
#However, I will return to using the 1000k more precise polygons read in earlier

geodata = geodata.merge(pop[["code", "name", "population_31_12_2018", "age_65"]], on="code")
geodata.head()

In [None]:
#Join Health district to geodata
geodata = geodata.merge(df, on="code", how="left")
geodata.tail(8)

In [None]:
# Municipalities in the Åland island did not have a matching health care district in the data
# count the number of NaN values in each column
print(geodata.isnull().sum())
geodata[geodata.healthCareDistrict.isnull()].name

In [None]:
# Update "Ahvenanmaa" as the health care district for Åland municipalities (16 municipalities in total)
geodata.loc[geodata.healthCareDistrict.isnull(),'healthCareDistrict'] = "Ahvenanmaa"
geodata.healthCareDistrict.value_counts()

In [None]:
geodata.info()

In [None]:
#Create polygons for health care districts
# Dissolve (=combine) municipality polygon geometries for each health care district
#https://geopandas.org/aggregation_with_dissolve.html
# In the geopandas library, we can aggregate geometric features using the dissolve function.

districts = geodata.dissolve(by='healthCareDistrict', aggfunc="sum")
districts.reset_index(inplace=True)
districts

In [None]:
#calclulate percentage old
districts['perc_pop_over_65'] = round( (districts['age_65']/districts['population_31_12_2018']*100) , 1)
districts

In [None]:
# Plot population estimates with an accurate legend, https://geopandas.org/mapping.html
from mpl_toolkits.axes_grid1 import make_axes_locatable

fig, ax = plt.subplots(figsize=(20, 10))
divider = make_axes_locatable(ax)
cax = divider.append_axes("bottom", size="5%", pad=0.1)   
fig.suptitle('Percentage of population over 65 years old in Finnish health districts', fontsize=16)

districts.plot( column='perc_pop_over_65', ax=ax, cax = cax, legend=True, legend_kwds=
               {'label': "Percentage of population over 65 years", 'orientation': "horizontal"} ) 

In [None]:
# Write population per health care district to csv
districts[['healthCareDistrict', 'geometry', 'population_31_12_2018']].to_csv("healtCareDistricts_pop.csv")
