# Norwegian Population Center
This notebook uses data about population of Norwegian municipalities and the location of administrative services in the municipalities to calculate a geographic center of the country weighted by population.

We begin by importing necessary packages.

In [84]:
import pandas as pd
import utm

We then load the data, and have look at the first few rows.

In [9]:
population_data = pd.read_csv('data/befolkning.csv', sep=';', decimal=',', usecols=[i for i in range(4)])

In [10]:
population_data.head()

Unnamed: 0,name,2018,2019,increase 2018-2019
0,,,,
1,Heile landet,5295619.0,5328212.0,0.6
2,,,,
3,01 Østfold,295420.0,297520.0,0.7
4,0101 Halden,31037.0,31177.0,0.5


There are some empty rows, so we will drop these. We will also drop all rows containing information about counties instead of municipalities, and also add an extra column with just the municipality code. Finally, we set this new column to be the index of the DataFrame.

In [54]:
population_data = population_data.dropna()
population_data['code'] = population_data['name'].str.split().str[0]
population_data = population_data[population_data['code'].str.len() == 4]
population_data['code'] = population_data['code'].astype(int)
population_data = population_data.set_index('code')

In [55]:
population_data.head()

Unnamed: 0_level_0,name,2018,2019,increase 2018-2019
code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
101,0101 Halden,31037.0,31177.0,0.5
104,0104 Moss,32588.0,32726.0,0.4
105,0105 Sarpsborg,55543.0,55997.0,0.8
106,0106 Fredrikstad,80977.0,81772.0,1.0
111,0111 Hvaler,4540.0,4599.0,1.3


Next up, we load the data necessary to retrieve a geographic location for each municipality.

In [60]:
mun_cover = pd.read_csv('http://hotell.difi.no/download/difi/etatsbasen/covers', sep=';', decimal=',', index_col='tailid')
mun_position = pd.read_csv('http://hotell.difi.no/download/difi/etatsbasen/position', sep=';', decimal=',', index_col='tailid')

In [61]:
mun_cover.head()

Unnamed: 0_level_0,geocode
tailid,Unnamed: 1_level_1
25304,428
25305,1835
25306,122
25308,800
25311,1121


In [62]:
mun_cover.shape

(15759, 1)

In [63]:
mun_position.head()

Unnamed: 0_level_0,coordsys,x,y,z
tailid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
25348,utm33,46374.0,6954209.0,
25349,utm33,5455.0,6626756.0,
25350,utm33,-47807.0,6722869.0,
25352,utm33,44123.0,6957977.0,
25353,utm33,182280.0,6998344.0,


We can then join all DataFrames together to get a new DataFrame with population and geographical locations for all municipalities in Norway.

In [64]:
mun_positions = mun_cover.join(mun_position)

In [65]:
mun_positions.head()

Unnamed: 0_level_0,geocode,coordsys,x,y,z
tailid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
20504,301,,,,
20518,301,,,,
20537,301,utm33,262127.0,6649664.0,
20605,1903,utm33,562319.0,7632932.0,
20605,1917,utm33,562319.0,7632932.0,


We depend on the coordinates, so we will drop all rows with missing x or y values.

In [70]:
mun_positions = mun_positions[mun_positions['x'].notna()]

We only need a single service from each municipality to retrieve a position, so we will drop all duplicates on the geocode field.

In [72]:
mun_positions = mun_positions.drop_duplicates('geocode')
mun_positions.shape

(452, 5)

We can then join this new DataFrame with the one containing population data:

In [73]:
all_data = mun_positions.join(population_data, on='geocode')

In [74]:
all_data.shape

(452, 9)

In [75]:
all_data.head()

Unnamed: 0_level_0,geocode,coordsys,x,y,z,name,2018,2019,increase 2018-2019
tailid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
20537,301,utm33,262127.0,6649664.0,,0301 Oslo kommune,673469.0,681071.0,1.1
20605,1903,utm33,562319.0,7632932.0,,1903 Harstad - Hárstták,24820.0,24827.0,0.0
20605,1917,utm33,562319.0,7632932.0,,1917 Ibestad,1380.0,1375.0,-0.4
20605,1911,utm33,562319.0,7632932.0,,1911 Kvæfjord,2928.0,2858.0,-2.4
20605,1913,utm33,562319.0,7632932.0,,1913 Skånland,2994.0,3009.0,0.5


We do a quick check for missing values on the necessary columns:

In [76]:
na_info = all_data.isna()
missing_x = na_info['x'].sum()
missing_y = na_info['y'].sum()
missing_pop = na_info['2019'].sum()
print(f'Missing x: {missing_x}, missing y: {missing_y}, missing population: {missing_pop}')

Missing x: 0, missing y: 0, missing population: 82


We are actually missing some population data. We check how many of these are also missing population data from 2018:

In [77]:
(na_info['2019'] & na_info['2018']).sum()

82

It appears all these rows miss population data for both years. We should probably have a look at these rows.

In [78]:
missing_pop_df = all_data[na_info['2019'] & na_info['2018']]
missing_pop_df.head()

Unnamed: 0_level_0,geocode,coordsys,x,y,z,name,2018,2019,increase 2018-2019
tailid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
20606,2111,utm33,653350.0,7731850.0,,,,,
20632,1567,utm33,133701.0,7018720.0,,,,,
20634,1702,utm33,317368.0,7073854.0,,,,,
20634,1756,utm33,317368.0,7073854.0,,,,,
20634,1721,utm33,317368.0,7073854.0,,,,,


These rows miss all values from the population data set, so clearly they are totally missing from this set. They might be municipalities that no longer exist, or simple errors in the data. Either way, we will drop these rows.

In [79]:
all_data = all_data[all_data['2019'].notna()]

In [80]:
all_data.shape

(370, 9)

The new number of municipalities is correct according to the number of municipalities in Norway after the recent reforms.

We now have all the data we need to actually perform our calculations. The calculation will simply entail finding the weighted average position in both coordinate axes.

In [82]:
total_pop = all_data['2019'].sum()
x = (all_data['x'] * all_data['2019']).sum() / total_pop
y = (all_data['y'] * all_data['2019']).sum() / total_pop

print(f'Norwegian population center (UTM33): ({x:.2f}, {y:.2f})')

Norwegian population center: (212542.93, 6790470.55)


The coordinates are UTM coordinates based on zone 33, which sounds cryptic and is hard to understand. Luckily, there exists a small python library for doing the conversion, so that we don't need to learn the necessary formulas.

In [91]:
lat, long = utm.to_latlon(x, y, 33, 'T')
print(f'Norwegian population center (WGS84): {lat:.5f}N {long:.5f}E')

Norwegian population center (WGS84): 61.14303N 9.65700E


Below, we generate a link to open this position directly in Google Maps:

In [92]:
print(f'https://www.google.com/maps/place/{lat},{long}')

https://www.google.com/maps/place/61.14303491940368,9.657001899409641
