# Exploratory Analysis of Norway Population Data

The norway population dataset has a granularity of 1km^2 and was recorded in 2023. It was retrieved from GeoNorge with data from SSB: https://kartkatalog.geonorge.no/metadata/befolkning-paa-rutenett-1000-m-2023/8de78b6a-6634-40f2-aac1-954d82ec31b7.

In this file, an exploratory analysis of the population dataset will be undertaken. The goal of this exploration is to ascertain distinct population categories, which can then be used further in the model dataset.

In [None]:
import pandas as pd
import geopandas as gpd
import seaborn as sns
import matplotlib.pyplot as plt

# Exploratory analysis of population data, to form categories.
norway_pop_gdf = gpd.read_file("../data/Befolkning_0000_Norge_25833_BefolkningsstatistikkRutenett1km2023_GML.gml", driver="GML")
print(norway_pop_gdf["popTot"])

### Kernel Density Estimation -- Smooth distribution of population

In [None]:
sns.displot(norway_pop_gdf, x="popTot", kind="kde", bw_adjust=.25)
plt.savefig("../plots/population/total_norway_pop_kde.png")

### Box plot of population
Box plot for total population included in norway 1km^2 population dataset.

In [None]:
sns.boxplot(y="popTot", data=norway_pop_gdf)
plt.savefig("../plots/population/total_norway_pop_box_plt.png")

## Node-Population Dataset Exploration

The Node-Population dataset is the result of combining populations with nodes given the latitude and longitude.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
node_pop_df = pd.read_csv("../data/updated_node_uptime_with_locations.csv")
print(node_pop_df)

### Kernel Density Estimation of Node Population

In [None]:
from scipy.signal import find_peaks
from scipy.signal import argrelextrema
import numpy as np

node_plot = sns.kdeplot(data=node_pop_df['population'], color='r', bw_adjust=0.2, clip= (0, None))
plt.savefig("../plots/population/node_pop_kde.png")

kde_data = node_plot.get_lines()[0].get_data()
x_kde, y_kde = kde_data

peaks, i = find_peaks(kde_data[1], height=0)
# print(i)
peak_values = [int(peak) for peak in kde_data[0][peaks]]
print(peaks)
maxima = [(x_kde[ind], y_kde[ind]) for ind in peaks]


# print(kde_data)
minima_indices = argrelextrema(y_kde, np.less)[0]
minima_values = [int(x_kde[min_index]) for min_index in minima_indices]
minima_points = [(x_kde[i], y_kde[i]) for i in minima_indices]

# plt.scatter(*zip(*maxima), color='black', label='Maxima')
plt.scatter(*zip(*minima_points), color='blue', label='Minima')

print("Population peaks:", peak_values)
print("Population minima:", minima_values)

plt.legend()
plt.savefig('population_kde.png')


### Box plot of node-population

In [None]:
sns.boxplot(y="population", data=node_pop_df)
plt.savefig("../plots/population/node_pop_box_plt.png")

In [None]:
fig, ax1 = plt.subplots()

sns.boxplot(x=node_pop_df['population'], orient='h', ax=ax1)

ax2 = ax1.twinx()

sns.kdeplot(data=node_pop_df['population'], ax=ax2, color='r', bw_adjust=0.2, clip= (0, None))
plt.savefig("KDE_box_plot_node_pop.png")