# 4. Analyse der Hageldaten kombiniert mit den KFZ-Zulassungsdaten
Im folgenden Notebook werden die csv-Daten für die KfZ-Zulassung und die Hagel-Daten für jeden Kanton 
kombiniert, um Visualisierungen über Orte mit höherem Risiko für die Versicherung zu erstellen.

* [4.1 Erforderliche Python-Bibliotheken](#python_libraries_)

* [4.2 Einlesen der Eingabedatei](#read_input_file)
    
* [4.3 Zusammenführen von Hagel- und Autodaten](#merge_kfz_hail)

* [4.4 Visualisierung der kombinierten Ergebnisse](#visualization_results)


<a id="python_libraries_"></a>
## 4.1 Erforderliche Python-Bibliotheken

In [None]:
import os
from pathlib import Path
#import xarray as xr
import pandas as pd
import geopandas as gpd
import numpy as np
import copy
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# display map
import folium
from branca.colormap import linear

# interactive plots
import ipywidgets as widgets
from ipywidgets import interactive
from IPython.display import display

In [None]:
# This is needed for the grid visualization later
from mpl_toolkits.axes_grid1 import ImageGrid

<a id="read_input_file"></a>
## 4.2 Einlesen der Eingabedatei

In [None]:
# Files are saved in the "processed" folder 
processed_data_path = Path("./data/processed")

kfz_data = pd.read_csv(processed_data_path/'interim_kfz_data.csv', sep=";")
hail_data = pd.read_csv(processed_data_path/'haildata_per_month_canton.csv', sep=",")

In [None]:
# swiss canton Shape file
shape_path = Path("data") / "swiss_shapes"
shape_canton_file = "swissBOUNDARIES3D_1_4_TLM_KANTONSGEBIET.shp"

# read in swiss canton shape file as GeoDataFrame
cantons_gdf = gpd.read_file(shape_path / shape_canton_file)

cantons = {
    "Zürich": "ZH",
    "Bern": "BE",
    "Luzern": "LU",
    "Uri": "UR",
    "Schwyz": "SZ",
    "Obwalden": "OW",
    "Nidwalden": "NW",
    "Glarus": "GL",
    "Zug": "ZG",
    "Fribourg": "FR",
    "Solothurn": "SO",
    "Basel-Stadt": "BS",
    "Basel-Landschaft": "BL",
    "Schaffhausen": "SH",
    "Appenzell Ausserrhoden": "AR",
    "Appenzell Innerrhoden": "AI",
    "St. Gallen": "SG",
    "Graubünden": "GR",
    "Aargau": "AG",
    "Thurgau": "TG",
    "Ticino": "TI",
    "Vaud": "VD",
    "Valais": "VS",
    "Neuchâtel": "NE",
    "Genève": "GE",
    "Jura": "JU"
}
canton_map = cantons_gdf.replace({"NAME": cantons})

In [None]:
# Selecting certain columns
columns_to_select = ['year', 'canton', 'haildays_per_point']

# Deep copy of selected columns
hail_data_copy = copy.deepcopy(hail_data[columns_to_select])

#changing names of columns so that we have similar names between hail and kfz data
# Rename columns using a dictionary
new_column_names = {
    'year': 'Jahr',
    'canton': 'Kanton',
    'haildays_per_point': 'number of hail days'
}

hail_data_copy = hail_data_copy.rename(columns=new_column_names)


In [None]:
hail_data_copy.info()

<a id="merge_kfz_hail"></a>
## 4.3 Zusammenführen von Hagel- und Autodaten

In [None]:
# Merge haildays and kfz dataframes by multiindex
#Because the KfZ data is available only after 2005, the data from hail dataframe is filtered first

# Filter the data in the first dataframe based on Jahr >= 2005
hail_data_filtered = hail_data_copy[hail_data_copy['Jahr'] >= 2005]

# Perform the inner join on Kanton and Jahr columns
kfz_hail_df = pd.merge(hail_data_filtered, kfz_data, on=['Kanton', 'Jahr'], how='inner')

# Print the merged dataframe header
kfz_hail_df.head()


Eine neue Spalte wird durch Multiplikation der Elemente der Spalte *number of hail days* und *Total-KFZ* erstellt. 

Die Einheit der neuen Spalte ist **HagelTag x Auto**. 

Die Idee dabei ist, dass das Risiko proportional zur Anzahl der Hageltage und der Anzahl der Autos ist. Wenn es beispielsweise 100 Autos in einem bestimmten Kanton und 10 Hageltage gibt, wird angenommen, dass das Risiko dem Fall entspricht, dass wir 1000 Autos und 1 Tag mit Hagel haben.

In [None]:
# A new column is created by multiply elements of column "number of hail days" and "Total-KFZ" 
# The Unite of the new column is "day-car". Idea here is that risk is proportional to number of haildays and number of cars
# as an example, if there were 100 cars in a specific canton and 10 days of hail, the risk is assumed to be equal 
# to the case where we had 1000 cars and 1 day with hail.

kfz_hail_df["hailday_kfz"] =kfz_hail_df.apply(lambda row: row['number of hail days'] * row['Total_KFZ'], axis=1)


In [None]:
# Column Jahr is in int64 format so we covert that column 
kfz_hail_df['Jahr'] = pd.to_datetime(kfz_hail_df['Jahr'], format='%Y').dt.strftime('%Y')

kfz_hail_df.info()

In [None]:
# To make more sense of the data, the *hailday_kfz* is normalized per 100k autos.

kfz_hail_df['hailday_kfz_normalized'] = kfz_hail_df['hailday_kfz'] / 100000
kfz_hail_df = kfz_hail_df.sort_values('Jahr', ascending=False).reset_index(drop=True)

In [None]:
kfz_hail_df.head()

<a id="visualization_results"></a>
## 4.4 Visualisierung der kombinierten Ergebnisse

### Bar running chart

In [None]:
!pip install bar_chart_race

In [None]:
import bar_chart_race as bcr

In [None]:
# To create a race bar chart, data has to be in wide format. 
# A wide format contains values that do not repeat in the first column.)

# Convert to wide format
kfz_hail_wide = kfz_hail_df.pivot(index='Jahr', columns='Kanton', values='hailday_kfz_normalized')

In [None]:
kfz_hail_wide

<div class="alert alert-block alert-warning">
Wichtiger Hinweis: <b>ffmpeg</b> sollte auf dem Computer installiert sein, um das folgende Race-Bar-Diagramm auszuführen.
</div>

Installation von **ffmpeg** :

Für Windows:

1. Visit the official ffmpeg website: https://www.ffmpeg.org/download.html.
2. Scroll down to the "Windows Builds" section and click on the link corresponding to "Download FFmpeg".
3. Choose the appropriate version based on your system architecture (32-bit or 64-bit).
4. Extract the downloaded zip file to a directory of your choice.
5. Add the path to the ffmpeg executable (e.g., ffmpeg/bin) to your system's PATH environment variable.

Für macOS:

1. Open a terminal.
2. Install Homebrew if you haven't already. Visit the Homebrew website (https://brew.sh/) and follow the installation instructions.
3. Once Homebrew is installed, run the following command in the terminal:

- *brew install ffmpeg*

Für Linux (Ubuntu):

1. Open a terminal.
2. Run the following command to install ffmpeg:
    
- *sudo apt-get update*
- *sudo apt-get install ffmpeg*


In [None]:
import warnings
warnings.filterwarnings('ignore')

# for settings, see here: https://www.dexplo.org/bar_chart_race/api/

bcr.bar_chart_race(
    df = kfz_hail_wide, 
    title = "Hageltagen x Anzahl Autos per Kanton (x100k)", 
    n_bars=10, 
    orientation='h', 
    fixed_order= False,
    fixed_max = True,
    cmap = 'prism',
    steps_per_period=10, 
    period_length=1000, 
    label_bars = False)
     

### Raster der einzelnen Balkendiagramme pro Jahr

Die einzelnen Balkendiagramme für jedes Jahr dienen dem Vergleich des Hagelrisikos für Autos in den verschiedenen Kantonen.

In [None]:
# Below setting is to avoid a scroll down for the figures (full screen figure)

In [None]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

In [None]:
sns.catplot(
    data=kfz_hail_df.groupby('Jahr').apply(lambda x: x.sort_values('hailday_kfz_normalized',ascending=False)).reset_index(drop=True), 
    x="hailday_kfz_normalized", 
    y="Kanton", 
    col="Jahr",
    kind="bar", 
    height=5, 
    aspect=.8, 
    col_wrap=3,
    ).set_xlabels('Hageltagen x Anzahl Autos (x100k)')

plt.show()


### Imagegrid des Hagelrisikos für Autos pro Jahr und Kanton

In [None]:
# color map defintions
color_map_haildays_kfz = "YlOrRd"

# xears from 2005 to 2022
years = kfz_hail_df.Jahr.unique()
years.sort()

In [None]:
# join swiss canton shape dataframe
canton_data_kfz_hail = canton_map.merge(kfz_hail_df, left_on="NAME", right_on="Kanton")

In [None]:
# create ImageGrid 6x3 for 2005-2022
fig = plt.figure(figsize=(20, 18))
grid = ImageGrid(fig, 111, nrows_ncols=(6, 3), axes_pad=0.1)

# colormap minimum and maximum definition (over all available years)
min_kfz_hail = canton_data_kfz_hail["hailday_kfz_normalized"].min()
max_kfz_hail = canton_data_kfz_hail["hailday_kfz_normalized"].max()

# fill ImageGrid loop over all years
for i, year in enumerate(years):
    year_data = canton_data_kfz_hail[canton_data_kfz_hail["Jahr"] == year]
    ax = grid[i]
    ax.axis("off")
    ax.set_title(year)
    year_data.plot(
        column="hailday_kfz_normalized",
        cmap=color_map_haildays_kfz,
        linewidth=0.1,
        ax=ax, 
        legend=False,
        vmin=min_kfz_hail,
        vmax=max_kfz_hail,
    )

# add colorbar
ax = grid[0]
sm = plt.cm.ScalarMappable(
    cmap=color_map_haildays_kfz,
    norm=plt.Normalize(vmin=min_kfz_hail, vmax=max_kfz_hail),
)
cbar = plt.colorbar(sm, ax=ax, pad=500, aspect=40)
cbar.ax.tick_params(labelsize=12)
cbar.ax.set_ylabel("Hageltagen x Anzahl Autos [100k]", rotation=270, labelpad=20, size=14)

# add title
fig.suptitle("Produkt Anzahl Hageltage und Anzahl Autos pro Kanton", fontsize=18, y=0.92, x=0.55)

plt.show()

### Interkativer Plot zu Hagelrisiko für Autos pro Jahr und Kanton

In [None]:
min_kfz_hail_canton = canton_data_kfz_hail["hailday_kfz_normalized"].min()
max_kfz_hail_canton = canton_data_kfz_hail["hailday_kfz_normalized"].max()

# defintion plot update function for year change
def update_plot(year: int) -> None:
    # filter data according to year selection
    year_data = canton_data_kfz_hail[canton_data_kfz_hail["Jahr"] == year]
    ax = year_data.plot(
        column="hailday_kfz_normalized",
        cmap=color_map_haildays_kfz,
        linewidth=0.1,
        legend=False,
        vmin=min_kfz_hail_canton,
        vmax=max_kfz_hail_canton,
    )
    ax.axis("off")
    ax.set_title(f"{year}")
    # extract axes object from GeoAxesSubplot
    ax = ax.axes
    # create and add colorbar
    sm = plt.cm.ScalarMappable(
        cmap=color_map_haildays_kfz,
        norm=plt.Normalize(vmin=min_kfz_hail_canton, vmax=max_kfz_hail_canton),
    )
    cbar = plt.colorbar(sm, ax=ax, fraction=0.05, pad=0.03)
    cbar.ax.set_ylabel("Hageltagen x Anzahl Autos [100k]", rotation=270, labelpad=20)
    plt.show()


# interactiver slider
interactive_plot_haildays = interactive(update_plot, year=years, continuous_update=False)

# display interactive plot
display(interactive_plot_haildays)

### Verlauf des Hagelrisikos für Autos pro Jahr

In [None]:
kfz_hail_df_total = kfz_hail_df.groupby("Jahr").agg({"hailday_kfz_normalized": sum}).reset_index()
kfz_hail_df_total["canton"] = "All"
kfz_hail_df_total.head()

fig = px.line(kfz_hail_df_total, x="Jahr", y="hailday_kfz_normalized")
fig.update_layout(
    title="Verlauf des Hagelrisikos für Autos pro Jahr",
    xaxis_title="Jahr",
    yaxis_title="Durchschn. Risiko",
    legend_title="Kanton",
)

fig.show(config= dict(displayModeBar = False))

### Balkendiagramm für den Durchschnitt über den gesamten Zeitraum

Im Folgenden wird der Durchschnitt von hailday_kfz_normalisiert über den gesamten Zeitraum berechnet.

In [None]:
df_average = kfz_hail_df.groupby('Kanton', as_index=False)['hailday_kfz_normalized'].mean().sort_values('hailday_kfz_normalized', ascending=False)

In [None]:
# Horizontal bar chart

# Initialize the matplotlib figure
f, ax = plt.subplots(figsize=(6, 15))

# Load the example car crash dataset
df_average = kfz_hail_df.groupby('Kanton', as_index=False)['hailday_kfz_normalized'].mean().sort_values('hailday_kfz_normalized', ascending=False)

# Plot the total crashes
sns.set_color_codes("pastel")
sns.barplot(x="hailday_kfz_normalized", y="Kanton", data=df_average)

# Add a legend and informative axis label
ax.set(xlim=(0, 6), ylabel="Kanton",
       xlabel='Durchschnitt Hageltagen x Anzahl Autos (x100k)',
      title='Durchschnitt gesamte Zeitspanne (2005-2022)')

plt.show()

### Karte für den Durchschnitt über den gesamten Zeitraum

In [None]:
# join swiss canton shape dataframe
average_data_kfz_hail = canton_map.merge(df_average, left_on="NAME", right_on="Kanton")

In [None]:
# colormap scale definitions
min_kfz_hail_average = average_data_kfz_hail["hailday_kfz_normalized"].min()
max_kfz_hail_average = average_data_kfz_hail["hailday_kfz_normalized"].max()

ax = average_data_kfz_hail.plot(
    column="hailday_kfz_normalized",
    cmap=color_map_haildays_kfz,
    linewidth=0.1,
    legend=False,
    vmin=min_kfz_hail_average,
    vmax=max_kfz_hail_average,
)
ax.axis("off")
ax.set_title("Durchschnitt gesamte Zeitspanne (2005-2022)")

# extract axes object from GeoAxesSubplot
ax = ax.axes
# create and add colorbar
sm = plt.cm.ScalarMappable(
    cmap=color_map_haildays_kfz,
    norm=plt.Normalize(vmin=min_kfz_hail_average, vmax=max_kfz_hail_average),
)
cbar = plt.colorbar(sm, ax=ax, fraction=0.05, pad=0.03)
cbar.ax.set_ylabel("Hageltagen x Anzahl Autos [100k]", rotation=270, labelpad=20)
plt.show()

Basierend auf den jährlichen Hageltagen und der Anzahl der immatrikulierten Autos über den gesamten Zeitraum (2005-2022) sind die Kantone Zürich, Bern und Luzern die Kantone mit dem höchsten Risiko für die Autoversicherer.