# Assign Risks and Get No-Fly zones

This notebook is step 2 in the process of creating a graph for a specific area. To run this notebook, you must first execute get_data.ipynb for the desired area.

In this step, risk scores are calculated for each area type based on predefined likelihood and severity values. It is important to ensure that your risk_scores.csv matches all the data retrieved in the first step, especially regarding the mapping of data to groups. These risk scores are then assigned to all data points obtained from the OSM data in get_data.ipynb. In addition, no-fly zones are also added to the dataset.

The result is exported as a GeoJSON file, which can be used as input for get_graph.ipynb.

In [None]:
# Import necessary libraries
import pandas as pd
import geopandas as gpd
import numpy as np
import os

## 0. Specify the area

Set folder_name to the folder used for the desired area, and set file_name to the corresponding area file name (including .geojson).
Make sure to use the same boundaries as in get_data.ipynb to correctly identify the no-fly zones for the selected area.

In [None]:
# city = 'breda'
# city = 'borsele'
city = 'alphen-waddinxveen' # lowercase name of the folder

In [None]:
file_name = f'{city}.geojson' # name of the file to be created

In [None]:
# boundaries = ['Breda, Noord-Brabant, Netherlands']

boundaries = [
    'Alphen aan den Rijn, Zuid-Holland, Netherlands',
    'Waddinxveen, Zuid-Holland, Netherlands',
    'Boskoop, Zuid-Holland, Netherlands'
] # name of the boundaries to be used in the file

#boundaries = ['Borsele, Zeeland, Netherlands']


In [None]:
output_path = "output/" + city

os.makedirs(output_path, exist_ok=True)

## 1. Calculate risks

Based on the severity and likelihood scores in risk_scores.csv, the overall risk scores for each data point are calculated here. The severity weights values used in this research are:
- fatality = 0.4 
- property = 0.3 
- societal = 0.3 

However, you can adjust these values as needed.

In [None]:
# Read the OSM data from the GeoJSON file
df = pd.read_csv("../2.risk_analysis/input/risk_scores.csv") 


In [None]:
df

In [None]:
# Define the severity weights
# These values are used to calculate the overall risk score for each data point.
# You can adjust these values as needed.

alpha_f = 0.4  # fatality
alpha_p = 0.3  # property
alpha_s = 0.3  # societal

### 1.2 Calculate risk

In [None]:
# Number of external risk factors, if you want to change this, also change the risk_scores.csv file
n_factors = 5

# Calculate cumulative contribution per domain (without division)
R_f_total = 0
R_p_total = 0

# Loop over external risk factors
for i in range(1, n_factors + 1):
    # Calculate individual fatality and property risk contributions
    df[f"R_if_{i}"] = df["Sf"] * df[f"L{i}"]
    df[f"R_ip_{i}"] = df["Sp"] * df[f"L{i}"]

    # Sum cumulatively
    R_f_total += df[f"R_if_{i}"]
    R_p_total += df[f"R_ip_{i}"]

# Cumulative crash-related risks
df["R_f"] = R_f_total
df["R_p"] = R_p_total

# Societal risk (independent of risk factors)
df["R_s"] = df["Ss"]

# Normalize fatality, property and societal domains separately
df["R_f_norm"] = (df["R_f"] - df["R_f"].min()) / (df["R_f"].max() - df["R_f"].min())
df["R_p_norm"] = (df["R_p"] - df["R_p"].min()) / (df["R_p"].max() - df["R_p"].min())
df["R_s_norm"] = (df["R_s"] - df["R_s"].min()) / (df["R_s"].max() - df["R_s"].min())

# Use normalized risks in weighted sum
df["risk"] = (
    alpha_f * df["R_f_norm"] +
    alpha_p * df["R_p_norm"] +
    alpha_s * df["R_s_norm"]
)

df["risk"] = df["risk"].round(3)

# Sort by highest total risk
df_sorted = df.sort_values("risk", ascending=False)


In [None]:
df_sorted.drop(columns=["Height"], inplace=True)

In [None]:
df_sorted[['area_type', 'Sf', 'Sp', 'Ss', 'L1', 'L2', 'L3', 'L4', 'L5', 'risk']]

In [None]:
df_risks = df.copy()

In [None]:
df_risks

## 2. Assign risk scores to data

Now that the risk scores have been calculated, we can match them to the data to create a complete dataset containing the type of infrastructure or linear corridors, their geometries, and their corresponding risk scores.

In [None]:
# Get the geometries from the GeoJSON file
# This file contains the geometries of the infrastructure or linear corridors.
# Retrieved in get_osm_data.ipynb
gdf = gpd.read_file(f"../1.get_osm_data/output/{city}/osm_data_{city}.geojson")

In [None]:
gdf['area_type'].unique()

In [None]:
# Merge the risk scores with the geometries
# This will add the risk scores to the GeoDataFrame based on the area_type.
gdf = gdf.merge(df_risks[['area_type', 'risk', 'Height']], on='area_type', how='left')

In [None]:
# All 'postnl point' areas have a risk of 0, as they are not relevant for the risk analysis.
gdf.loc[gdf['area_type'] == 'postnl point', 'risk'] = 0
gdf.loc[gdf['area_type'] == 'postnl point', 'Height'] = 0

In [None]:
# Round the risk scores to 3 decimal places
gdf['risk'] = gdf['risk'].round(3)

In [None]:
# Save the GeoDataFrame to a GeoJSON file
gdf.to_file((f"../2.risk_analysis/output/{city}/osm_data_with_risk_{city}.geojson"), driver="GeoJSON")