## TAA2: Natural Hazard Risk Assessment using public data

Within this tutorial, we are going to use publicly available hazard data and exposure data to do a risk assessment for the Netherlands. More specifically we will look at damage due to wind storms and flooding. We will use both Copernicus Land Cover data and OpenStreetMap to estimate the potential damage of natural hazards to the built environment.
 
We will first download, access and explore hazard data retrieved from the Copernicus Climate Data Copernicus Store and the European Commission Joint Research Centre. After this, we will learn how to download and access Copernicus Land Cover data. We will also explore the power of OpenStreetMap that provides vector data. We will learn how to extract information from OpenStreetMap, how you can explore and visualize this. Lastly, we will use Copernicus Land Cover data to estimate the damage to specific land-uses, whereas we will use OpenStreetMap to assess the potential damage to the road system.

## Learning Objectives
<hr>

- To understand the use of **OSMnx** to extract geospatial data from OpenStreetmap.
- To know how to download data from the Copernicus Climate Data Store using the `cdsapi` and access it through Python.
- To know how to access and open information from the Copernicus Land Monitoring System. Specifically the Corine Land Cover data.

- To be able to open and visualize this hazard data.
- To know how to rasterize vector data through using **Geocube**.
- To know how to visualise vector and raster data.
- To understand the basic functioning of **Matplotlib** to create a map.

- To understand the basic approach of a natural hazard risk assessment.
- To be able to use the `DamageScanner` to do a damage assessment.
- To interpret and compare the damage estimates.

## 1. Introducing the packages
<hr>

Within this tutorial, we are going to make use of the following packages: 

[**GeoPandas**](https://geopandas.org/) is a Python package that extends the datatypes used by pandas to allow spatial operations on geometric types.

[**OSMnx**](https://osmnx.readthedocs.io/) is a Python package that lets you download geospatial data from OpenStreetMap and model, project, visualize, and analyze real-world street networks and any other geospatial geometries. You can download and model walkable, drivable, or bikeable urban networks with a single line of Python code then easily analyze and visualize them. You can just as easily download and work with other infrastructure types, amenities/points of interest, building footprints, elevation data, street bearings/orientations, and speed/travel time.

[**NetworkX**](https://networkx.org/) is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

[**Matplotlib**](https://matplotlib.org/) is a comprehensive Python package for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible.

[**Geocube**](https://corteva.github.io/geocube) is a Python package to convert geopandas vector data into rasterized data.

[**xarray**](https://docs.xarray.dev/) is a Python package that allows for easy and efficient use of multi-dimensional arrays.

Import the packages in the cell below

In [None]:
import os
import cdsapi
import shapely 
import matplotlib
import urllib3
import pyproj
import contextily as cx

import osmnx as ox
import numpy as np
import xarray as xr
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx

from matplotlib.colors import ListedColormap
from zipfile import ZipFile
from io import BytesIO
from urllib.request import urlopen
from zipfile import ZipFile
from tqdm import tqdm

urllib3.disable_warnings()

Import error? Not all of the packages were installed already. Make sure to install the missing packages using pip install in the cell below and then run the cell above again:

## 2. Downloading and accessing natural hazard data
<hr>

We will first download and explore windstorm and flood data for the Netherlands. 

### Windstorm Data
<hr>

The windstorm data will be downloaded from the [Copernicus Climate Data Store](https://cds.climate.copernicus.eu/). As we have seen during the lecture, and as you can also see by browsing on this website, there is an awful lot of climate data available through this Data Store. As such, it is very valuable to understand how to access and download this information to use within an analysis. To keep things simple, we only download one dataset today: [A winter windstorm](https://cds.climate.copernicus.eu/cdsapp#!/dataset/sis-european-wind-storm-indicators?tab=overview). 

We will do so using an **API**, which is the acronym for application programming interface. It is a software intermediary that allows two applications to talk to each other. APIs are an accessible way to extract and share data within and across organizations. APIs are all around us. Every time you use a rideshare app, send a mobile payment, or change the thermostat temperature from your phone, you’re using an API.

However, before we can access this **API**, we need to take a few steps. Most importantly, we need to register ourselves on the [Copernicus Climate Data Store](https://cds.climate.copernicus.eu/) portal. To do so, we need to register, as explained in the video clip below:

<img src="https://github.com/ElcoK/BigData_AED/blob/main/_static/images/CDS_registration.gif?raw=1" class="bg-primary mb-1">
<br>

Now, the next step is to access the API. You can now login on the website of the [Copernicus Climate Data Store](https://cds.climate.copernicus.eu/). After you login, you can click on your name in the top right corner of the webpage (next to the login button). On the personal page that has just opened, you will find your user ID (**uid**) and your personal **API**. You need to add those in the cell below to be able to download the windstorm.

As you can see in the cell below, we download a specific windstorm that has occured on the 28th of October in 2013. This is storm [Carmen (also called St Jude)](https://en.wikipedia.org/wiki/St._Jude_storm). 

In [None]:
uid = XXX
apikey = 'XXX'

c = cdsapi.Client(key=f"{uid}:{apikey}", url="https://cds.climate.copernicus.eu/api/v2")

c.retrieve(
    'sis-european-wind-storm-indicators',
    {
        'variable': 'all',
        'format': 'zip',
        'product': 'windstorm_footprints',
        'year': '2013',
        'month': '10',
        'day': '28',
    },
    'Carmen.zip')

### Flood Data
<hr>

The flood data we will extract from a repository maintained by the European Commission Joint Research Centre. We will download river flood hazard maps from their [Flood Data Collection](https://data.jrc.ec.europa.eu/dataset/1d128b6c-a4ee-4858-9e34-6210707f3c81). 

Here we do not need to use an API and we also do not need to register ourselves, so we can download any of the files directly. To do so, we use the `urllib` package.

In [None]:
## this is the link to the 1/100 flood map for Europe
zipurl = 'https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/FLOODS/EuropeanMaps/floodMap_RP100.zip'

# and now we open and extract the data
with urlopen(zipurl) as zipresp:
    with ZipFile(BytesIO(zipresp.read())) as zfile:
        zfile.extractall(data_path)

The download and zip in the cell above sometimes does not work. If that is indeed the case (e.g., when it seems to remain stuck), download the files manually through the link and upload them in the data folder for this week (as explained at the start of this tutorial.)

### Set location to explore
---
Before we continue, we need to specify our location of interest. This should be a province that will have some flooding and relative high wind speeds occuring (else we will find zero damage). We specify the region of interest in the cell below by using the `geocode_to_gdf()` function.

In [None]:
place_name = "Gelderland, The Netherlands" ### But you could also consider Zeeland, for example.
area = ox.geocode_to_gdf(place_name)

## 3. Exploring the natural hazard data
<hr>

Now we will explore our natural hazard data.

### Windstorm Data
---

As you can see in the section above, we have downloaded the storm footprint in a zipfile. Let's open the zipfile and load the dataset using the `xarray` package through the `open_dataset()` function.

In [None]:
with ZipFile('Carmen.zip') as zf:
    
    # Let's get the filename first
    file = zf.namelist()[0]
    
    # And now we can open and select the file within Python
    with zf.open(file) as f:
        windstorm_europe = xr.open_dataset(f)

Let's have a look at the storm we have downloaded!

In [None]:
windstorm_europe['FX'].plot()

<div class="alert alert-block alert-success">
<b>Question 1:</b> Describe windstorm Carmen. When did this event happen, which areas were most affected? Can you say something about the maximum wind speeds in different areas, based on the plot? And what does FX mean?
</div>

Unfortunately, our data does not have a proper coordinate system defined yet. As such, we will need to use the `rio.write_crs()` function to set the coordinate system to **EPSG:4326** (the standard global coordinate reference system). 

We also need to make sure that the functions will know what the exact parameters are that we have to use for our spatial dimenions (e.g. longitude and latitude). It prefers to be named `x` and `y`. So we use the `rename()` function before we use the `set_spatial_dims()` function.

In [None]:
windstorm_europe.rio.write_crs(4326, inplace=True)
windstorm_europe = windstorm_europe.rename({'Latitude': 'y','Longitude': 'x'})
windstorm_europe.rio.set_spatial_dims(x_dim="x",y_dim="y", inplace=True)

<div class="alert alert-block alert-success">
<b>Question 2:</b> Climate data is often stored as a netCDF file. Please describe what a netCDF file is. Which information is stored in the netCDF file we have downloaded for the windstorm? What type of metadata does it contain?
</div>

Following, we also make sure it will be in the European coordinate system **EPSG:3035** to ensure we can easily use it together with the other data. To do so, we use the `rio.reproject()` function. You can simple add the number of the coordinate system.

In [None]:
windstorm_europe = windstorm_europe. [add function]

Now we have all the information to clip the windstorm data to our area of interest:

In [None]:
windstorm_map = windstorm_europe.rio.clip(area.envelope.values, area.crs)

And let's have a look as well by using the `plot()` function. Please note that the legend is in meters per second.

In [None]:
windstorm_map['FX']. [add function]

### Flood Data
---

And similarly, we want to open the flood map. But now we do not have to unzip the file anymore and we can directly open it through using `xarray`:

In [None]:
flood_map_path = 'floodmap_EFAS_RP100_C.tif'

In [None]:
flood_map = xr.open_dataset(flood_map_path, engine="rasterio")
flood_map

And let's make sure we set all the variables and the CRS correctly again to be able to open the data properly. Note that we should now use **EPSG:3035**. This is the standard coordinate system for Europe, in meters (instead of degrees).

In [None]:
flood_map.rio.write_crs(      , inplace=True)
flood_map.rio.set_spatial_dims(x_dim="x",y_dim="y", inplace=True)

Now it is pretty difficult to explore the data for our area of interest, so let's clip the flood data.  

We want to clip our flood data to our chosen area. The code, however, is very inefficient and will run into memories issues on Google Colab. As such, we first need to clip it by using a bounding box, followed by the actual clip.

<div class="alert alert-block alert-success">
<b>Question 4:</b> Please provide the lines of code below in which you show how you have clipped the flood map to your area.
</div>

*A few hints*:

* carefully read the documentation of the `.clip_box()` function of rioxarray. Which information do you need? 
* is the GeoDataFrame of your region (the area GeoDataframe) in the same coordinate system? Perhaps you need to convert it using the `.to_crs()` function. 
* how do you get the bounds from your area GeoDataFrame? 
* The final step of the clip would be to use the `.rio.clip()` function, using the actual area file and the flood map clipped to the bounding box. Please note that you should **not** use the envelope here, like we did in the previous clip. Here we really want to use the exact geometry values.

As you will see, we first clip it very efficiently using the bounding box. After that, we do an exact clip.

In [None]:
min_lon =  area.to_crs(epsg=3035).bounds.minx.values[0]
min_lat = area.to_crs(epsg=3035).bounds.miny
max_lon =  area.to_crs(epsg=3035).bounds
max_lat =  area.to_crs(epsg=3035).

flood_map_area = flood_map.rio.clip_box(minx=.... )
flood_map_area = flood_map_area.rio.clip(area.XXXX.values, area.crs)

And let's have a look as well. Please note that the legend is in meters.

In [None]:
flood_map_area['band_data'].plot(cmap='Blues',vmax=10)

## 4. Downloading and exploring Land Cover data and Land Use data
<hr>

We will explore rasterized Corine Land Cover data and land use data retrieved from OpenStreetMap.

### Download and access Copernicus Land Cover data
---

Unfortunately, there is no API option to download the [Corine Land Cover](https://land.copernicus.eu/pan-european/corine-land-cover) data. We will have to download the data from the website first.

To do so, we will first have to register ourselves again on the website. Please find in the video clip below how to register yourself on the website of the [Copernicus Land Monitoring Service](https://land.copernicus.eu/):

<img src="https://github.com/ElcoK/BigData_AED/blob/main/_static/images/CLMS_registration.gif?raw=1" class="bg-primary mb-1">

Now click on the Login button in the top right corner to login on the website. There are many interesting datasets on this website, but we just want to download the Corine Land Cover data, and specifically the latest version: [Corine Land Cover 2018](https://land.copernicus.eu/pan-european/corine-land-cover/clc2018?tab=download). To do so, please select the **Corine Land Cover - 100 meter**. Now click on the large green Download button. Your download should start any minute.

Slightly annoying, the file you have downloaded is double zipped. Its slightly inconvenient to open this through Python and within Google Drive. So let's unzip it twice outside of Python (on your local machine) and then direct yourself to the `DATA` directory within the unzipped file. Here you can find a file called `U2018_CLC2018_V2020_20u1.tif`. Drop this file into this week's data directory, as specified at the start of this tutorial when we mounted our Google Drive.

In [None]:
CLC_location = 'U2018_CLC2018_V2020_20u1.tif'

In [None]:
CLC = xr.open_dataset(CLC_location, engine="rasterio")

Similarly to the flood map data, we need to do a two-stage clip again (like we did before in this tutorial to ensure we get only our area of interest without exceeding our RAM.

In [None]:
CLC_region = CLC.rio.clip_box(
CLC_region = CLC_region.rio.clip(

In [None]:
CLC_region = CLC_region.rename({'x': 'lat','y': 'lon'})
CLC_region.rio.set_spatial_dims(x_dim="lat",y_dim="lon", inplace=True)

And now we create a *color_dict*, to ensure we can visualize the data properly. We use the colorscheme of Corine Land Cover. 

In [None]:
CLC_values = [111, 112, 121, 122, 123, 124, 131, 132, 133, 141, 142, 211, 212, 213, 221, 222, 223, 231, 241, 242,
 243, 244, 311, 312, 313, 321, 322, 323, 324, 331, 332, 333, 334, 335, 411, 412, 421, 422, 423, 511, 512, 521, 522, 523]

CLC_colors = ['#E6004D', '#FF0000', '#CC4DF2', '#CC0000', '#E6CCCC', '#E6CCE6', '#A600CC', '#A64DCC', '#FF4DFF', '#FFA6FF', '#FFE6FF', '#FFFFA8', '#FFFF00', '#E6E600',
 '#E68000', '#F2A64D', '#E6A600', '#E6E64D', '#FFE6A6', '#FFE64D', '#E6CC4D', '#F2CCA6', '#80FF00', '#00A600',
 '#4DFF00', '#CCF24D', '#A6FF80', '#A6E64D', '#A6F200', '#E6E6E6', '#CCCCCC', '#CCFFCC', '#000000', '#A6E6CC',
 '#A6A6FF', '#4D4DFF', '#CCCCFF', '#E6E6FF', '#A6A6E6', '#00CCF2', '#80F2E6', '#00FFA6', '#A6FFE6', '#E6F2FF']

The code below allows us the use the color_dict above to plot the CLC map

In [None]:
color_dict_raster = dict(zip(CLC_values,CLC_colors))

# We create a colormar from our list of colors
cm = ListedColormap(CLC_colors)

# Let's also define the description of each category : 1 (blue) is Sea; 2 (red) is burnt, etc... Order should be respected here ! Or using another dict maybe could help.
labels = np.array(CLC_values)
len_lab = len(labels)

# prepare normalizer
## Prepare bins for the normalizer
norm_bins = np.sort([*color_dict_raster.keys()]) + 0.5
norm_bins = np.insert(norm_bins, 0, np.min(norm_bins) - 1.0)

## Make normalizer and formatter
norm = matplotlib.colors.BoundaryNorm(norm_bins, len_lab, clip=True)
fmt = matplotlib.ticker.FuncFormatter(lambda x, pos: labels[norm(x)])

And let's plot the Corine Land Cover data for our area of interest

In [None]:
fig, ax = plt.subplots(1, 1,figsize=(14,10))

CLC_region["band_data"].plot(ax=ax,levels=len(CLC_colors),colors=CLC_colors)

<div class="alert alert-block alert-success">
<b>Question 5:</b> Describe the different land-use classes within your region that you see on the Corine Land Cover map. Do you see any dominant land-use classes? 
</div>

### Extract and visualize land-use information from OpenStreetMap
---

The next step is to define which area you want to focus on. In the cell below, you will now read "Kampen, The Netherlands". Change this to any area or municipality in the Netherlands that (1) you can think of and (2) will work. 

In some cases, the function does not recognize the location. You could either try a different phrasing or try a different location. Many parts of the Netherlands should work.

In [None]:
place_name = "Kampen, The Netherlands"
area = ox.geocode_to_gdf(place_name)

Now let us visualize the bounding box of the area. As you will notice, we also estimate the size of the area. If the area size is above 50km2, or when you have many elements within your area (for example the amsterdam city centre), extracting the data from OpenStreetMap may take a little while. 

In [None]:
area_to_check = area.to_crs(epsg=3857)
ax = area_to_check.plot(figsize=(10, 10), color="none", edgecolor="k", linewidth=4)
ax.set_xticks([])
ax.set_yticks([])
ax.set_axis_off()
cx.add_basemap(ax, zoom=11)

size = int(area_to_check.area/1e6)

ax.set_title("{}. Total area: {} km2".format(place_name,size),fontweight='bold')

<div class="alert alert-block alert-success">
<b>Question 1:</b> To make sure we understand which area you focus on, please submit the figure that outlines your area.
</div>

Now we are satisfied with the selected area, we are going to extract the land-use information from OpenStreetMap. To find the right information from OpenStreetMap, we use **tags**.

As you will see in the cell below, we use the tags *"landuse"* and *"natural"*. We need to use the *"natural"* tag to ensure we also obtain water bodies and other natural elements. 

In [None]:
tags = {'landuse': True, 'natural': True}   
landuse = ox.features_from_place(place_name, tags)

In case the above does not work, you can continue the assignment by using the code below (make sure you remove the hashtags to run it). If you decide to use the data as specified below, also change the map at the start to 'Kampen'.

In [None]:
# remote_url = 'https://github.com/ElcoK/BigData_AED/raw/main/week5/kampen_landuse.gpkg'
# file = 'kampen_landuse.gpkg'

# request.urlretrieve(remote_url, file)
#landuse = gpd.GeoDataFrame.from_file('kampen_landuse.gpkg')

To ensure we really only get the area that we want, we use geopandas's `clip` function to only keep the area we want. This function does exactly the same as the `clip` function in QGIS.

When we want to visualize or analyse the data, we want all information in a single column. However, at the moment, all information that was tagged as *"natural"*, has no information stored in the *"landuse"* tags. It is, however, very convenient if we can just use a single column for further exploration of the data. 

To overcome this issue, we need to add the missing information to the landuse column, as done below. Let's first have a look which categories we have in the **natural** column. 

In [None]:
landuse.natural.unique()

And now we can add them to the **landuse** column. We made a start, but its up to you to fill in the rest.

In [None]:
landuse.loc[landuse.natural=='water','landuse'] = 'water'
landuse.loc[landuse.natural=='wetland','landuse'] = 'wetlands'


landuse = landuse.dropna(subset=['landuse'])

<div class="alert alert-block alert-success">
<b>Question 2:</b> Please provide in the answer box in Canvas the code that you used to make sure that all land uses are now registered within the landuse column.
</div>

Our next step is to prepare the visualisation of a map. What better way to explore land-use information than plotting it on a map? 

As you will see below, we can create a dictionary with color codes that will color each land-use class based on the color code provided in this dictionary.

In [None]:
color_dict = {  "grass":'#c3eead',               "railway": "#000000",
                "forest":'#1c7426',              "orchard":'#fe6729',
                "residential":'#f13013',         "industrial":'#0f045c',
                "retail":'#b71456',              "education":'#d61181',              
                "commercial":'#981cb8',          "farmland":'#fcfcb9',
                "cemetery":'#c39797',            "construction":'#c0c0c0',
                "meadow":'#c3eead',              "farmyard":'#fcfcb9',
                "plant_nursery":'#eaffe2',       "scrub":'#98574d',
                "allotments":'#fbffe2',          "reservoir":'#8af4f2',
                "static_caravan":'#ff3a55',      "wetlands": "#c9f5e5",
                "water": "#c9e5f5",              "beach": "#ffeead",
                "landfill" : "#B08C4D",          "recreation_ground" : "#c3eead",
                "brownfield" : "#B08C4D",        "village_green" : "#f13013" ,
                "military": "#52514E",            "garden" : '#c3eead'
             } 

Unfortunately, OpenSteetMap very often contains elements that have a unique tag. As such, it may be the case that some of our land-use categories are not in the dictionary yet. 

Let's first create an overview of the unique land-use categories within our data through using the `.unique()` function within our dataframe:

In [None]:
landuse.landuse.unique()

Ofcourse we can visually compare the array above with our color_dict, but it is much quicker to use `Sets` to check if there is anything missing:

In [None]:
set(landuse.landuse.unique())-set(color_dict)

In case anything is missing, add them to the color_dict dictionairy and re-run that cell. 

<div class="alert alert-block alert-success">
<b>Question 3:</b> Show us in Canvas (i) which land-use categories you had to add, and (ii) how your final color dictionary looks like.
</div>

```{tip}
You can easily find hexcodes online to find the right colour for each land-use category. Just google hexcodes!
```


Our next step is to make sure that we can connect our color codes to our dataframe with land-use categories.

In [None]:
color_dict = {key: color_dict[key]
             for key in color_dict if key not in  list(set(color_dict)-set(landuse.landuse.unique()))}

map_dict = dict(zip(color_dict.keys(),[x for x in range(len(color_dict))]))

landuse['col_landuse'] = landuse.landuse.apply(lambda x: color_dict[x])

Now we can plot the figure!

As you will see in the cell below, we first state that we want to create a figure with a specific figure size. You can change the dimensions to your liking.

In [None]:
fig, ax = plt.subplots(1, 1,figsize=(12,10))

# add color scheme
color_scheme_map = list(color_dict.values())
cmap = LinearSegmentedColormap.from_list(name='landuse',
                                     colors=color_scheme_map)  

# and plot the land-use map.
landuse.plot(color=landuse['col_landuse'],ax=ax,linewidth=0)

# remove the ax labels
ax.set_xticks([])
ax.set_yticks([])
ax.set_axis_off()

# add a legend:
legend_elements = []
for iter_,item in enumerate(color_dict):
    legend_elements.append(Patch(facecolor=color_scheme_map[iter_],label=item))        

ax.legend(handles=legend_elements,edgecolor='black',facecolor='#fefdfd',prop={'size':12},loc=(1.02,0.2)) 

# add a title
ax.set_title(place_name,fontweight='bold')

<div class="alert alert-block alert-success">
<b>Question 4:</b> Please upload a figure of your land-use map, using OpenStreetMap. 
</div>

### Rasterize land-use information
---

As you have noticed already during the lecture, and as we have seen during TAA1 with the Google Earth Engine, most land-use data is in raster format. 

In OpenStreetMap everything is stored in vector format. As such, the land-use information we extracted from OpenStreetMap is also in vector format. While it is not always necessary to have this information in raster format, it is useful to know how to convert your data into a raster format.

To do so, we can make use of the **GeoCube** package, which is a recently developed Python package that can very easily convert vector data into a raster format.

The first thing we will need to do is to define all the unique land-use classes and store them in a dictionary:

In [None]:
categorical_enums = {'landuse': landuse.landuse.drop_duplicates().values.tolist()
}

And now we simply use the `make_geocube()` function to convert our vector data into raster data. 

In the `make_geocube()` function, we have to specify several arguments:

- Through the `vector_data` argument we have to state which dataframe we want to rasterize.
- Through the `output_crs` argument we have to state the coordinate reference system (CRS). We use the OpenStreetMap default EPSG:4326.
- Through the `resolution` argument we have to state the resolution. In our case, we will have to set this in degrees. 0.01 degrees is equivalent to roughly 10km around the equator. 
- Through the `categorical_enums` argument we specify the different land-use categories.

Play around with the different resolutions to find the level of detail. The higher the resolution (i.e., the more zeros behind the comma), the longer it will take to rasterize.

In [None]:
landuse_grid = make_geocube(
    vector_data=,
    output_crs=,
    resolution=(-XXXX, XXXX),
    categorical_enums=categorical_enums
)

Let's explore what this function has given us:

In [None]:
landuse_grid["landuse"]

The output above is a typical output of the **xarray** package. 

- The `array` shows the numpy array with the actual values. As you can see, the rasterization process has used the value `-1` for NoData. 
- The `Coordinates` table shows the x (longitude) and y (latitude) coordinates of the array. It has the exact same size as the `array` with land-use values.
- The `Attributes` table specifies the NoData value (the `_FillValue` element, which indeed shows `-1`) and the name of the dataset.

Now let's plot the data to see the result!

In [None]:
fig, ax = plt.subplots(1, 1,figsize=(14,10))

landuse_grid["landuse"].plot(ax=ax,vmin=0,vmax=15,levels=15,cmap='tab20')

# remove the ax labels
ax.set_xticks([])
ax.set_yticks([])
ax.set_axis_off()

#add a title

ax.set_title('')

As we can see in the figure above, the land-use categories have turned into numbers, instead of land-use categories described by a string value. 

This is of course a lot harder to interpret. Let's re-do some parts to make sure we can properly link them back to the original data.

To do so, we will first need to make sure that we know which values (numbers) are connected to each land-use category. Instead of trying to match, let's predefine this ourselves!

We will start with creating a dictionary that allows us to couple a number to each category:

In [None]:
value_dict = dict(zip(landuse.landuse.unique(),np.arange(0,len(landuse.landuse.unique()),1)))

In [None]:
value_dict['nodata'] = -1

And we now use this dictionary to add a new column to the dataframe with the values:

In [None]:
landuse_valued = make_geocube(
    vector_data=XXXX,
    output_crs=XXXX,
    resolution=(-XXXX, XXXX),
    categorical_enums={'landuse_value': landuse.landuse_value.drop_duplicates().values.tolist()
}
)

And let's use the original `color_dict` dictionary to find the right hex codes for each of the land-use categories

In [None]:
unique_classes = landuse.landuse.drop_duplicates().values.tolist()
colormap_raster = [color_dict[lu_class] for lu_class in unique_classes] 

To plot the new result:

In [None]:
fig, ax = plt.subplots(1, 1,figsize=(14,10))

landuse_valued["landuse_value"].plot(ax=ax,vmin=0,vmax=19,levels=len(unique_classes),colors=colormap_raster)

# remove the ax labels
ax.set_xticks([])
ax.set_yticks([])
ax.set_axis_off()

# add title
ax.set_title('')

<div class="alert alert-block alert-success">
<b>Question 5:</b> In the rasterization process, we use the `.make_geocube()` function. Please elaborate on the following: i)why is it important to specify the right coordinate system? What could happen if you choose the wrong coordinate system? ii) which resolution did you choose and why? iii)Why did the first result did not give us the right output with the correct colors? How did you solve this? 
</div>

In [None]:
unique_classes = landuse.landuse.drop_duplicates().values.tolist()
colormap_raster = [color_dict[lu_class] for lu_class in unique_classes] 
color_dict_raster = dict(zip(np.arange(-1,len(landuse.landuse.unique())+1,1),['#ffffff']+colormap_raster))

# We create a colormar from our list of colors
cm = ListedColormap([color_dict_raster[x] for x in color_dict_raster.keys()])

# Let's also define the description of each category. Order should be respected here!
labels = np.array(['nodata'] + unique_classes)
len_lab = len(labels)

# prepare normalizer
## Prepare bins for the normalizer
norm_bins = np.sort([*color_dict_raster.keys()]) + 0.5
norm_bins = np.insert(norm_bins, 0, np.min(norm_bins) - 1.0)

## Make normalizer and formatter
norm = matplotlib.colors.BoundaryNorm(norm_bins, len_lab, clip=True)
fmt = matplotlib.ticker.FuncFormatter(lambda x, pos: labels[norm(x)])

Let's plot the map again!

In [None]:
fig, ax = plt.subplots(1, 1,figsize=(14,10))

ax = landuse_valued["landuse_value"].plot(levels=len(unique_classes), cmap=cm, norm=norm)

# remove the ax labels
diff = norm_bins[1:] - norm_bins[:-1]
tickz = norm_bins[:-1] + diff / 2
cb = fig.colorbar(ax, format=fmt, ticks=tickz)

# set title again
fig.axes[0].set_title('')

fig.axes[0].set_xticks([])
fig.axes[0].set_yticks([])
fig.axes[0].set_axis_off()

# for some weird reason we get two colorbars, so we remove one:
fig.delaxes(fig.axes[1])

## 5. Perform a raster-based damage assessment using OSM and Corine Land Cover
<hr>

In [None]:
! @TASK: Copy paste from Tutorial 2

In [None]:
! @TASK: Let students also perform risk assessment based on rasterized OSM land use data?? Code needs to be written for that. Or think about a really good question that we can ask here about this?

## 6. Extracting high-resolution data from OpenStreetMap
<hr>

### Extracting buildings from OpenStreetMap
---

In [None]:
tags = {"building": True}
buildings = ox.features_from_place(place_name, tags)

There is a lot more data to extract from OpenStreetMap besides land-use information. Let's extract some building data. To do so, we use the *"building"* tag.

In [None]:
buildings.head()

In [None]:
! @TASK: copy-paste cells from tutorial 1

### Analyze and visualize building stock
---

In [None]:
fig,ax = plt.subplots(1,1,figsize=(5,18))

building_year.plot(kind='barh',ax=ax)

ax.tick_params(axis='y', which='major', labelsize=7)

In [None]:
! @TASK: copy-paste cells from tutorial 1

### Extracting roads from OpenStreetMap
---

In [None]:
! @TASK: copy-paste cells from tutorial 1

## 7. Perform a damage assessment of the road network using OpenStreetMap
<hr>

In [None]:
! @TASK: copy-paste cells from tutorial 2

### 6. Windstorm Damage

---
To estimate the potential damage of our windstorm, we use the vulnerability curves developed by [Yamin et al. (2014)](https://www.sciencedirect.com/science/article/pii/S2212420914000466). Following [Yamin et al. (2014)](https://www.sciencedirect.com/science/article/pii/S2212420914000466), we will apply a sigmoidal vulnerability function satisfying two constraints: (i) a minimum threshold for the occurrence of damage with an upper bound of 100% direct damage; (ii) a high power-law function for the slope, describing an increase in damage with increasing wind speeds. Due to the limited amount of vulnerability curves available for windstorm damage, we will use the damage curve that represents low-rise *reinforced masonry* buildings for all land-use classes that may contain buildings. Obviously, this is a large oversimplification of the real world, but this should be sufficient for this exercise. When doing a proper stand-alone windstorm risk assessment, one should take more effort in collecting the right vulnerability curves for different building types. 

In [None]:
wind_curves = pd.read_excel("https://github.com/ElcoK/BigData_AED/raw/main/week5/damage_curves.xlsx",sheet_name='wind_curves')
maxdam = pd.read_excel("https://github.com/ElcoK/BigData_AED/raw/main/week5/damage_curves.xlsx",sheet_name='maxdam')

In [None]:
landuse_map = CLC_region_wind['band_data'].to_numpy()[0,:,:]
wind_map = windstorm['FX'].to_numpy()[0,:,:]

In [None]:
wind_map.shape

In [None]:
wind_map_kmh = wind_map*XXX

In [None]:
wind_damage_CLC = DamageScanner(landuse_map,wind_map_kmh,wind_curves,maxdam)[1]

In [None]:
wind_damage_CLC

### Section 3: Quiz

#### Example 1:  ThE function damage scanner has been used quite extensively in this assignment. Explain in detail the sequantial flow of this functions (Can be added as comments)? CELL SIZE PARAMETER ??

In [None]:
def DamageScanner(landuse_map,inun_map,curve_path,maxdam_path,cellsize=100):
        
    
    landuse = landuse_map.copy()
    
   
    inundation = inun_map.copy()
    
    inundation = np.nan_to_num(inundation)        

    
    if isinstance(curve_path, pd.DataFrame):
        curves = curve_path.values   
    elif isinstance(curve_path, np.ndarray):
        curves = curve_path

   
    if isinstance(maxdam_path, pd.DataFrame):
        maxdam = maxdam_path.values 
    elif isinstance(maxdam_path, np.ndarray):
        maxdam = maxdam_path
        
    
    inun = inundation * (inundation>=0) + 0
    inun[inun>=curves[:,0].max()] = curves[:,0].max()
    waterdepth = inun[inun>0]
    landuse = landuse[inun>0]

    
    numberofclasses = len(maxdam)
    alldamage = np.zeros(landuse.shape[0])
    damagebin = np.zeros((numberofclasses, 4,))
    for i in range(0,numberofclasses):
        n = maxdam[i,0]
        damagebin[i,0] = n
        wd = waterdepth[landuse==n]
        alpha = np.interp(wd,((curves[:,0])),curves[:,i+1])
        damage = alpha*(maxdam[i,1]*cellsize)
        damagebin[i,1] = sum(damage)
        damagebin[i,2] = len(wd)
        if len(wd) == 0:
            damagebin[i,3] = 0
        else:
            damagebin[i,3] = np.mean(wd)
        alldamage[landuse==n] = damage

    
    loss_df = pd.DataFrame(damagebin.astype(float),columns=['landuse','losses','area','avg_depth']).groupby('landuse').sum()
    
    
    return loss_df.sum().values[0],loss_df