<img src="images/oasis-lmf-colour.png" alt="Oasis LMF logo" width="250" align="left"/>
<br><br><br>

# Exercise 2:   Running a model in the Oasis MDK.

The Oasis kernel provides a robust loss simulation engine for catastrophe modelling. Insurance practitioners are used to dealing with losses arising from events. These losses are numbers, not distributions. Policy terms are applied to the losses individually and then aggregated and further conditions or reinsurances applied. Oasis takes the same perspective, which is to generate individual losses from the probability distributions. The way to achieve this is random sampling called “Monte-Carlo” sampling from the use of random numbers, as if from a roulette wheel, to solve equations that are otherwise intractable.

Modelled and empirical intensities and damage responses can show significant uncertainty, Sometimes this uncertainty is multi-modal, meaning that there can be different peaks of behaviour rather than just a single central behaviour. Moreover, the definition of the source insured interest characteristics, such as location or occupancy or construction, can be imprecise. The associated values for event intensities and consequential damages can therefore be varied and their uncertainty can be represented in general as probability distributions rather than point values. The design of Oasis therefore makes no assumptions about the probability distributions and instead treats all probability distributions as probability masses in discrete bins. This includes closed interval point bins such as the values [0,0] for no damage and [1,1] for total damage.

The simulation approach taken by the Oasis calculation kernel computes a single cumulative distribution function (CDF) for the damage by “convolving” the binned intensity distribution with the vulnerability matrices. Sampling can then be done against the CDF. 

<img src="images/simulation_methodology.png" alt="Oasis simulation methodology" width="600"/>

The Oasis kernel requires a standard set of files for capturing the hazard footprints and vulnerability data.

<img src="images/oasis_model_files.png" alt="Oasis model files" width="600"/>

The files are:

#### area peril dictionary
    The meta-data that describes the model specific geo-spatial grid. This can be a set of points, a regular grid or a variable resolutiuon grid.
#### intensity bin dictionary
    The meta-data that descibes the hazard intensities corresponding to the bins.
#### hazard
    The hazard values for each impacted area peril cell for each event in the stochastic catalogue.
#### damage bin dictionary
    The meta-data tha descibes the damage percentages corresponding to the bins.
#### vulnerability dictionary
    The meta-data that descibes the vulnerability data, in particular mapping particular curves to particular exposure attributes.
#### vulnerability
    The vulnerability data. 
#### event
    The list of events in the stochastic catalogue. Event files can be use to distinguish event types, such as historical.
#### occurrences
    The list of event occurrences in particular periods, used for loss curve calculations.


In [None]:
%config IPCompleter.greedy=True

In [None]:
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import geopandas as gpd
import math
import numpy as np
import json
import seaborn as sns
import folium
import os
from shapely.geometry import Point, Polygon
from numpy import linspace
from bokeh import events
from bokeh.io import output_file, show, output_notebook
from bokeh.models import ColumnDataSource, FixedTicker, PrintfTickFormatter, CustomJS, Div, TapTool
from bokeh.plotting import figure
import branca.colormap as cm
import datetime
import psutil
import time
# Output Bokeh charts to notebook, rather than opening a browser window
output_notebook()

map_centre = [18.64, -70.09]
map_zoom = 8

import jupyter_helper
jupyter_helper.set_style()

## View the model files

### Area peril dictionary

In [None]:
area_peril_dictionary = pd.read_csv("./gem/keys_data/GMO/areaperil_dict_pga_only.csv")
area_peril_dictionary.columns = [x.lower() for x in area_peril_dictionary.columns]
area_peril_dictionary.head()

In [None]:
# Note that some browsers will not display the map as there are too many points to plot

m = folium.Map(tiles='cartodbpositron', location=map_centre, zoom_start=map_zoom)
for i, row in area_peril_dictionary.iterrows():
    folium.CircleMarker(
        location=[row.lat1, row.lon1], radius=1).add_to(m)
m

### Intensity bin dictionary

In [None]:
intensity_bin_dictionary = pd.read_csv("./gem/model_data/GMO/intensity_bin_dict.csv")
area_peril_dictionary.columns = [x.lower() for x in area_peril_dictionary.columns]
intensity_bin_dictionary.head()

### Hazard

In [None]:
event_id = 2401
area_peril_id = 15939

In [None]:
#To reduce memory usage we read the footprint file in chunks and filter according to event_id
iter_csv = pd.read_csv("./gem/model_data/GMO/footprint.csv", iterator=True, chunksize=1000000) 
footprints = pd.concat([chunk[chunk['event_id'] == event_id] for chunk in iter_csv])
footprints.head()

In [None]:
footprint_with_hazard = footprints.merge(
    area_peril_dictionary, how='inner', 
    left_on='areaperil_id', right_on='area_peril_id').merge(
    intensity_bin_dictionary, how="inner",
    left_on="intensity_bin_index", right_on="bin_index")
footprint_with_hazard = footprint_with_hazard[['areaperil_id', 'lat1', 'lon1', 'prob', 'intensity_bin_index','interpolation']] 
footprint_with_hazard = footprint_with_hazard.sort_values(['areaperil_id'])
footprint_with_hazard = footprint_with_hazard.rename(index=str, columns={"interpolation": "hazard"})
footprint_with_hazard.head()

In [None]:
footprint_with_hazard_for_cell = footprint_with_hazard[footprint_with_hazard.areaperil_id == area_peril_id] 
footprint_with_hazard_for_cell = intensity_bin_dictionary.merge(
    footprint_with_hazard_for_cell, how="inner",
    left_on="bin_index", right_on="intensity_bin_index", suffixes=('', '_dict'))

footprint_with_hazard_for_cell.fillna(0)
footprint_with_hazard_for_cell = footprint_with_hazard_for_cell.sort_values("intensity_bin_index")
footprint_with_hazard_for_cell = footprint_with_hazard_for_cell[['prob', 'intensity_bin_index','interpolation']]
footprint_with_hazard_for_cell = footprint_with_hazard_for_cell.rename(index=str, columns={"interpolation": "hazard"})
footprint_with_hazard_for_cell.head()

In [None]:
# Note that some browsers will not display the map as there are too many points to plot

footprint_with_hazard['weighted_hazard'] = footprint_with_hazard['hazard'] * footprint_with_hazard['prob']
footprint_with_mean_hazard = pd.DataFrame({'mean_hazard' : footprint_with_hazard.groupby( ['areaperil_id', 'lat1', 'lon1'] )['weighted_hazard'].sum()}).reset_index()
linear = cm.LinearColormap(
    ['green', 'yellow', 'red'],
    vmin=min(footprint_with_mean_hazard.mean_hazard), 
    vmax=max(footprint_with_mean_hazard.mean_hazard))
m = folium.Map(location=map_centre, zoom_start=map_zoom, tiles='cartodbpositron')
for i, row in footprint_with_mean_hazard.iterrows():
    c = linear(row.mean_hazard)
    folium.CircleMarker(
        location=[row.lat1, row.lon1], fill_color=c, radius=5,
        weight=0, fill=True, fill_opacity=1.0).add_to(m)
linear.caption = 'Hazard'
m.add_child(linear)
m.fit_bounds(m.get_bounds())
m

In [None]:
intensity_range = (0, footprint_with_hazard_for_cell.hazard.max())
p = figure(x_range=intensity_range, plot_height=300, y_range=(0, footprint_with_hazard_for_cell.prob.max()), toolbar_location=None)
p.vbar(x=footprint_with_hazard_for_cell.hazard, top=footprint_with_hazard_for_cell.prob, width=0.9)
p.xaxis.axis_label = 'Hazard'
p.yaxis.axis_label = 'Probability'
show(p)

### Damage bin dictionary

In [None]:
damage_bin_dictionary = pd.read_csv("./gem/model_data/GMO/damage_bin_dict.csv")
damage_bin_dictionary.head()

### Vulnerability dictionary

In [None]:
vulnerability_dict = pd.read_csv("./gem/keys_data/GMO/vulnerability_dict_pga_only.csv")
vulnerability_dict.head()

### Vulnerability

In [None]:
vulnerabilities = pd.read_csv("./gem/model_data/GMO/vulnerability.csv")
vulnerabilities

### Occurrence file

In [None]:
occurrences = pd.read_csv("gem/model_data/GMO/occurrence.csv")
occurrences.head()

## Model execution

To run the model we need some test exposure data. Lets have a look at an example Location and Account file. 

In [None]:
test_locations = pd.read_csv('./gem/tests/data/dom-rep-146-oed-location.csv')
test_locations.head()

To run the model, we also need to define some analysis settings. Lets have a look at an example settings file.

In [None]:
with open('gem/tests/analysis_settings.json', 'r') as myfile:
    analysis_settings=json.loads(myfile.read().replace('\n', ''))
print(json.dumps(analysis_settings, indent=True))

## Generate model files

In [None]:
# Compile the spatial index files used in the keys lookup
! oasislmf model generate-peril-areas-rtree-file-index -c gem/keys_data/GMO/lookup.json -d gem/keys_data/GMO -f gem/keys_data/GMO/area-peril

In [None]:
# Convert all the model files into Oasis binary formats
! damagebintobin < gem/model_data/GMO/damage_bin_dict.csv > gem/model_data/GMO/damage_bin_dict.bin 
! evetobin < gem/model_data/GMO/events.csv > gem/model_data/GMO/events.bin
! vulnerabilitytobin -d 166 < gem/model_data/GMO/vulnerability.csv > gem/model_data/GMO/vulnerability.bin
! footprinttobin -i 313 < gem/model_data/GMO/footprint.csv
! occurrencetobin -P 5000 -D < gem/model_data/GMO/occurrence.csv > gem/model_data/GMO/occurrence.bin
! returnperiodtobin < gem/model_data/GMO/returnperiods.csv  > gem/model_data/GMO/returnperiods.bin
! cp footprint.bin gem/model_data/GMO/
! cp footprint.idx gem/model_data/GMO/

## Run the model using the Oasis MDK

In [None]:
! rm -rf /tmp/analysis_test
! oasislmf model run -C gem/oasislmf.json -r /tmp/analysis_test_mdk \
--source-exposure-file-path /tmp/exercise_1_oed/location.csv \
--source-accounts-file-path /tmp/exercise_1_oed/account.csv

In [None]:
from bokeh.io import show, output_file
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure
from bokeh.transform import factor_cmap

analysis_directory = "/tmp/analysis_test_mdk"
gul_aep = pd.read_csv(os.path.join(analysis_directory, "output", "gul_S1_leccalc_full_uncertainty_aep.csv"))
gul_oep = pd.read_csv(os.path.join(analysis_directory, "output", "gul_S1_leccalc_full_uncertainty_oep.csv"))
eps = pd.merge(gul_oep, gul_aep, on=["summary_id", "return_period"], suffixes=["_oep", "_aep"])
eps = eps.sort_values(by="return_period", ascending=True)
return_periods = eps.return_period
lec_types = ['OEP', 'AEP']
data = {'Return periods' : return_periods,
       'OEP': eps.loss_oep,
       'AEP': eps.loss_aep}
palette = ["#c9d9d3", "#718dbf"]
x = [ (str(return_period), lec_type) for return_period in return_periods for lec_type in lec_types ]
counts = sum(zip(data['OEP'], data['AEP']), ())
source = ColumnDataSource(data=dict(x=x, counts=counts))
p = figure(x_range=FactorRange(*x), plot_height=350, title="EP by return period",
          toolbar_location=None, tools="")
p.vbar(x='x', top='counts', width=0.9, source=source, line_color="white",
      fill_color=factor_cmap('x', palette=palette, factors=lec_types, start=1, end=2))
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
show(p)