This script shows how data from the different model simulation runs and countries is aggregated to country and region level. The example is only done for Latin America and a small model selection of 6 GHM and climate forcing combinations. Real data contains all the countries and 46 model combinateions.
In order to see how damages are generated with CLIMADA to create the input file please the the CLIMADA RiverFlood tutorial here:

https://github.com/CLIMADA-project/climada_python/blob/main/doc/tutorial/climada_hazard_RiverFlood.ipynb

The pre-processing that is additionally necessary to convert asset damages and observed damages is described in the readme.txt under chapter 4 Instructions for use.

In [3]:
import numpy as np
import pandas as pd
import os
# load data (adjust path)
ass_data=pd.read_csv('~/projects/NC_Submission/Climada_papers/climada_papers/202010_flood_attribution/Demo/Demo_Data/DEMO_AssembledDataSubregions.csv')
ass_data[ass_data['D_CliExp_Pos']>0]

Unnamed: 0.1,Unnamed: 0,Year,Country,Region,GHM,clim_forc,D_CliExp_Pos,D_CliExp_Neg,D_1980_Pos,D_1980_Neg,D_2010_Pos,D_2010_Neg,D_obs_dummy_Pos,D_obs_dummy_Neg
18,369,1971,ARG,LAM,orchidee,wfdei,1.096388e+06,6.456917e+07,1.669235e+06,1.029778e+08,9.511688e+06,5.980761e+08,,
21,398,1971,ARG,LAM,lpjml,wfdei,8.936264e+05,1.306591e+07,1.359166e+06,2.009086e+07,7.811199e+06,1.204149e+08,,
23,409,1971,ARG,LAM,matsiro,wfdei,3.021842e+05,8.957474e+07,4.560491e+05,1.406277e+08,2.550091e+06,7.986013e+08,,
31,1019,1971,BHS,LAM,lpjml,princeton,3.452119e-02,0.000000e+00,6.223493e-02,0.000000e+00,1.729590e-01,0.000000e+00,,
33,1042,1971,BHS,LAM,lpjml,wfdei,1.872020e-03,0.000000e+00,3.374885e-03,0.000000e+00,9.379248e-03,0.000000e+00,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11263,396569,2010,VEN,LAM,lpjml,princeton,2.455575e+09,1.848530e+09,1.133333e+09,1.013319e+09,2.455575e+09,1.848530e+09,1.227788e+08,9.242652e+07
11264,396582,2010,VEN,LAM,orchidee,princeton,2.991682e+09,1.354734e+09,1.436558e+09,7.402328e+08,2.991682e+09,1.354734e+09,1.495841e+08,6.773672e+07
11265,396586,2010,VEN,LAM,lpjml,wfdei,1.990408e+09,8.089158e+07,8.697089e+08,4.454031e+07,1.990408e+09,8.089158e+07,9.952041e+07,4.044579e+06
11266,396591,2010,VEN,LAM,matsiro,princeton,1.003280e+09,1.635081e+09,4.573970e+08,8.996332e+08,1.003280e+09,1.635081e+09,5.016400e+07,8.175404e+07


The data fram contains various time series for different GHMs and Climate forcings (here reduced to 6 combinations from originally 46) for each country of Latin America:

D_CliExp... - Time series accounting for changes in climate and exposure

D_1980... - Time series only accounting for changes in climate (fixed 1980 socio-economic conditions)

D_2010... - Time series only accounting for changes in climate (fixed 1980 socio-economic conditions)

D_obs... - Dummy time series used to simulate observed damages that cannot be provided

Please note, modeled time series start in 1971, but observations only in 1980, all end in 2010.
In the next step data is aggregated to the regional level:


In [25]:
def aggregation_regions(x):
    """
    This function aggregates country-level damages and variables to
    regional level.
    Parameters
    ----------
    x : DataFrame
        country-level damages and other indicators for all model combinations

    Returns
    -------
    DataFrame
        regionally aggregated damages and other indicators
    """
    aggregated_model_damages_pos = x['D_CliExp_Pos'].sum()
    aggregated_model_damages_neg = x['D_CliExp_Neg'].sum()
    aggregated_model_damages_1980_pos = x['D_1980_Pos'].sum()
    aggregated_model_damages_1980_neg = x['D_1980_Neg'].sum()
    aggregated_model_damages_2010_pos = x['D_2010_Pos'].sum()
    aggregated_model_damages_2010_neg = x['D_2010_Neg'].sum()
    aggregated_observed_damages_pos = (x['D_obs_dummy_Pos']).sum()
    aggregated_observed_damages_neg = (x['D_obs_dummy_Neg']).sum()
    return pd.Series([aggregated_model_damages_pos,
                      aggregated_model_damages_neg,
                      aggregated_model_damages_1980_pos,
                      aggregated_model_damages_1980_neg,
                      aggregated_model_damages_2010_pos,
                      aggregated_model_damages_2010_neg,
                      aggregated_observed_damages_pos,
                      aggregated_observed_damages_neg],
                     index=['D_CliExp_Pos', 'D_CliExp_Neg',
                            'D_1980_Pos', 'D_1980_Neg',
                            'D_2010_Pos', 'D_2010_Neg',
                            'D_obs_dummy_Pos',
                            'D_obs_dummy_Neg'])

In [26]:
def region_aggregation(cols, dataFrame):
    """
    This function is a wrapper for the aggregation and selects the columns to
    be aggregated to regional level.

    Parameters
    ----------
    cols : string list
        Columns to be aggregated

    Returns
    -------
    DataFrame
         regionally aggregated damages and other indicators regions
    """
    data_region = dataFrame.groupby(['Year', 'GHM', 'clim_forc', 'Region'])\
                                     [cols].apply(aggregation_regions)\
                                     .reset_index()  # groupby year and model

    return data_region

In [27]:
# define columns to aggregate
agg_cols = ['D_CliExp_Pos', 'D_CliExp_Neg',
                            'D_1980_Pos', 'D_1980_Neg',
                            'D_2010_Pos', 'D_2010_Neg',
                            'D_obs_dummy_Pos',
                            'D_obs_dummy_Neg']
# execute aggregation
reg_data = region_aggregation(agg_cols, ass_data)
reg_data

Unnamed: 0,Year,GHM,clim_forc,Region,D_CliExp_Pos,D_CliExp_Neg,D_1980_Pos,D_1980_Neg,D_2010_Pos,D_2010_Neg,D_obs_dummy_Pos,D_obs_dummy_Neg
0,1971,lpjml,princeton,LAM,2.945511e+08,3.689331e+08,4.544100e+08,7.535720e+08,9.788589e+08,2.261250e+09,0.000000e+00,0.000000e+00
1,1971,lpjml,wfdei,LAM,5.057604e+08,1.615544e+09,8.113460e+08,3.189673e+09,1.590751e+09,8.641227e+09,0.000000e+00,0.000000e+00
2,1971,matsiro,princeton,LAM,4.009538e+08,3.686333e+08,6.263636e+08,7.416500e+08,1.339336e+09,1.958968e+09,0.000000e+00,0.000000e+00
3,1971,matsiro,wfdei,LAM,4.037848e+08,6.881197e+08,6.456926e+08,1.341611e+09,1.163209e+09,5.457415e+09,0.000000e+00,0.000000e+00
4,1971,orchidee,princeton,LAM,2.973702e+08,3.501032e+08,4.571187e+08,7.174854e+08,9.864467e+08,2.046198e+09,0.000000e+00,0.000000e+00
...,...,...,...,...,...,...,...,...,...,...,...,...
235,2010,lpjml,wfdei,LAM,2.271367e+10,1.085733e+10,9.249558e+09,4.208104e+09,2.271367e+10,1.085733e+10,1.135683e+09,5.428664e+08
236,2010,matsiro,princeton,LAM,2.194935e+10,3.691890e+10,8.399946e+09,9.748158e+09,2.194935e+10,3.691890e+10,1.097468e+09,1.845945e+09
237,2010,matsiro,wfdei,LAM,2.528861e+10,1.713264e+10,9.990795e+09,6.840531e+09,2.528861e+10,1.713264e+10,1.264430e+09,8.566319e+08
238,2010,orchidee,princeton,LAM,1.746115e+10,3.591611e+10,7.277341e+09,9.231161e+09,1.746115e+10,3.591611e+10,8.730574e+08,1.795805e+09


The data is now aggregated over all countries, but there still remain time series for each model.
In the next step, we build the model median and quantiles:

In [28]:
def func_median(x):
    """
    This function aggregates the damages and other indicators from the
    different model runs to the model median and adds basic statistics such as
    the one-third and two-third quantiles.

    Parameters
    ----------
    x : DataFrame
        regionally aggregated damages and other indicators for all model
        combinations

    Returns
    -------
    DataFrame
         model medians of regionally aggregated damages and other indicators
    """
    # identify the median of the model data:
    median_model_damages_pos = x['D_CliExp_Pos'].median()  # =quantile(0.5)
    median_model_damages_neg = x['D_CliExp_Neg'].median()
    median_model_damages_1980_pos = x['D_1980_Pos'].median()  # =quantile(0.5)
    median_model_damages_1980_neg = x['D_1980_Neg'].median()
    median_model_damages_2010_pos = x['D_2010_Pos'].median()  # =quantile(0.5)
    median_model_damages_2010_neg = x['D_2010_Neg'].median()
    median_observed_damages_pos = (x['D_obs_dummy_Pos']).mean()  # all the same value
    median_observed_damages_neg = (x['D_obs_dummy_Neg']).mean()  # all the same value
    one_third_quantile_model_damages_pos = x['D_CliExp_Pos'].quantile(0.3)  # 30
    two_third_quantile_model_damages_pos = x['D_CliExp_Pos'].quantile(0.7)
    one_third_quantile_model_damages_neg = x['D_CliExp_Neg'].quantile(0.3)  # 30
    two_third_quantile_model_damages_neg = x['D_CliExp_Neg'].quantile(0.7)
    one_third_quantile_model_damages_1980_pos = x['D_1980_Pos'].quantile(0.3)  # 30
    two_third_quantile_model_damages_1980_pos = x['D_1980_Pos'].quantile(0.7)  # 70
    one_third_quantile_model_damages_1980_neg = x['D_1980_Neg'].quantile(0.3)  # 30
    two_third_quantile_model_damages_1980_neg = x['D_1980_Neg'].quantile(0.7)  # 70

    one_third_quantile_model_damages_2010_pos = x['D_2010_Pos'].quantile(0.3)  # 30
    two_third_quantile_model_damages_2010_pos = x['D_2010_Pos'].quantile(0.7)  # 70
    one_third_quantile_model_damages_2010_neg = x['D_2010_Neg'].quantile(0.3)  # 30
    two_third_quantile_model_damages_2010_neg = x['D_2010_Neg'].quantile(0.7)  # 70

    # flood_vol = x['FloodVol_Flopros'].median()
    return pd.Series([median_model_damages_pos,
                      median_model_damages_neg,
                      median_model_damages_1980_pos,
                      median_model_damages_1980_neg,
                      median_model_damages_2010_pos,
                      median_model_damages_2010_neg,
                      median_observed_damages_pos,
                      median_observed_damages_neg,
                      one_third_quantile_model_damages_pos,
                      two_third_quantile_model_damages_pos,
                      one_third_quantile_model_damages_neg,
                      two_third_quantile_model_damages_neg,
                      one_third_quantile_model_damages_1980_pos,
                      two_third_quantile_model_damages_1980_pos,
                      one_third_quantile_model_damages_1980_neg,
                      two_third_quantile_model_damages_1980_neg,
                      one_third_quantile_model_damages_2010_pos,
                      two_third_quantile_model_damages_2010_pos,
                      one_third_quantile_model_damages_2010_neg,
                      two_third_quantile_model_damages_2010_neg],
                     index=['D_CliExp_Pos',
                            'D_CliExp_Neg',
                            'D_1980_Pos',
                            'D_1980_Neg',
                            'D_2010_Pos',
                            'D_2010_Neg',
                            'D_obs_dummy_Pos',
                            'D_obs_dummy_Neg',
                            'D_CliExp_Pos_onethird_quantile',
                            'D_CliExp_Pos_twothird_quantile',
                            'D_CliExp_Neg_onethird_quantile',
                            'D_CliExp_Neg_twothird_quantile',
                            'D_1980_Pos_onethird_quantile',
                            'D_1980_Pos_twothird_quantile',
                            'D_1980_Neg_onethird_quantile',
                            'D_1980_Neg_twothird_quantile',
                            'D_2010_Pos_onethird_quantile',
                            'D_2010_Pos_twothird_quantile',
                            'D_2010_Neg_onethird_quantile',
                            'D_2010_Neg_twothird_quantile'])

In [29]:
def model_aggregation(cols, dataFrame, years, select_model):
    """
    This function is a wrapper for the multi-model aggregation and provides
    the model median for each region of all variables.

    Parameters
    ----------
    cols : string list
        Columns to be aggregated

    Returns
    -------
    DataFrame
         regionally aggregated model medians
    """

    if select_model:

        dataFrame = dataFrame[dataFrame['GHM'] == select_model]

    data_models = dataFrame[(dataFrame['Year'] <= np.max(years)) &
                            (dataFrame['Year'] >= np.min(years))]
    # Get the median for model and datasets
    data_models = data_models.groupby(['Year', 'Region'])\
                              [cols].apply(func_median).reset_index()

    return data_models

In [30]:
years = np.arange(1971,2011)
med_mod_data = model_aggregation(agg_cols, reg_data, years, None)
med_mod_data

Unnamed: 0,Year,Region,D_CliExp_Pos,D_CliExp_Neg,D_1980_Pos,D_1980_Neg,D_2010_Pos,D_2010_Neg,D_obs_dummy_Pos,D_obs_dummy_Neg,...,D_CliExp_Neg_onethird_quantile,D_CliExp_Neg_twothird_quantile,D_1980_Pos_onethird_quantile,D_1980_Pos_twothird_quantile,D_1980_Neg_onethird_quantile,D_1980_Neg_twothird_quantile,D_2010_Pos_onethird_quantile,D_2010_Pos_twothird_quantile,D_2010_Neg_onethird_quantile,D_2010_Neg_twothird_quantile
0,1971,LAM,402369300.0,528526400.0,636028100.0,1047591000.0,1251272000.0,3859332000.0,0.0,0.0,...,368783200.0,1151832000.0,541741200.0,728519300.0,747611000.0,2265642000.0,1074828000.0,1465043000.0,2153724000.0,7049321000.0
1,1972,LAM,646714200.0,723212300.0,1028985000.0,1028685000.0,2866084000.0,3587172000.0,0.0,0.0,...,359502800.0,1084149000.0,837875500.0,1311003000.0,631032300.0,1623574000.0,2099250000.0,3692527000.0,2577915000.0,5066493000.0
2,1973,LAM,815102700.0,1165546000.0,1210916000.0,1800896000.0,3123314000.0,6070501000.0,0.0,0.0,...,690952400.0,1911286000.0,1111558000.0,1556728000.0,1080324000.0,3017047000.0,2812444000.0,3762174000.0,3798410000.0,9475175000.0
3,1974,LAM,246929700.0,5895776000.0,341615500.0,8737634000.0,955680400.0,30035730000.0,0.0,0.0,...,4884319000.0,6622060000.0,312734600.0,611614500.0,7246017000.0,9776114000.0,764248200.0,1325569000.0,24361260000.0,34964780000.0
4,1975,LAM,735667900.0,1252483000.0,962759600.0,1854030000.0,2283276000.0,5245553000.0,0.0,0.0,...,663521800.0,1829390000.0,693979800.0,1423361000.0,929943000.0,2732020000.0,1460793000.0,3653704000.0,3402896000.0,7534373000.0
5,1976,LAM,569054800.0,1089701000.0,727354500.0,1525657000.0,1588228000.0,4114654000.0,0.0,0.0,...,869301100.0,1837960000.0,470603200.0,1768818000.0,1217419000.0,2594562000.0,1044985000.0,3673838000.0,3100373000.0,6283568000.0
6,1977,LAM,588755800.0,718442700.0,676511800.0,872116200.0,1528850000.0,3321514000.0,0.0,0.0,...,638541600.0,792763600.0,621423800.0,736743700.0,774653400.0,997208200.0,1479644000.0,1741990000.0,2832951000.0,3667434000.0
7,1978,LAM,567404300.0,1005163000.0,672779700.0,1187181000.0,1539537000.0,3648889000.0,0.0,0.0,...,889594600.0,1542610000.0,456776100.0,1356205000.0,1071408000.0,1830328000.0,957701400.0,3106723000.0,2920028000.0,6619550000.0
8,1979,LAM,942882600.0,6272667000.0,1009824000.0,6594910000.0,2677416000.0,21953030000.0,0.0,0.0,...,6120509000.0,6469786000.0,794576700.0,1244181000.0,6397543000.0,6764037000.0,1861484000.0,3518487000.0,21326570000.0,22796330000.0
9,1980,LAM,690965700.0,6501249000.0,690965700.0,6501249000.0,1854897000.0,23476680000.0,1078941000.0,9669952000.0,...,5976350000.0,6669167000.0,687533800.0,723707100.0,5976350000.0,6669167000.0,1817068000.0,1928032000.0,21312790000.0,24830200000.0


Now the time series are ready to be used as an input for the vulnerability assessment in the next notebook 'DemoVulnerabilityAssessments.ipynb'. The data frame also contains quantile time series.We can save its as an output as .csv

In [33]:
med_mod_data.to_csv('~/projects/NC_Submission/Climada_papers/Test/DEMO_ModelMedianSubregions.csv')