# Advanced droughts workflow

Click [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/CLIMAAX/DROUGHTS/main?labpath=DROUGHTS_notebook_1.ipynb) to launch this workflow on MyBinder. 

# About droughts and droughts' risks

## What is a drought?

Simply stated, drought is ‘the extreme persistence of precipitation deficit over a specific region for a specific period of time’ $^1$. Droughts are often classified into three main types different by their severity, impacts, and time scales:

1. <ins>Meteorological drought</ins> is often caused by short-term precipitation deficiency and its impacts highly depend on its timing. For example, lack of rain during the sprouting phase in rain-fed agriculture could lead to crop failure. 
2. <ins>Agricultural drought</ins> is a medium-term phenomenon, characterized by reduced soil moisture content and is caused by a prolonged period of meterological drought. 
3. On the long-term, <ins>hydrological drought</ins> is characterized by lower stream flow, reduced water level in water bodies, and may affect groundwater storage. 

The cascade between drought types is goverened by the severity (i.e., magnitude), duration, and spatio-temporal distribution of drought events.

## What is drought risk?

Drought risk is a measure for quantifying the likelyhood of a meaningfull impact from drought-event(s)
on human population, its economic activity and assets, and the environment. The risk for an impact depends on the <ins>drought hazard</ins>, <ins>exposure</ins>, and the <ins>vulnerability</ins> to droughts. <ins>Hazard</ins> measures the magnitude, duration, and timing of drougt events. <ins>Exposure</ins> to droughts represent the spatial distribution of drought relative to distribution of potentially impactful systems, e.g., location of cultivated land, wetlands, etc. Finally, <ins>vulnerability</ins> stands for the level of impact expected for a given system during a given event, and is affected by the systems' intrinsic attributes. For example, fields with drought-resistent crops varities would be less vulnerable to droughts.


## How do we assess drought risk?

There are many different metrics to assess drought risk, which account for at least one of the risk factors: hazard, exposure and vulnerability.

This workflow quantifies drought risk as the product of drought hazard, exposure and vulnerability. The methodology used here was developed and applied globally by Carrão et al. (2016) $^2$. The result of this workflow is a risk map showing the relative drought risk of different spatial units (i.e., subnational administrative regions) from a larger region (i.e., the European Union). Regional drought risk scores are on a scale of 0 to 1, with 0 representing the lowest risk and 1 the highest. The workflows takes each risk determinant (i.e. hazard, exposure and vulnerability) and normalised it taking into account its maximum and minimum values across all sub-national administrative regions. Thus, the results of this drought risk workflow are relative to the sample of geographic regions used for normalisation. The proposed risk scale is not a measure of absolute losses or actual damage, but a relative comparison of drought risk between the input regions. Therefore, the resulting data and mapping can help users to assess in which sub-administrative units within a jurisdictions the drought risk is or will be higher, allowing for better resouce allocation and better coordination within and between different levels of government.

Below is a description of the data and tools used to calculate drought hazard, exposure and vulnerability, both for the historic period and for future scenarios, and the outputs of this workflow. 

For the future scenarios, we decided to follow the SSP-RCPs combinations as in the IPCC 6th assessment report (https://www.ipcc.ch/assessment-report/ar6/).

More expert users can find a more detailed and technical explanation on how hazard, exposure and vulnerability are quantified in the colored text boxes. 


## Datasets (historic and future projections)

In this workflow the following data is used:


#### Spatial units: 

We used GeoJSON maps of NUTS2 and NUTS3 regions to define the selected spacial units, which can be downloaded at this link https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/


### Hazard data and methods:

Drought hazard (dH) for a given region is estimated as the probability of exceedance the median of regional (e.g., EU level) severe precipitation deficits for an historical reference period (e.g. 1979-2019) or for a future projection period (e.g. 2015-2100).

For estimating drought hazard, this workflows requires monthly total precipitation for each NUTS2 or NUTS3 region during the historical reference period or future projection period. Usually, these are observation-based or simulated time-series of gridded precipitation data. In the historic workflow, we used GSWP3 and W5E5 global meteorological forcing data processed for ISIMIP3a, sets on a 0.5°x0.5°C global grid and at daily time steps for the historical period of 1979-2019 (https://doi.org/10.48364/ISIMIP.982724.2). For the future projections, we used the ISIMIP3b bias-adjusted atmospheric climate input data, available for 5 CMIP6 global climate models (GFDL-ESM4, IPSL-CM6A-LR, MPI-ESM1-2-HR, MRI-ESM2-0, UKESM1-0-LL), and three SSP-RCPs combinations (SSP126, SSP370, SSP585) (https://doi.org/10.48364/ISIMIP.842396.1). There is no minimum requirement for the length of the precipitation record in this workflow, but as individual drought events can last for months or even years, we recommend that at least several decades are included.

Processing these data is performed by applying Geographic Information System (GIS) techniques, to extract an aggregated value (e.g., total precipitation) of the data points located within each area of interest (e.g., NUTS2 region). Zonal statistics is widely used for that purpose, and it was the method used in our data processing.

Point, observation-based datasets are an alternative data source, usually collected by meteorological station networks. One can choose the data collected in one or more (e.g., average) representative station per area of interest to construct a NUTS2 level dataset. 

Our workflow expects a table where each row represents the total precipitation in mm for a month/year combination, and each column represents an area of interest (e.g. NUTS2 region). The first column contains the date in this format YYYY-MM-DD. The **title of the first columns has to be 'timing' and the rest of the titles have to be the codes of the areas of interest (e.g. NUTS2), which have to be identical to the codes as they appear in the NUTS2 or NUTS3 spatial data from the [European Commission](https://ec.europa.eu/eurostat/en/web/nuts/background)**.

A pre-processed table with precipitation data for European countries at NUTS3 level is already provided, which can be fed directly into the workflow (see sample_data folder).

Precipitation data are then analysed by calculating precipitation deficit events for each region and measuring their severity using the weighted anomaly of standardised precipitation (WASP) index. The result is a list of drought events and their severity for each selected region (e.g. NUTS3 regions) for the reference period, which is then compared with the median of severe precipitation deficits for the same period for all regions considered (e.g. EU level) to calculate the probability (dH) of each region being affected by a drought event (i.e. exceeding the EU median of severe precipitation deficits). For more details on the how the WASP index is calculated see the colored box below.

<div class="alert alert-block alert-warning">
<b>Quantifying drought hazard</b> 
Drought hazard (dH) for a given region is estimated as the probability of exceedance the median of regional (e.g., EU level) severe precipitation deficits for an specified time reference period (historic or future).

A severe precipitation deficity is calculated using the weighted anomaly of standardized precipitation (WASP) index. This index accounts for precipitation seasonal patterns and is computed by summing weighted and standardized monthly precipitation anomalies $^3$.

We use the weighted anomaly of standardized precipitation (WASP) index to define the severity of precipitation deficit. The WASP-index takes into account the annual seasonality of precipitation cycle and is computed by summing weighted standardized monthly precipitation anomalies (see Eq. 1). Where $P_{n,m}$ is each region's monthly precipitation, $T_m$ is a monthly treshold defining precipitation severity, and $T_A$ is an annual threshold for precipitation severity. The thresholds are defined by dividing multi-annual monthly observed rain using the 'Fisher-jenks' classigication algorithm $^4$. 

Eq. 1: $$WASP_j = \Sigma_{P_{n,m} < T_m}^{P_{n,m} >= T_m}( \frac{P_{n,m} - T_m}{T_m})*\frac{T_m}{T_A}$$
</div>

### Exposure data and methods:

Drought exposure (dE) indicates the potential losses from different types of drought hazards in different geographical regions. In general, exposure data identifies and quantifies the different types of physical entities on the ground, including built assets, infrastructure, agricultural land, people, livestock, etc. that can be affected by drought (e.g. the number of cars does not count).

Quantyfing drought exposure utilizes a non-compensatory model to account for the spatial distribution of a potential impact for crops and livestock, competition on water (e.g., for industrial uses represented by the water stress indicator), and human direct need (e.g., for drinking represneted by population size). More information can be found in the dropdown box below.

In this workflow we used the following data (provided in the exposure sample file):

#### Historic

| Data item | Description | Format and processing | Source |
| :-: | :- | :- | :-: |
| Cropland | Harvested land represents the exposure of agricultural activity to droughts. SPAM is a global crop distribution model covering 42 crops and four different technologies available for 2010 (latest). The model outputs include both harvested and physical cropland. | 5 arc-minutes crop-specific grid. All grids are to be summed and aggregated (using Zonal Statistics) per area of interest. | https://mapspam.info/ |
| Livestock density | Livestock density represents the exposure of animal husbandry systems to droughts. The Gridded Livestock of the World maps (GLW) show the density of eight different livestock animals in 2010 and 2015. | 5 arc-minutes animal-specific grid. All grids are to be summed and aggregated (using Zonal Statistics) per area of interest. | https://www.fao.org/livestock-systems/global-distributions/en/ |
| Competition on water | The water stress indicator is a proxy for competition on water, as it accounts for both multi-sectoral water demand, relative to the abundance of water. Values higher than 0.4 indicate on severe water stress and a high competition on water resources. |  The Water Futures and Solution initiative provided multimodel current and future water stress estimates at 0.5 degree spatial resolution. This gridded data can be extracted to the relevant NUTS2 by using GIS techniques, like zonal statistics. | https://pure.iiasa.ac.at/id/eprint/13008/ |
| Human direct need | Population counts represent the basic drinking water requirements across regions. Considering a similar economic and social context, these counts can also indicate the toal doemtic water demand. Global gridded population products are available at high resolution and multiple years, yet for the scope of the EU, a data from EUROSTAT is readily available.| EUROSTAT data is available as tabular format for the NUTS2 regions.| https://ec.europa.eu/eurostat/ |

#### Future

| Data item | Description | Format and processing | Source |
| :-: | :- | :- | :-: |
| Cropland | Cropland landcover under different Shared Socio Economic Pathways (SSPs) is a downscaled dataset of the integrated assessment model (IAM) GCAM. | The grid-cell area of a 30 arc-seconds land use grid, was summed for all cropland cells and aggregated (using Zonal Statistics) per area of interest. | [Zhang, Cheng, and Wu, 2023](https://www.nature.com/articles/s41597-023-02637-7#Sec1) |
| Competition on water | The water stress indicator is a proxy for competition on water, as it accounts for both multi-sectoral water demand, relative to the abundance of water. Values higher than 0.4 indicate on severe water stress and a high competition on water resources. |  Aqueduct v.4 provides global water-stress estimates at sub-catchment scale.  We have rasterized the water stress and water withdrawal, and calculated a weighted average water stress per unit of interest. | https://www.wri.org/data/aqueduct-global-maps-40-data |
| Human direct need | Population counts represent the basic drinking water requirements across regions. Considering a similar economic and social context, these counts can also indicate the total domestic water demand. Global gridded population products are available at high resolution and multiple years, and for this analysis - the rural and urban populations grid from Global CWatM were used.| Global CWatM provides rural and urban population grids at a spatial resolution of 5 arc-minutes. | - |


The algorithm expects a table in which each row represent an area of interest, and each column a variable. The **first column contains the codes of the area of interest (e.g., NUTS2), which have to be identical to the codes as they appear in the NUTS2 spatial data from the [European Commision](https://ec.europa.eu/eurostat/en/web/nuts/background)**.

Depending on the region of interest, other indicators may also be relevant for estimating drought exposure. We recommend that users research the most relevant factors in the region that may be exposed to drought before starting the analysis. 



<div class="alert alert-block alert-warning">
<b>Quantyfing drought exposure</b> uses a non-compensatory model to account for the spatial distribution of potential impacts on crops and livestock, competition for water (e.g. for industrial uses represented by the water stress indicator) and direct human demand (e.g. for drinking water represented by population size). We apply a <ins>Data Envelopment Analysis</ins> (DEA) to determine the relative exposure of each region to drought.

<ins>Data Envelopment Analysis (DEA) $^5$</ins>

Data Envelopment Analysis (DEA) has been widely used to assess the efficiency of decision making units (DMUs) in many areas of organisational performance improvement, such as financial institutions, manufacturing companies, hospitals, airlines and government agencies. In the same way that DEA estimates the relative efficiency of DMUs, it can also be used to quantify the relative exposure of a region (in this case the DMUs) to drought from a multidimensional set of indicators.

DEA works with a set of multiple inputs and outputs. In our case, the regions are only described by inputs, the indicators, so a dummy output can be used which has a unit value, i.e. all outputs are the same and equal, e.g. 1000. The efficiency of each region is then estimated as a weighted sum of outputs divided by a weighted sum of inputs, where all efficiencies are constrained to lie between zero and one. An optimisation algorithm is used for the weights to achieve the highest efficiency.

The exposure raw data is normalized using a linear transformation, as described in Eq. 2:

Eq. 2: $$Z_i = \frac{X_i - X_{min}}{X_{max} - X_{min}}$$
</div>


### Vulnerability data and methods:

Vulnerability data describes the elements that make a system susceptible to a natural hazard, which vary depending on the type of hazard and the nature of the system. However, there are some generic indicators such as poverty, health status, economic inequality and aspects of governance, which apply to all types of exposed parts and therefore remain constant despite changes in the type of hazard that pose a risk.

In this workflow, the selection of proxy indicators representing the economic, social, and infrastructural factors of drought vulnerability in each geographic location follows the criteria defined by Naumann et al. (2014): the indicator has to represent a quantitative or qualitative aspect of vulnerability factors to drought (generic or specific to some exposed element), and public data need to be freely available at the global scale.

Drought vulnerability is calculated by combining indicators for each factor (economic, social and infrastructure) for each region with a non-compensatory model, as done for exposure, and then aggregating the DEA results for the three factors to obtain a drought vulnerability (dV) score (see colored box below for more details).

Examples of indicators that we can find at a subnational resolution and that are included in the vulnerability sample file provided: 

#### Historic

| Variable prefix | Data item | Description | Format and processing | Correlation with Vulnerability | Source |
| :-: | :- | :- | :- | :-: | :-: |
| Economic_ | Energy use per person  | Per capita energy consumption. This dataset is produced annually by U.S. Energy Information Administration (EIA), and it is available per region and per country. | Data is available as tabular format at the country level, expressed in kilowatt-hours per capita, for years 1965-2022. | - | https://ourworldindata.org/grapher/per-capita-energy-use |
| Economic_ | Agriculture value added on the GDP| Describes the value added on the GDP (in percentage) of agriculture, forestry, and fishing. | Data is available as tabular format at the country level. | + | https://data.worldbank.org  |
| Economic_ | GDP per capita (current US dollar) | Gross domestic product (GDP) is a monetary measure of the market value of all the final goods and services produced in a specific time period by a country or countries. | Data is available as tabular format at the country level, expressed in current US dollar. | - | https://ec.europa.eu/eurostat/web/main/data/database |
| Economic_ | Poverty headcount ratio at 2.15 dollars a day (PPP) | Cross-country comparison of key poverty and inequality indicators. Data are based on primary household survey data obtained from government statistical agencies and World Bank country departments. | Data is available as tabular format at the country level, as percentage of total population. | + |  https://data.worldbank.org |
| Social_ | Rural population | Percentage of total population in a country or region that lives in rural areas. | Data is available as tabular format at the country level. | + | https://data.worldbank.org |
| Social_ | Safely managed drinking water services | The indicator is computed as the number of people who use safely managed drinking water services and expressed as the percentage of total population. | Data is available as tabular format at the country level for years 2000-2022. | - | https://data.worldbank.org |
| Social_ | Life expectancy at birth (years) | Life expectancy is a statistical measure of the estimate of the span of a life.  | Data is available as tabular format at the country level, expressed in years, for years 1960-2021. | - | https://ec.europa.eu/eurostat/web/main/data/database |
| Social_ | Population ages 15–64 | Data show the percentage of total population between age 15 and 64 (working age) for each region and country. | Data is available as tabular format at the country level for years 1960-2022. | - | https://data.worldbank.org |
| Social_ | Refugee population by country or territory of asylum | Number of people in a  country or territory of asylum which was registerd as a refugee. | Data is available as tabular format at the country level for years 1960-2022. | + | https://data.worldbank.org |
| Social_ | Government Effectiveness | Government Effectivenesse is one of the indicators used by the Worldwide Governance Indicators (WGI) project,that features six aggregate governance indicators for over 200 countries and territories over the period 1996–2022. Government effectiveness captures perceptions of the quality of public services, the quality of the civil service and the degree of its independence from political pressures, the quality of policy formulation and implementation, and the credibility of the government's commitment to such policies. | The six aggregate indicators are reported in tabular format in two ways: (1) in their standard normal units, ranging from approximately -2.5 to 2.5, and (2) in percentile rank terms from 0 to 100, with higher values corresponding to better outcomes. | + | https://www.gu.se/en/quality-government/qog-data/data-downloads/european-quality-of-government-index
| Social_ | Management of Water related Disasters | Self reporting on national compliance with the SDG 6.5.1 targets: Management of water-related disasters (3.1e). | The data represents the percent of compliance between 0-100, and is given at a country scale in a tabular format for the year 2020. | - | http://iwrmdataportal.unepdhi.org/country-reports | 
| Infrast_ | Agricultural irrigated land (percentage of total agricultural land) | Agricultural land is the combination of crop (arable) and grazing land. Data show the percentage of total agricultural land area which is irrigated (i.e. purposely provided with water), including land irrigated by controlled flooding. | EUROSTAT data is available as tabular format at the NUTS2 level. | - | https://ec.europa.eu/eurostat/web/main/data/database
| Infrast_ | Road density | The Global Roads Inventory Project is a harmonized global dataset of aproximately 60 geospatial datasets on road infrastructure collected for 2018. This dataset includes 5 road types: highways/ primary/ secondary/ tertiary/ local roads. |  5 arc-minutes grid. All grids are to be summed and aggregated (using Zonal Statistics) per area of interest. | - | https://www.globio.info/download-grip-dataset

#### Future

 Variable prefix | Data item | Description | Format and processing | Correlation with Vulnerability | Source |
| :-: | :- | :- | :- | :-: | :-: |
| Economic_ | GDP per capita (current US dollar) | Gross domestic product (GDP) is a monetary measure of the market value of all the final goods and services produced in a specific time period by a country or countries. | Data is available as global grids at a 30 arc-secondes resolution. | - | [Wang and Fubao, 2022](https://zenodo.org/records/5880037) |
| Economic_ | Poverty headcount ratio at 3.2 dollars a day (PPP) | Cross-country comparison of key poverty and inequality indicators. Data are based on primary household survey data obtained from government statistical agencies and World Bank country departments. | Data is available as tabular format at the country level, as percentage of total population. | + | [Rao et al., 2019](https://zenodo.org/records/5880037)|
| Social_ | Rural population | Percentage of total population in a country or region that lives in rural areas. | Data is available as global grids at a 30 arc-secondes resolution from Global CWatM. The share of rural population was calculated by dividing the rural by the total population counts. | + | - |
| Social_ | Population ages 15–64 | Data show the percentage of total population between age 15 and 64 (working age) for each region and country. | Data is available as tabular format at the country level from the IIASA SSP database. | - | [Samir and Lutz, 2014](https://doi.org/10.1016/j.gloenvcha.2014.06.004) |

The algorithm expects a table in which each row represent an area of interest, and each column a variable. **Each variable has to be named with a prefix according to the factor, i.e. Social_ Economic_ or Infrast_, followed by a number or the name of the variable. The first column contains the codes of the area of interest (e.g., NUTS2), which have to be identical to the codes as they appear in the NUTS2 spatial data from the European Commision.**

As for exposure, the indicators listed here are a suggestion based on the most common proxies for economic, social, and infrastructural factors of drought vulnerability in each geographic location. We recommend that users research the most relevant factors in the region that make it vulnerable to drought before starting the analysis.

<div class="alert alert-block alert-warning">
<b>Quantifying drought vulnerability</b> Vulnerability to drought is computed as a 2-step composite model that derives from the aggregation of proxy indicators representing the economic, social, and infrastructural factors of vulnerability at each geographic location. 

In the first step, indicators for each factor (i.e. economic, social and infrastructural) are combined using a DEA model (see above), as similar as for drought exposure. In the second step, individual factors resulting from independent DEA analyses are arithmetically aggregated (using the simple mean) into a composite model of drought vulnerability (dV):

Eq. 3: $$dv_i = \frac{Soc_i + Econ_i + Infr_i}{3}$$

where Soc$_i$, Econ$_i$, and Infr$_i$ are the social, economic and infrastructural vulnerability factors for geographic location (or region) $i$.

The normalization of the vulnerability indicator is also done using a linear transformation (see Eq. 2), and it accounts to the correlation of the indicator with drought vulberability. In case of negative correlation (e.g., GDP per capita), the normalized score is estimated as $1 - Z_i$.
</div>




# Workflow implementation

### Load libraries

In this notebook we will use the following Python libraries:
- [os](https://docs.python.org/3/library/os.html) - To create directories and work with files
- [urllib](https://docs.python.org/3/library/urllib.html) - To access to online resources
- [pandas](https://pandas.pydata.org/docs/user_guide/index.html) - To create and manage data frames (tables) in Python
- [geopandas](https://geopandas.org/en/stable/docs.html) - Extend pandas to store and manipulate spatial data
- [numpy](https://numpy.org/doc/stable/) - For basic math tools and operations
- [scipy](https://scipy.org/) - Provide advanced mathematical tools and optimization capacities 
- [jenkspy](https://github.com/mthh/jenkspy) - To apply Fisher-Jenks alogrithm 
- [json](https://docs.python.org/3/library/json.html) - To load, store and manipuilate JSON objects
- [pyproj](https://pyproj4.github.io/pyproj/stable/) - An interface to a geographic projections and transformations library
- [matplotlib](https://matplotlib.org/) - For plotting
- [plotly](https://plotly.com/python/) - For dynamic and interactive plotting
- [datetime](https://docs.python.org/3/library/datetime.html) - For handling dates in Python

In [27]:
# lOAD LIBRARIES
import os
import urllib
os.environ['USE_PYGEOS'] = '0'
import pandas as pd
import geopandas as gpd
import numpy as np
import scipy
import jenkspy
import json
import pyproj
import plotly.express as px
import matplotlib.pyplot as plt
from datetime import datetime

# READ SCRIPTS
# adapted from https://github.com/metjush/envelopment-py/tree/master used for DEA 
from envelopmentpy.envelopment import *

# Function for calculating drought hazard indices
%run DROUGHTS_functions.ipynb


In [32]:
# function to create and save results
def saveResults(nuts, pattern, pattern_h, pth):
    regions = nuts['NUTS_ID']
    # Load precipitation data
    print("Analyzing drought hazard. This process may take few minutes...")
    print('\n')
    precip = pd.read_csv(os.path.join(workflow_folder, "drought_hazard_{}.csv".format(pattern_h)))
    # convert timing column to datetime
    precip['timing'] = pd.to_datetime(precip['timing'], format = '%Y-%m-%d') 
    #'%b-%Y'
    
        # col_subset aims to extract the relevant results

    col_subset = list(precip.columns.str.contains(ccode))
    col_subset[0] = True
    precip = precip.loc[:, col_subset]

    # clean NaN rows & missing columns
    precip = precip.loc[~np.array(precip.isna().all(axis = 1)),:]

    drop_regions = []

    # missing data in columns
    col_subset = np.array(precip.isna().all(axis = 0))
    drop_regions += list(precip.columns[col_subset])
    precip = precip.loc[:, ~col_subset]

    # missing data column in data
    col_subset =  np.isin(regions,precip.columns)
    drop_regions += list(regions[~col_subset])
    regions = regions[col_subset]
    
    output = pd.DataFrame(regions, columns = ['NUTS_ID'])

    # print head of the table
    print("The following regions are dropped due to missing data: "+ str(drop_regions))
    print('\n')
    print('Input precipitation data (top 3 rows): ')
    print(precip.head(3))

    print('\n')
    # create empty arrays and tables for intermediate and final results
    WASP = []
    WASP_global = []
    drought_class = precip.copy()

    # prepare output for drought event index - WASP_j- list of lists wasp = [[rid1], [rid2], ...]
    for i in range(1, len(precip.columns)):


        # For every NUTS3 out of all regions - do the following:

        # empty array for the monthly water deficit thresholds
        t_m = []
        for mon_ in range(1, 13):
            # For every month out of all all months (January, ..., December) - do the following:

            # calculcate monthly drought threshold -\
                # using a division of the data into to clusters with the Jenks' (Natural breaks) algorithm
            r_idx = precip.index[precip.timing.dt.month == mon_].tolist()
            t_m_last = jenkspy.jenks_breaks(precip.iloc[r_idx, i], n_classes = 2)[1]
            t_m.append(t_m_last)

            # Define every month with water deficity (precipitation < threshold) as a drought month
            drought_class.iloc[r_idx, i] = (drought_class.iloc[r_idx, i] < t_m_last).astype(int)

        # calculate annual water deficit threshold
        t_a = sum(t_m)

        # calculate droughts' magnitude and duration using the WASP indicator
        WASP_tmp = []
        first_true=0
        index = []
        for k in range(1, len(precip)):
            # for evary row (ordered month-year combinations):
                # check if droguht month -> calculate drought accumulated magnitude (over 1+ months)
            if drought_class.iloc[k, i]== 1:
                # In case of a drought month.
                # calculate monthly WASP index
                index = int(drought_class.timing.dt.month[k] - 1)
                # WASP monthly index: [(precipitation - month_threshold)/month_threshold)]*[month_threshold/annual_treshold]
                WASP_last=((precip.iloc[k,i] - t_m[index])/t_m[index])* (t_m[index]/t_a)

                if first_true==0:
                    # if this is the first month in a drought event:
                    # append calculated monthly wasp to WASP array.
                    WASP_tmp.append(WASP_last)
                    first_true=1
                else:
                    # if this is NOT the first month in a drought event:
                    # add the calculated monthly wasp to last element in the WASP array (accumulative drought).
                    WASP_tmp[-1]=WASP_tmp[-1] + WASP_last
                WASP_global.append(WASP_last)
            else:
                # check if not drought month - do not calculate WASP
                first_true=0
        WASP.append(np.array(WASP_tmp))
        
    dH = []
    WASP = np.array(WASP, dtype=object)

     # calculate global median deficit severity - 
      # set drought hazard (dH) as the probability of exceeding the global median water deficit.

    median_global_wasp = np.nanmedian(WASP_global)
      
    # calculate dH per region i
    for i in range(WASP.shape[0]):
        # The more negative the WASP index, the more severe is the deficit event, so 
        # probability of exceedence the severity is 1 - np.nansum(WASP[i] >= median_global_wasp) / len(WASP[i])
        dH.append(round(1 - np.nansum(WASP[i] >= median_global_wasp) / len(WASP[i]), 3))
        
    output['hazard_raw'] = dH
    print('>>>>> Drought hazard is completed.')
        
    # exposure
    evaluateDEA = False
    print("Analyzing drought exposure. This process may take few minutes...")
    print('\n')
    exposure = pd.read_csv(os.path.join(workflow_folder, "drought_exposure_{}.csv".format(pattern)))
    # take out country statistics for stretching
    # np.array (0: min, 1: max; NUTS_ID..+variables)
    exposure = exposure.query('NUTS_ID.str.contains(@ccode)') # see how to use ^ to only use the beginning
    cnt_range = pd.Series(index=['min','max'],data=[exposure.min(),exposure.max()]) 
    exposure = exposure.query('NUTS_ID in @regions')
    # Normalize the exposure using a min-max strech.
    cols = exposure.columns[1:]

    for varname in cols:
        # save maximum and minimum values
        mx_exposure = cnt_range[1][varname]#np.nanmax(exposure[varname])
        mn_exposure = cnt_range[0][varname]#np.nanmin(exposure[varname])

        # stretch values between 0 -1
        exposure.loc[:, varname] = np.maximum((exposure.loc[:, varname] - mn_exposure)/(mx_exposure - mn_exposure), 0.01)


    # load exposure and sort to match nuts['NUTS_ID'] order
    sorterIndex = dict(zip(nuts['NUTS_ID'], range(len(nuts['NUTS_ID']))))
    exposure['sort_col'] = exposure['NUTS_ID'].map(sorterIndex)
    exposure.sort_values(['sort_col'],
            ascending = [True], inplace = True)
    exposure = exposure.drop(columns='sort_col')

    # show data

    print('Input exposure data (top 3 rows): ')
    print(exposure.head(3))
    print('\n')
    # set DEA(loud = True) to print optimization status/details
    dea_e = DEA(np.array([1.] * len(regions)).reshape(len(regions),1),\
        exposure.to_numpy()[:,1:],\
     loud = False)  # we use a dummy factor for the input
    dea_e.name_units(regions)

    # returns a list with regional efficiencies
    dE = dea_e.fit()
    if evaluateDEA:
        dEmax = exposure.iloc[:,1:].max(axis = 1)
        print("plot max vs DEA:")
        fig = px.scatter(x=list(dEmax), y=dE,\
                         title = 'Evaluate exposure\'s DEA',\
                        labels={
                             "x": "Maximum exposure",
                             "y": "DEA"
                         })
        fig.show()

    output['exposure_raw'] = dE
    print('>>>>> Drought exposure is completed.')

    # vulnerability

    print("Analyzing drought vulnerability. This process may take few minutes...")
    print('\n')
    vulnerability = pd.read_csv(os.path.join(workflow_folder, "drought_vulnerability_{}.csv".format(pattern)))

    # take out country statistics for stretching
    # np.array (0: min, 1: max; NUTS_ID..+variables)
    vulnerability = vulnerability.query('NUTS_ID.str.contains(@ccode)') # see how to use ^ to only use the beginning
    cnt_range = pd.Series(index=['min','max'],data=[vulnerability.min(),vulnerability.max()]) 


    vulnerability = vulnerability.query('NUTS_ID in @regions')


    cols = vulnerability.columns[1:]

    print("Define correlation's directions for the following indicators: ", list(cols))


    # Pre-define the correlation's direction between exposure and drought risk
    # The example shows that: 
        # corellation of the rural population share with vulnerability is positive (True, below), i.e., 
         # rural regions are more vulneravle to droughts
        # correlation of the gdp/capitawith vulnerability is negative (False, below)

    corelDirection = [True, False] 

    # get vulnebrability factors, e.g., Social, Economic, Infrast
    def sclt(x): 
        return(x[0])
    factorsString = list(cols.str.split('_').map(sclt).drop_duplicates())

    # Normalize the exposure using a min-max strech.


    for varname in cols:
        # save maximum and minimum values
        mx_vulnerability = cnt_range[1][varname]#np.nanmax(vulnerability[varname])
        mn_vulnerability = cnt_range[0][varname]#np.nanmin(vulnerability[varname])

        # stretch values between 0 -1
        if corelDirection[list(cols.values).index(varname)]:
            # positive correlation between vulnerability indicator and vulnerability
            vulnerability.loc[:, varname] = np.maximum((vulnerability.loc[:, varname] - mn_vulnerability)/(mx_vulnerability - mn_vulnerability), 0.01)
        else:
            # negative correlation between vulnerability indicator and vulnerability
            vulnerability.loc[:, varname] = np.maximum(1 - (vulnerability.loc[:, varname] - mn_vulnerability)/(mx_vulnerability - mn_vulnerability), 0.01)


    # load exposure and sort to match nuts['NUTS_ID'] order
    sorterIndex = dict(zip(nuts['NUTS_ID'], range(len(nuts['NUTS_ID']))))
    vulnerability['sort_col'] = vulnerability['NUTS_ID'].map(sorterIndex)
    vulnerability.sort_values(['sort_col'],
            ascending = [True], inplace = True)
    vulnerability = vulnerability.drop(columns='sort_col')

    # filter the data based on the regions
    row_subset = np.isin(vulnerability['NUTS_ID'], regions)
    vulnerability = vulnerability.loc[row_subset, :]

    #show the data
    print('Input vulnerability data (top 3 rows): ')
    print(vulnerability.head(3))

    print('\n')

    # summarise and write file
    #calculate dV
    #this is done in a two step process including a DEA; for more info see XX
    d_v = []   

    for fac_ in factorsString:
        #for each factor category, i.e. economy, social or infrastructure, do the following:
        d_v_max = []
        print(">>>>> Analyzing the '" + fac_ + "' factors")
        #select the indicators for each factor category
        factor_subset = vulnerability.loc[:, vulnerability.columns.str.contains(fac_)]
        dea_v = DEA(np.array([1.] * len(regions)).reshape(len(regions),1),\
                    factor_subset.to_numpy()[:, 1:],\
              loud = False)
        dea_v.name_units(regions)
        #d_v_max.append()
        d_v_last = dea_v.fit()  
        d_v.append(d_v_last)
        if evaluateDEA:
            dVmax = factor_subset.iloc[:,1:].max(axis = 1)
            print("plot max vs DEA:")
            fig = px.scatter(x=list(dVmax), y=d_v_last,\
                             title = 'Evaluate vulnerabiliy\'s DEA ({})'.format(fac_),\
                        labels={
                             "x": "Maximum vulnerabiliy",
                             "y": "DEA"
                         })
            fig.show()

    # returns three lists with regional efficiencies for each factor
    d_v = np.array(d_v).reshape(len(factorsString), len(regions))

    #calculate dV
    dV = np.nanmean(d_v, axis = 0)
    output['vulnerability_raw'] = dV

    print('>>>>> Drought vulnerability is completed.')

    # Risk = Hazard * Exposure * Vulnerability

    R = []

    for i in range(0, len(regions)):
            R_last = round(dH[i] * dE[i] * dV[i], 3)
            R.append(R_last)

    output['risk_raw'] = R

    # categorized risk and merge results with the spatial data

    output['risk_cat'] = [(int(np.ceil(x * 5))) for x in output['risk_raw']]
    # keep index

    nuts = nuts.merge(output, on='NUTS_ID')
    nuts_idx = nuts['NUTS_ID']
    nuts = nuts.set_index(nuts_idx)
    output.to_csv(os.path.join(workflow_folder, 'outputs', 'droughtrisk_{}_{}.csv'.format(ccode, pattern)))


### Define working environment and global parameters
This workflow relies on pre-proceessed data. The user will define the path to the data folder and the code below would create a folder for outputs.


In [29]:
# Set working environment

workflow_folder = './sample_data_nuts3/'

# Define scenario 0: historic; 1: SSP1-2.6; 2: SSP3-7.0. 3: SSP5-8.5

scn = 1

# Define time (applicible only for the future): 0: near-future (2050); 1: far-future (2080)
time = 0


    

# debug if folder does not exist - issue an error to check path

# create outputs folder
if not os.path.exists(os.path.join(workflow_folder, 'outputs')):
    os.makedirs(os.path.join(workflow_folder, 'outputs'))

Load NUTS3 spatial data and define regions of interest

In [30]:
# load nuts3 spatial data
print('Load NUTS3 map with three sample regions')
json_nuts_path = 'https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/NUTS_RG_01M_2021_4326_LEVL_3.geojson'
nuts_ = load_nuts_json(json_nuts_path)
pth = os.path.join(workflow_folder, 'outputs')
# set country = 0 to map all Europe
#nuts['NUTS_ID2'] = nuts['NUTS_ID'].str.slice(0,4)

#print("Choose country code from: ", nuts['CNTR_CODE'].unique())


Load NUTS3 map with three sample regions


In [33]:

for ccode in ['HR', 'ES', 'FR', 'PT' ,'EL', 'RS']:
    
    # validate country selection and subset regions
    if not nuts_['CNTR_CODE'].str.contains(ccode).any:
        print("Country code: ", ccode, " is not valid; please choose a valid country code.")
    else:
        nuts = nuts_.query('CNTR_CODE in @ccode')
    
    
    for scn in range(3):
        if scn > 0:
            for time in range(2):
                pattern_h = ['ssp126', 'ssp370', 'ssp585'][scn - 1]
                pattern = ['ssp126', 'ssp370', 'ssp585'][scn - 1] + '_' + ['nf', 'ff'][time]
                print("#############", ccode, "_", pattern, "_", time, "#############", "/n")
                saveResults(nuts = nuts, pattern = pattern, pattern_h = pattern_h, pth = pth)
        else:
            pattern = "historic"
            pattern_h = "historic"
            print("#############", ccode, "_", scn, "#############", "/n")
            saveResults(nuts = nuts, pattern = pattern, pattern_h = pattern_h, pth = pth)
        

############# HR _ 0 ############# /n
Analyzing drought hazard. This process may take few minutes...


The following regions are dropped due to missing data: []


Input precipitation data (top 3 rows): 
      timing     HR022     HR031     HR032     HR033     HR034     HR035  \
0 1901-01-31  0.000090  0.000229  0.000305  0.000162  0.000177  0.000238   
1 1901-02-28  0.000066  0.000296  0.000361  0.000195  0.000265  0.000414   
2 1901-03-31  0.000155  0.000580  0.000526  0.000250  0.000312  0.000421   

      HR036     HR037     HR050  ...     HR062     HR063     HR061     HR021  \
0  0.000053  0.000274  0.000052  ...  0.000086  0.000125  0.000040  0.000073   
1  0.000089  0.000463  0.000052  ...  0.000085  0.000101  0.000035  0.000062   
2  0.000246  0.000886  0.000075  ...  0.000155  0.000210  0.000075  0.000124   

      HR023     HR024     HR025     HR026     HR027     HR028  
0  0.000058  0.000063  0.000101  0.000052  0.000214  0.000151  
1  0.000048  0.000076  0.000094  0.000090  

>>>>> Drought hazard is completed.
Analyzing drought exposure. This process may take few minutes...


Input exposure data (top 3 rows): 
    NUTS_ID  cropland  waterstress  population
815   HR064  1.000000     0.901641    1.000000
799   HR024  1.000000     0.433995    0.118136
797   HR022  0.879762     0.093362    0.029965


>>>>> Drought exposure is completed.
Analyzing drought vulnerability. This process may take few minutes...


Define correlation's directions for the following indicators:  ['Overall_gdpcap', 'Overall_rural']
Input vulnerability data (top 3 rows): 
    NUTS_ID  Overall_gdpcap  Overall_rural
815   HR064        0.479344       1.000000
799   HR024        0.394371       0.254806
797   HR022        0.160596       0.010000


>>>>> Analyzing the 'Overall' factors
>>>>> Drought vulnerability is completed.
############# ES _ 0 ############# /n
Analyzing drought hazard. This process may take few minutes...


The following regions are dropped due to missing data: ['ES630', 'ES

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exposure.loc[:, varname] = np.maximum((exposure.loc[:, varname] - mn_exposure)/(mx_exposure - mn_exposure), 0.01)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exposure['sort_col'] = exposure['NUTS_ID'].map(sorterIndex)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exposure.sort_values(['sort_col'],


Input exposure data (top 3 rows): 
    NUTS_ID  cropland  livestock  population  waterstress
661   ES512  0.151682   0.423381    0.010000     0.085001
668   ES532  0.215733   0.089504    0.012415     1.000000
669   ES533  0.181331   0.042074    0.010000     1.000000


>>>>> Drought exposure is completed.
Analyzing drought vulnerability. This process may take few minutes...


Define correlation's directions for the following indicators:  ['overall_ruralshr', 'overall_gdpcap']


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  vulnerability.loc[:, varname] = np.maximum((vulnerability.loc[:, varname] - mn_vulnerability)/(mx_vulnerability - mn_vulnerability), 0.01)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  vulnerability.loc[:, varname] = np.maximum(1 - (vulnerability.loc[:, varname] - mn_vulnerability)/(mx_vulnerability - mn_vulnerability), 0.01)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/

Input vulnerability data (top 3 rows): 
    NUTS_ID  overall_ruralshr  overall_gdpcap
661   ES512          0.298517        0.148413
668   ES532          0.234903        0.229571
669   ES533          0.199753        0.268547


>>>>> Analyzing the 'overall' factors
>>>>> Drought vulnerability is completed.
############# ES _ ssp126_nf _ 0 ############# /n
Analyzing drought hazard. This process may take few minutes...


The following regions are dropped due to missing data: ['ES630', 'ES640', 'ES706', 'ES707', 'ES708', 'ES709', 'ES630', 'ES640', 'ES703', 'ES704', 'ES705']


Input precipitation data (top 3 rows): 
      timing     ES111     ES112     ES113     ES114     ES120     ES130  \
0 2020-01-31  0.000533  0.000505  0.000442  0.000404  0.000456  0.000269   
1 2020-02-29  0.000559  0.000511  0.000377  0.000361  0.000621  0.000416   
2 2020-03-31  0.000633  0.000585  0.000491  0.000458  0.000608  0.000379   

      ES211     ES212     ES213  ...     ES533     ES611     ES612     ES613 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exposure.loc[:, varname] = np.maximum((exposure.loc[:, varname] - mn_exposure)/(mx_exposure - mn_exposure), 0.01)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exposure['sort_col'] = exposure['NUTS_ID'].map(sorterIndex)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exposure.sort_values(['sort_col'],


Input exposure data (top 3 rows): 
    NUTS_ID  cropland  waterstress  population
661   ES512  0.449394     0.138395    0.135409
668   ES532  0.745475     1.000000    0.154784
669   ES533  0.825533     1.000000    0.010718


>>>>> Drought exposure is completed.
Analyzing drought vulnerability. This process may take few minutes...


Define correlation's directions for the following indicators:  ['Overall_gdpcap', 'Overall_rural']


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  vulnerability.loc[:, varname] = np.maximum((vulnerability.loc[:, varname] - mn_vulnerability)/(mx_vulnerability - mn_vulnerability), 0.01)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  vulnerability.loc[:, varname] = np.maximum(1 - (vulnerability.loc[:, varname] - mn_vulnerability)/(mx_vulnerability - mn_vulnerability), 0.01)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/

Input vulnerability data (top 3 rows): 
    NUTS_ID  Overall_gdpcap  Overall_rural
661   ES512        0.010000       0.907720
668   ES532        0.055110       0.984184
669   ES533        0.080792       0.943167


>>>>> Analyzing the 'Overall' factors
>>>>> Drought vulnerability is completed.
############# ES _ ssp126_ff _ 1 ############# /n
Analyzing drought hazard. This process may take few minutes...


The following regions are dropped due to missing data: ['ES630', 'ES640', 'ES706', 'ES707', 'ES708', 'ES709', 'ES630', 'ES640', 'ES703', 'ES704', 'ES705']


Input precipitation data (top 3 rows): 
      timing     ES111     ES112     ES113     ES114     ES120     ES130  \
0 2020-01-31  0.000533  0.000505  0.000442  0.000404  0.000456  0.000269   
1 2020-02-29  0.000559  0.000511  0.000377  0.000361  0.000621  0.000416   
2 2020-03-31  0.000633  0.000585  0.000491  0.000458  0.000608  0.000379   

      ES211     ES212     ES213  ...     ES533     ES611     ES612     ES613  \
0  0.0001

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exposure.loc[:, varname] = np.maximum((exposure.loc[:, varname] - mn_exposure)/(mx_exposure - mn_exposure), 0.01)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exposure['sort_col'] = exposure['NUTS_ID'].map(sorterIndex)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exposure.sort_values(['sort_col'],


Input exposure data (top 3 rows): 
    NUTS_ID  cropland  waterstress  population
661   ES512  0.445612     0.207934    0.149928
668   ES532  0.717253     1.000000    0.171346
669   ES533  0.673016     1.000000    0.012858


>>>>> Drought exposure is completed.
Analyzing drought vulnerability. This process may take few minutes...


Define correlation's directions for the following indicators:  ['Overall_gdpcap', 'Overall_rural']


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  vulnerability.loc[:, varname] = np.maximum((vulnerability.loc[:, varname] - mn_vulnerability)/(mx_vulnerability - mn_vulnerability), 0.01)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  vulnerability.loc[:, varname] = np.maximum(1 - (vulnerability.loc[:, varname] - mn_vulnerability)/(mx_vulnerability - mn_vulnerability), 0.01)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/

Input vulnerability data (top 3 rows): 
    NUTS_ID  Overall_gdpcap  Overall_rural
661   ES512        0.010000       0.924957
668   ES532        0.044086       0.988052
669   ES533        0.064590       0.946389


>>>>> Analyzing the 'Overall' factors
>>>>> Drought vulnerability is completed.
############# ES _ ssp370_nf _ 0 ############# /n
Analyzing drought hazard. This process may take few minutes...


The following regions are dropped due to missing data: ['ES630', 'ES640', 'ES706', 'ES707', 'ES708', 'ES709', 'ES630', 'ES640', 'ES703', 'ES704', 'ES705']


Input precipitation data (top 3 rows): 
      timing     ES111     ES112     ES113     ES114     ES120     ES130  \
0 2020-01-31  0.000631  0.000568  0.000516  0.000485  0.000425  0.000220   
1 2020-02-29  0.000541  0.000539  0.000459  0.000400  0.000522  0.000308   
2 2020-03-31  0.000260  0.000247  0.000207  0.000181  0.000335  0.000286   

      ES211     ES212     ES213  ...     ES533     ES611     ES612     ES613  \
0  0.0001

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exposure.loc[:, varname] = np.maximum((exposure.loc[:, varname] - mn_exposure)/(mx_exposure - mn_exposure), 0.01)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exposure['sort_col'] = exposure['NUTS_ID'].map(sorterIndex)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exposure.sort_values(['sort_col'],


Input exposure data (top 3 rows): 
    NUTS_ID  cropland  waterstress  population
661   ES512  0.395239     0.161643    0.093960
668   ES532  0.748188     0.932030    0.106368
669   ES533  0.811592     0.932030    0.010000


>>>>> Drought exposure is completed.
Analyzing drought vulnerability. This process may take few minutes...


Define correlation's directions for the following indicators:  ['Overall_gdpcap', 'Overall_rural']


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  vulnerability.loc[:, varname] = np.maximum((vulnerability.loc[:, varname] - mn_vulnerability)/(mx_vulnerability - mn_vulnerability), 0.01)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  vulnerability.loc[:, varname] = np.maximum(1 - (vulnerability.loc[:, varname] - mn_vulnerability)/(mx_vulnerability - mn_vulnerability), 0.01)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/

Input vulnerability data (top 3 rows): 
    NUTS_ID  Overall_gdpcap  Overall_rural
661   ES512        0.011080       0.845643
668   ES532        0.079866       0.879451
669   ES533        0.117977       0.834161


>>>>> Analyzing the 'Overall' factors
>>>>> Drought vulnerability is completed.
############# ES _ ssp370_ff _ 1 ############# /n
Analyzing drought hazard. This process may take few minutes...


The following regions are dropped due to missing data: ['ES630', 'ES640', 'ES706', 'ES707', 'ES708', 'ES709', 'ES630', 'ES640', 'ES703', 'ES704', 'ES705']


Input precipitation data (top 3 rows): 
      timing     ES111     ES112     ES113     ES114     ES120     ES130  \
0 2020-01-31  0.000631  0.000568  0.000516  0.000485  0.000425  0.000220   
1 2020-02-29  0.000541  0.000539  0.000459  0.000400  0.000522  0.000308   
2 2020-03-31  0.000260  0.000247  0.000207  0.000181  0.000335  0.000286   

      ES211     ES212     ES213  ...     ES533     ES611     ES612     ES613  \
0  0.0001

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exposure.loc[:, varname] = np.maximum((exposure.loc[:, varname] - mn_exposure)/(mx_exposure - mn_exposure), 0.01)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exposure['sort_col'] = exposure['NUTS_ID'].map(sorterIndex)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exposure.sort_values(['sort_col'],


Input exposure data (top 3 rows): 
    NUTS_ID  cropland  waterstress  population
661   ES512  0.289031     0.151932    0.069398
668   ES532  0.733497     1.000000    0.077948
669   ES533  0.805847     1.000000    0.010000


>>>>> Drought exposure is completed.
Analyzing drought vulnerability. This process may take few minutes...


Define correlation's directions for the following indicators:  ['Overall_gdpcap', 'Overall_rural']


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  vulnerability.loc[:, varname] = np.maximum((vulnerability.loc[:, varname] - mn_vulnerability)/(mx_vulnerability - mn_vulnerability), 0.01)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  vulnerability.loc[:, varname] = np.maximum(1 - (vulnerability.loc[:, varname] - mn_vulnerability)/(mx_vulnerability - mn_vulnerability), 0.01)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/

Input vulnerability data (top 3 rows): 
    NUTS_ID  Overall_gdpcap  Overall_rural
661   ES512        0.010545       0.858039
668   ES532        0.076696       0.887117
669   ES533        0.113304       0.842419


>>>>> Analyzing the 'Overall' factors
>>>>> Drought vulnerability is completed.
############# FR _ 0 ############# /n
Analyzing drought hazard. This process may take few minutes...


The following regions are dropped due to missing data: ['FRY30', 'FRY20', 'FRY40', 'FRY10', 'FRY50']


Input precipitation data (top 3 rows): 
      timing     FR101     FR102     FR103     FR104     FR105     FR106  \
0 1901-01-31  0.000015  0.000161  0.000078  0.000054  0.000015  0.000015   
1 1901-02-28  0.000012  0.000110  0.000056  0.000039  0.000012  0.000012   
2 1901-03-31  0.000029  0.000280  0.000161  0.000115  0.000029  0.000029   

      FR107     FR108     FRB01  ...     FRF12     FRF21     FRF22     FRF23  \
0  0.000015  0.000080  0.000157  ...  0.000238  0.000165  0.000213  0.00021

>>>>> Drought vulnerability is completed.
############# FR _ ssp370_ff _ 1 ############# /n
Analyzing drought hazard. This process may take few minutes...


The following regions are dropped due to missing data: ['FRY30', 'FRY20', 'FRY40', 'FRY10', 'FRY50']


Input precipitation data (top 3 rows): 
      timing     FR101     FR102     FR103     FR104     FR105     FR106  \
0 2020-01-31  0.000020  0.000181  0.000129  0.000076  0.000020  0.000020   
1 2020-02-29  0.000022  0.000222  0.000134  0.000087  0.000022  0.000022   
2 2020-03-31  0.000020  0.000188  0.000121  0.000079  0.000020  0.000020   

      FR107     FR108     FRB01  ...     FRK27     FRK28     FRL01     FRL02  \
0  0.000020  0.000114  0.000162  ...  0.000212  0.000205  0.000218  0.000215   
1  0.000022  0.000118  0.000258  ...  0.000329  0.000365  0.000200  0.000252   
2  0.000020  0.000104  0.000242  ...  0.000274  0.000296  0.000186  0.000212   

      FRL03     FRL04     FRL05     FRL06     FRM01     FRM02  
0  0.00019

>>>>> Drought hazard is completed.
Analyzing drought exposure. This process may take few minutes...


Input exposure data (top 3 rows): 
     NUTS_ID  cropland  waterstress  population
1106   PT112  0.168023     0.149455    0.505135
1107   PT119  0.137650     0.135376    0.543950
1108   PT11A  0.098633     0.197958    0.806341


>>>>> Drought exposure is completed.
Analyzing drought vulnerability. This process may take few minutes...


Define correlation's directions for the following indicators:  ['Overall_gdpcap', 'Overall_rural']
Input vulnerability data (top 3 rows): 
     NUTS_ID  Overall_gdpcap  Overall_rural
1106   PT112        0.243236       0.694923
1107   PT119        0.108326       0.701610
1108   PT11A        1.000000       0.895098


>>>>> Analyzing the 'Overall' factors
>>>>> Drought vulnerability is completed.
############# PT _ ssp370_ff _ 1 ############# /n
Analyzing drought hazard. This process may take few minutes...


The following regions are dropped due to missing

>>>>> Drought exposure is completed.
Analyzing drought vulnerability. This process may take few minutes...


Define correlation's directions for the following indicators:  ['Overall_gdpcap', 'Overall_rural']
Input vulnerability data (top 3 rows): 
    NUTS_ID  Overall_gdpcap  Overall_rural
616   EL623        0.013491       0.010000
617   EL624        0.035000       0.099068
598   EL521        0.010000       0.904644


>>>>> Analyzing the 'Overall' factors
>>>>> Drought vulnerability is completed.
############# EL _ ssp370_nf _ 0 ############# /n
Analyzing drought hazard. This process may take few minutes...


The following regions are dropped due to missing data: []


Input precipitation data (top 3 rows): 
      timing     EL301     EL302     EL303     EL304     EL305     EL306  \
0 2020-01-31  0.000017  0.000017  0.000017  0.000017  0.000144  0.000082   
1 2020-02-29  0.000011  0.000011  0.000011  0.000011  0.000101  0.000056   
2 2020-03-31  0.000015  0.000015  0.000015  0.000015  0

>>>>> Drought vulnerability is completed.
############# RS _ ssp126_ff _ 1 ############# /n
Analyzing drought hazard. This process may take few minutes...


The following regions are dropped due to missing data: []


Input precipitation data (top 3 rows): 
      timing     RS110     RS121     RS122     RS123     RS124     RS125  \
0 2020-01-31  0.000059  0.000052  0.000113  0.000043  0.000036  0.000029   
1 2020-02-29  0.000071  0.000085  0.000145  0.000070  0.000068  0.000060   
2 2020-03-31  0.000096  0.000125  0.000168  0.000099  0.000082  0.000064   

      RS126     RS127     RS211  ...     RS218     RS221     RS222     RS223  \
0  0.000063  0.000074  0.000228  ...  0.000077  0.000121  0.000114  0.000106   
1  0.000091  0.000089  0.000307  ...  0.000100  0.000164  0.000150  0.000157   
2  0.000118  0.000147  0.000300  ...  0.000109  0.000154  0.000139  0.000144   

      RS224     RS225     RS226     RS227     RS228     RS229  
0  0.000081  0.000055  0.000077  0.000059  0.000077  

### Loading precipitation data
Precipitation data is provided in the format required at the NUTS3 level for EU countries in the sample_data folder: file "drought_hazard.csv". See XX for details on how to prepare the input data for the hazard assessment.

## Contributors
The workflow has beend developed by [Silvia Artuso](https://iiasa.ac.at/staff/silvia-artuso) and [Dor Fridman](https://iiasa.ac.at/staff/dor-fridman) from [IIASA's Water Security Research Group](https://iiasa.ac.at/programs/biodiversity-and-natural-resources-bnr/water-security), and supported by [Michaela Bachmann](https://iiasa.ac.at/staff/michaela-bachmann) from [IIASA's Systemic Risk and Reslience Research Group](https://iiasa.ac.at/programs/advancing-systems-analysis-asa/systemic-risk-and-resilience).

## References

[1] Zargar, A., Sadiq, R., Naser, B., & Khan, F. I. (2011). A review of drought indices. *Environmental Reviews*, 19: 333-349.

[2] Carrão, H., Naumann, G., & Barbosa, P. (2016). Mapping global patterns of drought risk: An empirical framework based on sub-national estimates of hazard, exposure and vulnerability. *Global Environmental Change*, 39, 108-124.

[3] Lyon, B., & Barnston, A. G. (2005). ENSO and the spatial extent of interannual precipitation extremes in tropical land areas. *Journal of climate*, 18(23), 5095-5109.

[4] Carrão, H., Singleton, A., Naumann, G., Barbosa, P., & Vogt, J. V. (2014). An optimized system for the classification of meteorological drought intensity with applications in drought frequency analysis. *Journal of Applied Meteorology and Climatology*, 53(8), 1943-1960.

[5] Sherman, H. D., & Zhu, J. (2006). Service productivity management: Improving service performance using data envelopment analysis (DEA). Springer science & business media.

[6] Carrão, H., Naumann, G. & Barbosa, P. (2018). Global projections of drought hazard in a warming climate: a prime for disaster risk management. *Clim Dyn* 50: 2137–2155.
