# Vulnerability Assessment Pilot
This notebook demonstrates on-going development of a vulnerability assessment support tool using climate data in the Analytics Engine. 

To execute a given 'cell' of this notebook, place the cursor in the cell and press the 'play' icon, or simply press shift+enter together. Some cells will take longer to run, and you will see a [$\ast$] to the left of the cell while AE is still working.

**Intended Application**: As a user, I want to **<span style="color:#FF0000">access climate projections data for my vulnerability assessment report</span>** by:
1. Retrieve data metrics required for planning needs

**Runtime**: With the default settings, this notebook takes approximately **several hours** to run from start to finish, depending on the metric choice. Modifications to selections may increase the runtime. 

### Step 0: Set-up

First, we'll import the Python library [climakitae](https://github.com/cal-adapt/climakitae), our AE toolkit for climate data analysis, along with this specific functions from that library that we'll use in this notebook, as well as any other necessary Python libraries to aid in analysis.

In [None]:
from climakitae.explore.vulnerability import cava_data
from climakitae.explore.vulnerability_table import create_vul_table

### Step 1: Import locations

To import your own custom locations, we recommend putting your csv file in the same folder as this notebook for ease:
1. Drag and drop a csv file into the file tree on the left hand side; or
2. Use the `upload` button (the "up arrow" symbol next to the large blue plus symbol above the file tree). 

<span style="color:#FF0000">**Formatting note**</span>: For the code cells below to work, there must be **2 columns labeled `lat` and `lon`**. Functionality to accept different labeling is forthcoming!

In the cell below, we read the csv file in. We use the HadISD station list as an example here -- you may want to replace with your own locations file!

In [None]:
# Read in dummy locations from `stations_csv` file
from climakitae.core.paths import stations_csv_path
from climakitae.util.utils import read_csv_file
example_locs = read_csv_file(stations_csv_path, index_col=0)[['LAT_Y', 'LON_X']].rename(columns={'LAT_Y': 'lat', 'LON_X': 'lon'})

### Step 2: Retrieve metric data

The `cava_data` funciontality is designed to provide flexibility over customizable metric calculation. There are 4 customizable metrics that can be built with this functionality:
1. Likely seasonal event occurence (e.g., "likely summer night low temperature")
2. 1-in-X temperature events (e.g., "1-in-10 year maximum temperature")
3. High/Extreme Heat Index events (e.g., "how many days per year does the Heat Index exceed 90°F")
4. 1-in-X precipitation events (e.g., "1-in-100 year, 24 hour precipitation")

Below is a table outlining all avaialble arguments to the `cava_data` function. The "Required" flag notes whether the argument must be passed to start generating data. Input options for each argument is provided, as well as whether a setting is required for any of the required selections. We provide multiple examples of working with the `cava_data` function with multiple configurations.

| Argument | Options | Argument required for | Notes |
|----------|---------|-----------------------|------|
|input_locations | Pass a location via csv. | All | Option to run either a single location, or multiple when **batch_mode=True**.|
|variable | "Air Temperature at 2m", "NOAA Heat Index", "Precipitation (total)"| All | |
|approach | "Time", "Warming Level" | All | |
|downscaling_method | "Dynamical", "Statistical"| All | |
|time_start_year | Numerical (min is 1981) | Required for **approach=Time**.| |
|time_end_year | Numerical (max is 2100) | Required for **approach=Time**.| |
|historical_data| "Historical Climate", "Historical Reconstruction" | Required for **approach=Time**| **Historical Climate** ranges from 1980-2015 for WRF and 1950-2015 for LOCA2-Hybrid. **Historical Reconstruction** ranges from 1950-2022. Historical Reconstruction data cannot be combined with SSP data.|
|ssp_data | "[SSP 2-4.5]", "[SSP 3-7.0]", "[SSP 5-8.5]" | Required for **approach=Time** | Dynamical only has SSP 3-7.0, Statisical has all 3 SSP options.|
|warming_level| 0.8, 1.0, 1.2, 1.5, 2.0, 2.5, 3.0 | Required for **approach=Warming Level**| Historical/Current period GWLs: 0.8, 1.0, 1.2. Future GWLs: 1.5, 2.0, 2.5, 3.0. |
|metric_calc| "max", "min" | Required for 1-in-X events, Heat Index, likely seasonal event | |
|heat_idx_threshold | Numerical | Required for Heat Index | Heat Index can only be calculated with **downscaling_method="Dynamical"**.|
|one_in_x | List or Numerical values | Required for 1-in-X events| Example: one_in_x = 10 *or* one_in_x = [10, 100]|
|event_duration| Numerical + "day"/"hour"| Optional for 1-in-X events| Must adhere to following structure: ({number}, "{temporal frequency}"). Temporal frequency options: "day", "hour". Default is (1, "day").|
|distr| "gev", "genpareto", "gumbel", "wibull", "pearson3", "gamma"| Optional for 1-in-X events |Default set to "gev".|
|percentile | Numerical (0-100) | Required for likely seasonal event |   |
|season| "summer", "winter", "all" | Required for likely seasonal event| Default set to "all".|
|units | Temp/Heat Index: "degF", "degC", "K". Precip: "mm", "inches" | Optional | Default for temp/Heat Index is DegF. Default for precip is mm.|
|wrf_bias_adjust| True, False | Optional| Option to return only the 5 bias-adjusted WRF models. Only applicable for **downscaling_method="Dynamical"**. Default set to True.|
|export_method| "raw", "calculate", "both" | Optional | Default set to "both".|
|file_format | "NetCDF", "csv" | Optional | Default set to "NetCDF".|
|separate_files | True, False | Optional | Option to export separate files if multiple points are passed. Default set to True.|
|batch_mode | True, False | Optional, but recommmended for multiple locations. | Option to efficiently run multiple points. Separate files for export is turned off in batch mode. Default set to False.|

The following cells illustrate several examples of how to retrieve and calculate various configurations of the `cava_data` function. Below is a list of the examples; and more are coming soon as more functionality is built in:
1. Likely seasonal event, single location, time approach, all WRF data, with custom percentile
2. Likely seasonal event, batch mode for multiple locations, time approach, all WRF data, with custom percentile
3. 1-in-X temperature event, single location, bias-adjusted WRF data only, time approach, with custom return period, event frequency, and distribution
4. Heat index event, single location, time approach, with custom threshold
5. Likely seasonal event, single location, warming level approach, all WRF data, wtih custom percentile
6. Likely seasonal event, single location, warming level approach, all LOCA2 data, with custom percentile
7. Heat index event, batch mode for multiple locations, time approach, Historical Reconstruction data, with custom threshold
8. Likely seasonal event, batch mode for multiple locations, warming levels approach, all LOCA2 data, with custom percentile
9. 1-in-X precipitation event, single location, all LOCA2 data, time approach, with custom return period
10. 1-in-X precipitation event, single location, all WRF data, time approach, with custom return period, event duration, and distribution
11. Multiple 1-in-X temperature events, single location, bias-adjusted WRF data only, time approach, with custom return periods
12. Example of reading the calculated metric data via xarray for easy viewing within this notebook. 

#### Example: Likely seasonal event
Example scenario: I want to calculate "likely summer day high temperature for 2030-2050, where likely is the 75th percentile, in Celsius, using all available WRF data (bias-adjusted and non-bias-adjusted), and export only the calculated metric data" for a **single location**. I would input:

In [None]:
data = cava_data(
    ## Set-up
    example_locs[:1], # select a single location        
    approach="Time",  
    time_start_year=2030, 
    time_end_year=2050,
    downscaling_method="Dynamical",  # WRF data
    wrf_bias_adjust=False, # return all WRF models
    
    ## Likely seaonal event specific arguments
    variable="Air Temperature at 2m", 
    metric_calc="max", # daily high temperature
    percentile=75, # likeliness percentile
    season="summer", # season
    units="degC", # change units
    
    ## Export
    export_method="calculate",  # export only calculated metric data
    file_format="NetCDF",
    batch_mode=True,
)

#### Example: Likely seasonal event, in batch mode for multiple locations
Example scenario: I want to calculate "likely summer day high temperature for 2030-2050, where likely is the 75th percentile, in Celsius, using all available WRF data (bias-adjusted and non-bias-adjusted) for many locations, and export only the calculated metric data" for **multiple locations**. I would input:

In [None]:
data = cava_data(
    ## Set-up
    example_locs, # no subsetting for a single location from input list
    time_start_year=2020, 
    time_end_year=2050,
    downscaling_method="Dynamical",  # WRF data
    approach="Time",  
    wrf_bias_adjust=False, # return all WRF models
    
    ## Likely seaonal event specific arguments
    variable="Air Temperature at 2m", 
    metric_calc="max", # daily high temperature
    percentile=75, # likeliness percentile
    season="summer", # season
    units="degC", # change units
    
    ## Export
    export_method="calculate",  # export only calculated metric data
    file_format="NetCDF",
    batch_mode=True, # batch mode - optimized for multiple locations
)

#### Example: 1-in-X temperature event
Example scenario: I want to calculate "1-in-10 year maximum temperature using the GEV distribution, in Fahrenheit, for 2070-2090, using only the bias-adjusted WRF data, and export both the raw and calculated metric data." I would input:

In [None]:
data = cava_data(
    ## Set-up
    example_locs.iloc[:1], # select a single location
    time_start_year=2070,
    time_end_year=2090,
    downscaling_method="Dynamical",  # WRF data 
    approach="Time",  
    wrf_bias_adjust=True, # return bias adjusted WRF models
    
    ## 1-in-X event specific arguments
    variable="Air Temperature at 2m",
    metric_calc="max", # daily maximum temperature
    one_in_x=10, # One-in-X
    distr="gev", # change distribution
    units="degF", # change units
    
    ## Export
    export_method="both",
    file_format="NetCDF",
)

#### Example: Heat Index
Example scenario: I want to calculate "the number of days per year that the Heat Index exceeds 90°F between 2030-2060, and export only the raw data". I would input:

In [None]:
data = cava_data(
    ## Set-up
    example_locs.iloc[:1], # select a single location
    time_start_year=2030,
    time_end_year=2060,
    downscaling_method="Dynamical",  # WRF data
    approach="Time",  
    
    ## Heat Index specific arguments
    variable="NOAA Heat Index", 
    metric_calc="max", # daily maximum
    heat_idx_threshold=90, # Heat Index Threshold
    units="degF", # change units
    
    ## Export
    export_method="raw",
    file_format="csv",
    batch_mode=False,
)

#### Example: Global Warming Level approach with WRF
Example scenario: I want to calculate "likely summer day high temperature with a 2°C warming level, where likely is the 50th percentile, in Celsius, using all available WRF data (bias-adjusted and non-bias-adjusted), and export both the raw and calculated metric data". I would input:

In [None]:
data = cava_data(
    ## Set-up
    example_locs.iloc[:1], # select a single location
    downscaling_method="Dynamical",  # WRF data
    approach="Warming Level",
    warming_level=2.0,
    wrf_bias_adjust=False, # return all WRF models
    
    ## Likely seasonal event specific arguments
    variable="Air Temperature at 2m", 
    metric_calc="max", # daily high temperature
    percentile=50, # likeliness percentile
    season="summer", # season
    units="degC", # change units
    
    ## Export
    export_method="both",
    file_format="NetCDF",
)

#### Example: Global Warming Level approach with LOCA2
Example scenario: I want to calculate "likely winter day high temperature with a 2°C warming level, where likely is the 60th percentile, in Celsius, using LOCA2 data, and export both the raw and calculated metric data". I would input: 

In [None]:
data = cava_data(
    ## Set-up
    example_locs.iloc[:1], # select a single location
    downscaling_method="Statistical",  # LOCA2 data
    approach="Warming Level",
    warming_level=2.0,
    
    ## Likely seasonal event specific arguments
    variable="Air Temperature at 2m", 
    metric_calc="max", # daily high temperature
    percentile=60, # likeliness percentile
    season="winter", # season
    units="degC", # change units
    
    ## Export
    export_method="both",
    file_format="NetCDF",
)

#### Example: Heat Index, in batch mode for multiple locations
Example scenario: I want to calculate "the number of days per year that the Heat Index exceeds 104°F between 1990-2010 in the Historical Reconstruction data for many locations, and export only the calculated metric data". I would input:

In [None]:
data = cava_data(
    ## Set-up
    example_locs, # no subsetting for a single location from input list
    time_start_year=1990,
    time_end_year=2010,
    historical_data="Historical Reconstruction", # selecting reconstruction data
    approach="Time",  
    
    ## Heat Index specific arguments
    variable="NOAA Heat Index", 
    metric_calc="max", # daily maximum
    heat_idx_threshold=104, # Heat Index Threshold
    units="degF", # change units
    
    ## Export
    export_method="calculate",
    file_format="NetCDF",
    batch_mode=True, # batch mode - optimized for multiple locations
)

#### Example: Likely seasonal event, in batch mode for multiple locations with LOCA2
Example scenario: I want to calculate "likely summer day high temperature for 1.5°C warming level, where likely is the 70th percentile, in Celsius, using all available LOCA2 data, and export only the calculated metric data" for **multiple locations**. I would input:

**Note:** Batch mode for LOCA2 data using the warming levels approach resets to `batch_mode = False` regardless of your setting here due to optimization constraints, but will compute all desired metrics. This will take quite some time to run (**2 locations** with warming levels with LOCA2 data takes **approx. 1 hour to run**) -- hang tight! Improvements in this space is forthcoming!

In [None]:
data = cava_data(
    ## Set-up
    example_locs, # select multiple locations
    downscaling_method="Statistical",  # LOCA data 
    approach="Warming Level",  
    warming_level=1.5, 
    
    ## Likely seasonal event specific arguments
    variable="Air Temperature at 2m",
    metric_calc="max", # daily maximum temperature
    season='summer', # change season
    percentile=70, # change percentile
    units="degC", # change units

    ## Export
    export_method="calculate",
    file_format="NetCDF",
    batch_mode=False  # batch mode - optimized for multiple locations
)

#### Example: 1-in-X precipitation event
Example scenario: I want to calculate "1-in-10 year precipitation event using the GEV distribution, in inches, for 2070-2090 with SSP 3-7.0, using LOCA2 data, and export both the raw and calculated metric data." I would input:

**Notes**: 
- For daily precipitation 1-in-X events (i.e., 24-hour), we recommend the use of LOCA2 data instead of WRF data. If looking for a non-24-hour event, WRF data must be used, as the LOCA2 data is not available at hourly time steps. 
- The goodness of fit on the distribution is provided for 1-in-X precipitation events. The p-value of the distribution fit to the data is provided during the calculation step and as an attribute in the final data object. For 1-in-X precipitation events, **we recommend the use of "gev" as the distribution** for the first distribution test. GEV allows for a continuous range of different shapes, and will reduce to either Gumbel, Weibull, or Generalized Pareto distributions under different conditions. GEV is typically a better fit than the 3 individaul distributions, and is a common distribution in hydrological applications. If the **p-value is less than 0.05**, this indicates that the **selected distribution is not a good fit for the data**, and we recommend choosing a different distribution and re-running the `cava_data` function. 
- In certain geographic regions, the selection of a high return period event (e.g., 1-in-1000) may produce unrealistically high precipitation values. This is primarily a limitation of the data sample size beyond what the data can reasonably estimate, where a small change in noise will produce a large change in the distribution tails. 

In [None]:
data = cava_data(
    ## Set-up
    example_locs.iloc[:1], # select a single location
    time_start_year=2070,
    time_end_year=2090,
    downscaling_method="Statistical",  # LOCA2 data 
    approach="Time",  
    ssp_data=["SSP 3-7.0"], # ssp selection
    
    ## 1-in-X event specific arguments
    variable="Precipitation (total)",
    metric_calc="max", # daily maximum precipitation
    one_in_x=10, # One-in-X
    distr="gev", # change distribution
    units="inches", # change units
    
    ## Export
    export_method="both",
    file_format="NetCDF",
)

#### Example: 1-in-X precipitation event with custom event duration
Example scenario: I want to calculate "1-in-100 year precipitation event for a 3-hour event using the GEV distribution, in inches, for 2070-2090 with SSP3-7.0, using only the bias-adjusted WRF data, and export both the raw and calculated metric data." I would input:

**Notes**: 
- For daily precipitation 1-in-X events (i.e., 24-hour), we recommend the use of LOCA2 data instead of WRF data. If looking for a non-24-hour event, WRF data must be used, as the LOCA2 data is not available at hourly time steps. 
- The goodness of fit on the distribution is provided for 1-in-X precipitation events. The p-value of the distribution fit to the data is provided during the calculation step and as an attribute in the final data object. For 1-in-X precipitation events, **we recommend the use of "gev" as the distribution** for the first distribution test. GEV allows for a continuous range of different shapes, and will reduce to either Gumbel, Weibull, or Generalized Pareto distributions under different conditions. GEV is typically a better fit than the 3 individaul distributions, and is a common distribution in hydrological applications. If the **p-value is less than 0.05**, this indicates that the selected distribution is not a good fit for the data, and we recommend choosing a different distribution and re-running the `cava_data` function. 
- In certain geographic regions, the selection of a high return period event (e.g., 1-in-1000) may produce unrealistically high precipitation values. This is primarily a limitation of the data sample size beyond what the data can reasonably estimate, where a small change in noise will produce a large change in the distribution tails. 

In [None]:
data = cava_data(
    ## Set-up
    example_locs.iloc[:1], # select a single location
    downscaling_method="Dynamical",  # WRF data 
    approach="Time",
    time_start_year=2070,
    time_end_year=2090,
    wrf_bias_adjust=True, # return bias-adjusted WRF models
    
    ## 1-in-X event specific arguments
    variable="Precipitation (total)",
    metric_calc="max", # daily maximum precipitation
    one_in_x=100, # One-in-X
    distr="gev", # change distribution
    event_duration = (3, 'hour'), # change event duration
    units="inches", # change units
    
    ## Export
    export_method="both",
    file_format="NetCDF",
)

#### Example: Multiple 1-in-X temperature events
Example scenario: I want to calculate "1-in-10 **and** 1-in-100 year maximum temperatures using the GEV distribution, in Fahrenheit, for 2070-2090, using only the bias-adjusted WRF data, and export both the raw and calculated metric data." I would input:

In [None]:
data = cava_data(
    ## Set-up
    example_locs.iloc[:1], # select a single location
    time_start_year=2070,
    time_end_year=2090,
    downscaling_method="Dynamical",  # WRF data 
    approach="Time",  
    wrf_bias_adjust=True, # return bias adjusted WRF models
    
    ## 1-in-X event specific arguments
    variable="Air Temperature at 2m",
    metric_calc="max", # daily maximum temperature
    one_in_x=[10, 100], # One-in-X
    distr="gev", # change distribution
    units="degF", # change units
    
    ## Export
    export_method="both",
    file_format="NetCDF",
)

#### Example: Looking at the `cava_data` output
It may be useful to look at the `cava_data` output within this notebook to assess the results and make any changes to the data request. After running the `cava_data` function, in a new cell you can type `data` to view the xarray data object. Depending on your export setting ("raw", "calculate", "both"), you can also view the data object in a more user-friendly xarray view. We provide the code to do so in the next cell -- select which option matches your `cava_data` run and the export option you would like to view! 

In [None]:
data # looking at the full xarray data object; will be a dictionary of data arrays!
# data['calc_data'] # looking at just the calculated data metric
# data['raw'] # looking at just the raw input data

---

### Appendix: Table Generation Sample Code

Below, you'll find code that generates a table with different climate data metrics used in a vulnerability assessment. Feel free to run it and check it out! It is still very much in progress. **This will take 30+ min. to run.**

In [None]:
%%time
percentile = 50
heat_idx_threshold = 80
one_in_x = 10 # currently, only can do `one_in_x` for one value at a time
df = create_vul_table(example_locs.iloc[[10]], percentile, heat_idx_threshold, one_in_x)
df