# Developing Model Agnostic Tools 

This notebook runs through the development of two agnostic tools of interest for model selection based on a desired data metric.<br><br>
**1. Lookup Warming Level and Year Tool**: This tool illustrates the on-going development of functionality to identify either a **warming level** or a **year** of interest, based on a given scenario. This tool ties warming levels, climate scenarios, and year together for a specific location that may be of interest to stakeholders for their planning needs.<br>*Intended Application*: As a utility analyst, I want to be able to enter either a warming level or year of interest, and extract information on the model distribution at that warming level or year for my analysis needs. <br><br>
**2. WRF/LOCA2-Hybrid Simulation Explorer Tool**: This tool illustrates the on-going development of functionality to identify the WRF or LOCA2-Hybrid simulations that match a pre-selected list of statistics for a metric, namely the min, max, quartiles, and middle 10% of models. <br>*Intended Application*: As a policy-maker exploring future projects, I want to understand the landscape of WRF or LOCA2-Hybrid runs in order to utilize a range of projections in my decision-making. 

**Runtime**: With the default settings, this notebook takes approximately **1-2 minutes** to run from start to finish. Modifications to selections may increase the runtime. 

### Step 0: Setup 

In [None]:
from climakitae.explore.agnostic import (
    create_lookup_tables,
    agg_area_subset_sims, 
    agg_lat_lon_sims, 
    show_available_vars,
    get_available_units,
)
from climakitaegui.explore.agnostic import (
    create_conversion_function,
    plot_LOCA,
    plot_WRF,
    plot_climate_response_WRF,
    plot_climate_response_LOCA,
)
import numpy as np

## Tool 1: Lookup Warming Level and Year

This tool is designed to provide critical information on the connections between global warming levels, scenario, and timing amongst model simulations. We utilize SSP3-7.0 here as the climate scenario. The handy `find_warm_level_or_time` function will return either the `warming_level` or `year` of interest based on the inputs and is completely flexible for input needs. Warming levels are constrained to 1.5°C, 2.0°C, and 3°C, as only a portion of model simulations do not reach 4°C warming to provide statistical confidence in the results. 

### Step 1: Create the model lookup tables
The `find_warm_level_or_time` function is built off of look-up tables for all simulations and their designated warming levels based on the 1850-1900 historical baseline as is standard in the global warming levels approach. First we need to generate the lookup tables that this function utilizes. 

In [None]:
lookup_tables = create_lookup_tables()
find_warm_level_or_time = create_conversion_function(lookup_tables)

### Step 2: Example usage of the Lookup Function for Identifying a Year

In the following cells, we demonstrate how to find the year and month of interest when a **warming level** is passed as input to the tool, and returns several key pieces of information based on SSP3-7.0 as the climate scenario:
- A histogram of all 80 simulations binned by when the simulation reaches the input warming level
- Median year, and the specific year-month

In [None]:
find_warm_level_or_time(warming_level='1.2')

In [None]:
find_warm_level_or_time(warming_level='2.0')

In [None]:
find_warm_level_or_time(warming_level='3.0')

### Step 3: Example usage of the Lookup Tool to find the projected warming level

In the following cells, we demonstrate how to find the warming level interest when a **year** is passed as input to the tool, and returns several key pieces of information based on SSP3-7.0 as the climate scenario:
- A histogram of all 80 simulations binned by warming level
- The major warming level nearest to the median projected warming level
- Information on the median projected warming level

In [None]:
find_warm_level_or_time(year=2043)

In [None]:
find_warm_level_or_time(year=2050)

In [None]:
find_warm_level_or_time(year=2070)

In [None]:
find_warm_level_or_time(year=2100)

## Tool 2: Simulation Explorer Tool

Now we demonstrate the on-going development of the Simulation Explorer tool. Given a specific location and pre-calculated metric of interest, this tool returns information on the statistical distribution of simulations based on the selected metric for suitability in planning design.

### Step 1: Identify parameters and location of interest

Below, we offer an example of default settings to set-up the Simulation Explorer Tool. You can also customize these parameters, including months, years, and downscaling method ("Dynamical" or "Statistical").

**Note**: If you use the default downscaling method of "Dynamical" to analyze WRF data, the timescale is set to monthly by default. This retrieves the 4 monthly-aggregated simulations and takes approximately 1-2 minutes to run. However, if you would like to look at all 8 available models, set `wrf_timescale` to "hourly"; the notebook will take much longer to run (~45 minutes) as it has to compute a much larger dataset!

If you select the "Statistical" downscaling method for analyzing LOCA2-Hybrid data, the timescale can only be monthly, because of how computationally-heavy it is to aggregate on more granular timescales.

In [None]:
# Months desired for analysis, Jan = 1
months = range(1, 13)

# Years desired for analysis, inclusive
years = (2013, 2040)

# Options are: "Dynamical" (WRF) or "Statistical" (LOCA2-Hybrid)
downscaling_method = 'Statistical'

# Options are: "monthly" (4 monthly-aggregated WRF models),  or "hourly" (8 hourly WRF models -- time intensive!)
# Ignore this line if you are just using 'Statistical' data
wrf_timescale = 'monthly'

# This shows the available variables for your inputs
show_available_vars(downscaling_method, wrf_timescale)

In [None]:
# Input desired variable
variable = 'Maximum air temperature at 2m' # change variable if so desired HERE

# Select desired aggregation function (another option is "np.median")
agg_func = np.mean

# Select latitude and longitude range; replace with individual numbers if you're only looking for a specific lat/lon point
lat_range = (32.58, 33.20)
lon_range = (-117.125, -117.345)

# Select your desired units
print(get_available_units(variable, downscaling_method))
units = 'K' # change unit if so desired HERE

### Step 2: Run analyses

With the below function, we can look at the distribution for a gridcell at a specific lat/lon. For WRF data, the will take between 1-3 min. For LOCA2-Hybrid data, this can take up to ~5 min.

In [None]:
%%time
single_stats_gridcell, multiple_stats_gridcell, results_gridcell = agg_lat_lon_sims(lat_range, lon_range, downscaling_method, variable, agg_func, units, years, months, wrf_timescale)

With the next function below, we can look at the distribution of simulations across a selected metric for the state of California. This will take some time since it's calculating over a much larger area - hang tight!

In [None]:
%%time
area_subset = 'states' # Choose your `area_subset`
selected_area = 'CA' # Choose your `selected_area`
single_stats_area, multiple_stats_area, results_area = agg_area_subset_sims(area_subset, selected_area, downscaling_method, variable, agg_func, units, years, months, wrf_timescale)

### Step 3: Extract simulations based on specific statistics.
Below we illustrate how to retrieve simulations on the min, median, max, and the middle 10% of the distribution of simulations from the results of the above gridcell aggregation. Feel free to change the below cells from `single_stats_gridcell` to `single_stats_area` if you'd rather see the results of an area aggregation.

In [None]:
min_sim = single_stats_gridcell['min']
# min_sim = single_stats_area['min']
min_sim

In [None]:
med_sim = single_stats_gridcell['median']
# med_sim = single_stats_area['median']
med_sim

In [None]:
max_sim = single_stats_gridcell['max']
# max_sim = single_stats_area['max']
max_sim

In [None]:
# Finding statistics that return multiple simulations
mid_10 = multiple_stats_gridcell['middle 10%']
# mid_10 = multiple_stats_area['middle 10%']
mid_10

### Step 4: Visualize the distribution of results

Here, you can view some initial distributions of your results through bar plots and scatter plots.

If we were interested in plotting two aggregations against each other, we can compute a different metric over the same area to see how the models quantitatively differ across two variables. We will use `agg_lat_lon_sims` again to aggregate the simulations across gridcells, but once again, if you're doing an analysis over an area instead, feel free to change `agg_lat_lon_sims` to `agg_area_subset_sims`.

If you're not interested in a second variable, you can also just skip the following cell.

In [None]:
%%time
variable2 = 'Precipitation (total)'
units2 = 'mm'
single_stats_gridcell2, multiple_stats_gridcell2, results_gridcell2 = agg_lat_lon_sims(lat_range, lon_range, downscaling_method, variable2, agg_func, units2, years, months, wrf_timescale)

## alternative version, if you are using an aggregated area instead of a single gridcell selection
# single_stats_area2, multiple_stats_area2, results_area2 = agg_area_subset_sims(area_subset, selected_area, downscaling_method, variable2, agg_func, units2, years, months, wrf_timescale)

Below, replace `results_gridcell` with `results_area` and `single_stats_gridcell` with `single_stats_area` if you ran your analysis on a gridcell vs. on a selected area.

In [None]:
# Plotting distribution of simulations based on if your downscaling method was 'Dynamical' (WRF) or 'Statistical' (LOCA2-Hybrid).
if downscaling_method == 'Dynamical':
    plot_WRF(results_gridcell, agg_func, years)
elif downscaling_method == 'Statistical':
    plot_LOCA(results_gridcell, agg_func, years, single_stats_gridcell)

If you calculated the second variable, you can view both aggregated variables on a scatterplot:

In [None]:
# Plotting 2 climate metrics against each other based on if your downscaling method was 'Dynamical' (WRF) or 'Statistical' (LOCA2-Hybrid).
if downscaling_method == 'Dynamical':
    plot = plot_climate_response_WRF(results_gridcell, results_gridcell2)
elif downscaling_method == 'Statistical':
    plot = plot_climate_response_LOCA(results_gridcell, results_gridcell2)
    
display(plot)