# Extreme heat distributions using a Global Warming Level approach

Because warming levels are defined based on amount of global mean temperature change, they can be used to compare possible outcomes across multiple scenarios and model simulations. 

Warming levels are often used in international policy discussions, such as the [Paris Agreement](https://unfccc.int/process-and-meetings/the-paris-agreement/the-paris-agreement) to limit warming to 2˚C.

In this example, we will:

- Examine the range of possibility in regional changes in daily maximum temperature across climate models 

- Calculate a distribution for days exceeding a maximum temperature threshold
    - Contingent on a warming level
    - Using a "traditional" method, with data centered around some future year for one emissions scenario

## Step 0: Setup

Import libraries needed.

In [None]:
from climakitae.explore import warming_levels
from climakitae.explore.threshold_tools import get_exceedance_count
from climakitae.util.utils import get_closest_gridcell

Spin up some extra computing resources: essential for the highest resolution data.

In [None]:
from climakitae.util.cluster import Cluster
cluster = Cluster()
cluster.adapt(minimum=0, maximum=43)
client = cluster.get_client()
cluster

## Step 1: Explore data

Launch a toolkit to view localized projections under varying levels of warming. First, choose a variable, and spatial area of interest. 

In [None]:
wl = warming_levels()

In [None]:
wl.choose_data()

A latitude and longitude of interest:

In [None]:
my_lat, my_lon = 34.08214634521255, -117.2425643

A range that will result in the nearest gridcell being included:

In [None]:
wl.wl_params.latitude=(34.0,34.3)
wl.wl_params.longitude=(-117.5,-117.1)

And let's set a few other things for this example, in case we forget to do so above:

In [None]:
wl.wl_params.variable="Maximum air temperature at 2m"
wl.wl_params.units="degF"
wl.wl_params.timescale="daily"
wl.wl_params.downscaling_method=["Statistical"]
wl.wl_params.anom="No"

### Now retrieve and process the data
The calculate step may take a while to complete. Selecting statistical downscaling will take longer because there are more simulations to work with, and they are at the highest spatial resolution (~10min with a cluster).

In [None]:
%%time
wl.calculate()

### Next visualize the regional response at a series of global warming levels.
Use the drop down menu to visualize when a specified global warming level in reached for a scenario of interest. Scenarios shown are Shared Socioeconomic Pathways ([SSPs](https://www.sciencedirect.com/science/article/pii/S0959378016300681)): ranging from low (SSP 1-1.9) to high (SSP 5-8.5) emissions trajectories. 

To learn more about the data available on the Analytics Engine, [see our data catalog](https://analytics.cal-adapt.org/data/). 

In [None]:
wl.visualize()

The visualize step, above, is optional if you want to go directly to extracting the data.

### Extract 30yr slices of data centered at warming levels:

In [None]:
gwl_slices = wl.sliced_data
gwl_slices

Now let's limit ourselves to only the nearest gridcell to our location of interest:

In [None]:
gwl_slices = gwl_slices.sel(lat=my_lat,lon=my_lon,method='nearest')

Choose one warming level to focus on:

In [None]:
two_degrees = gwl_slices.sel(warming_level='2.0').squeeze()

Get a feel for how the data is stored, by looking at the timeseries. The 30-year slice is different for each simulation, with some that reach the warming level sooner or later that others. 

In [None]:
two_degrees.squeeze().to_pandas().plot(legend=None,figsize=[13,2])

To rearrange this data to use it in the next step. 

(This took 10 minutes by itself.

Produces a deluge of warning messages too.

You might find a better way to do this. Or maybe you can avoid having to do this. 

The only reason I did it was that otherwise all of the nan items got added into the bin for 0-days per year. But there might be a way to do the histogram-binning step that ignores nans, and that would be a better solution.)

In [None]:
%%time
import pandas as pd
def align_time(y):
    y = y.dropna(dim='time')
    # start from an arbitrary year -- won't matter which we call it
    y['time'] = pd.date_range('20000101',periods=len(y.time),freq='1D')
    return y

two_degrees_stacked = two_degrees.groupby('all_sims').apply(align_time)
two_degrees_stacked

In [None]:
two_degrees_stacked

### Next: calculate exceedences of 115˚F

We'll use a function from threshold_tools: it defaults to grouping by the year

In [None]:
threshold_value = 115
gwl_dist = get_exceedance_count(two_degrees_stacked,threshold_value)

In [None]:
n_samples = 30 * len(gwl_dist.all_sims)
print('Sample size: '+str(n_samples))

**Note:** the built-in xarray approach is from matplotlib: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html

You can also calculate this separately (with this, or another, library), and then you've got the bins and values saved and you can play around with the plot formatting as a separate issue. Should be possible to do a logy with matplotlib, and overwrite that title to be something nicer.

In [None]:
counts, bins, x = gwl_dist.plot.hist(bins=20)

**Insert here:** probability of >10 days?

Fit a distribution to do so? if so, include a confidence interval.

In [None]:
#[insert code here]

### Compare with "traditional" approach

We can grab the same data that the warming_levels tool starts from before it subsets the data: accessed as wl.catalog_data

In [None]:
time_horizon = wl.catalog_data.sel(scenario="Historical + SSP 3-7.0 -- Business as Usual")

To compare this with a traditional approach, let's select a 30-year slice centered on **2047 (the year when SSP3-7.0 reaches 2˚C** above preindustrial according to the IPCC weighted consensus projection -- see graphic above).

We'll use the same window for number of years around the time of interest.

In [None]:
my_window = wl.wl_params.window

In [None]:
time_horizon = time_horizon.sel(time=slice(str(2047-my_window),str(2047+(my_window-1))))

Again, we'll need to extract the gridcell of interest:

In [None]:
time_horizon = time_horizon.sel(lat=my_lat,lon=my_lon,method='nearest')

And count up the threshold exceedences:

In [None]:
time_period_dist = get_exceedance_count(time_horizon,threshold_value)

In [None]:
n_samples = len(time_period_dist.time) * len(time_period_dist.simulation)
print('Sample size: '+str(n_samples))

In [None]:
time_period_dist.plot.hist(bins=bins) #using same bins as the plot above

**Last thing to (potentially) add:** resample the warming-level-contingent data using a bootstrapping approach. Select a random 1770 samples from the 3420. Plot the histogram (or fit the distribution if you've been doing that above) for that subset. Do this *n* times to get an idea of how much better-sampled you really are with 2x the samples.

In [None]:
#[insert code here]