# DFU Notebook 4: Working with hourly projections data 
Exploring how hourly projections data can be used as inputs into DFU hourly models 

## Step 0: Setup 
Import the climakitae library and any other required packages.

In [None]:
import climakitae as ck
from climakitae.cluster import Cluster
from climakitae.utils import get_closest_gridcell
import pandas as pd
import xarray as xr
from xclim.core.calendar import convert_calendar
from xclim.sdba.adjustment import QuantileDeltaMapping

Initialize a [climakitae.Application](https://climakitae.readthedocs.io/en/latest/generated/climakitae.Application.html) object. 

In [None]:
app = ck.Application()

Additionally, get set up to make the computing go faster by executing the following cell. It will likely take several minutes to spin up! Learn more about dask and see some common troubleshooting tips on our FAQ page.

In [None]:
cluster = Cluster()
cluster.adapt(minimum=0, maximum=8)
client = cluster.get_client()
cluster

# Part 1: Monthly extremes

### 1a) Retrieve catalog data using a configuration csv file
We can easily use the climakitae helper function `retrieve_from_csv` to use a configuration csv file to retrieve data from the AE data catalog. To modify the retrieved data, simply modify the csv file. See the [function documentation](https://climakitae.readthedocs.io/en/latest/generated/climakitae.Application.retrieve_from_csv.html#climakitae.Application.retrieve_from_csv) for more information. Because we are retrieving two data variables, the data will be returned as an [xarray Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) object. 

In [None]:
t2_daily = app.retrieve("data/config_min_max_daily_temp.csv")

### 1b) Preview the data 
You can review the retrieved data easily in the notebook. You'll see that we've retrieved daily minimum and maximum 2 meter air temperature data for SSP 3-7.0 for the time period of 1980-2050 at a grid resolution of 9km. <br><br>The daily min and max data has been pre-computed by our team using the hourly 2m Air Temperature data so that these derived variables don't need to be computed on the fly. 

In [None]:
display(t2_daily)

### 1c) Find the monthly minimum and maximum air temperature
We'll resample the daily data to monthly using [xarray's resample function](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.resample.html#xarray.DataArray.resample), then compute a minimum and maximum. We'll combine the derived monthly variables to create a new xarray Dataset object. 

In [None]:
# Resample to monthly
mon_min = t2_daily["Daily minimum air temperature at 2m"].resample(
    time="MS").min().assign_attrs({"frequency":"monthly"})
mon_max = t2_daily["Daily maximum air temperature at 2m"].resample(
    time="MS").max().assign_attrs({"frequency":"monthly"})

# Rename variable daily --> monthly 
mon_min.name = "Monthly minimum air temperature at 2m"
mon_max.name = "Monthly maximum air temperature at 2m"

# Create new combined object 
t2_monthly = xr.merge([mon_min, mon_max], combine_attrs="drop_conflicts").squeeze()

### 1d) Get data from the closest grid cell to the weather station. 
As an example - to replicate the historical observations at Sacramento Executive Airport, grab the grid cell from the model nearest to the airport.

In [None]:
stations_df = pd.read_csv("data/CEC_Forecast_Weather Stations_California.csv", index_col="STATION")
one_station = stations_df.loc["SACRAMENTO EXECUTIVE AIRPORT"]

t2_monthly_sac = get_closest_gridcell(
    data=t2_monthly,
    lat=one_station.LAT_Y,
    lon=one_station.LON_X, 
)

### 1e) Read the data into memory. 
Until this point, the data is only lazily loaded into the notebook, so this step will take several minutes. You'll notice that we've added a [Jupyter magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html), `%%time`, to the top of the cell, which will print final time it takes to perform this step once the code finishes running. 

In [None]:
%%time
t2_monthly_sac = app.load(t2_monthly_sac)

### 1f) Bias correct the data
First, we'll read in the weather station data from a netcdf file

In [None]:
obs_da = xr.open_dataset('data/station-data/KSAC_temperatures_1981-2010.nc').temperatures # Read in data
obs_da.attrs["units"] = "degF"
obs_da = convert_calendar(obs_da, "noleap") # Convert calendar to exclude leap days

Next, we'll use the station data to perform the bias correction

In [None]:
def bias_correct(obs_da, da, nquantiles=20, group="time.dayofyear", kind="+"): 
    """Perform bias correction using observational data.
    
    Parameters
    ----------
    obs_da: xr.DataArray 
        Observational dataset 
    da: xr.DataArray 
        Model data to bias correct 
    nquantiles: xr.DataArray, optional
        The number of quantiles to use
        Default to 20
    group: str, optional
         The grouping information
         Default to "time.dayofyear" 
    kind: str, optional 
         Either additive or multiplicative
         Default to additive: "+" 
    
    Returns
    -------
    xr.DataArray 
        Bias corrected input data model_da 
        
    See Also
    --------
    xclim.sdba.adjustment.QuantileDeltaMapping
    
    """
    QDM = QuantileDeltaMapping.train(
        obs_da, 
        # Input data, sliced to time period of observational data
        da.sel(time=slice(str(obs_da.time.values[0].year), str(obs_da.time.values[-1].year))), 
        nquantiles=nquantiles, 
        group=group,
        kind=kind
    )
    da_adj = QDM.adjust(da)
    da_adj.name = da.name
    return da_adj

In [None]:
# Convert calendar to exclude leap days
#t2_monthly_sac_no_leap = convert_calendar(t2_monthly_sac, "noleap")

# Bias correct each variable individually because QuantileDeltaMapping can only accept xr.DataArray as input
t2_min_bias_corrected = bias_correct(
    obs_da, 
    convert_calendar(t2_monthly_sac["Monthly minimum air temperature at 2m"], "noleap")
) 
t2_max_bias_corrected = bias_correct(
    obs_da, 
    convert_calendar(t2_monthly_sac["Monthly maximum air temperature at 2m"], "noleap")
) 

# Convert back to pandas datetime 
# This raises a warning and I'm not sure if it will cause issues down the line
t2_min_bias_corrected["time"] = t2_min_bias_corrected.indexes["time"].to_datetimeindex()
t2_max_bias_corrected["time"] = t2_max_bias_corrected.indexes["time"].to_datetimeindex()

### 1g) Plot the data

In [None]:
def interactive_lineplot(data, dynamic=True, ylim=(0,130),ylabel="Air Temperature (degF)",line_dash="solid"): 
    """Create an interactive lineplot using monthly data for each simulation in the dataset.
    Setting dynamic=False (the default) makes the plot take longer to produce upfront, but everything 
    is zippy after (the developer's personal preference). 
    Setting dynamic=True means the plot will only be generated once you change the settings.
    line_dash options: 'solid', 'dashed', 'dotted', 'dotdash', 'dashdot'
    """
    plots_all = None
    for (sim, color) in zip(data.simulation.values,['#377eb8', '#ff7f00', '#4daf4a','#f781bf']):
        plot_i = data.sel(simulation=sim).hvplot.line(
            groupby="time.month", 
            width=550, height=350, 
            label=sim,
            line_dash=line_dash,
            grid=True,
            ylabel=ylabel,
            color=color,
            ylim=ylim, # Set limits of y axis 
            dynamic=dynamic 
        )
        plots_all = plot_i if plots_all is None else plots_all*plot_i

    plots_all = plots_all.opts(legend_position='bottom') # Move legend to bottom of plot 
    return plots_all

In [None]:
# Bias corrected data
pl1 = interactive_lineplot(t2_min_bias_corrected)
pl2 = interactive_lineplot(t2_max_bias_corrected)
(pl1*pl2).opts(legend_position='bottom')

In [None]:
pl1 = interactive_lineplot(t2_monthly_sac["Monthly minimum air temperature at 2m"])
pl2 = interactive_lineplot(t2_monthly_sac["Monthly maximum air temperature at 2m"])
(pl1*pl2).opts(legend_position='bottom')

## Part 2: Diurnal trends
Find the day in each season that has the lowest minimum temperature **or** the highest maximum temperature

### 2a) Retrieve the data 
Same as we've done in Part 1, here we'll grab the data using the `retrieve_from_csv` function and get the closest gridcell to the Sacramento weatherstation.

Along with the future 30yr data, we'll also retrieve the Historical Reconstruction ERA5-WRF data from 1981-2010 as our historical baseline. We'll add this data to our plots at the end, so that it can be compared to the future period. By setting `merge` to `False` in the funtion, we're indicating that we want the two datasets returned separately, instead of merged into the same object (which would be incompatible as the datasets cover different time periods and have different dimensions) 

In [None]:
t2_hourly_fut_sim, t2_hourly_historical, t2_hourly_hist_sim = app.retrieve("data/config_hourly_2m_temp.csv", merge=False)

In [None]:
t2_hourly_future_sac = get_closest_gridcell(
    data=t2_hourly_fut_sim,
    lat=one_station.LAT_Y,
    lon=one_station.LON_X, 
)
t2_hourly_historical_sac = get_closest_gridcell(
    data=t2_hourly_hist_sim,
    lat=one_station.LAT_Y,
    lon=one_station.LON_X, 
)
t2_hourly_reconstruction_sac = get_closest_gridcell(
    data=t2_hourly_historical,
    lat=one_station.LAT_Y,
    lon=one_station.LON_X, 
)

### 2b) Read the data into memory

In [None]:
%%time
t2_hourly_future_sac = app.load(t2_hourly_future_sac)
t2_hourly_historical_sac = app.load(t2_hourly_historical_sac)
t2_hourly_reconstruction_sac = app.load(t2_hourly_reconstruction_sac)

### 2c) Extract extreme diurnal cycle from each simulation
Here is a function to return the diurnal cycle of a day in which a particular extreme occurs.

In [None]:
def get_diurnal_cycle_by_season(y, how):
    if how == 'min':
        index_value_is_reached = y.argmin().values
    elif how == 'min_daily_max':
        max_daily = y.resample(time='1D').max()
        index_value_is_reached = max_daily.argmin().values
    elif how == 'max_daily_range':
        daily_range = y.resample(time='1D').max() - y.resample(time='1D').min()
        index_value_is_reached = daily_range.argmax().values

    time_value_is_reached = y.isel(time=index_value_is_reached).time.values
    day_value_is_reached = pd.to_datetime(time_value_is_reached).date()
    diurnal_cycle = y.sel(time=slice(day_value_is_reached,pd.tseries.offsets.DateOffset(hour=23)+day_value_is_reached))
    return xr.DataArray(diurnal_cycle.squeeze().values,coords={"time_of_day":diurnal_cycle.time.dt.hour.values})

def get_diurnal_cycle(t2_hourly_one_season,how='min'):
    return t2_hourly_one_season.groupby('time.season').apply(get_diurnal_cycle_by_season,how=how)

Choose which extreme you are interested in by setting `method` to `"min"` or `"min_daily_max"` or `"max_daily_range"`. We'll find the diurnal cycle for the entire day in which this occurs.

In [None]:
method = 'max_daily_range' #'min', 'min_daily_max'
diurnal_cycle_sims_all_fut = t2_hourly_future_sac.groupby('simulation').apply(get_diurnal_cycle,how=method) 
diurnal_cycle_sims_all_hist = t2_hourly_historical_sac.groupby('simulation').apply(get_diurnal_cycle,how=method)
diurnal_cycle_reconstruct = t2_hourly_reconstruction_sac.groupby('simulation').apply(get_diurnal_cycle,how=method)
# note: could stack on scenario *and* simulation, and then groupby that combined dim if there were more than one scenario selected

### 2d) Plot the results


In [None]:
plots_future = diurnal_cycle_sims_all_fut.hvplot.line(
            x="time_of_day", 
            by="simulation",
            grid=True, 
            xlabel="Hour of Day",
            width=575, height=250,
        ) 

plots_historical = diurnal_cycle_sims_all_hist.hvplot.line(
            x="time_of_day", 
            by="simulation",
            grid=True, 
            xlabel="Hour of Day",
            width=575, height=250,
        ) 

plot_reconstruct = diurnal_cycle_reconstruct.hvplot.line(
            x="time_of_day", 
            by="simulation",
            line_dash="dashed",
            color="black",
            grid=True, 
            xlabel="Hour of Day",
            width=575, height=250,
        ) 


This now takes a moment to generate... 

Recall, also, that these data are not bias-corrected

In [None]:
plot_reconstruct * plots_historical

In [None]:
plots_future

### 2e) Observe the output data structure
The data used to generate the plots above are available in the xr.DataArray object `diurnal_data`, computed in the code cell above. Here, we'll display the data so that you can observe the dimensions.

In [None]:
display(diurnal_cycle_sims_all_fut)

### 2d) Export the results 
Choose your desired filetype (we recommend NetCDF) and export the data. We've left the actual export code, `app.export_dataset` commented out; if you want to save the file, simply remove the comment (#)

In [None]:
app.export_as()

In [None]:
#app.export_dataset(diurnal_data, file_name="diurnal_data")