# DFU Notebook 4: Working with hourly projections data 
Exploring how hourly projections data can be used as inputs into DFU hourly models 

## Step 0: Setup 
Import the climakitae library and any other required packages.

In [None]:
import climakitae as ck
from climakitae.cluster import Cluster
from climakitae.utils import get_closest_gridcell
import pandas as pd
import xarray as xr

Initialize a [climakitae.Application](https://climakitae.readthedocs.io/en/latest/generated/climakitae.Application.html) object. 

In [None]:
app = ck.Application()

Additionally, get set up to make the computing go faster by executing the following cell. It will likely take several minutes to spin up! Learn more about dask and see some common troubleshooting tips on our FAQ page.

In [None]:
cluster = Cluster()
cluster.adapt(minimum=0, maximum=8)
client = cluster.get_client()
cluster

## Part 1: Monthly extremes
### 1a) Retrieve catalog data using a configuration csv file
We can easily use the climakitae helper function `retrieve_from_csv` to use a configuration csv file to retrieve data from the AE data catalog. To modify the retrieved data, simply modify the csv file. See the [function documentation](https://climakitae.readthedocs.io/en/latest/generated/climakitae.Application.retrieve_from_csv.html#climakitae.Application.retrieve_from_csv) for more information. Because we are retrieving two data variables, the data will be returned as an [xarray Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) object. 

In [None]:
t2_daily = app.retrieve_from_csv("data/config_min_max_daily_temp.csv")

### 1b) Preview the data 
You can review the retrieved data easily in the notebook. You'll see that we've retrieved daily minimum and maximum 2 meter air temperature data for SSP 3-7.0 for the time period of 1980-2050 at a grid resolution of 9km. <br><br>The daily min and max data has been pre-computed by our team using the hourly 2m Air Temperature data so that these derived variables don't need to be computed on the fly. 

In [None]:
display(t2_daily)

### 1c) Find the monthly minimum and maximum air temperature
We'll resample the daily data to monthly using [xarray's resample function](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.resample.html#xarray.DataArray.resample), then compute a minimum and maximum. We'll combine the derived monthly variables to create a new xarray Dataset object. 

In [None]:
# Resample to monthly
mon_min = t2_daily["Daily minimum air temperature at 2m"].resample(
    time="MS").min().assign_attrs({"frequency":"monthly"})
mon_max = t2_daily["Daily maximum air temperature at 2m"].resample(
    time="MS").max().assign_attrs({"frequency":"monthly"})

# Rename variable daily --> monthly 
mon_min.name = "Monthly minimum air temperature at 2m"
mon_max.name = "Monthly maximum air temperature at 2m"

# Create new combined object 
t2_monthly = xr.merge([mon_min, mon_max], combine_attrs="drop_conflicts")

## 1d) Get data from the closest grid cell to the weather station. 
As an example - to replicate the historical observations at Sacramento Executive Airport, grab the grid cell from the model nearest to the airport.

In [None]:
stations_df = pd.read_csv("data/CEC_Forecast_Weather Stations_California.csv", index_col="STATION")
one_station = stations_df.loc["SACRAMENTO EXECUTIVE AIRPORT"]

data_closest_gridcell = get_closest_gridcell(
    data=t2_monthly,
    lat=one_station.LAT_Y,
    lon=one_station.LON_X, 
)

## 1e) Subset the data by time period 
We'll create a separate data object for the historical baseline period, and a separate data object for the 30 year future period. The time period can be easily modified by changing the years in the code below. For example, to change the historical baseline period to 1990-2020, use the following code: `data_closest_gridcell.sel(time=slice("1990","2020"))`

In [None]:
historical_baseline = data_closest_gridcell.sel(time=slice("1981", "2010"))
future_30yr = data_closest_gridcell.sel(time=slice("2020","2050"))

## 1f) Read the data into memory. 
Until this point, the data is only lazily loaded into the notebook, so this step will take several minutes. You'll notice that we've added a [Jupyter magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html), `%%time`, to the top of the cell, which will print final time it takes to perform this step once the code finishes running. 

In [None]:
%%time
historical_baseline = app.load(historical_baseline)
future_30yr = app.load(future_30yr)

### 1g) Plot the data

In [None]:
def interactive_lineplot(data, dynamic=True, ylim=(0,130),ylabel="Air Temperature (degF)",line_dash="solid"): 
    """Create an interactive lineplot using monthly data for each simulation in the dataset.
    Setting dynamic=False (the default) makes the plot take longer to produce upfront, but everything 
    is zippy after (the developer's personal preference). 
    Setting dynamic=True means the plot will only be generated once you change the settings.
    line_dash options: 'solid', 'dashed', 'dotted', 'dotdash', 'dashdot'
    """
    plots_all = None
    for (sim, color) in zip(data.simulation.values,['#377eb8', '#ff7f00', '#4daf4a','#f781bf']):
        plot_i = data.isel(
            scenario=0).sel(simulation=sim).hvplot.line(
            groupby="time.month", 
            width=550, height=350, 
            label=sim,
            line_dash=line_dash,
            grid=True,
            ylabel=ylabel,
            color=color,
            ylim=ylim, # Set limits of y axis 
            dynamic=dynamic 
        )
        plots_all = plot_i if plots_all is None else plots_all*plot_i

    plots_all = plots_all.opts(legend_position='bottom') # Move legend to bottom of plot 
    return plots_all

In [None]:
pl1 = interactive_lineplot(historical_baseline["Monthly minimum air temperature at 2m"])
pl2 = interactive_lineplot(historical_baseline["Monthly maximum air temperature at 2m"])
(pl1*pl2).opts(legend_position='bottom')

In [None]:
pl1 = interactive_lineplot(future_30yr["Monthly minimum air temperature at 2m"])
pl2 = interactive_lineplot(future_30yr["Monthly maximum air temperature at 2m"])
(pl1*pl2).opts(legend_position='bottom')

## Part 2: Diurnal trends
Find the day in each season that has the lowest minimum temperature **or** the highest maximum temperature

### 2a) Retrieve the data 
Same as we've done in Part 1, here we'll grab the data using the `retrieve_from_csv` function and get the closest gridcell to the Sacramento weatherstation.

Along with the future 30yr data, we'll also retrieve the Historical Reconstruction ERA5-WRF data from 1981-2010 as our historical baseline. We'll add this data to our plots at the end, so that it can be compared to the future period. By setting `merge` to `False` in the funtion, we're indicating that we want the two datasets returned separately, instead of merged into the same object (which would be incompatible as the datasets cover different time periods and have different dimensions) 

In [None]:
t2_hourly_future, t2_hourly_historical = app.retrieve_from_csv("data/config_hourly_2m_temp.csv", merge=False)

In [None]:
t2_hourly_future_sac = get_closest_gridcell(
    data=t2_hourly_future,
    lat=one_station.LAT_Y,
    lon=one_station.LON_X, 
)
t2_hourly_historical_sac = get_closest_gridcell(
    data=t2_hourly_historical,
    lat=one_station.LAT_Y,
    lon=one_station.LON_X, 
)

### 2b) Read the data into memory

In [None]:
%%time
t2_hourly_future_sac = app.load(t2_hourly_future_sac)
t2_hourly_historical_sac = app.load(t2_hourly_historical_sac)

### 2c) Find the hour in each season that has the lowest minimum temperature **or** the highest maximum temperature
Choose whether you are interested in a min or max by setting `by` to `"min"` or `"max"`. In the following cell, we'll find the diurnal cycle for the entire day in which this hour occurs.

In [None]:
by = "min"

In [None]:
if by == "min": 
    # Compute the value at 
    values_reached = t2_hourly_future_sac.groupby("time.season").min()
    values_reached_historical = t2_hourly_historical_sac.groupby("time.season").min()
    # Set the plot title 
    title = "Diurnal cycles for the day with the lowest temperature miminum, by season"
elif by == "max": 
    values_reached = t2_hourly_future_sac.groupby("time.season").max()
    values_reached_historical = t2_hourly_historical_sac.groupby("time.season").max()
    title = "Diurnal cycles for the day with the highest temperature maximum, by season"

### 2d) Plot the results 
In this cell, we'll find the hour the target values are reached, and the data for that corresponding day.  

In [None]:
season_accessors = {"SON":"Autumn","DJF":"Winter","MAM":"Spring","JJA":"Summer"}

plots_all = None
diurnal_cycles_by_season = []
for season in ["MAM","JJA","SON","DJF"]: 
    diurnal_cycles_by_sim = []
    for simulation in t2_hourly_future_sac.simulation.values: 
        
        # FUTURE
        # Find hour the target value is reached, and data for the corresponding day
        value_reached_by_season = values_reached.sel(season = season, simulation=simulation)
        data_by_season = t2_hourly_future_sac.sel(time=t2_hourly_future_sac.time.dt.season==season, simulation = simulation) 
        hour_value_is_reached = data_by_season.where(data_by_season == value_reached_by_season, drop=True).time
        day_value_is_reached = pd.to_datetime(hour_value_is_reached).strftime("%b %d %Y").item()
        diurnal_cycle = data_by_season.sel(time=day_value_is_reached).isel(scenario=0)
        diurnal_cycle["time"] = diurnal_cycle.time.dt.hour
        diurnal_cycle = diurnal_cycle.assign_coords(
            {"season":season,"day":day_value_is_reached}
        ).expand_dims("simulation")
        diurnal_cycles_by_sim.append(diurnal_cycle)
    
    diurnal_cycle_sims_all = xr.concat(diurnal_cycles_by_sim, dim="simulation")
    diurnal_cycles_by_season.append(diurnal_cycle_sims_all)
    
    plot_i = diurnal_cycle_sims_all.hvplot.line(
        x="time", 
        grid=True, 
        title=season_accessors[season],
        xlabel="Hour of Day",
        width=575, height=250
    ).overlay().opts(legend_position='right')
    
    # HISTORICAL
    # Find hour the target value is reached, and data for the corresponding day
    value_reached_by_season = values_reached_historical.sel(season=season, simulation="era5", scenario="reanalysis")
    data_by_season = t2_hourly_historical_sac.sel(time=t2_hourly_historical_sac.time.dt.season==season, simulation="era5", scenario="reanalysis")
    hour_value_is_reached = data_by_season.where(data_by_season == value_reached_by_season, drop=True).time
    day_value_is_reached = pd.to_datetime(hour_value_is_reached).strftime("%b %d %Y").item()
    diurnal_cycle_historical = data_by_season.sel(time=day_value_is_reached)
    diurnal_cycle_historical["hour"] = diurnal_cycle_historical.time.dt.hour
    plot_i *= diurnal_cycle_historical.hvplot.line(
        label="{0}: {1}".format("historical", day_value_is_reached),
        xlabel="Hour of Day",
        line_dash="dashed",
        grid=True, 
        color="black",
        title=season_accessors[season], 
        width=575, height=250
    ) 
    plots_all = plot_i if plots_all is None else plots_all+plot_i

diurnal_data = xr.concat(diurnal_cycles_by_season, dim="season")
plots_all = plots_all.cols(2).opts(title=title)
plots_all

### 2e) Observe the output data
The data used to generate the plots above are available in the xr.DataArray object `diurnal_data`, computed in the code cell above. Here, we'll display the data so that you can observe the dimensions.

In [None]:
display(diurnal_data)

### 2d) Export the results 
Choose your desired filetype (we recommend NetCDF) and export the data. We've left the actual export code, `app.export_dataset` commented out; if you want to save the file, simply remove the comment (#)

In [None]:
app.export_as()

In [None]:
#app.export_dataset(diurnal_data, file_name="diurnal_data")