### Climate Data Acquisition for Hydrological Renewables

This notebook quickly walks through how to access AE data at 9km for several variables:
- Precipitation (mm/day)
- Min and max temperature (degC)
- Relative humidity (%)
- Mean wind speed (m/s)

At present, this notebook sets-up access the historical dynamically downscaled WRF simulations. 

**Runtime**: With the default settings, this notebook takes approximately **35-45 minutes** to run from start to finish. Modifications to selections may increase the runtime. 

#### Step 0: Set-up
Import the climakitae library and other dependencies.

In [None]:
import climakitae as ck
import climakitaegui as ckg
import xarray as xr
import numpy as np

#### Semi-bulk processing for WRF data download
**Warning**: Each variable *per model* is approximately 5-7GB of data and will take approximately 20-30 minutes to load and export. To download the data, it may either save to the filetree to the left (in which you can right click and download), or a URL link to an s3 bucket will be provided (click the link and your download will begin), depending on the available memory space.

We've provided an easy "bulk" function to set-up and export the data for you. All you need to do is modify which model you want to download by changing the `number` in `data_models[NUMBER]` from 0-3, and the variable. You can then call this function after you have calculated your variable of interest (we demonstrate this below as well). 

**Note**: If you see the memory in the bottom bar of your web browser approaching 30GB, we recommend either hitting the `stop` button, or restarting your kernel by selecting `"restart kernel and clear all outputs"` in the top bar under `Kernel` and returning to this notebook. If the hub crashes on you because of memory space, restarting the kernel with this option will help. 

In [None]:
def bulk_run(model_to_run, var):
    print('Running bulk_run on {} will take approx. 20-30 minutes!\n'.format(model_to_run))
    print('Loading variable into memory space...')
    var = ck.load(var) # about 12-15 min
    print('Variable loaded into memory!')

    filename = "{}_{}".format(model_to_run, var.name.replace(" ", "_"))
    print('\nPreparing {} for export...'.format(filename))

    ck.export(var, filename, 'NetCDF')
    var.close() # to save memory on the hub / not crash
    print('\nVariable closed to save space in memory')

#### Step 1: Grab and process all required input data
Two important notes:
1. Not all models in the Cal-Adapt: Analytics Engine have the solar variables critical for renewables generation - only 4 out of 8 do, and they are currently only available at hourly timesteps. We will carefully subset our variables to ensure that the same 4 models are selected for consistency, and aggregate to daily timescales. However, if you need the other models, comment out (by adding a `#` symbol) to the lines of code below that are noted for subsetting for specific models. 
2. The dynamically downscaled WRF data in the Cal-Adapt: Analytics Engine is in UTC time.

In [None]:
selections = ckg.Select()

# default selections applicable to all variables selected
selections.data_type = "Gridded"
selections.area_average = "No"
selections.scenario_ssp = ["SSP 3-7.0 -- Business as Usual"]
selections.timescale = "hourly"
selections.resolution = "9 km"
selections.time_slice = (2015, 2060)

# selections.show() # to see the GUI panel for more customizeable selections

In [None]:
# these 4 models are consistent with the solar/wind efforts
data_models = ['WRF_MIROC6_r1i1p1f1', 'WRF_TaiESM1_r1i1p1f1', 'WRF_EC-Earth3_r1i1p1f1', 'WRF_MPI-ESM1-2-HR_r3i1p1f1']

# highly recommended to run a single model at a time
data_models = data_models[0]
data_models # confirmation of selection

Now that we have set up default settings, let's start retrieving data. We will need to aggregate variables to daily timescales for the following variables:

In [None]:
# air temperature
selections.variable = "Air Temperature at 2m"
selections.units = "degC"
temp_data = selections.retrieve()
temp_data = temp_data.sel(simulation = data_models) # subset for specific models

# max air temp
max_airtemp_data = temp_data.resample(time="1D").max() # daily max air temp
max_airtemp_data.name = "Daily max air temperature" # rename for clarity
max_airtemp_data.attrs["frequency"] = "daily"

# min air temp
min_airtemp_data = temp_data.resample(time="1D").min() # daily min air temp
min_airtemp_data.name = "Daily min air temperature" # rename for clarity
min_airtemp_data.attrs["frequency"] = "daily"

Next, we demonstrate the `bulk_run` function and how to download a large amount of data for a single model. A reminder that the `bulk_run` function will take time to run (**~20-30 minutes**) - but if you see the memory getting too large, we recommend stopping the run (via the stop button) or restarting the kernel.

In [None]:
bulk_run(data_models, max_airtemp_data)

In [None]:
# precipitation (split across two variables that we will sum)
selections.variable = "Precipitation (cumulus portion only)"
selections.units = "mm"
precip_cumulus_data = selections.retrieve()
precip_cumulus_data = precip_cumulus_data.sel(simulation = data_models) # subset for specific models

selections.variable = "Precipitation (grid-scale portion only)"
selections.units = "mm"
precip_grid_data = selections.retrieve()
precip_grid_data = precip_grid_data.sel(simulation = data_models) # subset for specific models

# sum precipitation together and aggregate to daily
precip_data = precip_cumulus_data + precip_grid_data
precip_data = precip_data.resample(time="1D").sum() # daily total precip
precip_data.name = "Daily precipitation" # rename for clarity
precip_data.attrs["frequency"] = "daily"

In [None]:
# relative humidity
selections.variable = "Relative humidity"
selections.units = "[0 to 100]"  # percent
rh_data = selections.retrieve()
rh_data = rh_data.sel(simulation = data_models) # subset for specific models

rh_data = rh_data.resample(time="1D").mean() # daily mean relative humidity
rh_data.name = "Daily relative humidity"  # rename for clarity
rh_data.attrs["frequency"] = "daily"

In [None]:
# wind speed
selections.variable = "Wind speed at 10m"
selections.units = "m s-1"
ws_data = selections.retrieve()
ws_data = ws_data.sel(simulation = data_models) # subset for specific models

# mean wind speed
mean_windspd_data = ws_data.resample(time="1D").mean() # daily mean wind speed
mean_windspd_data.name = "Daily mean wind speed" # rename for clarity
mean_windspd_data.attrs["frequency"] = "daily"

In the next cell, we are going to load in **only a small subset** for visualization purposes only.

In [None]:
data_to_view = max_airtemp_data.isel(time=np.arange(0,5)) # selecting only first 5 days
data_to_view = ck.load(data_to_view)
ck.view(data_to_view)

#### Step 2: Access the dynamically-downscaled Historical Reconstruction (WRF-ERA5) data
WRF-ERA5 is available on the Analytics Engine for a longer period of time that the WRF data above: 1950-2022. In the step below we will retrieve the WRF-ERA5 data and subset the time index so that it matches the historical length of the WRF data (with "Historical Climate") - if you need a longer period of time, modify: `selections.time_slice = (START_YEAR, END_YEAR)`.

In [None]:
selections.data_type = "Gridded"
selections.area_average = "No"
selections.scenario_historical = ["Historical Reconstruction"]
selections.scenario_ssp = []
selections.time_slice = (1980, 2014) # subsetting to match WRF data
selections.timescale = "daily"
selections.resolution = "9 km"

In [None]:
# there's only one simulation for the WRF-ERA5 so we can batch run all variables
selections.variable = "Maximum air temperature at 2m"
selections.units = "degC"
era5_max_temp_data = selections.retrieve()

selections.variable = "Minimum air temperature at 2m"
selections.units = "degC"
era5_min_temp_data = selections.retrieve()

selections.variable = "Precipitation (total)"
selections.units = "mm"
era5_precip_data = selections.retrieve()

selections.variable = "Relative humidity"
selections.units = "[0 to 100]"  # percent
era5_rh_data = selections.retrieve()

selections.variable = "Mean wind speed at 10m"
selections.units = "m/s"
era5_ws_data = selections.retrieve()

We'll now run `bulk_run` on a variable as an example export of the historical reconstruction data. Since the ERA5 data is much less complex, this will take about **2-5 minutes** to run. 

In [None]:
bulk_run('WRF-ERA5', era5_max_temp_data)

**Optional**: Visualize at a single variable (as an example)

In [None]:
data_to_view = era5_max_temp_data.isel(time=np.arange(0,5)) # selecting only first 5 days
data_to_view = ck.load(data_to_view)
ck.view(data_to_view)