### Climate Data Acquisition for Hydrological Renewables

This notebook quickly walks through how to access AE data at 9km for several variables:
- Precipitation (mm/day)
- Min and max temperature (degC)
- Relative humidity (%)
- Mean wind speed (m/s)

At present, this notebook sets-up access the historical dynamically downscaled WRF simulations. 

**Runtime**: With the default settings, this notebook takes approximately **5-10 minutes** to run from start to finish. Modifications to selections may increase the runtime. 

#### Step 0: Set-up
Import the climakitae library and other dependencies.

In [None]:
import climakitae as ck
import climakitaegui as ckg
from climakitae.core.data_interface import DataParameters

import xarray as xr
import numpy as np

#### Semi-bulk processing for WRF data download
**Warning**: Each variable *per model* is approximately 5-7GB of data and will take approximately 20-30 minutes to load and export. To download the data, it may either save to the filetree to the left (in which you can right click and download), or a URL link to an s3 bucket will be provided (click the link and your download will begin), depending on the available memory space.

We've provided an easy "bulk" function to set-up and export the data for you. All you need to do is modify which model you want to download by changing the `number` in `data_models[NUMBER]` from 0-3, and the variable. You can then call this function after you have calculated your variable of interest (we demonstrate this below as well). 

**Note**: If you see the memory in the bottom bar of your web browser approaching 30GB, we recommend either hitting the `stop` button, or restarting your kernel by selecting `"restart kernel and clear all outputs"` in the top bar under `Kernel` and returning to this notebook. If the hub crashes on you because of memory space, restarting the kernel with this option will help. 

In [None]:
def bulk_run(model_to_run, var):
    print('Running bulk_run on {} will take approx. 5-10 minutes!\n'.format(model_to_run))
    print('Loading variable into memory space...')
    var = ck.load(var) # about 2-3 min.
    print('Variable loaded into memory!')

    filename = "{}_{}".format(model_to_run, var.name.replace(" ", "_"))
    print('\nPreparing {} for export...'.format(filename))

    ck.export(var, filename, 'NetCDF')
    var.close() # to save memory on the hub / not crash
    print('\nVariable closed to save space in memory')

#### Step 1a: Grab and process all required input data
Two important notes:
1. Not all models in the Cal-Adapt: Analytics Engine have the solar variables critical for renewables generation - only 4 out of 8 do, and they are currently only available at hourly timesteps. We will carefully subset our variables to ensure that the same 4 models are selected for consistency, and aggregate to daily timescales. However, if you need the other models, comment out (by adding a `#` symbol) to the lines of code below that are noted for subsetting for specific models. 
2. The dynamically downscaled WRF data in the Cal-Adapt: Analytics Engine is in UTC time.

In [None]:
selections = DataParameters()

# default selections applicable to all variables selected
selections.data_type = "Gridded"
selections.area_average = "No"
selections.scenario_ssp = ["SSP 3-7.0"]
selections.timescale = "daily"
selections.resolution = "9 km"
selections.time_slice = (2015, 2060)

# selections.show() # to see the GUI panel for more customizeable selections

In [None]:
# these 4 models are consistent with the solar/wind efforts
data_models = ['WRF_MIROC6_r1i1p1f1', 'WRF_TaiESM1_r1i1p1f1', 'WRF_EC-Earth3_r1i1p1f1', 'WRF_MPI-ESM1-2-HR_r3i1p1f1']

# highly recommended to run a single model at a time
data_models = data_models[0]
data_models # confirmation of selection

Now that we have set up default settings, let's start retrieving data. We will need to aggregate variables to daily timescales for the following variables:

In [None]:
# air temperature
selections.variable = "Air Temperature at 2m"
selections.units = "degC"
mean_airtemp_data = selections.retrieve()
mean_airtemp_data = mean_airtemp_data.sel(simulation = data_models) # subset for specific models

# max air temp
selections.variable = 'Maximum air temperature at 2m'
max_airtemp_data = selections.retrieve()
max_airtemp_data = max_airtemp_data.sel(simulation = data_models) # subset for specific models

# min air temp
selections.variable = 'Minimum air temperature at 2m'
min_airtemp_data = selections.retrieve()
min_airtemp_data = min_airtemp_data.sel(simulation = data_models) # subset for specific models

In [None]:
# precipitation (split across two variables that we will sum)
selections.variable = "Precipitation (total)"
selections.units = "mm"
precip_data = selections.retrieve()
precip_data = precip_data.sel(simulation = data_models) # subset for specific models

In [None]:
# relative humidity
selections.variable = "Relative humidity"
selections.units = "[0 to 100]"  # percent
rh_data = selections.retrieve()
rh_data = rh_data.sel(simulation = data_models) # subset for specific models

In [None]:
# wind speed
selections.variable = "Mean wind speed at 10m"
selections.units = "m/s"
ws_data = selections.retrieve()
ws_data = ws_data.sel(simulation = data_models) # subset for specific models

In the next cell, we are going to load in **only a small subset** for visualization purposes only.

In [None]:
data_to_view = max_airtemp_data.isel(time=np.arange(0,5)) # selecting only first 5 days
data_to_view = ck.load(data_to_view)
ckg.view(data_to_view)

#### Step 1b: Export
There are two options for export:
* Using the `bulk_run` function, which will process and **export a single model and 1 variable at a time**. This approximately is 6 GB of data, and takes 5-10 minutes per model per variable.
* Merging all WRF variables together and **export a single model with all 6 variables**. This is approximately 35 GB of data, and will take approx. 1 hour. 

In [None]:
# option 1: Bulk run, export of 1 model, 1 variable at a time
%%time
bulk_run(data_models, max_airtemp_data)

In [None]:
# option 2: Merge, export of 1 model 6 variables at a time
filename_export = f"{data_models}_allvars"
wrf_ds = xr.merge([mean_airtemp_data, max_airtemp_data, min_airtemp_data, precip_data, rh_data, ws_data]).squeeze() # removes "scenario" dimension of 1
ck.export(wrf_ds, filename_export, 'NetCDF')

#### Step 2a: Access the dynamically-downscaled Historical Reconstruction (WRF-ERA5) data
WRF-ERA5 is available on the Analytics Engine for a longer period of time that the WRF data above: 1950-2022. In the step below we will retrieve the WRF-ERA5 data and subset the time index so that it matches the historical length of the WRF data (with "Historical Climate") - if you need a longer period of time, modify: `selections.time_slice = (START_YEAR, END_YEAR)`.

In [None]:
selections.data_type = "Gridded"
selections.area_average = "No"
selections.scenario_historical = ["Historical Reconstruction"]
selections.scenario_ssp = []
selections.time_slice = (1980, 2014) # subsetting to match WRF data
selections.timescale = "daily"
selections.resolution = "9 km"

In [None]:
# there's only one simulation for the WRF-ERA5 so we can batch run all variables
selections.variable = "Air Temperature at 2m"
selections.units = "degC"
era5_mean_temp_data = selections.retrieve()
era5_mean_temp_data.name = "Mean air temperature at 2m" # rename for clarity

selections.variable = "Maximum air temperature at 2m"
selections.units = "degC"
era5_max_temp_data = selections.retrieve()

selections.variable = "Minimum air temperature at 2m"
selections.units = "degC"
era5_min_temp_data = selections.retrieve()

selections.variable = "Precipitation (total)"
selections.units = "mm"
era5_precip_data = selections.retrieve()

selections.variable = "Relative humidity"
selections.units = "[0 to 100]"  # percent
era5_rh_data = selections.retrieve()

selections.variable = "Mean wind speed at 10m"
selections.units = "m/s"
era5_ws_data = selections.retrieve()

**Optional**: Visualize at a single variable (as an example)

In [None]:
data_to_view = era5_max_temp_data.isel(time=np.arange(0,5)) # selecting only first 5 days
data_to_view = ck.load(data_to_view)
ckg.view(data_to_view)

#### Step 2b: Export 
Like the WRF data, there are two options here. Since the ERA5 data is much less complex, this will take a shorter amount of time to run. 
There are two options for export:
* Using the `bulk_run` function, which will process and **export 1 variable at a time**. This approximately is 5 GB of data, and takes 2-5 minutes per variable. 
* Merging all ERA5 variables together and **export all 6 variables**. This is approximately 26 GB of data, and will take approx. 20 minutes. 

In [None]:
# Option 1: Bulk run, 1 variable at a time
%%time
bulk_run('WRF-ERA5', era5_max_temp_data)

In [None]:
# Option 2: Merge, all 6 variables
era5_ds = xr.merge([era5_mean_temp_data, era5_max_temp_data, era5_min_temp_data, era5_precip_data, era5_rh_data, era5_ws_data]).squeeze() # removes dimension of 1
ck.export(era5_ds, 'era5_all_vars', 'NetCDF')