### Climate Data Acquisition for Hydrological Renewables

This notebook quickly walks through how to access AE data at 9km for several variables:
- Precipitation (mm/day)
- Min and max temperature (degC)
- Relative humidity (%)
- Mean wind speed (m/s)

At present, this notebook sets-up access the historical dynamically downscaled WRF simulations. 

#### Step 0: Set-up
Import the climakitae library and other dependencies.

In [None]:
import climakitae as ck
import xarray as xr
import numpy as np

#### Step 1: Grab and process all required input data
Two important notes:
1. Not all models in the Cal-Adapt: Analytics Engine have the solar variables critical for renewables generation - only 4 out of 8 do. We will carefully subset our variables to ensure that the same 4 models are selected for consistency. However, if you need the other models, comment out (by adding a `#` symbol) to the lines of code below that are noted for subsetting for specific models.
2. The dynamically downscaled WRF data in the Cal-Adapt: Analytics Engine is in UTC time.

In [None]:
selections = ck.Select()

# default selections applicable to all variables selected
selections.data_type = "Gridded"
selections.area_average = "No"
selections.scenario_historical = ["Historical Climate"]
selections.timescale = "hourly"
selections.resolution = "9 km"
selections.area_subset = "states"
selections.cached_area = ['CA']

In [None]:
# these 4 models are consistent with the solar/wind efforts
data_models = ['WRF_MIROC6_r1i1p1f1', 'WRF_TaiESM1_r1i1p1f1', 'WRF_EC-Earth3_r1i1p1f1', 'WRF_MPI-ESM1-2-HR_r3i1p1f1']

Now that we have set up default settings, let's start retrieving data. We will need to aggregate variables to daily timescales for the following variables:

In [None]:
# air temperature
selections.variable = "Air Temperature at 2m"
selections.units = "degC"
temp_data = selections.retrieve()
temp_data = temp_data.sel(simulation = data_models) # subset for specific models

# max air temp
max_airtemp_data = temp_data.resample(time="1D").max() # daily max air temp
max_airtemp_data.name = "Daily max air temperature" # rename for clarity

# min air temp
min_airtemp_data = temp_data.resample(time="1D").min() # daily min air temp
min_airtemp_data.name = "Daily min air temperature" # rename for clarity

In [None]:
# precipitation (split across two variables that we will sum)
selections.variable = "Precipitation (cumulus portion only)"
selections.units = "mm/day"
precip_cumulus_data = selections.retrieve()
precip_cumulus_data = precip_cumulus_data.sel(simulation = data_models) # subset for specific models

selections.variable = "Precipitation (grid-scale portion only)"
selections.units = "mm/day"
precip_grid_data = selections.retrieve()
precip_grid_data = precip_grid_data.sel(simulation = data_models) # subset for specific models

# sum precipitation together and aggregate to daily
precip_data = precip_cumulus_data + precip_grid_data
precip_data = precip_data.resample(time="1D").sum() # daily total precip
precip_data.name = "Daily precipitation" # rename for clarity

In [None]:
# relative humidity
selections.variable = "Relative humidity"
selections.units = "%"
rh_data = selections.retrieve()
rh_data = rh_data.sel(simulation = data_models) # subset for specific models

rh_data = rh_data.resample(time="1D").mean() # daily mean relative humidity
rh_data.name = "Daily relative humidity"  # rename for clarity

In [None]:
# wind speed
selections.variable = "Wind speed at 10m"
selections.units = "m s-1"
ws_data = selections.retrieve()
ws_data = ws_data.sel(simulation = data_models) # subset for specific models

# mean wind speed
mean_windspd_data = ws_data.resample(time="1D").mean() # daily mean wind speed
mean_windspd_data.name = "Daily mean wind speed" # rename for clarity

**Note**: Each variable is approximately 1.6GB of data. If you need to download the data, we strongly recommend further subsetting for specific locations first to trim down the size. In the next cell, we are going to load in **only a small subset** for visualization purposes only. An example of how to download to a netcdf file is in the last cell of this notebook. 

In [None]:
data_to_view = max_airtemp_data.isel(time=np.arange(0,30)) # selecting only first 30 days
data_to_view = ck.load(data_to_view)
ck.view(data_to_view)

#### Step 2: Access the dynamically-downscaled Historical Reconstruction (WRF-ERA5) data
WRF-ERA5 is available on the Analytics Engine for a longer period of time that the WRF data above: 1950-2022. In the step below we will retrieve the WRF-ERA5 data and subset the time index so that it matches the WRF data - if you need a longer period of time, modify to: `selections.time_slice = (1950, 2022)`.

In [None]:
selections.data_type = "Gridded"
selections.area_average = "No"
selections.scenario_historical = ["Historical Reconstruction"]
selections.time_slice = (1980, 2014) # subsetting to match WRF data
selections.timescale = "daily"
selections.resolution = "9 km"
selections.area_subset = "states"
selections.cached_area = ['CA']

In [None]:
# there's only one simulation for the WRF-ERA5 so we can batch run all variables
selections.variable = "Maximum air temperature at 2m"
selections.units = "degC"
era5_max_temp_data = selections.retrieve()

selections.variable = "Minimum air temperature at 2m"
selections.units = "degC"
era5_min_temp_data = selections.retrieve()

selections.variable = "Precipitation (total)"
selections.units = "mm/day"
era5_precip_data = selections.retrieve()

selections.variable = "Relative humidity"
selections.units = "%"
era5_rh_data = selections.retrieve()

selections.variable = "Mean wind speed at 10m"
selections.units = "m/s"
era5_ws_data = selections.retrieve()

In [None]:
# load all data in and compute
all_era5_vars = xr.merge([era5_max_temp_data.squeeze(), era5_min_temp_data.squeeze(), era5_precip_data.squeeze(), era5_rh_data.squeeze(), era5_ws_data.squeeze()])
all_era5_vars = all_era5_vars.compute()
all_era5_vars

**Optional**: Visualize at a single variable (as an example)

In [None]:
tmax = all_era5_vars['Maximum air temperature at 2m']
ck.view(tmax)

**Optional**: If you need to work with the data locally, you can export it to a .nc file with the following line of code -- the data will download to the filetree on the left hand side. Uncomment it (by removing the "#" symbol) to run and export the data. 

In [None]:
# filename = "historical_era5_renewables_data" ## modify file name if needed
# ck.export(all_era5_vars, filename, 'NetCDF')