# Accessing renewables data 
Data access for our derived renewables data is still a work in progress as we build a data catalog and continue generating data products. Eventually, helper functions will be incorporated into `climakitae` to streamline data access. For the time being, here's the best way to access this data using python.<br><br>For more details on data availability and production, check our memo here: https://wfclimres.s3.amazonaws.com/era/data-guide_pv-wind.pdf


## The basics
Retrieve renewables data from the AWS s3 bucket and download it to your current directory as a netcdf file. 

In [None]:
# Library for reading zarrs into data objects using python 
import xarray as xr 

In [None]:
# Set your simulation: one of ["ec-earth3","miroc6","mpi-esm1-2-hr","taiesm1", "era5"]
simulation = "taiesm1" 

# Set your technology: one of ["pv_distributed","pv_utility","windpower_offshore","windpower_onshore"]
technology = "pv_utility"

# Set your variable: on of ["cf", "gen"] 
variable = "gen" 

# Set your scenario: either "renanalysis" for model "era5" or one of ["historical","ssp370"] for any other model 
scenario = "ssp370" 

In [None]:
# Retrieve the data from s3 
path_to_zarr = f"s3://wfclimres/era/{technology}/{simulation}/{scenario}/1hr/{variable}/d03/"
ds = xr.open_zarr(path_to_zarr, storage_options={"anon": True})

In [None]:
# Download the data to a netcdf 
# Just download one timestep as an example 
ds_to_download = ds.isel(time=0)
ds_to_download.to_netcdf("my-renewables-data.nc")

## Make a quick plot of the data 
xarray has some nice mapping features that enable you to quickly generate a plot for a single timestep. This lets you get a sense for the data you read in. 

In [None]:
one_timestep = ds[variable].isel(time=0).compute() # Select the first timestep and read it into memory 
one_timestep.plot();

## A peek into the available data options 
We are working on building a more user-friendly catalog to detail all of the data options in a more user-friendly manner, but for now, here's a simple table that shows all the current available options for renewables data. 

In [None]:
import pandas as pd 

In [None]:
def build_ren_catalog(): 
    """Build a simple pandas DataFrame showing current available data options 
    Temporary method-- will be replaced by an intake ESM catalog in the future 
    """
    def _ren_cat_by_technology(technology, reanalysis, variable, frequency="1hr"): 
        rows = 1 if reanalysis else 16
        return pd.DataFrame({
            "variable": [variable]*rows,
            "technology": [technology]*rows,
            "simulation": ["ec-earth3","miroc6","mpi-esm1-2-hr","taiesm1"]*4 if reanalysis is False else ["era5"]*rows,
            "scenario": ["historical"]*4 + ["ssp370"]*4 + ["historical"]*4 + ["ssp370"]*4 if reanalysis is False else ["reanalysis"]*rows,
            "frequency": [frequency]*rows,
            "resolution" : ["3 km"]*rows,
        })

    # Use a list comprehension to generate all combinations
    ren_cat = pd.concat(
        [_ren_cat_by_technology(technology, False, "cf", "1hr") for technology in ["pv_distributed","pv_utility","windpower_offshore","windpower_onshore"]] + 
        [_ren_cat_by_technology(technology, False, "gen", "1hr") for technology in ["pv_distributed","pv_utility","windpower_offshore","windpower_onshore"]] + 
        [_ren_cat_by_technology(technology, False, "cf", "day") for technology in ["pv_distributed","pv_utility","windpower_offshore","windpower_onshore"]] + 
        [_ren_cat_by_technology(technology, False, "gen", "day") for technology in ["pv_distributed","pv_utility","windpower_offshore","windpower_onshore"]] +
        [_ren_cat_by_technology(technology, True, "cf", "1hr") for technology in ["pv_distributed","pv_utility","windpower_offshore","windpower_onshore"]] + 
        [_ren_cat_by_technology(technology, True, "gen", "1hr") for technology in ["pv_distributed","pv_utility","windpower_offshore","windpower_onshore"]] 
    
    ).reset_index(drop=True)
    return ren_cat

In [None]:
# Generate and display the catalog 
ren_cat = build_ren_catalog()
ren_cat

You can easily filter this table to see available options for a particular variable, technology, simulation, or scenario of interest. For example, let's look at all the available data options for total generated power (`"gen"`) derived from ERA5, a reanalysis product. 

In [None]:
ren_cat[(ren_cat["simulation"] == "era5") & (ren_cat["variable"] == "gen")]

Now, let's read in that first row of data 

In [None]:
# Data settings
simulation = "era5" 
technology = "pv_distributed"
variable = "gen" 
scenario = "reanalysis" 

# Read zarr using xarray 
path_to_zarr = f"s3://wfclimres/era/{technology}/{simulation}/{scenario}/1hr/{variable}/d03/"
era5 = xr.open_zarr(path_to_zarr, storage_options={"anon": True})

# Display xarray object in notebook
era5

## Get the closest gridcell for a coordinate pair 
For this, we'll use a helper function from `climakitae`. We'll demonstrate how to do this for the the coordinates of the city of San Francisco. 

In [None]:
from climakitae.util.utils import get_closest_gridcell
import numpy as np

First, let's read in some total generated power (`"gen"`) data for distributed solar photovoltaic (`"pv_distributed"`) in the past (`"historical"`) from the EC-Earth3 model simulation (`"ec-earth3"`)

In [None]:
# Data settings
simulation = "ec-earth3" 
technology = "pv_distributed"
variable = "gen" 
scenario = "historical" 

# Read zarr using xarray 
path_to_zarr = f"s3://wfclimres/era/{technology}/{simulation}/{scenario}/1hr/{variable}/d03/"
ds = xr.open_zarr(path_to_zarr, storage_options={"anon": True})

Next, let's use `climakitae`'s utility function `get_closest_gridcell` to grab the model gridcell that is closest to the coordinates for the city of San Francisco. <br><br>**NOTE**: The renewables data has missing values where data was not generated for a variety of reasons, so this function may return `nan` if your coordinates closest gridcell is over one of these missing value regions. Missing data regions will vary by technology type. 

In [None]:
# Coordinates of San Francisco 
lat = 37.7749
lon = -122.4194

# Reassign attribute so the function can find the resolution 
ds.attrs["resolution"] = ds.attrs["nominal_resolution"]

# Use the function to get the closest gridcell of data 
closest_gridcell = get_closest_gridcell(data=ds, lat=lat, lon=lon)

Finally, let's make a quick plot of the data for the first year of the timeseries. 

In [None]:
# Get the first 365 days of data and read into memory 
to_plot = closest_gridcell.isel(time=np.arange(0,365)).compute()

# Generate a simple lineplot 
to_plot.gen.plot();