# Renewables Data Access Notebook

In this notebook, we'll walk through how to retrieve renewable energy generation data from `climakitae` using the new core API. You'll learn how to access capacity factor and generation data for solar PV and wind power installations across California.

**Intended Application:** As a user, I want to **<span style="color:red">access and analyze renewable energy generation data for California using the `climakitae` new core interface.</span>**

**Runtime**: ~5-10 min. depending on the queries you execute.

For more details on data availability and production methodology, see: [Renewables Data Guide](https://wfclimres.s3.amazonaws.com/era/data-guide_pv-wind.pdf)

**Data Overview**: The renewables catalog contains capacity factor and generation data derived from WRF climate model outputs for:

- Utility-scale solar PV- Offshore wind power

- Distributed (rooftop) solar PV- Onshore wind power

## Setup and Imports

First, let's import the necessary packages and initialize the `ClimateData` interface.

In [None]:
import xarray as xr
import matplotlib.pyplot as plt
import numpy as np

import climakitae as ck

Initialize the `ClimateData` interface with quiet output:

In [None]:
# Initialize the interface with quiet output
cd = ck.ClimateData(verbosity=-1)

## Exploring Renewables Data Options

The renewables catalog contains data for different installation types, variables, and climate models. Just like with climate data, you can explore what's available before making your data selections.

Let's explore the available options:

### Installation Types Available:

- `pv_utility`: Utility-scale solar photovoltaic systems- `windpower_offshore`: Offshore wind turbines

- `pv_distributed`: Distributed rooftop solar photovoltaic systems- `windpower_onshore`: Onshore wind turbines

In [None]:
# Select the renewables catalog
cd.reset()
renewables = cd.catalog("renewable energy generation")

# Show available installation types
print("=== Available Installation Types ===")
renewables.show_installation_options()

Now let's explore the available variables for a specific installation type:

In [None]:
# Choose an installation type and explore variables
pv_utility = renewables.installation("pv_utility")

print("=== Available Variables for PV Utility ===")
pv_utility.show_variable_options()

print("\n=== Available Climate Models ===")
pv_utility.show_source_id_options()

print("\n=== Available Scenarios ===")
pv_utility.show_experiment_id_options()

## Retrieving Renewables Data

The `ClimateData` interface allows you to chain method calls to build readable queries for renewables data, just like with climate data.

### Required Parameters for Renewables Queries:
- **`catalog`**: "renewable energy generation"
- **`installation`**: Installation type (e.g., "pv_utility", "pv_distributed", "windpower_onshore", "windpower_offshore")
- **`variable_id`**: Variable name
  - `"cf"`: Capacity factor (0-1, dimensionless ratio of actual to potential output)
  - `"gen"`: Generation (power output in appropriate units)

- **`table_id`**: Temporal resolution ("day" or "mon")Let's retrieve capacity factor data for utility-scale solar PV:

- **`grid_label`**: Spatial resolution

  - `"d02"`: 9km resolution (regional coverage)### Example 1: Single Installation Type and Variable

  - `"d03"`: 3km resolution (fine-scale, local)

In [None]:
cd.reset()
pv_utility_data = (cd
    .catalog("renewable energy generation")
    .installation("pv_utility")
    .variable("cf")  # Capacity factor
    .experiment_id("historical")
    .source_id("EC-Earth3")
    .table_id("day")
    .grid_label("d03")
    .get()
)

pv_utility_data

### Example 2: Multiple Scenarios

You can retrieve data for multiple scenarios (e.g., historical and future) in a single query:

In [None]:
cd.reset()
multi_scenario_data = (cd
    .catalog("renewable energy generation")
    .installation("pv_utility")
    .variable("cf")
    .experiment_id(["historical", "ssp370"])  # Multiple scenarios
    .source_id("EC-Earth3")
    .table_id("day")
    .grid_label("d03")
    .get()
)

multi_scenario_data

### Not specifying scenario

Not specifying a scenario will return historical + scenario for all available scenarios

In [None]:
cd.reset()
multi_scenario_data = (cd
    .catalog("renewable energy generation")
    .installation("pv_utility")
    .variable("cf")
    #.experiment_id(["historical", "ssp370"])  # No specified scenarios
    .source_id("EC-Earth3")
    .table_id("day")
    .grid_label("d03")
    .get()
)

multi_scenario_data

### Example 3: Using Dictionary Query

You can also use the dictionary query method:

In [None]:
# Define your query 
renewables_query_dict = {
    "catalog": "renewable energy generation",
    "installation": "windpower_onshore",
    "variable_id": "gen",  # Generation data
    "experiment_id": "ssp370",
    "source_id": "MPI-ESM1-2-HR",
    "table_id": "day",
    "grid_label": "d03"
}

# Load and retrieve the data
wind_data = ck.ClimateData(verbosity=-2).load_query(renewables_query_dict).get()
wind_data

## Working with Processors

Just like with climate data, you can apply processors to subset and transform renewables data. Common processors include:

- **`time_slice`**: Subset data to a specific date rangeLet's apply spatial and temporal subsetting:

- **`clip`**: Extract data for specific geographic regions or coordinates

- **`convert_units`**: Convert between units (if applicable)- **`export`**: Save data to various file formats

In [None]:
# Retrieve data with time slice and spatial clip
cd.reset()
processed_data = (cd
    .catalog("renewable energy generation")
    .installation("pv_utility")
    .variable("cf")
    .table_id("day")
    .grid_label("d03")
    .processes({
        "time_slice": ("2020-01-01", "2020-12-31"),    
        "clip": "Kern County"
    })
    .get()
)

processed_data

## Visualizing Renewables Data 

`xarray` has built-in plotting capabilities that make it easy to visualize spatial and temporal patterns in renewables data. Let's create a spatial map of capacity factor data for a single timestep:

In [None]:
# Plot a single timestep
fig, ax = plt.subplots(figsize=(10, 6))
processed_data['cf'].mean(dim=['time', 'sim']).plot(
    x='lon',
    y='lat',
    cmap='YlOrRd',
    cbar_kwargs={'label': 'Capacity Factor'}
)
ax.set_title('PV Utility Capacity Factor - Kern County')
plt.show()

## Extracting Data for Specific Locations

For site-specific analysis, you can extract data at exact coordinates using the `clip` processor. This is useful for:

- Evaluating renewable energy potential at proposed project sitesLet's extract data for San Francisco:

- Comparing capacity factors across different locations
- Creating location-specific time series for energy planning

In [None]:
# Coordinates of San Francisco
lat = 37.7749
lon = -122.4194

cd.reset()
sf_data = (cd
    .catalog("renewable energy generation")
    .installation("pv_distributed")
    .variable("gen")
    .experiment_id("historical")
    .source_id("MPI-ESM1-2-HR")
    .table_id("day")
    .grid_label("d03")
    .processes({
        "clip": (lat, lon)
    })
    .get()
)

sf_data

### Time Series Visualization

Now let's plot a time series for the first year of data to see daily variability in generation:

In [None]:
# Plot the first 365 days
fig, ax = plt.subplots(figsize=(12, 5))
sf_data['gen'].isel(time=slice(0, 365), sim=0).plot(ax=ax)
ax.set_title('PV Distributed Generation - San Francisco (First Year)')
ax.set_ylabel('Generation')
ax.set_xlabel('Date')
plt.tight_layout()
plt.show()

## Comparing Multiple Locations

You can retrieve data for multiple locations simultaneously using the `clip` processor with a list of coordinates. This is useful for:

- Comparing renewable energy potential across different regionsLet's compare three major California cities:

- Portfolio analysis for multi-site projects
- Regional resource assessment

In [None]:
# Multiple cities in California
locations = [
    (37.7749, -122.4194),  # San Francisco
    (34.0522, -118.2437),  # Los Angeles
    (32.7157, -117.1611),  # San Diego
]

cd.reset()
multi_location_data = (cd
    .catalog("renewable energy generation")
    .installation("pv_utility")
    .variable("cf")
    .experiment_id("historical")
    .source_id("EC-Earth3")
    .table_id("day")
    .grid_label("d03")
    .processes({
        "time_slice": (2000, 2005),
        "clip": {
            "points": locations,
            "separated": True
        }
    })
    .get()
)

multi_location_data

### Comparative Analysis

Let's compare monthly mean capacity factors across the three locations to understand regional differences:

In [None]:
# Calculate monthly mean capacity factor for each location
monthly_cf = multi_location_data['cf'].resample(time='M').mean()

# Plot comparison
fig, ax = plt.subplots(figsize=(14, 6))
city_names = ['San Francisco', 'Los Angeles', 'San Diego']

for i, city in enumerate(city_names):
    monthly_cf.isel(points=i, sim=0).plot(ax=ax, label=city, linewidth=2)

ax.set_title('Monthly Mean PV Utility Capacity Factor - Three California Cities')
ax.set_ylabel('Capacity Factor')
ax.set_xlabel('Date')
ax.legend()
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()

## Summary and Additional Resources

### Key Takeaways

This notebook demonstrated how to:
- Explore available renewables data options using the new core API
- Retrieve capacity factor and generation data for different installation types
- Apply spatial and temporal subsetting with processors
- Visualize spatial patterns and temporal trends
- Compare renewable energy potential across multiple locations

### Data Reference

**Installation Types:**
- `pv_utility`: Utility-scale solar photovoltaic systems
- `pv_distributed`: Distributed rooftop solar photovoltaic systems
- `windpower_onshore`: Onshore wind turbines
- `windpower_offshore`: Offshore wind turbines

**If you have any issues running this notebook or questions about renewables data access, please open an issue on our GitHub repository. Thank you!**

**Variables:**

- `cf`: Capacity factor (ratio of actual to potential output, 0-1)---

- `gen`: Generation (actual power output)

- **Report Issues or Feedback**: [https://github.com/cal-adapt/climakitae/issues](https://github.com/cal-adapt/climakitae/issues)

**Temporal Resolutions:**- **ClimakitAE Documentation**: [https://climakitae.readthedocs.io/](https://climakitae.readthedocs.io/)

- `day`: Daily data- **Renewables Data Guide**: [https://wfclimres.s3.amazonaws.com/era/data-guide_pv-wind.pdf](https://wfclimres.s3.amazonaws.com/era/data-guide_pv-wind.pdf)

- `mon`: Monthly data

### Further Resources

**Spatial Resolutions:**

- `d02`: 9km (regional analysis)- **Use Cases**: This data is suitable for planning studies, resource assessment, and climate change impact analysisâ€”not for operational forecasting

- `d03`: 3km (fine-scale, site-specific analysis)- **Climate Forcing**: All renewables data is based on the same climate simulations available in the main climate catalog

- **Data Derivation**: Capacity factors and generation values are derived from WRF climate model outputs using technology-specific power curves, panel configurations, and turbine specifications

### Important Notes- **Missing Data**: Renewables data may contain `NaN` values in certain regions where data was not generated (e.g., water bodies, non-viable terrain)
