# Renewables Data Access Notebook

In this notebook, we'll walk through how to retrieve renewable energy generation data from `climakitae` using the new core API. You'll learn how to access capacity factor and generation data for solar PV and wind power installations across California.

**Intended Application:** As a user, I want to **<span style="color:red">access and analyze renewable energy generation data for California using the `climakitae` new core interface.</span>**

**Runtime**: ~5-10 min. depending on the queries you execute.

## About the Renewables Data

This data is suitable for **planning studies, resource assessment, and climate change impact analysis**â€”not for operational forecasting. All renewables data is based on the same climate simulations available in the main climate catalog, with capacity factors and generation values derived from WRF climate model outputs using technology-specific power curves, panel configurations, and turbine specifications.

 Each grid cell in this data represents the energy generation capacity of that a facility would have it if were located at that grid cell's location. The dataset does not contain information about the real-world location or capacity of solar or wind power facilities. 
### Installation Types Available:
- **`pv_utility`**: Utility-scale solar photovoltaic systems
- **`pv_distributed`**: Distributed rooftop solar photovoltaic systems
- **`windpower_onshore`**: Onshore wind turbines
- **`windpower_offshore`**: Offshore wind turbines

### Variables:
- **`cf`**: Capacity factor (ratio of actual to potential output, 0-1)
- **`gen`**: Generation (actual power output)

### Temporal Resolutions:
- **`day`**: Daily data
- **`mon`**: Monthly data

### Spatial Resolutions:
- **`d02`**: 9km resolution (regional analysis)
- **`d03`**: 3km resolution (fine-scale, site-specific analysis)

### Important Notes:
- **Missing Data**: Renewables data may contain `NaN` values in certain regions where data was not generated (e.g., water bodies, non-viable terrain)
- **Data Guide**: For more details on data availability and production methodology, see the [Renewables Data Guide](https://wfclimres.s3.amazonaws.com/era/data-guide_pv-wind.pdf)

## Setup and Imports

First, let's import the necessary packages and initialize the `ClimateData` interface.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

import climakitae as ck

Initialize the `ClimateData` interface with quiet output (verbosity):

In [None]:
# Initialize the interface with quiet output
cd = ck.ClimateData(verbosity=-1)

## Exploring Renewables Data Options

The renewables catalog contains data for different installation types, variables, and climate models. Just like with climate data, you can explore what's available before making your data selections.

Let's explore the available options:

### Installation Types Available:

- `pv_utility`: Utility-scale solar photovoltaic systems
- `windpower_offshore`: Offshore wind turbines
- `pv_distributed`: Distributed rooftop solar photovoltaic systems
- `windpower_onshore`: Onshore wind turbines

In [None]:
# Select the renewables catalog
cd.reset()
renewables = cd.catalog("renewable energy generation")

# Show available installation types
print("=== Available Installation Types ===")
renewables.show_installation_options()

Now let's explore the available variables for a specific installation type:

In [None]:
# Choose an installation type and explore variables
pv_utility = renewables.installation("pv_utility")

print("=== Available Variables for PV Utility ===")
pv_utility.show_variable_options()

print("\n=== Available Climate Models ===")
pv_utility.show_source_id_options()

print("\n=== Available Scenarios ===")
pv_utility.show_experiment_id_options()

## Retrieving Renewables Data

The `ClimateData` interface allows you to chain method calls to build readable queries for renewables data, just like with climate data.

### Required Parameters for Renewables Queries:
- **`catalog`**: "renewable energy generation"
- **`installation`**: Installation type (e.g., "pv_utility", "pv_distributed", "windpower_onshore", "windpower_offshore")
- **`variable_id`**: Variable name
  - `"cf"`: Capacity factor (0-1, dimensionless ratio of actual to potential output)
  - `"gen"`: Generation (power output in appropriate units)

- **`table_id`**: Temporal resolution ("day" or "mon")Let's retrieve capacity factor data for utility-scale solar PV:

- **`grid_label`**: Spatial resolution

  - `"d02"`: 9km resolution (regional coverage)
  - `"d03"`: 3km resolution (fine-scale, local)

### Example 1: Single Installation Type and Variable

In [None]:
cd.reset()
pv_utility_data = (cd
    .catalog("renewable energy generation")
    .installation("pv_utility")
    .variable("cf")  # Capacity factor
    .experiment_id("historical")
    .source_id("EC-Earth3")
    .table_id("day")
    .grid_label("d03")
    .get()
)

pv_utility_data

### Example 2: Multiple Scenarios

You can retrieve data for multiple scenarios (e.g., historical and future) in a single query:

In [None]:
cd.reset()
multi_scenario_data = (cd
    .catalog("renewable energy generation")
    .installation("pv_utility")
    .variable("cf")
    .experiment_id(["historical", "ssp370"])  # Multiple scenarios (optional)
    .source_id("EC-Earth3")
    .table_id("day")
    .grid_label("d03")
    .get()
)

multi_scenario_data

>  ### Note
> 
> You do not have to specify `experiment_id`. The default behavior is to append historical to all available ssps. 
>   

### Example 3: Using Dictionary Query

You can also use the dictionary query method:

In [None]:
# Define your query 
renewables_query_dict = {
    "catalog": "renewable energy generation",
    "installation": "windpower_onshore",
    "variable_id": "gen",  # Generation data
    "experiment_id": "ssp370",
    "source_id": "MPI-ESM1-2-HR",
    "table_id": "day",
    "grid_label": "d03"
}

# Load and retrieve the data
wind_data = ck.ClimateData(verbosity=-2).load_query(renewables_query_dict).get()
wind_data

## Working with Processors

Just like with climate data, you can apply processors to subset and transform renewables data. Common processors include:

- **`time_slice`**: Subset data to a specific date rangeLet's apply spatial and temporal subsetting:

- **`clip`**: Extract data for specific geographic regions or coordinates

- **`convert_units`**: Convert between units (if applicable)- **`export`**: Save data to various file formats

In [None]:
# Retrieve data with time slice and spatial clip
cd.reset()
processed_data = (cd
    .catalog("renewable energy generation")
    .installation("pv_utility")
    .variable("cf")
    .table_id("day")
    .grid_label("d03")
    .processes({
        "time_slice": ("2020-01-01", "2020-12-31"),    
        "clip": "Kern County"
    })
    .get()
)

processed_data

## Visualizing Renewables Data 

`xarray` has built-in plotting capabilities that make it easy to visualize spatial and temporal patterns in renewables data. Let's create a spatial map of capacity factor data for a single timestep:

In [None]:
# Plot a single timestep
fig, ax = plt.subplots(figsize=(10, 6))
processed_data['cf'].mean(dim=['time', 'sim']).plot(
    x='lon',
    y='lat',
    cmap='YlOrRd',
    cbar_kwargs={'label': 'Capacity Factor'}
)
ax.set_title('PV Utility Capacity Factor - Kern County')
plt.show()

This plot shows the average modeled Capacity Factor over 1 year (2020) for Photovoltaic utility across Kern County, CA. Empty grid cells represent areas that are unavailable for utility development.

## Extracting Data for Specific Locations

For site-specific analysis, you can extract data at exact coordinates using the `clip` processor. This is useful for:

- Evaluating renewable energy potential at proposed project sitesLet's extract data for San Francisco:

- Comparing capacity factors across different locations
- Creating location-specific time series for energy planning

In [None]:
# Coordinates of downtown San Francisco
lat = 37.7749
lon = -122.4194

cd.reset()
sf_data = (cd
    .catalog("renewable energy generation")
    .installation("pv_distributed")
    .variable("gen")
    .experiment_id("historical")
    .source_id("MPI-ESM1-2-HR")
    .table_id("day")
    .grid_label("d03")
    .processes({
        "clip": (lat, lon)
    })
    .get()
)

sf_data

### Time Series Visualization

Now let's plot a time series for the first year of data to see daily variability in generation:

In [None]:
# Plot the first 365 days
fig, ax = plt.subplots(figsize=(12, 5))
sf_data['gen'].isel(time=slice(0, 365), sim=0).plot(ax=ax)
ax.set_title('PV Distributed Generation - San Francisco (First Year)')
ax.set_ylabel('Generation')
ax.set_xlabel('Date')
plt.tight_layout()
plt.show()

This is a timeseries of modeled PV distributed generation for San Francisco in 1981. The "dips" you see are when PV generation was reduced or diminished, which is indicative of cloudy conditions or dunkelflautes

## Comparing Multiple Locations

You can retrieve data for multiple locations simultaneously using the `clip` processor with a list of coordinates. This is useful for:

- Comparing renewable energy potential across different regionsLet's compare three major California cities:

- Portfolio analysis for multi-site projects
- Regional resource assessment

In [None]:
# Multiple cities in California
locations = [
    (37.7749, -122.4194),  # San Francisco
    (34.0522, -118.2437),  # Los Angeles
    (32.7157, -117.1611),  # San Diego
]

cd.reset()
multi_location_data = (cd
    .catalog("renewable energy generation")
    .installation("pv_utility")
    .variable("cf")
    .experiment_id("historical")
    .source_id("EC-Earth3")
    .table_id("day")
    .grid_label("d03")
    .processes({
        "time_slice": (2000, 2005),
        "clip": {
            "points": locations,
            "separated": True
        }
    })
    .get()
)

multi_location_data

### Comparative Analysis

Let's compare monthly mean capacity factors across the three locations to understand regional differences:

In [None]:
# Calculate monthly mean capacity factor for each location
monthly_cf = multi_location_data['cf'].resample(time='M').mean()

# Plot comparison
fig, ax = plt.subplots(figsize=(14, 6))
city_names = ['San Francisco', 'Los Angeles', 'San Diego']

for i, city in enumerate(city_names):
    monthly_cf.isel(points=i, sim=0).plot(ax=ax, label=city, linewidth=2)

ax.set_title('Monthly Mean PV Utility Capacity Factor - Three California Cities')
ax.set_ylabel('Capacity Factor')
ax.set_xlabel('Date')
ax.legend()
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()

## Hydopower Generation

The files for hydropower generation can be found here.
https://wfclimres.s3.amazonaws.com/index.html#era/hydropower_generation/

 Hydropower generation projections were modeled using a different process than the solar and windpower data above. This data projects hydropower generation in MW at specific real-world facilities at rivers and reservoirs across the western United States. Generation is based on a variety of climate variables across the watersheds that feed into each generation facility. 
The following cells show how to load this dataset:

In [None]:
# Read CSV files from S3
dfs = []
model_names = ["EC-Earth3", "MIROC6", "MPI-ESM1-2-HR", "TaiESM1"]
for model in model_names:
    url = f"s3://wfclimres/era/hydropower_generation/{model}_weekly_2015-2060_WEAP.csv"
    # Skip first 3 rows (title, scenario, empty row) to get to the actual header
    df = pd.read_csv(url, storage_options={"anon": True}, skiprows=3)
    df["model"] = model
    dfs.append(df)

# Merge all dataframes
merged_df = pd.concat(dfs, ignore_index=True)

# Set multi-index with model and facility name
merged_df = merged_df.set_index(["model", "Hydropower facility"])

# Separate time-series columns from statistics columns
stats_cols = ["Sum", "Min", "Max", "Mean", "Median", "SD", "RMS"]
time_cols = [col for col in merged_df.columns if col not in stats_cols]

# Extract only the time-series data
timeseries_df = merged_df[time_cols]

# Parse week columns into datetime timestamps
# Column format: "Wk 1 2015" -> start of week 1 in 2015
def parse_week_to_datetime(col_name):
    """Convert 'Wk X YYYY' to datetime (start of that week)"""
    parts = col_name.split()
    week = int(parts[1])
    year = int(parts[2])
    # Week 1 starts on January 1st, subsequent weeks are 7 days apart
    return pd.Timestamp(f"{year}-01-01") + pd.Timedelta(weeks=week-1)

# Create datetime index from column names
time_index = [parse_week_to_datetime(col) for col in timeseries_df.columns]

# Reshape: columns become a time dimension
timeseries_df.columns = time_index

# Stack to long format then convert to xarray
timeseries_df = timeseries_df.stack()
timeseries_df.index.names = ["model", "Hydropower facility", "time"]

# Convert to xarray Dataset with variable named "hydropower"
ds = timeseries_df.to_xarray()
ds = ds.rename({"Hydropower facility": "hydropower"})
ds

In [None]:
# Check for facilities with all zeros or minimal generation
# Sum across time for each facility and model
total_generation = ds.sum(dim='time')

# Find facilities with zero or near-zero generation across all models
zero_facilities = []
for facility in ds.hydropower.values:
    facility_data = total_generation.sel(hydropower=facility)
    if (facility_data == 0).all():
        zero_facilities.append(facility)

print(f"Facilities with all zeros across all models: {len(zero_facilities)}")
print(f"\nList of zero-generation facilities:")
for f in zero_facilities:
    print(f"  - {f}")

# Show some statistics
print(f"\nTotal facilities: {len(ds.hydropower)}")
print(f"Facilities with data: {len(ds.hydropower) - len(zero_facilities)}")
print(f"\nPercentage with zero generation: {100 * len(zero_facilities) / len(ds.hydropower):.1f}%")

In [None]:
# Show example of facilities with actual generation data
active_facilities = [f for f in ds.hydropower.values if f not in zero_facilities]

print("Example active facilities with generation data:")
for i, facility in enumerate(active_facilities[:5]):
    facility_total = ds.sel(hydropower=facility, model="EC-Earth3").sum().values
    print(f"  {facility}: {facility_total:.2f} GWh total (2015-2060)")

# Show a sample time series for one active facility
if len(active_facilities) > 0:
    sample_facility = active_facilities[0]
    sample_data = ds.sel(hydropower=sample_facility, model="EC-Earth3")
    print(f"\n{sample_facility} - First 10 weeks (EC-Earth3 model):")
    print(sample_data.isel(time=slice(0, 10)).values)

### Total Hydropower Generation Over Time

Let's visualize the total hydropower generation across all active facilities, showing the median across climate models:

In [None]:
# Filter to only active facilities (exclude zero-generation facilities)
ds_active = ds.sel(hydropower=[f for f in ds.hydropower.values if f not in zero_facilities])

# Sum across all facilities to get total generation per week
total_gen = ds_active.sum(dim='hydropower')

# Calculate median across models
median_gen = total_gen.median(dim='model')

# Resample to monthly for smoother visualization
monthly_median = median_gen.resample(time='M').sum()

# Create the plot
fig, ax = plt.subplots(figsize=(14, 6))
monthly_median.plot(ax=ax, linewidth=2, color='steelblue')
ax.set_title('Total Hydropower Generation Over Time (Median Across Climate Models)', fontsize=14)
ax.set_ylabel('Generation (GWh/month)', fontsize=12)
ax.set_xlabel('Date', fontsize=12)
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()

# Print some summary statistics
print(f"Time period: {monthly_median.time.values[0]} to {monthly_median.time.values[-1]}")
print(f"Average monthly generation: {monthly_median.mean().values:.2f} GWh")
print(f"Peak monthly generation: {monthly_median.max().values:.2f} GWh")
print(f"Minimum monthly generation: {monthly_median.min().values:.2f} GWh")

## Summary

This notebook demonstrated how to:
- Explore available renewables data options using the new core API
- Retrieve capacity factor and generation data for different installation types
- Apply spatial and temporal subsetting with processors
- Visualize spatial patterns and temporal trends
- Compare renewable energy potential across multiple locations

### Additional Resources
- **ClimakitAE Documentation**: [https://climakitae.readthedocs.io/](https://climakitae.readthedocs.io/)
- **Renewables Data Guide**: [https://wfclimres.s3.amazonaws.com/era/data-guide_pv-wind.pdf](https://wfclimres.s3.amazonaws.com/era/data-guide_pv-wind.pdf)
