Skip to content

Feature: support efficient per-cell capacity factor time series extraction #480

@MaykThewessen

Description

@MaykThewessen

Summary

When generating capacity factor profiles for many individual grid cells (e.g., per-bus profiles for a power system model), the current recommended workflow involves creating a separate single-pixel Cutout for each location. This results in O(N) file I/O operations and O(N) calls to convert_wind/convert_pv, which is very slow for hundreds of locations.

We found that calling convert_wind and convert_pv directly on a multi-cell dataset returns per-cell (time, y, x) DataArrays, enabling fully vectorized computation — a ~50x speedup for our use case (345 bus locations, 152 unique grid pixels → 85 seconds for 3 weather years instead of ~60 minutes).

Current workaround

from atlite.convert import convert_wind, convert_pv, get_windturbineconfig, get_solarpanelconfig, get_orientation

# 1. Open cutout and select sub-region covering all locations
ds = xr.open_dataset(cutout_path, chunks="auto")
sub = ds.sel(x=slice(x_min, x_max), y=slice(y_min, y_max)).compute()

# 2. Vectorized conversion — ALL grid cells at once
turbine = get_windturbineconfig("Vestas_V112_3MW")
wind_cf = convert_wind(sub, turbine, interpolation_method="logarithmic")
# → DataArray with dims (time, y, x), shape e.g. (8760, 13, 16)

panel = get_solarpanelconfig("CSi")
orientation = get_orientation("latitude_optimal")
solar_cf = convert_pv(sub, panel, orientation, tracking=None)
# → DataArray with dims (y, time, x)

# 3. Extract per-location profiles via coordinate lookup
meeden_wind = wind_cf.sel(x=6.9, y=53.1, method="nearest").values  # shape (8760,)

This works but relies on internal functions (convert_wind, convert_pv) that are not part of the documented public API.

Proposal

I see two possible improvements:

Option A: Document the capacity_factor_timeseries=True workflow for multi-cell cutouts

Looking at convert_and_aggregate, when capacity_factor_timeseries=True is used without layout/shapes/matrix, it already returns the per-cell DataArray directly (the no_args path at line 138-142). This could be documented as the recommended approach for per-cell profile extraction:

cutout = atlite.Cutout(path="sub_region.nc")
# Returns per-cell (time, y, x) DataArray when no layout/shapes/matrix given
wind_cf = cutout.wind(turbine="Vestas_V112_3MW", capacity_factor_timeseries=True)
solar_cf = cutout.pv(panel="CSi", orientation="latitude_optimal", capacity_factor_timeseries=True)

# Extract individual locations
meeden = wind_cf.sel(x=6.9, y=53.1, method="nearest")

Option B: Add a convenience method for multi-point extraction

A new Cutout.profiles_at_points() method that takes a DataFrame of (lat, lon) coordinates and returns per-location time series:

locations = pd.DataFrame({
    "name": ["Meeden", "Borssele"],
    "x": [6.9, 3.0],
    "y": [53.1, 51.7],
})
profiles = cutout.profiles_at_points(locations, turbine="Vestas_V112_3MW")
# → DataFrame with columns ["Meeden", "Borssele"] and DatetimeIndex

Use case

This is common in power system modeling where each bus/substation needs location-specific renewable capacity factors. In PyPSA-based models, generators at different buses should have distinct wind/solar profiles reflecting local weather conditions, not a single country-level average.

Performance comparison

For our model (345 bus locations across 8 European countries, SARAH3+ERA5 cutouts):

Approach Time per year Method
Per-pixel Cutout (current) ~20-30 min 152 × Cutout + cutout.wind()
Vectorized convert_wind/convert_pv ~28 seconds 8 country sub-regions, parallel

The vectorized approach is ~50x faster because it eliminates per-pixel file I/O and leverages numpy's vectorized operations across the entire grid.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions