-
Notifications
You must be signed in to change notification settings - Fork 126
Description
Summary
When generating capacity factor profiles for many individual grid cells (e.g., per-bus profiles for a power system model), the current recommended workflow involves creating a separate single-pixel Cutout for each location. This results in O(N) file I/O operations and O(N) calls to convert_wind/convert_pv, which is very slow for hundreds of locations.
We found that calling convert_wind and convert_pv directly on a multi-cell dataset returns per-cell (time, y, x) DataArrays, enabling fully vectorized computation — a ~50x speedup for our use case (345 bus locations, 152 unique grid pixels → 85 seconds for 3 weather years instead of ~60 minutes).
Current workaround
from atlite.convert import convert_wind, convert_pv, get_windturbineconfig, get_solarpanelconfig, get_orientation
# 1. Open cutout and select sub-region covering all locations
ds = xr.open_dataset(cutout_path, chunks="auto")
sub = ds.sel(x=slice(x_min, x_max), y=slice(y_min, y_max)).compute()
# 2. Vectorized conversion — ALL grid cells at once
turbine = get_windturbineconfig("Vestas_V112_3MW")
wind_cf = convert_wind(sub, turbine, interpolation_method="logarithmic")
# → DataArray with dims (time, y, x), shape e.g. (8760, 13, 16)
panel = get_solarpanelconfig("CSi")
orientation = get_orientation("latitude_optimal")
solar_cf = convert_pv(sub, panel, orientation, tracking=None)
# → DataArray with dims (y, time, x)
# 3. Extract per-location profiles via coordinate lookup
meeden_wind = wind_cf.sel(x=6.9, y=53.1, method="nearest").values # shape (8760,)This works but relies on internal functions (convert_wind, convert_pv) that are not part of the documented public API.
Proposal
I see two possible improvements:
Option A: Document the capacity_factor_timeseries=True workflow for multi-cell cutouts
Looking at convert_and_aggregate, when capacity_factor_timeseries=True is used without layout/shapes/matrix, it already returns the per-cell DataArray directly (the no_args path at line 138-142). This could be documented as the recommended approach for per-cell profile extraction:
cutout = atlite.Cutout(path="sub_region.nc")
# Returns per-cell (time, y, x) DataArray when no layout/shapes/matrix given
wind_cf = cutout.wind(turbine="Vestas_V112_3MW", capacity_factor_timeseries=True)
solar_cf = cutout.pv(panel="CSi", orientation="latitude_optimal", capacity_factor_timeseries=True)
# Extract individual locations
meeden = wind_cf.sel(x=6.9, y=53.1, method="nearest")Option B: Add a convenience method for multi-point extraction
A new Cutout.profiles_at_points() method that takes a DataFrame of (lat, lon) coordinates and returns per-location time series:
locations = pd.DataFrame({
"name": ["Meeden", "Borssele"],
"x": [6.9, 3.0],
"y": [53.1, 51.7],
})
profiles = cutout.profiles_at_points(locations, turbine="Vestas_V112_3MW")
# → DataFrame with columns ["Meeden", "Borssele"] and DatetimeIndexUse case
This is common in power system modeling where each bus/substation needs location-specific renewable capacity factors. In PyPSA-based models, generators at different buses should have distinct wind/solar profiles reflecting local weather conditions, not a single country-level average.
Performance comparison
For our model (345 bus locations across 8 European countries, SARAH3+ERA5 cutouts):
| Approach | Time per year | Method |
|---|---|---|
| Per-pixel Cutout (current) | ~20-30 min | 152 × Cutout + cutout.wind() |
Vectorized convert_wind/convert_pv |
~28 seconds | 8 country sub-regions, parallel |
The vectorized approach is ~50x faster because it eliminates per-pixel file I/O and leverages numpy's vectorized operations across the entire grid.