# Validation and Benchmarking of European Renewable Generation Dataset

## 1. Introduction
### Motivation and Context
High-resolution hourly renewable generation time series are required for energy system modeling, including adequacy assessment, storage and flexibility sizing, transmission expansion, and evaluation of weather-driven risk.

This notebook implements the validation and benchmarking of the dataset defined in the accompanying article. It covers:
1. **Sensitivity Analyses**: Quantifying the robustness of the dataset to key modeling assumptions and heuristics.
2. **Validation Results**: Comparing the calibrated historical generation dataset (2015–2025) against reported generation (ENTSO-E Transparency Platform).
3. **Weather-driven Variability**: Analyzing the fixed-layout dataset (2005–2025) to isolate weather effects from capacity expansion.

### Contributions
- Validate country-level aggregated wind and solar PV generation.
- Validate bidding-zone level generation for Norway, Sweden, and Denmark.
- Assess sensitivity to turbine model choice, spatial smoothing, and PV configuration assumptions.

In [None]:
import sys
import os
from pathlib import Path
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
import importlib
from scipy.stats import pearsonr
from sklearn.metrics import mean_squared_error, mean_absolute_error

# Add parent directory to path to import functions
sys.path.append(os.path.abspath('..'))
import functions

sns.set_style("darkgrid")
%matplotlib inline

# Configure plot fonts
plt.rcParams.update({'font.size': 12})

ModuleNotFoundError: No module named 'sklearn'

## 2. Input Data
### 2.1 Load Generated Datasets (2015-2025)
We load the aggregated wind and solar power generation datasets. These files are produced by the generation scripts (`solar_production_aggregated_generation.py`, etc.).

In [None]:
#Paths to generated NetCDF files
nc_path = Path("/Data/gfi/vindenergi/nab015/combined_production/2023/07/07_2023_solar_wind_combined.nc")  # Example path

# Check if files exist before loading
if nc_path.exists():
    ds = xr.open_dataset(nc_path)
    print("Dataset loaded.")
else:
    print(f"Dile file not found at {nc_path}")


### 2.2 Load Reported Generation (Validation Data)
We load the actual generation data from ENTSO-E or national TSOs for validation.

In [None]:
actual_gen_path = Path("/Data/gfi/vindenergi/nab015/Actual_Generation/AggregatedGenerationPerType/2023/2023_08_AggregatedGenerationPerType_16.1.B_C_r3.csv")

if actual_gen_path.exists():
    df_actual = pd.read_csv(actual_gen_path, sep='\t')
    # Preprocessing to align timestamps and TimeZones
    # (Implementation details depend on the specific format columns)
    print("Actual generation data loaded.")
else:
    print(f"Actual generation file not found at {actual_gen_path}")

## 3. Sensitivity Analyses
This section quantifies the robustness of the dataset to key modeling and heuristic assumptions.

### 3.1 Sensitivity of Wind Generation to Turbine Model Choice
We evaluate how wind generation changes under alternative turbine model mappings:
1. **Baseline**: Year/type-based mapping (`map_turbine_model`).
2. **Low Cut-in**: Using models with lower cut-in speeds.
3. **High Cut-in**: Using models with higher cut-in speeds.
4. **Fixed Reference**: Using a single reference turbine per class.

In [None]:
def test_wind_sensitivity(lat, lon, start_year, installation_type, xrds, scenario='baseline'):
    """
    Wrapper to estimate wind power with different turbine model assumptions.
    This mocks the behavior of changing the map_turbine_model function.
    """
    # Mock turbine mapping based on scenario
    if scenario == 'baseline':
        model = functions.map_turbine_model(start_year, installation_type)
    elif scenario == 'low_cut_in':
        model = "VestasV47_660kW_47" # Example of older/lower-wind model
    elif scenario == 'high_cut_in':
        model = "IEA_Reference_15MW_240" # Example of large offshore (typically higher rated wind speed)
    
    # Here we would call a modified version of estimate_wind_power that accepts 'model_override'
    # For this outline, we assume estimate_wind_power is used as is, or we'd duplicate logic here.
    pass

# Placeholder loop for running sensitivity
scenarios = ['baseline', 'low_cut_in', 'high_cut_in']
results_wind = {}
# for scen in scenarios:
#    results_wind[scen] = ...

### 3.2 Sensitivity to Farm-Aggregation Smoothing
We test the impact of the Gaussian smoothing width $\sigma(w)$ on ramp rates and spikiness.

In [None]:
# Code to vary the 'wts_smoothing' or the internal Gaussian spread in generate_farm_power_curve
# Compare ramp rate distributions (diff(t) - diff(t-1))

### 3.3 Sensitivity of PV Generation to Configuration Assumptions
- Fixed-tilt rule ($|\phi|-5$) vs alternatives.
- Tracking heuristic thresholds.
- Twilight zenith limit.

In [None]:
# Example: Compare Fixed vs Tracking for a specific farm
lat, lon = 40.4168, -3.7038 # Spain example
# call functions.estimate_power_final with mounting_type='fixed'
# call functions.estimate_power_final with mounting_type='single_axis'
# Plot the diurnal profiles in Summer vs Winter

### 3.4 Sensitivity to Calibration Settings
Testing robustness of correction factors (stable-band width, minimum power threshold).

## 4. Validation Results (2015-2025)
### 4.1 Evaluation Metrics
We calculate MAE, RMSE, Bias, and $R^2$ for each country and bidding zone.

In [None]:
def calculate_metrics(simulated_ts, actual_ts):
    common_idx = simulated_ts.index.intersection(actual_ts.index)
    if len(common_idx) == 0:
        return None
    
    sim = simulated_ts.loc[common_idx]
    act = actual_ts.loc[common_idx]
    
    bias = (sim - act).mean()
    mae = mean_absolute_error(act, sim)
    rmse = np.sqrt(mean_squared_error(act, sim))
    corr, _ = pearsonr(act, sim)
    
    return {'Bias': bias, 'MAE': mae, 'RMSE': rmse, 'R2': corr**2}

# Apply to all countries
metrics_list = []
# for country in country_list:
#     metrics = calculate_metrics(ds_wind.sel(country=country), df_actual[country])
#     metrics_list.append(metrics)

### 4.2 Wind Generation Validation
Plots of aggregated performance, scatter plots, and seasonal performance.

In [None]:
# Figure: Time series comparison for a specific week
# plt.plot(simulated, label='Estimated')
# plt.plot(actual, label='Reported')

# Figure: Scatter plot of Hourly generation
# plt.scatter(actual, simulated, alpha=0.1)

### 4.3 Solar Generation Validation
Focus on high-latitude performance and tracking/fixed heuristics.

### 4.4 Hydropower Consistency Checks
Present completeness and descriptive statistics of the harmonized hydropower data.

In [None]:
# Load hydro data
# Check for missing values
# Plot seasonal hydro profiles (Reservoir vs Run-of-river)

### 4.5 Cross-technology and Spatial Correlation
Correlation matrices between wind and solar, and between neighboring countries.

In [None]:
# Compute correlation matrix of generation time series across countries
# sns.heatmap(corr_matrix)

## 5. Weather-driven Variability Results (2005-2025)
Using the fixed 2025 layout to isolate weather effects.

In [None]:
# Load long-term 2005-2025 dataset (fixed layout)
# Calculate annual capacity factors
# Plot interannual variability
# plt.bar(years, annual_energy)

## 6. Conclusion