# Get Wind and Solar Data from IRENA

**Objective**
Summarize IRENA Model Supply Regions (MSR) outputs—capacity factors, LCOE, and hourly profiles—for the countries plugged into the SPLAT supply model.

**Data Requirements & Methods**
- Place `SolarPV_BestMSRsToCover5%CountryArea.csv` and `Wind_BestMSRsToCover5%CountryArea.csv` into `pre-analysis/open-data/input/`.
- Provide the country list in SPLAT naming conventions and ensure dependencies (`pandas`, `matplotlib`, `geopy`, `timezonefinder`, etc.) are installed.
- The notebook validates inputs, computes weighted statistics by technology, aligns time zones, and prepares visualization-ready tables.

**Overview of Steps**
1. Step 1 - Import libraries and declare study countries.
2. Step 2 - Confirm the required MSR CSV inputs exist.
3. Step 3 - Process MSR data to compute weighted LCOE/CF stats and hourly profiles.
4. Step 4 - Visualize country-level diagnostics (heatmaps and monthly profiles).



In [6]:
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import pytz

from utils_renewables import extract_time_zone

## Step 1 - Configure user settings
List the SPLAT country names you want to analyze and confirm the count before loading data.



In [7]:
countries = [
    'Bosnia and Herzegovina',
    'Croatia',
    'Serbia',
    'Montenegro',
    'North Macedonia',
    'Albania'
]
print(f'Number of countries: {len(countries)}')

Number of countries: 6


## Step 2 - Load MSR inputs
Verify that the Solar PV and Wind MSR CSV files exist in the `input/` directory before continuing.



In [8]:
file_solarMSR = os.path.join('input', 'SolarPV_BestMSRsToCover5%CountryArea.csv')
if not os.path.exists(file_solarMSR):
    raise FileNotFoundError(f"The file {file_solarMSR} does not exist. Please download the Solar PV MSR data and place it in the input folder.")
else:
    print(f"File {file_solarMSR} found. Proceeding with the analysis.")

File input/SolarPV_BestMSRsToCover5%CountryArea.csv found. Proceeding with the analysis.


In [9]:
file_windMSR = os.path.join('input', 'Wind_BestMSRsToCover5%CountryArea.csv')
if not os.path.exists(file_windMSR):
    raise FileNotFoundError(f"The file {file_windMSR} does not exist. Please download the Solar PV MSR data and place it in the input folder.")
else:
    print(f"File {file_windMSR} found. Proceeding with the analysis.")

File input/Wind_BestMSRsToCover5%CountryArea.csv found. Proceeding with the analysis.


## Step 3 - Process MSR data
Derive time zones, compute weighted capacity-factor and LCOE statistics, and reshape hourly profiles for each technology.



In [10]:

# Preprocess step to get country time zone. This step is necessary as IRENA data is expressed in local time. To account for a common time reference, those times need to be transformed into UTC time. To do so, each country time zone must be obtained

# Warning: this step may take a few dozen seconds
# Manual correction to standard names (to help geocoder or pycountry). Transform names used in the IRENA / SPLAT database into official country names.
# You only need to specify countries for which the SPLAT name does not correspond to official country name
name_map = {
    'SouthAfrica': 'South Africa',
    'DemocraticRepublicoftheCongo': 'Democratic Republic of the Congo',
    'UnitedRepublicofTanzania': 'Tanzania',
    'CentralAfricanRepublic': 'Central African Republic',
    'SaoTomeandPrincipe': 'Sao Tome and Principe'
    # The others are fine
}

country_timezones = extract_time_zone(countries, name_map)

In [11]:
# We define capex parameters used in their model
# They use a discount rate of 10%, and lifetime of 25 years for generation, 40 years for transmission
CAPEX_PARAMETERS = {
    'solar': {
        'supply_asset_capital_recovery': 0.1101681,
        'operating_costs': 4,
        'fixed_costs': 53500
    },
    'wind': {
        'supply_asset_capital_recovery': 0.1101681,
        'operating_costs': 0,
        'fixed_costs': 64200
    },
    'grid': {
        'supply_asset_capital_recovery': 0.102259,
    },
    'road': {
        'supply_asset_capital_recovery': 0.11017,
    }
}


def compute_weighted_stats(group, tech):
    """Weighted statistics across all relevant clusters for a given country. We use the theoretical available capacity in a given cluster as the weight."""
    avg_cf = (group[column_cf] * group['CapacityMW']).sum() / group['CapacityMW'].sum()  # MWh / MW
    avg_lcoe = (group['LCOE-MWh'] * group['CapacityMW']).sum() / group['CapacityMW'].sum()  # $ / MWh
    # For the LCOE, we use the breakdown of the LCOE into different components (generation, road and transmission), and the capital recovery rates used in the MSR model
    cost_per_MW = ((((group['sLCOE-MWh'] - CAPEX_PARAMETERS[tech]['operating_costs'])  * (8760 * group[column_cf] / 100) - CAPEX_PARAMETERS[tech]['fixed_costs']) / CAPEX_PARAMETERS[tech]['supply_asset_capital_recovery'] + (group['tLCOE-MWh']  * (8760 * group[column_cf] / 100)) / CAPEX_PARAMETERS['grid']['supply_asset_capital_recovery'] + (group['rLCOE-MWh']  * (8760 * group[column_cf] / 100)) / CAPEX_PARAMETERS['road']['supply_asset_capital_recovery'])  * group['CapacityMW']).sum() / group['CapacityMW'].sum()  * 1e-6  # costs in m$ / MW
    
    return pd.Series({
        'avg_CF': avg_cf,
        'avg_LCOE': avg_lcoe,
        'cost_per_MW': cost_per_MW
    })


# Weighted hourly profile for each country
def weighted_hourly_profile(group):
    weights = group['CapacityMW'].values.reshape(-1, 1)
    hourly_data = group[hourly_cols].values
    weighted_avg = (hourly_data * weights).sum(axis=0) / weights.sum()
    return pd.Series(weighted_avg, index=hourly_cols)

def convert_to_utc(row, country_timezones):
    """
    Converts a local timestamp to UTC based on the country's time zone.

    This function is designed to be used within a pandas `.apply()` call
    to convert a 'timestamp' column (assumed local time) into UTC, using
    a dictionary that maps each country name to its IANA time zone string.

    Parameters
    ----------
    row : pd.Series
        A row from the DataFrame containing at least 'CtryName' and 'timestamp'.
    country_timezones : dict
        Dictionary mapping country names (as in 'CtryName') to their IANA time zone names,
        e.g., {'SouthAfrica': 'Africa/Johannesburg'}.

    Returns
    -------
    datetime
        The timestamp converted to UTC timezone.
    """
    ctry = row['CtryName']
    local_zone = pytz.timezone(country_timezones[ctry])
    local_time = local_zone.localize(row['timestamp'], is_dst=None)
    return local_time.astimezone(pytz.utc)


In [12]:
cf_lcoe_stats = {}
hourly_profiles = {}

for tech in ['wind', 'solar']:
    if tech == 'wind':
        file = file_windMSR
        column_cf = 'CF100m'
    else:
        file = file_solarMSR
        column_cf = 'CF'

    # Select relevant columns
    meta_cols = ['CtryName', 'CapacityMW', column_cf, 'sLCOE-MWh', 'tLCOE-MWh', 'rLCOE-MWh', 'LCOE-MWh']
    hourly_cols = [f'H{i}' for i in range(1, 8761)]  # if needed

    # Combine what you need
    use_columns = meta_cols + hourly_cols  # or just meta_cols to start

    data_MSR_stats = pd.read_csv(file, usecols=use_columns, header=0)
    data_MSR_stats = data_MSR_stats[data_MSR_stats['CtryName'].isin([c for c in countries])]

    cf_lcoe_stats[tech] = data_MSR_stats.groupby('CtryName').apply(compute_weighted_stats, tech=tech).reset_index()

    data_MSR_hourlyprofile = data_MSR_stats.set_index(['CtryName', 'CapacityMW'])[hourly_cols].reset_index()
    hourly_profiles[tech] = data_MSR_hourlyprofile.groupby('CtryName').apply(weighted_hourly_profile).reset_index()

    # Saving data in good format for representative days analysis
    date_index = pd.date_range(start='2023-01-01', periods=8760, freq='h')  # 2023 is a non-leap year

    df_long = hourly_profiles[tech].melt(id_vars='CtryName', var_name='Hour', value_name='value')

    # Convert 'H1', ..., 'H8760' to integer hour index
    df_long['hour_index'] = df_long['Hour'].str.extract('H(\d+)').astype(int) - 1  # zero-based index

    df_long['timestamp'] = df_long['hour_index'].map(lambda i: date_index[i])

    # Convert local time to UTC
    df_long['CtryName'] = df_long['CtryName'].astype(str)  # just in case
    df_long['timestamp_utc'] = df_long.apply(lambda row: convert_to_utc(row, country_timezones), axis=1)

    # Create season, day and hours
    df_long['season'] = df_long['timestamp_utc'].dt.month
    df_long['day'] = df_long['timestamp_utc'].dt.day
    df_long['hour'] = df_long['timestamp_utc'].dt.hour

    # Step 5: Rename and index
    df_long = df_long.rename(columns={'CtryName': 'zone'})
    df_final = df_long.set_index(['zone', 'season', 'day', 'hour'])['value']
    df_final = df_final.to_frame().rename(columns={'value': 2018}).sort_values(by=['season', 'day', 'hour'])

    df_final.to_csv(os.path.join('output', f'data_SAPP_{tech}.csv'), index=True)

  cf_lcoe_stats[tech] = data_MSR_stats.groupby('CtryName').apply(compute_weighted_stats, tech=tech).reset_index()


ValueError: cannot insert CtryName, already exists

## Step 4 - Visualize the results
Create heatmaps and monthly profile summaries so stakeholders can quickly compare wind and solar performance across countries.



#### Make heatmap

In [None]:


for tech in ['wind', 'solar']:
    make_heatmap(cf_lcoe_stats, tech, filename=os.path.join('output', f'heatmap_{tech}_annual_cf.png'))

In [None]:

        

for tech in ['wind', 'solar']:
    make_monthly_heatmap(hourly_profiles, tech, filename=os.path.join('output', f'heatmap_{tech}_monthly_cf.png'))