# Hydro Availability Builder

**Objective**  
- Transform monthly reservoir and run-of-river (ROR) hydro profiles into the seasonal and hourly availability CSVs (`pAvailabilityCustom.csv` and `pVREgenProfile.csv`) required by EPM.
- Ensure consistency with pGenDataInput and complete if necessary.

**Data requirements (user-provided) and method**  
- Data requirements: Monthly capacity-factor CSVs per plant/zone (default `input/hydro_profile_dry.csv`), plant metadata fields (`gen`, `zone`, `tech`), and the official `pHours.csv` template from `epm/input/` to align the season-daytype-hour structure.  
- Method: Validate inputs, align them with the `pHours` calendar, aggregate reservoir series to seasonal capacity factors, reshape ROR series into the long hourly format, and export review-ready CSVs.

**Overview of steps**  
1. Step 1 - Capture the user inputs describing folders, scenario tags, and technology filters.  
2. Step 2 - Create the working/output folders and load the template layout.  
3. Step 3 - Load the hydro profiles plus the `pHours` calendar.  
4. Step 4 - Cross-check hydro coverage against `pGenDataInput` to flag missing capacity-factor rows.  
5. Step 5 - Process reservoir and ROR data into the `pAvailabilityCustom` and `pVREgenProfile` tables, then save them for QA.

## 1. User Inputs

Fill these once and rerun if you need to change the source data or template. Files are resolved relative to `pre-analysis/prepare-data/`.

In [8]:
# Path to the monthly hydro profile CSV stored under `input/`
input_profile_filename = 'hydro_profile_baseline.csv'

# Folder under `epm/input/` containing the scenario-specific inputs
folder_epm_input = 'data_test'

# Filenames relative to `epm/input/<folder_epm_input>/`
hours_template_filename = 'pHours.csv'
pGenDataInput_filename = 'pGenDataInput.csv'


## 2. Setup: imports, folders, and helpers

Run once to load libraries and create the working folders referenced later in the notebook.

In [9]:
import os
from pathlib import Path

import pandas as pd

folder_input = 'input'
folder_output = 'output'
os.makedirs(folder_output, exist_ok=True)
print(f'Input folder: {folder_input}')
print(f'Output folder: {folder_output}')

Input folder: input
Output folder: output


## 3. Load hydro profiles and the pHours template

Validates the user inputs, loads the monthly hydro profiles, and reads the pHours layout that drives the seasonal/daytype structure.

In [10]:
input_path = Path(folder_input) / input_profile_filename
if not input_path.exists():
    raise FileNotFoundError(f'Cannot find {input_path}. Double-check `input_profile_filename`.')

data = pd.read_csv(input_path, index_col=None, header=0)
for col in ('gen', 'zone', 'tech'):
    if col in data.columns:
        data[col] = data[col].astype(str).str.strip()
print(f'Loaded hydro profile: {input_path.name} with {len(data)} rows')

epm_input_root = Path('../../epm/input')
hours_template_file = Path(os.path.join(str(epm_input_root), folder_epm_input, hours_template_filename))
if not hours_template_file.exists():
    raise FileNotFoundError(f'Cannot find {hours_template_file}. Update `hours_template_filename` or `folder_epm_input`.')

template = pd.read_csv(hours_template_file)
print(f'Loaded pHours template from {hours_template_file}')

pgen_candidates = [
    Path(os.path.join(str(epm_input_root), folder_epm_input, pGenDataInput_filename)),
    Path(os.path.join(str(epm_input_root), folder_epm_input, 'supply', pGenDataInput_filename)),
]
pGenDataInput_file = next((candidate for candidate in pgen_candidates if candidate.exists()), None)
if pGenDataInput_file is None:
    raise FileNotFoundError('Cannot find pGenDataInput in the scenario folder (checked root and `supply/`).')

pgen_data = pd.read_csv(pGenDataInput_file, encoding='utf-8-sig')
for col in ('gen', 'zone', 'tech'):
    if col in pgen_data.columns:
        pgen_data[col] = pgen_data[col].astype(str).str.strip()
print(f'Loaded pGenDataInput from {pGenDataInput_file}')

MONTH_TO_SEASON = {
    1: 'Q1', 2: 'Q1', 3: 'Q1',
    4: 'Q2', 5: 'Q2', 6: 'Q2',
    7: 'Q3', 8: 'Q3', 9: 'Q3',
    10: 'Q4', 11: 'Q4', 12: 'Q4',
}


Loaded hydro profile: hydro_profile_baseline.csv with 218 rows
Loaded pHours template from ../../epm/input/data_test/pHours.csv
Loaded pGenDataInput from ../../epm/input/data_test/supply/pGenDataInput.csv


## 4. Validate generators against `pGenDataInput`

Compare the hydro units included in the monthly capacity-factor file with those defined in `pGenDataInput` so any gaps can be filled before exporting the availability tables.

In [11]:
value_columns = [col for col in data.columns if col not in ('gen', 'zone', 'tech')]
if not value_columns:
    raise ValueError('The hydro profile file does not contain any monthly capacity-factor columns.')

numeric_matrix = data[['zone', 'gen']].copy()
for col in value_columns:
    numeric_matrix[col] = pd.to_numeric(data[col], errors='coerce')

zone_cf_template = numeric_matrix.groupby('zone')[value_columns].mean()
zone_avg_cf = zone_cf_template.mean(axis=1).to_dict()

profile_gens = data['gen'].astype(str).str.strip()
profile_gen_set = set(profile_gens)
profile_zone_lookup = data.assign(gen=profile_gens, zone=data['zone'].astype(str).str.strip()).set_index('gen')['zone'].to_dict()

hydro_mask = pgen_data['tech'].astype(str).str.contains('hydro', case=False, na=False)
pgen_hydro = pgen_data[hydro_mask].copy()
pgen_hydro['gen'] = pgen_hydro['gen'].astype(str).str.strip()
pgen_hydro['zone'] = pgen_hydro['zone'].astype(str).str.strip()
pgen_gen_set = set(pgen_hydro['gen'])
zone_lookup = pgen_hydro.set_index('gen')['zone'].to_dict()

missing_profiles = sorted(pgen_gen_set - profile_gen_set)
missing_in_pgen = sorted(profile_gen_set - pgen_gen_set)

zone_peer_map = (
    data.assign(gen=profile_gens, zone=data['zone'].astype(str).str.strip())
        .groupby('zone')['gen']
        .apply(lambda s: sorted(s.tolist()))
        .to_dict()
)

if missing_profiles:
    print(f"{len(missing_profiles)} hydro generators are defined in pGenDataInput but missing capacity-factor profiles:")
    impacted_zones = set()
    for gen in missing_profiles:
        zone = zone_lookup.get(gen, 'Unknown')
        impacted_zones.add(zone)
        peers = [peer for peer in zone_peer_map.get(zone, []) if peer != gen]
        avg_cf = zone_avg_cf.get(zone)
        if peers and pd.notna(avg_cf):
            print(f"  - {gen} (zone {zone}): add values by averaging existing hydro in zone ({', '.join(peers)}) → suggested mean CF {avg_cf:.3f}.")
        else:
            print(f"  - {gen} (zone {zone}): no peer hydro available; please enter CFs manually.")
    available_zones = sorted(zone for zone in impacted_zones if zone in zone_cf_template.index)
    if available_zones:
        print('\nZone-average monthly profiles for the impacted zones:')
        display(zone_cf_template.loc[available_zones].round(3))
else:
    print('All hydro generators in pGenDataInput have matching capacity-factor profiles.')

if missing_in_pgen:
    print(f"\n{len(missing_in_pgen)} generators have capacity-factor rows here but are missing from pGenDataInput:")
    for gen in missing_in_pgen:
        zone = profile_zone_lookup.get(gen, 'Unknown')
        print(f"  - {gen} (zone {zone}): either remove it from the hydro profile or add the unit to pGenDataInput so EPM can use it.")
else:
    print('\nNo extra hydro capacity-factor rows beyond what pGenDataInput expects.')


6 hydro generators are defined in pGenDataInput but missing capacity-factor profiles:
  - Cambambe (zone Angola): add values by averaging existing hydro in zone (AH Mabubas, Baynes, Cacula Cabasa, Cafula (Keve), Calengue (Catumbela), Cambambe 1, Cambambe 2, Capanda, Chicapa/Biopio, Gove, Jamba Ya Mina, Jamba Ya Oma, Lauca, Lomaum, Lomaum 2, Luachimo, Matala, Quilengue, Tumuludo Casador, Vuka, Zenzo 1) → suggested mean CF 0.373.
  - Chute de Tsamba (zone Gabon): add values by averaging existing hydro in zone (Akieni, Angouma, Bongolo, Booue, Boundji, Dibwangui, FE-2 (Kinguele Aval), Faga, Grand Poubara, Ibola, Igotchi, Imperatrice (Mouila), Iroungou, Kinguele, Kinguele Rebuild, Kongue, Kouata Mango, Lebombi, Liboka, Lifouta, Lolo, Mafoulamatato, Mingouli, Mitoungou, Mpassa, Nenguembani, Ngoulmendjim, Nguene, Omvan Amont, Omvan Aval, Ouyama, Poubara 1 Rebuild, Poubara 2, Poubara 2 Rebuild, Sindara, Souka, Tchimbele, Tsengue Leledi) → suggested mean CF 0.559.
  - Dimoli (CAR) (zone CAR): 

Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10,11,12
zone,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Angola,0.576,0.641,0.864,0.771,0.48,0.174,0.094,0.072,0.072,0.101,0.26,0.369
CAR,0.598,0.531,0.459,0.438,0.409,0.522,0.691,0.878,0.971,0.978,0.78,0.662
Cameroon,0.559,0.476,0.457,0.478,0.604,0.669,0.743,0.739,0.768,0.846,0.824,0.675
Gabon,0.588,0.511,0.527,0.615,0.625,0.529,0.373,0.276,0.475,0.648,0.799,0.746



114 generators have capacity-factor rows here but are missing from pGenDataInput:
  - AH Mabubas (zone Angola): either remove it from the hydro profile or add the unit to pGenDataInput so EPM can use it.
  - Akieni (zone Gabon): either remove it from the hydro profile or add the unit to pGenDataInput so EPM can use it.
  - Angouma (zone Gabon): either remove it from the hydro profile or add the unit to pGenDataInput so EPM can use it.
  - Baidou (zone CAR): either remove it from the hydro profile or add the unit to pGenDataInput so EPM can use it.
  - Bayomen (zone Cameroon): either remove it from the hydro profile or add the unit to pGenDataInput so EPM can use it.
  - Bihongore (zone Rwanda): either remove it from the hydro profile or add the unit to pGenDataInput so EPM can use it.
  - Bikomo (zone EquatorialGuinea): either remove it from the hydro profile or add the unit to pGenDataInput so EPM can use it.
  - Bini A Warak (zone Cameroon): either remove it from the hydro profile o

## 5 - Process hydro availability

### 5a - Build seasonal reservoir availability

In [55]:
# Keep only Reservoir Hydro units and the monthly columns.
data_reservoir = data[data['tech'] == 'ReservoirHydro'].copy()
data_reservoir.set_index(['gen'], inplace=True)
data_reservoir.drop(columns=['zone', 'tech'], inplace=True)
data_reservoir.columns = data_reservoir.columns.astype(int)
display(data_reservoir.head())

# Convert months to seasons using MONTH_TO_SEASON and compute the mean for each season.
data_reservoir = data_reservoir.T.groupby(MONTH_TO_SEASON).mean().T

# Rename the season columns with the expected Q prefix and persist the output.
data_reservoir.columns = [f'Q{col}' for col in data_reservoir.columns]
data_reservoir.columns.names = ['season']
display(data_reservoir.head())

output_path_reservoir = os.path.join(folder_output, 'pAvailabilityCustom.csv')
data_reservoir.to_csv(output_path_reservoir)
print(f"Reservoir data processed and saved to {output_path_reservoir}")


Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10,11,12
gen,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
AH Mabubas,0.57,0.48,0.41,0.46,0.55,0.61,0.78,0.92,1.0,1.0,0.95,0.53
Baynes,0.448571,0.401429,0.382857,0.408571,0.468571,0.542857,0.632857,0.672857,0.712857,0.738571,0.641429,0.478571
Bikongo,0.484167,0.5025,0.5,0.504167,0.458333,0.31,0.2475,0.225,0.279167,0.334167,0.52,0.559167
Boali 1 Rebuild,0.54,0.55,0.51,0.49,0.4,0.24,0.17,0.14,0.13,0.18,0.49,0.62
Boali 2,0.54,0.55,0.51,0.49,0.4,0.24,0.17,0.14,0.13,0.18,0.49,0.62


season,Q1,Q2
gen,Unnamed: 1_level_1,Unnamed: 2_level_1
AH Mabubas,0.772,0.628571
Baynes,0.606,0.5
Bikongo,0.304,0.48631
Boali 1 Rebuild,0.216,0.482857
Boali 2,0.216,0.482857


Reservoir data processed and saved to output/pAvailabilityCustom.csv


### 5b - Format run-of-river hourly availability

In [56]:
def build_ror_generation_profile(result, template):
    """Build the long-run hourly ROR profile expected by pVREgenProfile.csv.

    Parameters
    ----------
    result : pandas.DataFrame
        Seasonal data for ROR plants with `gen` as index and seasons as columns.
    template : pandas.DataFrame
        Template that provides the `season`, `daytype`, and hourly column structure.

    Returns
    -------
    pandas.DataFrame
        MultiIndex DataFrame compatible with the EPM pVREgenProfile format.
    """

    # Reshape seasonal data to long format and merge with season/daytype combinations.
    result_reset = result.reset_index()
    result_long = result_reset.melt(id_vars='gen', var_name='season', value_name='value')
    daytypes = template.reset_index()[['season', 'daytype']].drop_duplicates()
    merged = result_long.merge(daytypes, on='season', how='left')

    # Broadcast the seasonal value across all hourly columns required by the template.
    hour_cols = template.columns.difference(['season', 'daytype'])
    for col in hour_cols:
        merged[col] = merged['value']

    merged_final = merged.drop(columns=['value'])
    merged_final = merged_final.set_index(['gen', 'season', 'daytype'])
    merged_final.index.names = ['gen', 'q', 'd']
    return merged_final


In [57]:
# Filter to Run-of-River units and retain monthly columns only.
data_ror = data[data['tech'] == 'ROR'].copy()
data_ror.set_index(['gen'], inplace=True)
data_ror.drop(columns=['zone', 'tech'], inplace=True)
data_ror.columns = data_ror.columns.astype(int)
display(data_ror.head())

# Convert months to seasons and label them with the Q prefix.
data_ror = data_ror.T.groupby(MONTH_TO_SEASON).mean().T
display(data_ror.head())
data_ror.columns = [f'Q{col}' for col in data_ror.columns]
data_ror.columns.names = ['season']

# Align with the template structure so downstream scripts can ingest the file directly.
data_ror = build_ror_generation_profile(data_ror, template)
data_ror = data_ror[template.columns]
display(data_ror.head())

output_path_ror = os.path.join(folder_output, 'pVREgenProfile.csv')
data_ror.to_csv(output_path_ror)
print(f"ROR data processed and saved to {output_path_ror}")


Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10,11,12
gen,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Akieni,0.57,0.48,0.41,0.46,0.55,0.61,0.78,0.92,1.0,1.0,0.95,0.53
Angouma,0.448571,0.401429,0.382857,0.408571,0.468571,0.542857,0.632857,0.672857,0.712857,0.738571,0.641429,0.478571
Baidou,0.464444,0.357778,0.387778,0.536667,0.558889,0.455556,0.237778,0.175556,0.314444,0.564444,0.716667,0.678889
Bayomen,0.364,0.401,0.439,0.495,0.443,0.261,0.226,0.222,0.244,0.271,0.352,0.342
Bihongore,0.464444,0.357778,0.387778,0.536667,0.558889,0.455556,0.237778,0.175556,0.314444,0.564444,0.716667,0.678889


Unnamed: 0_level_0,1,2
gen,Unnamed: 1_level_1,Unnamed: 2_level_1
Akieni,0.772,0.628571
Angouma,0.606,0.5
Baidou,0.348444,0.529524
Bayomen,0.2792,0.380571
Bihongore,0.348444,0.529524


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,...,t15,t16,t17,t18,t19,t20,t21,t22,t23,t24
gen,q,d,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
Akieni,Q1,d1,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,...,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772
Akieni,Q1,d2,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,...,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772
Akieni,Q1,d3,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,...,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772
Akieni,Q1,d4,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,...,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772
Akieni,Q1,d5,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,...,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772,0.772
