# Selecting a subset of time periods

Running a capacity expansion model for multiple regions with 8760 hours of load/generation data might be too computationally complex. PowerGenome includes a method (with more to come in the future) for selecting a representitive subset of periods. Doing so requires first generating all load and generation profiles.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import warnings

warnings.simplefilter("ignore")

In [3]:
from pathlib import Path
import itertools

import pandas as pd
from powergenome.load_profiles import make_final_load_curves
from powergenome.generators import GeneratorClusters
from powergenome.util import (
    build_scenario_settings,
    init_pudl_connection,
    load_settings,
    reverse_dict_of_lists,
    check_settings
)

from powergenome.GenX import reduce_time_domain
from powergenome.external_data import make_generator_variability

## Import settings
This assumes that the settings file is set up for multiple scenarios/planning periods. If you are using a settings file with only a single scenario/planning period, remove or comment out the line with `build_scenario_settings`.

In [4]:
cwd = Path.cwd()

settings_path = (
    cwd.parent / "example_systems" / "CA_AZ" / "test_settings.yml"
)
settings = load_settings(settings_path)
settings["input_folder"] = settings_path.parent / settings["input_folder"]
scenario_definitions = pd.read_csv(
    settings["input_folder"] / settings["scenario_definitions_fn"]
)
scenario_settings = build_scenario_settings(settings, scenario_definitions)

pudl_engine, pudl_out, pg_engine = init_pudl_connection(
    freq="AS",
    start_year=min(settings.get("data_years")),
    end_year=max(settings.get("data_years")),
)

check_settings(settings, pg_engine)

## Load curves

In [5]:
load_curves = make_final_load_curves(pg_engine, scenario_settings[2030]["p1"])
load_curves

region,CA_N,CA_S,WECC_AZ
time_index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,14211,18064,10358
2,13539,17227,10054
3,12836,16394,9670
4,13052,14691,9545
5,12452,14045,9411
...,...,...,...
8756,13983,17833,11038
8757,15896,20164,11764
8758,16058,20372,11449
8759,15326,19448,11061


## Generation profiles

In [6]:
gc = GeneratorClusters(pudl_engine, pudl_out, pg_engine, scenario_settings[2030]["p1"])
all_gens = gc.create_all_generators()

794.1999999999999  MW without lat/lon


Technology Conventional Hydroelectric changed capacity from 12699.700000000008 to 12704.200000000008


Creating gdf
['Solar Photovoltaic', 'Onshore Wind Turbine', 'Batteries', 'Biomass', 'Conventional Hydroelectric', 'Natural Gas Fired Combined Cycle', 'Other_peaker', 'All Other', 'Natural Gas Fired Combustion Turbine', 'Other Natural Gas']


No model tag values found for MinCapTag_2 ('MinCapTag_2')
No model tag values found for Reg_Max ('Reg_Max')
No model tag values found for Rsv_Max ('Rsv_Max')
Selected technology landbasedwind capacity in region CA_N less than minimum (8424.4314 < 25000 MW)
Selected technology landbasedwind capacity in region CA_S less than minimum (23639.682500000003 < 45000 MW)
No model tag values found for MinCapTag_2 ('MinCapTag_2')
No model tag values found for Reg_Max ('Reg_Max')
No model tag values found for Rsv_Max ('Rsv_Max')
Transmission investment costs are missing or zero for some resources and will not be included in the total investment costs.


In [7]:
gen_variability = make_generator_variability(all_gens)

## Reduce time domain
This function selects `N` periods of `x` days from the 8760 hours. It uses the settings parameters
- `reduce_time_domain` (a boolean value)
- `time_domain_periods` (`N`)
- `time_domain_days_per_period` (`x`)
- `include_peak_day` (if the day of peak demand should always be included in the output)
- `demand_weight_factor` (weighting factor for demand relative to generation profiles)

It outputs time reduced generation profiles and load profiles, along with a dataframe that tracks the sequential order of cluster slots in a year. The sequential order is needed to track long-duration storage across time. The weight of hours in each cluster are provided sequentially in the column `Sub_Weights`.

In [8]:
(
    reduced_resource_profile,
    reduced_load_profile,
    long_duration_storage,
) = reduce_time_domain(gen_variability, load_curves, scenario_settings[2030]["p1"])

In [9]:
# Resource profiles are in the same column order as rows in all_gens
reduced_resource_profile

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,56,57,58,59,60,61,62,63,64,65
1,1,0.407870,1,1,1,1,0.5676,0.407870,0.0000,1,...,1,1,1,1,1,0.000000,0.00000,0.000000,0.102265,0.102282
2,1,0.407970,1,1,1,1,0.6791,0.407970,0.0000,1,...,1,1,1,1,1,0.000000,0.00000,0.000000,0.057640,0.057697
3,1,0.408070,1,1,1,1,0.5626,0.408070,0.0000,1,...,1,1,1,1,1,0.000000,0.00000,0.000000,0.036849,0.036848
4,1,0.408170,1,1,1,1,0.5625,0.408170,0.0000,1,...,1,1,1,1,1,0.000000,0.00000,0.000000,0.019270,0.019276
5,1,0.408277,1,1,1,1,0.6367,0.408277,0.0000,1,...,1,1,1,1,1,0.000000,0.00000,0.000000,0.022819,0.022817
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
476,1,0.376590,1,1,1,1,0.2112,0.376590,0.5489,1,...,1,1,1,1,1,0.198808,0.30131,0.535644,0.634888,0.634933
477,1,0.376590,1,1,1,1,0.2645,0.376590,0.3881,1,...,1,1,1,1,1,0.082685,0.12004,0.319530,0.615957,0.616050
478,1,0.376590,1,1,1,1,0.2676,0.376590,0.1781,1,...,1,1,1,1,1,0.000000,0.00000,0.001218,0.529412,0.529504
479,1,0.376590,1,1,1,1,0.3345,0.376590,0.0000,1,...,1,1,1,1,1,0.000000,0.00000,0.000000,0.444557,0.444532


In [10]:
# This is formatted for GenX, drop any columns you don't need.
# I'm adding the cluster label here to match with the long_duration_storage parameter below
hours_per_cluster = settings["time_domain_days_per_period"] * 24
cluster_labels = [[N] * hours_per_cluster for N in range(1, settings["time_domain_periods"] + 1)]
reduced_load_profile["cluster"] = list(itertools.chain.from_iterable(cluster_labels))
reduced_load_profile

Unnamed: 0,Voll,Demand_Segment,Cost_of_Demand_Curtailment_per_MW,Max_Demand_Curtailment,$/MWh,Rep_Periods,Timesteps_per_Rep_Period,Sub_Weights,Time_Index,CA_N,CA_S,WECC_AZ,cluster
0,9000.0,1.0,1.000,1.000,9000.0,4.0,120.0,3000.0,1,17459.0,19302.0,12735.0,1
1,,2.0,0.067,0.075,603.0,,,2640.0,2,16658.0,18484.0,12038.0,1
2,,,,,,,,3000.0,3,15640.0,17421.0,11331.0,1
3,,,,,,,,120.0,4,14343.0,16038.0,10909.0,1
4,,,,,,,,,5,13199.0,14516.0,10588.0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
475,,,,,,,,,476,28951.0,31718.0,22598.0,4
476,,,,,,,,,477,28597.0,31136.0,21905.0,4
477,,,,,,,,,478,27745.0,30192.0,21447.0,4
478,,,,,,,,,479,26617.0,29029.0,20811.0,4


In [11]:
long_duration_storage.head(50)

Unnamed: 0,Period_Index,Rep_Period,Rep_Period_Index
0,1,3,1
1,2,2,1
2,3,2,1
3,4,2,1
4,5,2,1
5,6,2,1
6,7,2,2
7,8,2,2
8,9,2,2
9,10,2,2
