# ISSUE 84
The following strategy has been adopted to respond to issue 84 concerning the data generation

- Sample some chronic using regular expression (for training, validation, test and OOD test)
- while the number of required observation is not reached:
    - select a chronic by random among the pre-selected chronics
    - Fast-forward or not to random time stamp
    - Generate X time stamps from the time stamp identified in last step (e.g., 3 days)

In [1]:
import pathlib

In [2]:
LIPS_PATH = pathlib.Path().resolve().parent
CONFIG_PATH = LIPS_PATH / "configurations" / "powergrid" / "benchmarks" / "l2rpn_neurips_2020_track1_small.ini"
DATA_PATH = LIPS_PATH / "reference_data" / "test"
LOG_PATH = LIPS_PATH / "lips_logs.log"

In [3]:
if not DATA_PATH.exists():
    DATA_PATH.mkdir(mode=511, parents=False)

In [4]:
from lips.benchmark.powergridBenchmark import PowerGridBenchmark
benchmark1 = PowerGridBenchmark(benchmark_path=DATA_PATH, # set to None, to not store the data on disk
                                benchmark_name="Benchmark1",
                                load_data_set=False,
                                config_path=CONFIG_PATH,
                                log_path=LOG_PATH)

2022-05-20 21:56:36.959348: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-05-20 21:56:36.959424: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


**Generate the dataset by sampling <span style="color:red">one day</span> from each chronic in dataset.**
The new argument `nb_samples_per_chronic` is added in generate function of `DataSet` class to manage the required number of time stamps per chronic. It is computed based on the number of time stamps, as there are one time stamp per 5 minutes, so one day it computed from :

$$ \underbrace{1}_{\text{#days}} \times \underbrace{24}_{\text{#hours in day}} \times \underbrace{\frac{60}{5}}_{\text{#time stamps per hour}} = 288 \ \ \text{steps per day}$$

In [5]:
benchmark1.generate(nb_sample_train=int(1e5),
                    nb_sample_val=int(1e4),
                    nb_sample_test=int(1e4),
                    nb_sample_test_ood_topo=int(1e4)
                   )

100%|█████████████████████████████████████████████████████████████████████████████████████████████| 100000/100000 [29:13<00:00, 57.01it/s]
  val = np.asanyarray(val)
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [03:41<00:00, 45.20it/s]
  val = np.asanyarray(val)
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [03:41<00:00, 45.23it/s]
  val = np.asanyarray(val)
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [13:46<00:00, 12.11it/s]
  val = np.asanyarray(val)


We analyze, hereafter, the simulator used during the generation of dataset for training phase of the models.

In [7]:
print(len(benchmark1.training_simulator.chronics_name))
chronics_id_list = benchmark1.training_simulator.chronics_id 
chronics_names_list = benchmark1.training_simulator.chronics_name
chronics_names_unique = benchmark1.training_simulator.chronics_name_unique
time_stamps = benchmark1.training_simulator.time_stamps

10000


These informations are also available using `chronics_info` variable of `DataSet` class.

In [6]:
chronics_id_list = benchmark1.train_dataset.chronics_info["chronics_id"]
chronics_names_list = benchmark1.train_dataset.chronics_info["chronics_name"]
chronics_names_unique = benchmark1.train_dataset.chronics_info["chronics_name_unique"]
time_stamps = benchmark1.train_dataset.chronics_info["time_stamps"]

In [7]:
print("Length of used chronics:", len(chronics_names_unique))
chronics_names_unique

Length of used chronics: 116


['Scenario_february_018',
 'Scenario_may_015',
 'Scenario_october_018',
 'Scenario_july_001',
 'Scenario_february_023',
 'Scenario_june_022',
 'Scenario_december_022',
 'Scenario_april_012',
 'Scenario_february_001',
 'Scenario_march_004',
 'Scenario_december_018',
 'Scenario_september_021',
 'Scenario_december_014',
 'Scenario_june_019',
 'Scenario_october_025',
 'Scenario_march_002',
 'Scenario_may_003',
 'Scenario_january_003',
 'Scenario_august_023',
 'Scenario_february_026',
 'Scenario_may_010',
 'Scenario_april_013',
 'Scenario_january_007',
 'Scenario_may_011',
 'Scenario_may_006',
 'Scenario_january_018',
 'Scenario_november_015',
 'Scenario_march_009',
 'Scenario_june_008',
 'Scenario_december_007',
 'Scenario_may_025',
 'Scenario_february_006',
 'Scenario_april_025',
 'Scenario_december_012',
 'Scenario_august_012',
 'Scenario_june_023',
 'Scenario_september_010',
 'Scenario_december_025',
 'Scenario_may_008',
 'Scenario_march_029',
 'Scenario_october_007',
 'Scenario_january

In [8]:
import numpy as np
print("Length of unique chronics used during the generation: ", len(np.unique(chronics_names_unique)))

Length of unique chronics used during the generation:  95


In [9]:
print("Number of items in time_stamps:", len(time_stamps))
# time_stamps

Number of items in time_stamps: 116


Time stamps for the first chronic. There should be time stamps for one entire day, as we have opted for `nb_samples_per_chronic=288`. 

In [10]:
idx = 0
print("Time stamps for the chronic: ", chronics_names_unique[idx])
time_stamps[idx]

Time stamps for the chronic:  Scenario_february_018


[datetime.datetime(2012, 2, 19, 17, 50),
 datetime.datetime(2012, 2, 19, 17, 55),
 datetime.datetime(2012, 2, 19, 18, 0),
 datetime.datetime(2012, 2, 19, 18, 5),
 datetime.datetime(2012, 2, 19, 18, 10),
 datetime.datetime(2012, 2, 19, 18, 15),
 datetime.datetime(2012, 2, 19, 18, 20),
 datetime.datetime(2012, 2, 19, 18, 25),
 datetime.datetime(2012, 2, 19, 18, 30),
 datetime.datetime(2012, 2, 19, 18, 35),
 datetime.datetime(2012, 2, 19, 18, 40),
 datetime.datetime(2012, 2, 19, 18, 45),
 datetime.datetime(2012, 2, 19, 18, 50),
 datetime.datetime(2012, 2, 19, 18, 55),
 datetime.datetime(2012, 2, 19, 19, 0),
 datetime.datetime(2012, 2, 19, 19, 5),
 datetime.datetime(2012, 2, 19, 19, 10),
 datetime.datetime(2012, 2, 19, 19, 15),
 datetime.datetime(2012, 2, 19, 19, 20),
 datetime.datetime(2012, 2, 19, 19, 25),
 datetime.datetime(2012, 2, 19, 19, 30),
 datetime.datetime(2012, 2, 19, 19, 35),
 datetime.datetime(2012, 2, 19, 19, 40),
 datetime.datetime(2012, 2, 19, 19, 45),
 datetime.datetime(2

### Load and read the data concerning chronics

In [11]:
from lips.benchmark.powergridBenchmark import PowerGridBenchmark
benchmark_test = PowerGridBenchmark(benchmark_path=DATA_PATH, # set to None, to not store the data on disk
                                    benchmark_name="Benchmark1",
                                    load_data_set=True,
                                    config_path=CONFIG_PATH,
                                    log_path=LOG_PATH)

In [None]:
benchmark_test.train_dataset.chronics_info

In [13]:
chronics_id_list = benchmark1.train_dataset.chronics_info["chronics_id"]
chronics_names_list = benchmark1.train_dataset.chronics_info["chronics_name"]
chronics_names_unique = benchmark1.train_dataset.chronics_info["chronics_name_unique"]
time_stamps = benchmark1.train_dataset.chronics_info["time_stamps"]

In [14]:
chronics_names_unique

['Scenario_february_018',
 'Scenario_may_015',
 'Scenario_october_018',
 'Scenario_july_001',
 'Scenario_february_023',
 'Scenario_june_022',
 'Scenario_december_022',
 'Scenario_april_012',
 'Scenario_february_001',
 'Scenario_march_004',
 'Scenario_december_018',
 'Scenario_september_021',
 'Scenario_december_014',
 'Scenario_june_019',
 'Scenario_october_025',
 'Scenario_march_002',
 'Scenario_may_003',
 'Scenario_january_003',
 'Scenario_august_023',
 'Scenario_february_026',
 'Scenario_may_010',
 'Scenario_april_013',
 'Scenario_january_007',
 'Scenario_may_011',
 'Scenario_may_006',
 'Scenario_january_018',
 'Scenario_november_015',
 'Scenario_march_009',
 'Scenario_june_008',
 'Scenario_december_007',
 'Scenario_may_025',
 'Scenario_february_006',
 'Scenario_april_025',
 'Scenario_december_012',
 'Scenario_august_012',
 'Scenario_june_023',
 'Scenario_september_010',
 'Scenario_december_025',
 'Scenario_may_008',
 'Scenario_march_029',
 'Scenario_october_007',
 'Scenario_january