# Evaluate Baselines (Power Grid use case)

The goal of this notebook is to demonstrate how we can evaluate the results of a baseline on a given benchmark.

It will be split into two part. The first part will focus on the evaluation of a baseline that does not requires any training (the `DCApproximatrionAC`). On the second part, we will show how to load a baseline (or any other `AugmentedSimulator`) and evaluate it on a `Benchmark` of our choice.

As for the first notebook, we demonstrate this capability for the case of `NeuripsBenchmark1`.

**NB** This notebook supposes that the data for the benchmark are already available. If they are not, please generate them or download them.

**NB** The `DCApproximatrionAC` baseline requires the `grid2op` python package.

**To learn more about the training procedure, visit the next notebook [$\rightarrow$](./03_TrainAnAugmentedSimulator.ipynb)**

# TOC
- [Evaluation on Benchmark1](#benchmark1)
  - [DC approximation](#bench1-dc)
  - [Trained simulator](#bench1-fc)
  - [Comparison](#bench1-comp)
- [Evaluation on Benchmark2](#benchmark2)
  - [DC approximation](#bench2-dc)
  - [Trained simulator](#bench2-fc)

#### Import required packages

In [1]:
import pathlib
from lips import get_root_path
from pprint import pprint
from lips.benchmark.powergridBenchmark import PowerGridBenchmark
from lips.utils import get_path

In [14]:
# indicate required paths
#LIPS_PATH = get_root_path(pathlib_format=True).parent #pathlib.Path().resolve().parent
LIPS_PATH = get_root_path(pathlib_format=True).resolve().parents[3]
DATA_PATH = LIPS_PATH / "reference_data" / "powergrid" / "l2rpn_case14_sandbox"
BENCH_CONFIG_PATH = LIPS_PATH / "configurations" / "powergrid" / "benchmarks" / "l2rpn_case14_sandbox.ini"
SIM_CONFIG_PATH = LIPS_PATH / "configurations" / "powergrid" / "simulators"
#BASELINES_PATH = LIPS_PATH / "trained_baselines" / "powergrid"
BASELINES_PATH = LIPS_PATH / "trained_models" / "powergrid"
EVALUATION_PATH = LIPS_PATH / "evaluation_results" / "PowerGrid"
LOG_PATH = LIPS_PATH / "lips_logs.log"

In [15]:
print(LIPS_PATH)
print(EVALUATION_PATH)
print(BASELINES_PATH)

C:\Users\jch_m\DocumentsPerso\Centrale-IAC\Cours\Python-ML\ProjectMaster\LIPS
C:\Users\jch_m\DocumentsPerso\Centrale-IAC\Cours\Python-ML\ProjectMaster\LIPS\evaluation_results\PowerGrid
C:\Users\jch_m\DocumentsPerso\Centrale-IAC\Cours\Python-ML\ProjectMaster\LIPS\trained_models\powergrid


#  Benchmark1 <a id="benchmark1"></a>
Benchmark1 for power grid use case concerns the risk identification (security analysis). For more details concerning the benchmark scenario, refer to `Notebook 01`, or also to our article and the LIPS documentation available [here](https://lips.readthedocs.io).

## Initial step: load the dataset

A common dataset will be used for evaluate the two augmented simulator. This initial step aims at loading it once and for all.

In [16]:
benchmark1 = PowerGridBenchmark(benchmark_name="Benchmark1",
                                benchmark_path=DATA_PATH,
                                load_data_set=True,
                                log_path=LOG_PATH,
                                config_path=BENCH_CONFIG_PATH
                               )

In [17]:
# to verify the config is loaded appropriately for this benchmark
print("Benchmark name: ", benchmark1.config.section_name)
print("Environment name: ", benchmark1.config.get_option("env_name"))
print("Output attributes: ", benchmark1.config.get_option("attr_y"))
print("Evaluation criteria: ")
pprint(benchmark1.config.get_option("eval_dict"))

Benchmark name:  Benchmark1
Environment name:  l2rpn_case14_sandbox
Output attributes:  ('a_or', 'a_ex')
Evaluation criteria: 
{'IndRed': ['TIME_INF'],
 'ML': ['MSE_avg', 'MAE_avg', 'MAPE_avg', 'MAPE_90_avg', 'TIME_INF'],
 'OOD': ['MSE_avg', 'MAE_avg', 'MAPE_avg', 'MAPE_90_avg', 'TIME_INF'],
 'Physics': ['CURRENT_POS']}


## The DC approximation <a id="bench1-dc"></a>

We remind that the `grid2op` library is required for this part. You can install it with `pip install grid2op` if you do not have it already.

First we will create the "augmented simulator". As opposed to the second model we will expose here, this method require access to a powergrid. This is one of the reason we need grid2op. 

The way to load each `AugmentedSimulator` is specific. Here for example we load the DCApproximation that will use the same powergrid as the one used to generate the data in the previous Notebook.

In [19]:
# the next few lines are specific for each benchmark and each `AugmentedSimulator`
import grid2op
import warnings
from lips.physical_simulator.dcApproximationAS import DCApproximationAS
with warnings.catch_warnings():
    warnings.filterwarnings("ignore")
    env = grid2op.make(benchmark1.config.get_option("env_name"), test=True)
    grid_path = pathlib.Path(env.get_path_env()) / "grid.json"

dc_sim = DCApproximationAS(name="dc_approximation", 
                           benchmark_name="Benchmark1",
                           config_path=BENCH_CONFIG_PATH,
                           grid_path=grid_path)

numba cannot be imported and numba functions are disabled.
Probably the execution is slow.
Please install numba to gain a massive speedup.

numba cannot be imported and numba functions are disabled.
Probably the execution is slow.
Please install numba to gain a massive speedup.

numba cannot be imported and numba functions are disabled.
Probably the execution is slow.
Please install numba to gain a massive speedup.

numba cannot be imported and numba functions are disabled.
Probably the execution is slow.
Please install numba to gain a massive speedup.

numba cannot be imported and numba functions are disabled.
Probably the execution is slow.
Please install numba to gain a massive speedup.

  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype 

Now that the model is loaded, there is a common interface to evaluate its performance, on a dataset. This is showed in the cell bellow where we evaluate a physics based simulator `DCApproximation` on these two dataset.

In [None]:
EVAL_SAVE_PATH = get_path(EVALUATION_PATH, benchmark1)
dc_metrics_per_dataset = benchmark1.evaluate_simulator(augmented_simulator=dc_sim,
                                                       dataset="all", # other values : "val", "test", "test_ood_topo"
                                                       save_path=EVAL_SAVE_PATH,
                                                       save_predictions=True
                                                      )

  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
  for item, dtype in list(dtypes.iteritems()):
evaluate dc:   0%|          | 0/10000 [00:00<?, ?it/s]numba cannot be imported and numba functions are disabled.
Probably the execution is slow.
Please install numba to gain a massive speedup.

numba cannot be imported and numba functions are disabled.
Probably the execution is slow.
Please install numba to gain a massive speedup.

numba ca

Once the evaluation terminated, we can analyze the evaluation criteria in more details. We have analyzed the peformance wrt. four evaluation criteria categories which are:

- `ML`: Machine learning related metrics (computing the accuracy of augmented simulators);
- `Physics`: physics compliances which verify the physics laws (equations) on the predictions of an augmented simulator;
- `IndRed`: Industrial Readiness which verifies whether the proposed augmented simulators could be exploited in industriy;
- `OOD`: which verifies the out-of-distribution generalization capacity of the augmented simulators.

### ML-related performances

#### Test dataset evaluation

In [9]:
print("MAPE90 for a_or: ", dc_metrics_per_dataset["test"]["ML"]["MAPE_90_avg"]["a_or"])
print("MAPE90 for a_ex: ", dc_metrics_per_dataset["test"]["ML"]["MAPE_90_avg"]["a_ex"])

NameError: name 'dc_metrics_per_dataset' is not defined

#### OOD Generalization evaluation

In [8]:
print("MAPE90 for a_or: ", dc_metrics_per_dataset["test_ood_topo"]["ML"]["MAPE_90_avg"]["a_or"])
print("MAPE90 for a_ex: ", dc_metrics_per_dataset["test_ood_topo"]["ML"]["MAPE_90_avg"]["a_ex"])

MAPE90 for a_or:  0.1608534769924919
MAPE90 for a_ex:  0.149238631980091


### Physics compliances

In [9]:
dc_metrics_per_dataset["test"]["Physics"].keys()

dict_keys(['CURRENT_POS'])

TODO : when there is no violation, create although the key and force it to take value zero

#### Test Dataset Evaluation

In [10]:
print("1) Current positivity violation:", dc_metrics_per_dataset["test"]["Physics"]["CURRENT_POS"])#["a_or"]["Violation_proportion"]

1) Current positivity violation: {'a_or': {'Violation_proportion': 0.0}, 'a_ex': {'Violation_proportion': 0.0}}


#### OOD Generalization Evaluation

In [11]:
print("1) Current positivity violation:", dc_metrics_per_dataset["test_ood_topo"]["Physics"]["CURRENT_POS"])#["a_or"]["Violation_proportion"]

1) Current positivity violation: {'a_or': {'Violation_proportion': 0.0}, 'a_ex': {'Violation_proportion': 0.0}}


And now it is possible to study the metrics on the different dataset. For example, if we want the "MSE" error on the "test" dataset (with a similar distribution as the training one):

## A learned baseline "augmented simulator" <a id="bench1-fc"></a>

Along with some dataset, we provide also some baseline (from a trained neural network). This baseline is made of a fully connected neural network that takes the available input of the powergrid and tries to predict all the output of the simulator.

The fully connected neural network is made of XXX layer each with YYY units.

It is learned for KKK epochs on the training set of the `Benchmark1`.

**NB** These baselines are not yet fully trained, and some hyper parameters still need to be optimized. We intend on doing that before the official release of the benchmark for the Neurips conference.

First we need to load the baseline and initialize it properly

In [12]:
from lips.augmented_simulators.tensorflow_models import TfFullyConnected
from lips.dataset.scaler import StandardScaler

# rebuild the baseline architecture
tf_fc = TfFullyConnected(name="tf_fc",
                         bench_config_path=BENCH_CONFIG_PATH,
                         bench_config_name="Benchmark1",
                         sim_config_path=SIM_CONFIG_PATH / "tf_fc.ini",
                         sim_config_name="DEFAULT",
                         scaler=StandardScaler,
                         log_path=LOG_PATH)

LOAD_PATH = get_path(BASELINES_PATH, benchmark1)
tf_fc.restore(LOAD_PATH)

FileNotFoundError: path C:\Users\jch_m\DocumentsPerso\Centrale-IAC\Cours\Python-ML\ProjectMaster\LIPS\trained_baselines\powergrid\l2rpn_case14_sandbox\Benchmark1\tf_fc_DEFAULT not found

Then, as for the DC approximation, we can evaluate it on the test datasets of the benchmark.

This is done with the same command, by indicating the learned augmented simulator `tf_fc` as the argument:

In [None]:
# TODO: log the losses
# EVAL_SAVE_PATH = get_path(EVALUATION_PATH, benchmark1)
tf_fc_metrics = benchmark1.evaluate_simulator(augmented_simulator=tf_fc,
                                              eval_batch_size=10000,
                                              dataset="all",
                                              shuffle=False,
                                              save_path=None,
                                              save_predictions=False,
                                              result_level=0 # more information using level > 0
                                             )

## Comparison of two augmented simulator <a id="bench1-comp"></a>

### Machine learning metrics 

And now we can compare the two "augmented simulators". For example, if we want to compare the MAPE90 (mean absolute percentage error compute for last 10% quantile) on the test dataset (with a distribution similar to the training distribution) for currents (A) at two extremity of power lines, we might compare:

In [None]:
ML_metrics = "ML"
dataset_name = "test"
print("DC Approximation")
print(f"Dataset : {dataset_name}")
print("{:<10} : {}".format("MAPE_90_avg", dc_metrics_per_dataset[dataset_name][ML_metrics]["MAPE_90_avg"]))
print("{:<10} : {}".format("MSE_avg", dc_metrics_per_dataset[dataset_name][ML_metrics]["MSE_avg"]))
print("{:<10} : {}".format("MAE_avg", dc_metrics_per_dataset[dataset_name][ML_metrics]["MAE_avg"]))
dataset_name = "test_ood_topo"
print(f"Dataset : {dataset_name}")
print("{:<10} : {}".format("MAPE_90_avg", dc_metrics_per_dataset[dataset_name][ML_metrics]["MAPE_90_avg"]))
print("{:<10} : {}".format("MSE_avg", dc_metrics_per_dataset[dataset_name][ML_metrics]["MSE_avg"]))
print("{:<10} : {}".format("MAE_avg", dc_metrics_per_dataset[dataset_name][ML_metrics]["MAE_avg"]))

DC Approximation
Dataset : test
MAPE90     : {'a_or': 0.2054102762347278, 'a_ex': 0.18943869874874716}
MSE_avg    : {'a_or': 91723.53006148033, 'a_ex': 131732.91079220167}
MAE_avg    : {'a_or': 114.3059446624192, 'a_ex': 143.16579138645758}
Dataset : test_ood_topo
mape_90_avg : {'a_or': 0.2227085624215881, 'a_ex': 0.20711185760666956}
MSE_avg    : {'a_or': 107658.54513807358, 'a_ex': 155649.85692985042}
MAE_avg    : {'a_or': 123.67567242785628, 'a_ex': 155.63737545351296}


In [None]:
ML_metrics = "ML"
dataset_name = "test"
print("Fully Connected Augmented Simulator")
print(f"Dataset : {dataset_name}")
print("{:<10} : {}".format("MAPE_90_avg", tf_fc_metrics[dataset_name][ML_metrics]["MAPE_90_avg"]))
print("{:<10} : {}".format("MSE_avg", tf_fc_metrics[dataset_name][ML_metrics]["MSE_avg"]))
print("{:<10} : {}".format("MAE_avg", tf_fc_metrics[dataset_name][ML_metrics]["MAE_avg"]))
dataset_name = "test_ood_topo"
print(f"Dataset : {dataset_name}")
print("{:<10} : {}".format("MAPE_90_avg", tf_fc_metrics[dataset_name][ML_metrics]["MAPE_90_avg"]))
print("{:<10} : {}".format("MSE_avg", tf_fc_metrics[dataset_name][ML_metrics]["MSE_avg"]))
print("{:<10} : {}".format("MAE_avg", tf_fc_metrics[dataset_name][ML_metrics]["MAE_avg"]))

Fully Connected Augmented Simulator
Dataset : test
mape_90_avg : {'a_or': 0.005971284397153569, 'a_ex': 0.005959986222812791}
MSE_avg    : {'a_or': 21.65915870666504, 'a_ex': 44.23374557495117}
MAE_avg    : {'a_or': 2.7974419593811035, 'a_ex': 3.8893446922302246}
Dataset : test_ood_topo
mape_90_avg : {'a_or': 0.17957805279569528, 'a_ex': 0.17880120699511942}
MSE_avg    : {'a_or': 12364.6455078125, 'a_ex': 23608.12109375}
MAE_avg    : {'a_or': 52.66007614135742, 'a_ex': 74.55868530273438}


### Physic compliance
In comparison to DC approximation, which is by nature respects most of the physical laws, a trained augmented simulator could make some errors when verifying physics compliances.

In [None]:
physic_compliances = "Physics"
dataset_name = "test"
current_violation = tf_fc_metrics[dataset_name][physic_compliances]["CURRENT_POS"]["a_or"]["Violation_proportion"]
print("{:.2f}% of currents at the origin side of power lines violate the current positivity.".format(current_violation*100))

2.23% of currents at the origin side of power lines violate the current positivity.


In [None]:
current_error = tf_fc_metrics[dataset_name][physic_compliances]["CURRENT_POS"]["a_or"]["Violation_proportion"]
print("The sum of negative current values (Amp) : {:.2f}".format(current_error))

The sum of negative current values (Amp) : 0.02


### Industrial readiness

In [None]:
dataset_name = "test"
tf_fc_metrics[dataset_name]["IndRed"]["TIME_INF"]

0.033661603927612305

Concerning the computation time required by physical solvers, there is another notebook which analyzes in more details all the required steps and their computation time. In the power grid context, the security analysis (Benchmark1) could be computed in parallel and physical solvers (namely DC approximation) could compute the electricity flow very fast. 

# Benchmark2 <a id="benchmark2"></a>

In [None]:
benchmark2 = PowerGridBenchmark(benchmark_name="Benchmark2",
                                benchmark_path=DATA_PATH,
                                load_data_set=True,
                                log_path=LOG_PATH,
                                config_path=BENCH_CONFIG_PATH
                               )

In [None]:
# to verify the config is loaded appropriately for this benchmark
print("Benchmark name: ", benchmark2.config.section_name)
print("Environment name: ", benchmark2.config.get_option("env_name"))
print("Output attributes: ", benchmark2.config.get_option("attr_y"))
print("Evaluation criteria: ")
pprint(benchmark2.config.get_option("eval_dict"))

## The DC approximation <a id="bench2-dc"></a>

In [None]:
# the next few lines are specific for each benchmark and each `AugmentedSimulator`
import grid2op
import warnings
from lips.physical_simulator.dcApproximationAS import DCApproximationAS
with warnings.catch_warnings():
    warnings.filterwarnings("ignore")
    env = grid2op.make(benchmark2.config.get_option("env_name"), test=True)
    grid_path = pathlib.Path(env.get_path_env()) / "grid.json"

dc_sim = DCApproximationAS(name="dc_approximation", 
                           benchmark_name="Benchmark2",
                           config_path=BENCH_CONFIG_PATH,
                           grid_path=grid_path)

In [None]:
EVAL_SAVE_PATH = get_path(EVALUATION_PATH, benchmark2)
dc_metrics_per_dataset_bench2 = benchmark2.evaluate_simulator(augmented_simulator=dc_sim,
                                                              dataset="all", # other values : "val", "test", "test_ood_topo"
                                                              save_path=EVAL_SAVE_PATH,
                                                              save_predictions=True
                                                              )

### ML-related performances

#### Test dataset evaluation

In [None]:
print("MAPE90 for a_or: ", dc_metrics_per_dataset_bench2["test"]["ML"]["MAPE_90_avg"]["a_or"])
print("MAPE90 for a_ex: ", dc_metrics_per_dataset_bench2["test"]["ML"]["MAPE_90_avg"]["a_ex"])
print("MAPE for p_or: ", dc_metrics_per_dataset_bench2["test"]["ML"]["MAPE_avg"]["p_or"])
print("MAPE for p_ex: ", dc_metrics_per_dataset_bench2["test"]["ML"]["MAPE_avg"]["p_ex"])
print("MAE for p_or: ", dc_metrics_per_dataset_bench2["test"]["ML"]["MAE_avg"]["p_or"])
print("MAE for p_ex: ", dc_metrics_per_dataset_bench2["test"]["ML"]["MAE_avg"]["p_ex"])
print("MAPE for v_or: ", dc_metrics_per_dataset_bench2["test"]["ML"]["MAPE_avg"]["v_or"])
print("MAPE for v_ex: ", dc_metrics_per_dataset_bench2["test"]["ML"]["MAPE_avg"]["v_ex"])
print("MAE for v_or: ", dc_metrics_per_dataset_bench2["test"]["ML"]["MAE_avg"]["v_or"])
print("MAE for v_ex: ", dc_metrics_per_dataset_bench2["test"]["ML"]["MAE_avg"]["v_ex"])

MAPE90 for a_or:  0.1575609916428136
MAPE90 for a_ex:  0.14730832052856613
MAPE for p_or:  0.09214239993002604
MAPE for p_ex:  0.08787188867895084
MAE for p_or:  1.1406078395914332
MAE for p_ex:  1.0265692826889805
MAPE for v_or:  0.020920751203534725
MAPE for v_ex:  0.03316630132846375
MAE for v_or:  0.6651214754772188
MAE for v_ex:  0.9825881891059878


#### OOD Generalization evaluation

In [None]:
print("MAPE90 for a_or: ", dc_metrics_per_dataset_bench2["test_ood_topo"]["ML"]["MAPE_90_avg"]["a_or"])
print("MAPE90 for a_ex: ", dc_metrics_per_dataset_bench2["test_ood_topo"]["ML"]["MAPE_90_avg"]["a_ex"])
print("MAPE for p_or: ", dc_metrics_per_dataset_bench2["test_ood_topo"]["ML"]["MAPE_avg"]["p_or"])
print("MAPE for p_ex: ", dc_metrics_per_dataset_bench2["test_ood_topo"]["ML"]["MAPE_avg"]["p_ex"])
print("MAE for p_or: ", dc_metrics_per_dataset_bench2["test_ood_topo"]["ML"]["MAE_avg"]["p_or"])
print("MAE for p_ex: ", dc_metrics_per_dataset_bench2["test_ood_topo"]["ML"]["MAE_avg"]["p_ex"])
print("MAPE for v_or: ", dc_metrics_per_dataset_bench2["test_ood_topo"]["ML"]["MAPE_avg"]["v_or"])
print("MAPE for v_ex: ", dc_metrics_per_dataset_bench2["test_ood_topo"]["ML"]["MAPE_avg"]["v_ex"])
print("MAE for v_or: ", dc_metrics_per_dataset_bench2["test_ood_topo"]["ML"]["MAE_avg"]["v_or"])
print("MAE for v_ex: ", dc_metrics_per_dataset_bench2["test_ood_topo"]["ML"]["MAE_avg"]["v_ex"])

MAPE90 for a_or:  0.16061949592695715
MAPE90 for a_ex:  0.15209429645818998
MAPE for p_or:  0.08811211862607919
MAPE for p_ex:  0.0853203102271247
MAE for p_or:  1.1736193182265913
MAE for p_ex:  1.0713623671718724
MAPE for v_or:  0.021399578333099696
MAPE for v_ex:  0.034334340726291844
MAE for v_or:  0.7879805879878998
MAE for v_ex:  1.119801747546196


### Physics compliances

In [None]:
dc_metrics_per_dataset_bench2["test"]["Physics"].keys()

dict_keys(['CURRENT_POS', 'VOLTAGE_POS', 'LOSS_POS', 'DISC_LINES', 'CHECK_LOSS', 'CHECK_GC', 'CHECK_LC', 'CHECK_VOLTAGE_EQ'])

TODO : when there is no violation, create although the key and force it to take value zero

#### Test Dataset Evaluation

In [None]:
print("1) Current positivity violation:", dc_metrics_per_dataset_bench2["test"]["Physics"]["CURRENT_POS"])#["a_or"]["Violation_proportion"]
print("2) Voltage positivity violation:", dc_metrics_per_dataset_bench2["test"]["Physics"]["VOLTAGE_POS"])
print("3) Loss positivity violation:", dc_metrics_per_dataset_bench2["test"]["Physics"]["LOSS_POS"])
print("4) Disconnected lines violation:", dc_metrics_per_dataset_bench2["test"]["Physics"]["DISC_LINES"])
print("5) Violation of loss to be between [1,4]% of production:", dc_metrics_per_dataset_bench2["test"]["Physics"]["CHECK_LOSS"]["violation_percentage"])
print("6) Violation of global conservation: {}% and its weighted mape: {}".format(dc_metrics_per_dataset_bench2["test"]["Physics"]["CHECK_GC"]["violation_percentage"], dc_metrics_per_dataset_bench2["test"]["Physics"]["CHECK_GC"]["wmape"]))
print("7) Violation of local conservation: {}% and its weighted mape: {}".format(dc_metrics_per_dataset_bench2["test"]["Physics"]["CHECK_LC"]["violation_percentage"], dc_metrics_per_dataset_bench2["test"]["Physics"]["CHECK_LC"]["mape"]))
print("8) Violation proportion of voltage equality at subs:", dc_metrics_per_dataset_bench2["test"]["Physics"]["CHECK_VOLTAGE_EQ"]["prop_voltages_violation"])

1) Current positivity violation: {}
2) Voltage positivity violation: {}
3) Loss positivity violation: {}
4) Disconnected lines violation: {}
5) Violation of loss to be between [1,4]% of production: 0.0
6) Violation of global conservation: 100.0% and its weighted mape: 0.9999998971074707
7) Violation of local conservation: 7.142857142857142% and its weighted mape: 0.01564590303591114
8) Violation proportion of voltage equality at subs: 0.5333333333333333


#### OOD Generalization Evaluation

In [None]:
print("1) Current positivity violation:", dc_metrics_per_dataset_bench2["test_ood_topo"]["Physics"]["CURRENT_POS"])#["a_or"]["Violation_proportion"]
print("2) Voltage positivity violation:", dc_metrics_per_dataset_bench2["test_ood_topo"]["Physics"]["VOLTAGE_POS"])
print("3) Loss positivity violation:", dc_metrics_per_dataset_bench2["test_ood_topo"]["Physics"]["LOSS_POS"])
print("4) Disconnected lines violation:", dc_metrics_per_dataset_bench2["test_ood_topo"]["Physics"]["DISC_LINES"])
print("5) Violation of loss to be between [1,4]% of production:", dc_metrics_per_dataset_bench2["test_ood_topo"]["Physics"]["CHECK_LOSS"]["violation_percentage"])
print("6) Violation of global conservation: {}% and its weighted mape: {}".format(dc_metrics_per_dataset_bench2["test_ood_topo"]["Physics"]["CHECK_GC"]["violation_percentage"], dc_metrics_per_dataset_bench2["test_ood_topo"]["Physics"]["CHECK_GC"]["wmape"]))
print("7) Violation of local conservation: {}% and its weighted mape: {}".format(dc_metrics_per_dataset_bench2["test_ood_topo"]["Physics"]["CHECK_LC"]["violation_percentage"], dc_metrics_per_dataset_bench2["test_ood_topo"]["Physics"]["CHECK_LC"]["mape"]))
print("8) Violation proportion of voltage equality at subs:", dc_metrics_per_dataset_bench2["test_ood_topo"]["Physics"]["CHECK_VOLTAGE_EQ"]["prop_voltages_violation"])

1) Current positivity violation: {}
2) Voltage positivity violation: {}
3) Loss positivity violation: {}
4) Disconnected lines violation: {}
5) Violation of loss to be between [1,4]% of production: 0.0
6) Violation of global conservation: 100.0% and its weighted mape: 1.0000000047503754
7) Violation of local conservation: 7.142857142857142% and its weighted mape: 0.017686318493588492
8) Violation proportion of voltage equality at subs: 0.5


### Industrial Readiness

In [None]:
print(f'Inference time for test dataset: {dc_metrics_per_dataset_bench2["test"]["IndRed"]["TIME_INF"]:.2f}s')
print(f'Inference time for OOD dataset: {dc_metrics_per_dataset_bench2["test_ood_topo"]["IndRed"]["TIME_INF"]:.2f}s')

Inference time for test dataset: 148.18s
Inference time for OOD dataset: 178.38s


## A learned baseline "augmented simulator" <a id="bench2-fc"></a>

In [None]:
from lips.augmented_simulators.tensorflow_models import TfFullyConnected
from lips.dataset.scaler import StandardScaler

# rebuild the baseline architecture
tf_fc = TfFullyConnected(name="tf_fc",
                         bench_config_path=BENCH_CONFIG_PATH,
                         bench_config_name="Benchmark2",
                         sim_config_path=SIM_CONFIG_PATH / "tf_fc.ini",
                         sim_config_name="DEFAULT",
                         scaler=StandardScaler,
                         log_path=LOG_PATH)

In [None]:
LOAD_PATH = get_path(BASELINES_PATH, benchmark2)
tf_fc.restore(LOAD_PATH)

In [None]:
EVAL_SAVE_PATH = get_path(EVALUATION_PATH, benchmark2)
tf_fc_metrics = benchmark2.evaluate_simulator(augmented_simulator=tf_fc,
                                              eval_batch_size=10000,
                                              dataset="all",
                                              shuffle=False,
                                              save_path=EVAL_SAVE_PATH,
                                              save_predictions=True
                                             )

### ML-related performances

#### Test dataset evaluation

In [None]:
print("MAPE90 for a_or: ", tf_fc_metrics["test"]["ML"]["MAPE_90_avg"]["a_or"])
print("MAPE90 for a_ex: ", tf_fc_metrics["test"]["ML"]["MAPE_90_avg"]["a_ex"])
print("MAPE for p_or: ", tf_fc_metrics["test"]["ML"]["MAPE_avg"]["p_or"])
print("MAPE for p_ex: ", tf_fc_metrics["test"]["ML"]["MAPE_avg"]["p_ex"])
print("MAE for p_or: ", tf_fc_metrics["test"]["ML"]["MAE_avg"]["p_or"])
print("MAE for p_ex: ", tf_fc_metrics["test"]["ML"]["MAE_avg"]["p_ex"])
print("MAPE for v_or: ", tf_fc_metrics["test"]["ML"]["mape_avg"]["v_or"])
print("MAPE for v_ex: ", tf_fc_metrics["test"]["ML"]["mape_avg"]["v_ex"])
print("MAE for v_or: ", tf_fc_metrics["test"]["ML"]["MAE_avg"]["v_or"])
print("MAE for v_ex: ", tf_fc_metrics["test"]["ML"]["MAE_avg"]["v_ex"])

MAPE90 for a_or:  0.005536789216725236
MAPE90 for a_ex:  0.0054568138356318775
MAPE for p_or:  0.00862245828511439
MAPE for p_ex:  0.008477684811352349
MAE for p_or:  0.09964931756258011
MAE for p_ex:  0.09761589020490646
MAPE for v_or:  0.0008934855222482042
MAPE for v_ex:  0.0009033801333890047
MAE for v_or:  0.08649303764104843
MAE for v_ex:  0.06617464125156403


#### OOD Generalization evaluation

In [None]:
print("MAPE90 for a_or: ", tf_fc_metrics["test_ood_topo"]["ML"]["MAPE_90_avg"]["a_or"])
print("MAPE90 for a_ex: ", tf_fc_metrics["test_ood_topo"]["ML"]["MAPE_90_avg"]["a_ex"])
print("MAPE for p_or: ", tf_fc_metrics["test_ood_topo"]["ML"]["MAPE_avg"]["p_or"])
print("MAPE for p_ex: ", tf_fc_metrics["test_ood_topo"]["ML"]["MAPE_avg"]["p_ex"])
print("MAE for p_or: ", tf_fc_metrics["test_ood_topo"]["ML"]["MAE_avg"]["p_or"])
print("MAE for p_ex: ", tf_fc_metrics["test_ood_topo"]["ML"]["MAE_avg"]["p_ex"])
print("MAPE for v_or: ", tf_fc_metrics["test_ood_topo"]["ML"]["mape_avg"]["v_or"])
print("MAPE for v_ex: ", tf_fc_metrics["test_ood_topo"]["ML"]["mape_avg"]["v_ex"])
print("MAE for v_or: ", tf_fc_metrics["test_ood_topo"]["ML"]["MAE_avg"]["v_or"])
print("MAE for v_ex: ", tf_fc_metrics["test_ood_topo"]["ML"]["MAE_avg"]["v_ex"])

MAPE90 for a_or:  0.1018934792920239
MAPE90 for a_ex:  0.10297622868209473
MAPE for p_or:  0.1474770309530317
MAPE for p_ex:  0.14722551119114358
MAE for p_or:  1.7987455129623413
MAE for p_ex:  1.7728214263916016
MAPE for v_or:  0.004895506828630025
MAPE for v_ex:  0.00496045368458097
MAE for v_or:  0.4244811534881592
MAE for v_ex:  0.34588274359703064


### Physics compliances

#### Test dataset evaluation

In [None]:
print("1) Current positivity violation:", (tf_fc_metrics["test"]["Physics"]["CURRENT_POS"]["a_or"]["Violation_proportion"]+tf_fc_metrics["test"]["Physics"]["CURRENT_POS"]["a_ex"]["Violation_proportion"])/2)#["a_or"]["Violation_proportion"]
print("2) Voltage positivity violation:", (tf_fc_metrics["test"]["Physics"]["VOLTAGE_POS"]["v_or"]["Violation_proportion"]+tf_fc_metrics["test"]["Physics"]["VOLTAGE_POS"]["v_ex"]["Violation_proportion"])/2)
print("3) Loss positivity violation:", tf_fc_metrics["test"]["Physics"]["LOSS_POS"]["violation_proportion"])
print("4) Disconnected lines violation:", tf_fc_metrics["test"]["Physics"]["DISC_LINES"])
print("5) Violation of loss to be between [1,4]% of production:", tf_fc_metrics["test"]["Physics"]["CHECK_LOSS"]["violation_percentage"])
print("6) Violation of global conservation: {}% and its weighted mape: {}".format(tf_fc_metrics["test"]["Physics"]["CHECK_GC"]["violation_percentage"], tf_fc_metrics["test"]["Physics"]["CHECK_GC"]["wmape"]))
print("7) Violation of local conservation: {}% and its weighted mape: {}".format(tf_fc_metrics["test"]["Physics"]["CHECK_LC"]["violation_percentage"], tf_fc_metrics["test"]["Physics"]["CHECK_LC"]["mape"]))
print("8) Violation proportion of voltage equality at subs:", tf_fc_metrics["test"]["Physics"]["CHECK_VOLTAGE_EQ"]["prop_voltages_violation"])

1) Current positivity violation: 0.0114725
2) Voltage positivity violation: 0.013675
3) Loss positivity violation: 0.16484
4) Disconnected lines violation: {}
5) Violation of loss to be between [1,4]% of production: 0.18
6) Violation of global conservation: 99.4% and its weighted mape: 0.027306465432047844
7) Violation of local conservation: 89.68142857142857% and its weighted mape: 0.005147017132729412
8) Violation proportion of voltage equality at subs: 0.9976333333333334


#### OOD Generalization Evaluation

In [None]:
print("1) Current positivity violation:", (tf_fc_metrics["test_ood_topo"]["Physics"]["CURRENT_POS"]["a_or"]["Violation_proportion"]+tf_fc_metrics["test_ood_topo"]["Physics"]["CURRENT_POS"]["a_ex"]["Violation_proportion"])/2)#["a_or"]["Violation_proportion"]
print("2) Voltage positivity violation:", (tf_fc_metrics["test_ood_topo"]["Physics"]["VOLTAGE_POS"]["v_or"]["Violation_proportion"]+tf_fc_metrics["test_ood_topo"]["Physics"]["VOLTAGE_POS"]["v_ex"]["Violation_proportion"])/2)
print("3) Loss positivity violation:", tf_fc_metrics["test_ood_topo"]["Physics"]["LOSS_POS"]["violation_proportion"])
print("4) Disconnected lines violation:", tf_fc_metrics["test_ood_topo"]["Physics"]["DISC_LINES"])
print("5) Violation of loss to be between [1,4]% of production:", tf_fc_metrics["test_ood_topo"]["Physics"]["CHECK_LOSS"]["violation_percentage"])
print("6) Violation of global conservation: {}% and its weighted mape: {}".format(tf_fc_metrics["test_ood_topo"]["Physics"]["CHECK_GC"]["violation_percentage"], tf_fc_metrics["test_ood_topo"]["Physics"]["CHECK_GC"]["wmape"]))
print("7) Violation of local conservation: {}% and its weighted mape: {}".format(tf_fc_metrics["test_ood_topo"]["Physics"]["CHECK_LC"]["violation_percentage"], tf_fc_metrics["test_ood_topo"]["Physics"]["CHECK_LC"]["mape"]))
print("8) Violation proportion of voltage equality at subs:", tf_fc_metrics["test_ood_topo"]["Physics"]["CHECK_VOLTAGE_EQ"]["prop_voltages_violation"])

1) Current positivity violation: 0.012065
2) Voltage positivity violation: 0.013049999999999999
3) Loss positivity violation: 0.20383
4) Disconnected lines violation: {}
5) Violation of loss to be between [1,4]% of production: 0.36
6) Violation of global conservation: 99.81% and its weighted mape: 0.1251574009656906
7) Violation of local conservation: 97.77785714285714% and its weighted mape: 0.023546348186552223
8) Violation proportion of voltage equality at subs: 0.99536875


### Industrial Readiness

In [None]:
print(f'Inference time for test dataset: {tf_fc_metrics["test"]["IndRed"]["TIME_INF"]:.2f}s')
print(f'Inference time for OOD dataset: {tf_fc_metrics["test_ood_topo"]["IndRed"]["TIME_INF"]:.2f}s')

Inference time for test dataset: 0.24s
Inference time for OOD dataset: 0.27s
