The goal of this notebook is to demonstrate how we can evaluate the results of a baseline on a given benchmark.

It will be split into two part. The first part will focus on the evaluation of a baseline that does not requires any training (the `DCApproximatrionAC`). On the second part, we will show how to load a baseline (or any other `AugmentedSimulator`) and evaluate it on a `Benchmark` of our choice.

As for the first notebook, we demonstrate this capability for the case of `NeuripsBenchmark1`.

**NB** This notebook supposes that the data for the benchmark are already available. If they are not, please generate them or download them.

**NB** The `DCApproximatrionAC` baseline requires the `grid2op` python package.

# Initial step: load the dataset

A common dataset will be used for evaluate the two augmented simulator. This initial step aims at loading it once and for all.

In [1]:
import os
from lips.neurips_benchmark import NeuripsBenchmark1
path_benchmark = os.path.join("reference_data")
neurips_benchmark1 = NeuripsBenchmark1(path_benchmark=path_benchmark,
                                       load_data_set=True)

# The DC approximation

We remind that the `grid2op` library is required for this part. You can install it with `pip install grid2op` if you do not have it already.

First we will create the "augmented simulator". As opposed to the second model we will expose here, this method require access to a powergrid. This is one of the reason we need grid2op. 

The way to load each `AugmentedSimulator` is specific. Here for example we load the DCApproximation that will use the same powergrid as the one used to generate the data in the previous Notebook.

In [2]:
# the next few lines are specific for each benchmark and each `AugmentedSimulator`
import grid2op
import warnings
from lips.augmented_simulators import DCApproximationAS
with warnings.catch_warnings():
    warnings.filterwarnings("ignore")
    env = grid2op.make("l2rpn_case14_sandbox", test=True)
    grid_path = os.path.join(env.get_path_env(), "grid.json")

dc_augmented_sim = DCApproximationAS(grid_path=grid_path)

Now that the model is load, there is a common interface to evaluate its performance, on a dataset. This is showed in the cell bellow where we evaluate this specific `AugmentedSimulator` one this two dataset.

In [3]:
dc_metrics_per_dataset = neurips_benchmark1.evaluate_augmented_simulator(dc_augmented_sim)

evaluate dc: 100%|██████████| 10000/10000 [01:46<00:00, 93.72it/s]


************* Basic verifier *************
Current positivity check passed for origin side !
----------------------------------------------
Current positivity check passed for extremity side !
----------------------------------------------
Voltage positivity check passed for origin side !
----------------------------------------------
Voltage positivity check passed for extremity side !
----------------------------------------------
Loss positivity check passed !
----------------------------------------------
Prediction in presence of line disconnection. Check passed !
----------------------------------------------
************* Check loss *************
Verification is done without any violation !
************* Check Energy Conservation *************
Number of failed cases is 10000 and the proportion is 100.000% : 
************* Check kirchhoff's current law *************
6.75% not verify the Kirchhoff's current law at 0.01 tolerance


ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 16 and the array at index 1 has size 14

And now it is possible to study the metrics on the different dataset.

In [None]:
# TODO ....

# A learned baseline "augmented simulator"

Along with some dataset, we provide also some baseline (from a trained neural network). This baseline is made of a fully connected neural network that takes the available input of the powergrid and tries to predict all the output of the simulator.

The fully connected neural network is made of XXX layer each with YYY units.

It is learned for KKK epochs on the training set of the `Benchmark1`.

**NB** These baselines are not yet fully trained, and some hyper parameters still need to be optimized. We intend on doing that before the official release of the benchmark for the Neurips conference.

First we need to load the baseline and initialize it properly

In [4]:
path_baselines = os.path.join("trained_baselines")
from lips.augmented_simulators import FullyConnectedAS

# recreate the baseline
fc_augmented_sim = FullyConnectedAS(name="Baseline_FullyConnected")

# TODO create a wrapper for these 3 calls
fc_augmented_sim.load_metadata(path_baselines)
fc_augmented_sim.init()
fc_augmented_sim.restore(path_baselines)

Then, as for the DC approximation, we can evaluate it on the test datasets of the benchmark.

This is done with the same command:

In [5]:
fc_metrics_per_dataset = neurips_benchmark1.evaluate_augmented_simulator(fc_augmented_sim)

************* Basic verifier *************
2.335% of lines does not respect the positivity of currents (Amp) at origin
Concerned lines with corresponding number of negative current values at their origin:
 {2: 513, 13: 388, 16: 380, 9: 350, 4: 346, 14: 331, 8: 302, 7: 291, 17: 276, 5: 267, 19: 229, 1: 218, 10: 212, 11: 147, 0: 139, 3: 131, 6: 112, 15: 22, 12: 16}
----------------------------------------------
2.445% of lines does not respect the positivity of currents (Amp) at extremity
Concerned lines with corresponding number of negative current values at their extremity:
 {2: 517, 4: 427, 16: 381, 13: 359, 9: 347, 7: 338, 14: 323, 1: 320, 17: 279, 8: 274, 5: 272, 10: 219, 19: 203, 0: 149, 11: 147, 6: 142, 3: 134, 15: 32, 12: 26}
----------------------------------------------
2.651% of lines does not respect the positivity of voltages (Kv) at origin
Concerned lines with corresponding number of negative voltage values at their origin:
 {3: 848, 5: 497, 0: 466, 10: 424, 13: 323, 16: 31

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 16 and the array at index 1 has size 14

# Comparison of the two augmented simulator

And now we can compare the two augmented simulators:

In [None]:
# TODO: use the dictionaries dc_metrics_per_dataset and fc_metrics_per_dataset