In [1]:
# for the moment the directory should be set to the parent to be able to use the LIPS package
import os
os.chdir(os.path.pardir)

The goal of this notebook is to demonstrate how we can evaluate the results of a baseline on a given benchmark.

It will be split into two part. The first part will focus on the evaluation of a baseline that does not requires any training (the `DCApproximatrionAC`). On the second part, we will show how to load a baseline (or any other `AugmentedSimulator`) and evaluate it on a `Benchmark` of our choice.

As for the first notebook, we demonstrate this capability for the case of `NeuripsBenchmark1`.

**NB** This notebook supposes that the data for the benchmark are already available. If they are not, please generate them or download them.

**NB** The `DCApproximatrionAC` baseline requires the `grid2op` python package.

#  Benchmark1

## Initial step: load the dataset

A common dataset will be used for evaluate the two augmented simulator. This initial step aims at loading it once and for all.

In [2]:
from lips.benchmark import PowerGridBenchmark
path_benchmark = os.path.join("reference_data")
log_path = os.path.abspath(os.path.join("lips","logger","logs.log"))
benchmark1 = PowerGridBenchmark(benchmark_name="Benchmark1",
                                path_benchmark=path_benchmark,
                                load_data_set=True,
                                log_path=log_path
                               )

2022-01-07 15:48:18.989554: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2022-01-07 15:48:18.989611: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## The DC approximation

We remind that the `grid2op` library is required for this part. You can install it with `pip install grid2op` if you do not have it already.

First we will create the "augmented simulator". As opposed to the second model we will expose here, this method require access to a powergrid. This is one of the reason we need grid2op. 

The way to load each `AugmentedSimulator` is specific. Here for example we load the DCApproximation that will use the same powergrid as the one used to generate the data in the previous Notebook.

In [3]:
# the next few lines are specific for each benchmark and each `AugmentedSimulator`
import grid2op
import warnings
from lips.physical_simulator.dcApproximationAS import DCApproximationAS
with warnings.catch_warnings():
    warnings.filterwarnings("ignore")
    env = grid2op.make("l2rpn_case14_sandbox", test=True)
    grid_path = os.path.join(env.get_path_env(), "grid.json")

dc_sim = DCApproximationAS(name="dc_approximation", 
                           benchmark_name="Benchmark1",
                           path_config=None, # use default config path
                           grid_path=grid_path)

Now that the model is load, there is a common interface to evaluate its performance, on a dataset. This is showed in the cell bellow where we evaluate a physics based simulator `DCApproximation` on these two dataset.

In [None]:
dc_metrics_per_dataset = benchmark1.evaluate_simulator(augmented_simulator=dc_sim,
                                                       dataset="all"
                                                      )

In [5]:
dataset_name = "val"
# use dc simulator `_observations` private variable to trace all the observations used
print(dc_sim._observations["val"].keys())
# use dc simulator `_flow` private variable for computed flow values 
print(dc_sim._flow.keys())

dict_keys(['prod_p', 'prod_v', 'load_p', 'load_q', 'line_status', 'topo_vect', 'a_or', 'a_ex'])
dict_keys(['val', 'test', 'test_ood_topo'])


And now it is possible to study the metrics on the different dataset. For example, if we want the "MSE" error on the "test" dataset (with a similar distribution as the training one):

In [6]:
# see the evaluated datasets
print(dc_metrics_per_dataset.keys())

dict_keys(['val', 'test', 'test_ood_topo'])


In [22]:
dataset_name = "test"
ML_metrics = 0
print("{:<10} : {}".format("MAPE90", dc_metrics_per_dataset[dataset_name][ML_metrics]["mape90"]))
print("{:<10} : {}".format("NRMSE_avg", dc_metrics_per_dataset[dataset_name][ML_metrics]["NRMSE_avg"]))
print("{:<10} : {}".format("MAE_avg", dc_metrics_per_dataset[dataset_name][ML_metrics]["MAE_avg"]))

MAPE90     : {'a_or': 0.16308450692718962, 'a_ex': 0.1464899336666497}
NRMSE_avg  : {'a_or': 0.08072826942694072, 'a_ex': 0.07632464035218221}
MAE_avg    : {'a_or': 84.77337833139086, 'a_ex': 107.36099085275112}


In [23]:
dataset_name = "test_ood_topo"
print("{:<10} : {}".format("MAPE90", dc_metrics_per_dataset[dataset_name][ML_metrics]["mape90"]))
print("{:<10} : {}".format("NRMSE_avg", dc_metrics_per_dataset[dataset_name][ML_metrics]["NRMSE_avg"]))
print("{:<10} : {}".format("MAE_avg", dc_metrics_per_dataset[dataset_name][ML_metrics]["MAE_avg"]))

MAPE90     : {'a_or': 0.6199380111893823, 'a_ex': 0.6128878148489871}
NRMSE_avg  : {'a_or': 0.24516475287804304, 'a_ex': 0.24667614062546903}
MAE_avg    : {'a_or': 184.18927002976574, 'a_ex': 256.52591287868444}


## A learned baseline "augmented simulator"

Along with some dataset, we provide also some baseline (from a trained neural network). This baseline is made of a fully connected neural network that takes the available input of the powergrid and tries to predict all the output of the simulator.

The fully connected neural network is made of XXX layer each with YYY units.

It is learned for KKK epochs on the training set of the `Benchmark1`.

**NB** These baselines are not yet fully trained, and some hyper parameters still need to be optimized. We intend on doing that before the official release of the benchmark for the Neurips conference.

First we need to load the baseline and initialize it properly

In [2]:
path_baselines = os.path.join("trained_baselines")
from lips.augmented_simulators import FullyConnectedAS

# recreate the baseline
fc_augmented_sim = FullyConnectedAS(name="Baseline_FullyConnected")

# TODO create a wrapper for these 3 calls
fc_augmented_sim.load_metadata(path_baselines)
fc_augmented_sim.init()
fc_augmented_sim.restore(path_baselines)

Then, as for the DC approximation, we can evaluate it on the test datasets of the benchmark.

This is done with the same command:

In [5]:
fc_metrics_per_dataset = neurips_benchmark1.evaluate_augmented_simulator(fc_augmented_sim, batch_size=10000)

A log file including some verifications is created at root directory with the name logs.log


In [6]:
fc_augmented_sim._predict_time

0.12818050384521484

## Comparison of the two augmented simulator

### Machine learning metrics 

And now we can compare the two "augmented simulators". For example, if we want to compare the MAPE90 (mean absolute percentage error compute for last 10% quantile) on the test dataset (with a distribution similar to the training distribution) for currents (A) at two extremity of power lines, we might compare:

In [11]:
ML_metrics = 0
fc_metrics_per_dataset["test"][ML_metrics]['mape90']

{'a_or': 0.00880006226699487, 'a_ex': 0.008783799864951779}

In [12]:
dc_metrics_per_dataset["test"][ML_metrics]['mape90']

{'a_or': 0.1776475031579627, 'a_ex': 0.1608028851306038}

If we want the same quantity but for the "out of distribution (due to topology)" distribution we can have a look at:

In [13]:
fc_metrics_per_dataset["test_ood_topo"][ML_metrics]['mape90']

{'a_or': 0.20694255606968653, 'a_ex': 0.20816347558174791}

In [14]:
dc_metrics_per_dataset["test"][ML_metrics]['mape90']

{'a_or': 0.1776475031579627, 'a_ex': 0.1608028851306038}

### Physic compliance

In [16]:
physic_compliances = 1
fc_metrics_per_dataset["test"][physic_compliances]["BasicVerifications"]["currents"]["a_or"]["Violation_proportion"]

0.02305

In [17]:
fc_metrics_per_dataset["test"][physic_compliances]["BasicVerifications"]["currents"]["a_or"]["Error"]

11866.515

# Benchmark2

# Benchmark3 