# Compute the global score

The aim of this notebook is to demonstrate how to compute the global score to evaluate the performance for a given model (aka `AugmentedSimulator`). This notebook is composed of three main parts:

- [Acceleration reference computation](#acceleration) we show how the acceleration factor is obtained wrt the reference simulator
- [Score computation step-by-step](#step-by-step) we disentangle the score computation throughout a real example
- [Automatic Score Computation for local submissions](#auto-score) we provide a function which computes the score from a trained model or from saved metrics

For more information concerning the computation of the score, the readers could refer to [this file](Evaluation.md).

#### Import required packages

In [169]:
import os
import math

## Acceleration reference computation <a id="acceleration"></a>

As the acceleration of simulation is one of the most important criteria in this competition, in this section we try to explain with respect to which reference the acceleration will be computed.

#### Using AC Power Flow simulator (LightSim2Grid)

- The reference in this competition is the physical solver based on Newton Raphson optimisation which is implemented in  [LightSim2Grid](https://github.com/BDonnot/lightsim2grid) and called through [Grid2op](https://github.com/rte-france/Grid2Op) framework. It solves the power flow equations in AC power system. We first provide a function which return the corresponding computation time for an indicated number of simulated samples.

Get the AC solver time (used as the reference) with respect to which the acceleration rate is computed.

In [170]:
from lips.metrics.power_grid.compute_solver_time_grid2op import compute_solver_time_grid2op

BENCH_CONFIG_PATH = os.path.join("configs", "benchmarks", "lips_idf_2023.ini")
grid2op_solver_time = compute_solver_time_grid2op(config_path=BENCH_CONFIG_PATH, benchmark_name="Benchmark_competition", nb_samples=int(1e5))

100%|██████████| 1000/1000 [00:26<00:00, 37.69it/s]

Time required to solve one power flow:  0.0003269047848880291
Time required to solve 100000 power flows:  32.69047848880291





#### Using Security Analysis

However, a more optimized way to compute the power flow for the risk assessment application is through the security analysis. It runs simulations on a grid state for every contingency of interest in terms of risk, more especially for every single line disconnection that could unexpectdly occur. It hence runs as many simulations as there are lines on the grid in a single computation. Compared to single powerflow computation, this is accelerated thanks to matrix decomposition and factorization that is only done once and reused all along the contingency simulations. 

This Security Analysis computation is used as a baseline for this risk assessment application of interest in this competition.

In [39]:
from lips.config import ConfigManager
from lips.metrics.power_grid.compute_solver_time import compute_solver_time

BENCH_CONFIG_PATH = os.path.join("configs", "benchmarks", "lips_idf_2023.ini")
config = ConfigManager(path=BENCH_CONFIG_PATH, section_name="Benchmark_competition")

sa_solver_time = compute_solver_time(nb_samples=int(1e5), config=config)

In [40]:
sa_solver_time

8.627058052987502

In [20]:
print(f"The acceleration obtained using Security Analysis is :  {(grid2op_solver_time / sa_solver_time):.2f} times")

The acceleration obtained using Security Analysis is :  3.77 times


## Score computation step-by-step <a id='step-by-step'></a>
Hereafter, we provide the score computation procedure for the submissions. We start by an example of metrics returned by a baseline approach on `lips_idf_2023` environment. 

For more information about the metrics, you can refer to the LIPS paper Appendix.

We yet highlight the use of a specific metric: the MAPE90. As in risk assessment, we are interested to detect overloads on the lines, we only compute the MAPE on for currents on the last 10% quantile.

Once overloads are detected, an operator still needs to picture the flows on this state. Hence we assess the active power on the almost complete distribution, except the very low values which are not very significant. Hence the use of MAPE10 in that case.

In [244]:
test_metrics = {"ML":{"a_or":0.013, # MAPE90 
                      "a_ex":0.013, # MAPE90 
                      "p_or":0.029, # MAPE10 
                      "p_ex":0.029, # MAPE10 
                      "v_or":1.11, # MAE
                      "v_ex":1.11  # MAE
                },
                "Physics":{
                      "CURRENT_POS": 0.2,
                      "VOLTAGE_POS": 0.1,
                      "LOSS_POS": 27.88,
                      "DISC_LINES": 100.0,
                      "CHECK_LOSS": 1.54,
                      "CHECK_GC": 99.99,
                      "CHECK_LC": 98.55,
                      "CHECK_JOULE_LAW": 85.84                      
                }
               }

test_ood_metrics = {"ML":{"a_or":0.024, # MAPE90 
                          "a_ex":0.024, # MAPE90 
                          "p_or":0.041, # MAPE10 
                          "p_ex":0.040, # MAPE10 
                          "v_or":2.28, # MAE
                          "v_ex":2.07  # MAE
                    },
                    "Physics":{
                          "CURRENT_POS": 0.4, # violation percentage (%)
                          "VOLTAGE_POS": 0.1,
                          "LOSS_POS": 30.88,
                          "DISC_LINES": 100.0,
                          "CHECK_LOSS": 4.63,
                          "CHECK_GC": 99.99,
                          "CHECK_LC": 98.70,
                          "CHECK_JOULE_LAW": 88.48 
                    }
                   } 

speed_up = grid2op_solver_time / 2.74 # 2.74 is the inference time of a Neural Network based approach

We define the acceptability thresholds. Each variable is associated with 2 thresholds used to determine whether the result are great, acceptable or unacceptable and whether the result should be maximized or minimized.

In [245]:
thresholds={"a_or":(0.02,0.05,"min"),
            "a_ex":(0.02,0.05,"min"),
            "p_or":(0.02,0.05,"min"),
            "p_ex":(0.02,0.05,"min"),
            "v_or":(0.2,0.5,"min"),
            "v_ex":(0.2,0.5,"min"),
            "CURRENT_POS":(1., 5.,"min"),
            "VOLTAGE_POS":(1.,5.,"min"),
            "LOSS_POS":(1.,5.,"min"),
            "DISC_LINES":(1.,5.,"min"),
            "CHECK_LOSS":(1.,5.,"min"),
            "CHECK_GC":(0.05,0.10,"min"),
            "CHECK_LC":(0.05,0.10,"min"),
            "CHECK_JOULE_LAW":(1.,5.,"min")
           }

For instance, regarding the value obtained for the variable 'a_or'

- if it is lower than 2%, the result is great
- if it is greater than 2% but lower than 5%, the result is acceptable
- if it is greater than 5%, the result is not acceptable

For a physical criteria `CHECK_GC` (check global conservation):

- if the violation is less than 5 percent, the result is acceptable
- if it is greater than 5 percent but lower than 10 percent, the result is acceptable
- if it is greater than 10 percent, the result is not acceptable

We also define the configuration which are the coefficients considered for each category and subcategories.

In [246]:
configuration={
    "coefficients":{"test":0.3, "test_ood":0.3, "speed_up":0.4},
    "test_ratio":{"ml": 0.66, "physics":0.34},
    "test_ood_ratio":{"ml": 0.66, "physics":0.34},
    "value_by_color":{"g":2,"o":1,"r":0},
    "max_speed_ratio_allowed":50
}

We evaluate the result accuracy performances for all variables. We denote by:

- g, a great result
- o, an acceptable result
- r, a not acceptable result

In [247]:
results_test=dict()
for subcategoryName, subcategoryVal in test_metrics.items():
    results_test[subcategoryName]=[]
    for variableName, variableError in subcategoryVal.items():
        thresholdMin,thresholdMax,evalType=thresholds[variableName]
        if evalType=="min":
            if variableError<thresholdMin:
                accuracyEval="g"
            elif thresholdMin<variableError<thresholdMax:
                accuracyEval="o"
            else:
                accuracyEval="r"
        elif evalType=="max":
            if variableError<thresholdMin:
                accuracyEval="r"
            elif thresholdMin<variableError<thresholdMax:
                accuracyEval="o"
            else:
                accuracyEval="g"

        results_test[subcategoryName].append(accuracyEval)
    
print(results_test)

{'ML': ['g', 'g', 'o', 'o', 'r', 'r'], 'Physics': ['g', 'g', 'r', 'r', 'o', 'r', 'r', 'r']}


the same for OOD dataset

In [249]:
results_test_ood=dict()
for subcategoryName, subcategoryVal in test_ood_metrics.items():
    results_test_ood[subcategoryName]=[]
    for variableName, variableError in subcategoryVal.items():
        thresholdMin,thresholdMax,evalType=thresholds[variableName]
        if evalType=="min":
            if variableError<thresholdMin:
                accuracyEval="g"
            elif thresholdMin<variableError<thresholdMax:
                accuracyEval="o"
            else:
                accuracyEval="r"
        elif evalType=="max":
            if variableError<thresholdMin:
                accuracyEval="r"
            elif thresholdMin<variableError<thresholdMax:
                accuracyEval="o"
            else:
                accuracyEval="g"

        results_test_ood[subcategoryName].append(accuracyEval)
    
print(results_test_ood)

{'ML': ['o', 'o', 'o', 'o', 'r', 'r'], 'Physics': ['g', 'g', 'r', 'r', 'o', 'r', 'r', 'r']}


In [263]:
def quadratic_function(x, a, b, c, k):
    if x == 1.:
        return 0.
    else:
        return a*(x**2) + b*x + c + math.log10(k*x)

def SpeedMetric(speedUp, speedMax):
    a=0.01 # 0.01
    b=0.5 #0.5
    c=0.1 #0.1
    k=9
    res = quadratic_function(speedUp, a=a, b=b, c=c, k=k) / quadratic_function(speedMax, a=a, b=b, c=c, k=k)
    return max(min(res, 1), 0)

In [251]:
coefficients = configuration["coefficients"]
test_ratio = configuration["test_ratio"]
test_ood_ratio = configuration["test_ood_ratio"]
value_by_color = configuration["value_by_color"]
max_speed_ratio_allowed = configuration["max_speed_ratio_allowed"]

### Test dataset:

- ML

In [252]:
test_ml_subscore=0

test_ml_res = sum([value_by_color[color] for color in results_test["ML"]])
test_ml_res = (test_ml_res * test_ratio["ml"]) / (len(results_test["ML"])*max(value_by_color.values()))
test_ml_subscore += test_ml_res

- Physics:

In [253]:
test_physics_res = sum([value_by_color[color] for color in results_test["Physics"]])
test_physics_res = (test_physics_res*test_ratio["physics"]) / (len(results_test["Physics"])*max(value_by_color.values()))
test_physics_subscore = test_physics_res

In [262]:
test_subscore = test_ml_subscore + test_physics_subscore
print(f"Test dataset subscore (ML + Physics) : {test_subscore:.2f}")

Test dataset subscore (ML + Physics) : 0.44


### Speed up

In [261]:
speedup_score = SpeedMetric(speedUp=speed_up,speedMax=max_speed_ratio_allowed)
print(f"Speed-up score: {speedup_score:.2f}")

Speed-up score: 0.18


### Test OOD

- ML

In [256]:
test_ood_ml_subscore=0

test_ood_ml_res = sum([value_by_color[color] for color in results_test_ood["ML"]])
test_ood_ml_res = (test_ood_ml_res * test_ood_ratio["ml"]) / (len(results_test_ood["ML"])*max(value_by_color.values()))
test_ood_ml_subscore += test_ood_ml_res

- Physics

In [257]:
test_ood_physics_res = sum([value_by_color[color] for color in results_test_ood["Physics"]])
test_ood_physics_res = (test_ood_physics_res*test_ood_ratio["physics"]) / (len(results_test_ood["Physics"])*max(value_by_color.values()))
test_ood_physics_subscore = test_ood_physics_res

In [260]:
test_ood_subscore = test_ood_ml_subscore + test_ood_physics_subscore
print(f"OOD dataset score (ML + Physics):  {test_ood_subscore:.2f}")

OOD dataset score (ML + Physics):  0.33


- Global Score

In [259]:
globalScore=100*(coefficients["test"]*test_subscore+coefficients["test_ood"]*test_ood_subscore+coefficients["speed_up"]*speedup_score)
print(globalScore)

30.093348207718172


## Automatic Score Computation for local submissions <a id='auto-score'></a>

In this section, we use the scoring function (available under `utils.compute_score`) in two ways:  
- to compute the score for already trained baseline models;
- to compute the score on the basis of the saved evaluation results (dictionary).

### Compute the score using an already trained model

Evaluate an already trained baseline (a fully connected architecture) and get the corresponding score using `compute_score` function.

In [141]:
### Import required packages
import os
from lips.benchmark.powergridBenchmark import PowerGridBenchmark

#Define the required paths
BENCH_CONFIG_PATH = os.path.join("configs", "benchmarks", "lips_idf_2023.ini")
DATA_PATH = os.path.join("input_data_local", "lips_idf_2023")
TRAINED_MODELS = os.path.join("input_data_local", "trained_models")
LOG_PATH = "logs.log"

benchmark_kwargs = {"attr_x": ("prod_p", "prod_v", "load_p", "load_q"),
                    "attr_y": ("a_or", "a_ex", "p_or", "p_ex", "v_or", "v_ex"),
                    "attr_tau": ("line_status", "topo_vect"),
                    "attr_physics": None}

benchmark = PowerGridBenchmark(benchmark_path=DATA_PATH,
                               config_path=BENCH_CONFIG_PATH,
                               benchmark_name="Benchmark_competition",
                               load_data_set=True, 
                               log_path=LOG_PATH,
                               **benchmark_kwargs)

In [2]:
import tensorflow as tf
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

memory_limit = 20000

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=memory_limit)])
  except RuntimeError as e:
    print(e)

In [None]:
# load an already trained augmented simulator
from lips.augmented_simulators.tensorflow_models import TfFullyConnected
from lips.dataset.scaler import StandardScaler

# Indicate the path required for corresponding augmented simulator parameters
SIM_CONFIG_PATH = os.path.join("configs", "simulators", "tf_fc.ini")

tf_fc = TfFullyConnected(name="tf_fc",
                         bench_config_path=BENCH_CONFIG_PATH,
                         bench_config_name="Benchmark_competition",
                         bench_kwargs=benchmark_kwargs,
                         sim_config_path=SIM_CONFIG_PATH,
                         sim_config_name="DEFAULT",
                         scaler=StandardScaler,
                         log_path=LOG_PATH)

LOAD_PATH = os.path.join(TRAINED_MODELS, "lips_idf_2023")
tf_fc.restore(path=LOAD_PATH)

In [None]:
EVALUATION_PATH = os.path.join("input_data_local", "eval_results", "lips_idf_2023")
metrics = benchmark.evaluate_simulator(augmented_simulator=tf_fc,
                                       eval_batch_size=128,
                                       dataset="all",
                                       shuffle=False,
                                       save_path=EVALUATION_PATH,
                                       save_predictions=False
                                      )

In [174]:
from utils.compute_score import compute_global_score
score = compute_global_score(metrics, benchmark.config)

100%|██████████| 1000/1000 [00:19<00:00, 50.50it/s]

Time required to solve one power flow:  0.0003298692740499973
Time required to solve 100000 power flows:  32.98692740499973
22.43482332440552





### Compute the score using the saved evaluation results

Read the evaluation results for the baseline architecture (LeapNet architecture in this case).

In [190]:
import os
import json

def import_metrics(path, dataset):
    path_to_results = os.path.join(path, dataset, "eval_res.json")
    with open(path_to_results) as json_file:
        metrics = json.load(json_file)
    return metrics

In [223]:
EVALUATION_PATH = os.path.join("input_data_local", "eval_results", "lips_idf_2023", "tf_leapnet_DEFAULT")
metrics_dict = dict()
metrics_dict["test"] = import_metrics(EVALUATION_PATH, "test")
metrics_dict["test_ood_topo"] = import_metrics(EVALUATION_PATH, "test_ood_topo")

In [224]:
from utils.compute_score import compute_global_score
score = compute_global_score(metrics_dict, benchmark.config)

100%|██████████| 1000/1000 [00:20<00:00, 49.60it/s]

Time required to solve one power flow:  0.000327156824991107
Time required to solve 100000 power flows:  32.7156824991107
30.083020095673813





## Compute the physical solvers scores

In [177]:
from utils.compute_score import compute_ml_subscore, compute_physics_subscore, SpeedMetric, configuration

test_results_disc = {"ML": ['g','g','g','g','g','g'], "Physics": ['g','g','g','g','g','g','g','g']}
test_ood_results_disc = {"ML": ['g','g','g','g','g','g'], "Physics": ['g','g','g','g','g','g','g','g']}
coefficients = configuration["coefficients"]
max_speed_ratio_allowed = configuration["max_speed_ratio_allowed"]

test_ml_subscore = compute_ml_subscore(test_results_disc, key="test_ratio")
test_physics_subscore = compute_physics_subscore(test_results_disc, key="test_ratio")
test_subscore = test_ml_subscore + test_physics_subscore

test_ood_ml_subscore = compute_ml_subscore(test_ood_results_disc, key="test_ood_ratio")
test_ood_physics_subscore = compute_physics_subscore(test_ood_results_disc, key="test_ood_ratio")
test_ood_subscore = test_ood_ml_subscore + test_ood_physics_subscore

speed_up = 1. # LighSim2grid
# speed_up = 3.77 # Security Analysis

speedup_score = SpeedMetric(speedUp=speed_up, speedMax=max_speed_ratio_allowed)

globalScore = 100*(coefficients["test"]*test_subscore+coefficients["test_ood"]*test_ood_subscore+coefficients["speed_up"]*speedup_score)

## Benchmark table
The table provided below shows the benchmark results for the evaluations made in the previous sections of this notebook. Two first lines are physical solvers, which have the exact solutions for all the variables and are used as the ground truth for comparisons. However, their accelerations are not enough to obtain a global score of 100%. On the other hand, Fully Connected and LeapNet are two baselines based on Neural Networks. We can see that they do not show satisfying results regarding most of the considered criteria. The LeapNet architecture shows better generalization properties in comparison to Fully Connected architecture due to its specific design.

![Benchmark table](img/Benchmark_table_new.png)