# Compute the global score

The aim of this notebook is to demonstrate how to compute the global score to evaluate the performance for a given model (aka `AugmentedSimulator`). This notebook is composed of three main parts:

- [Acceleration reference computation](#acceleration) we show how the acceleration factor is obtained wrt the reference simulator
- [Score computation step-by-step](#step-by-step) we disentangle the score computation throughout a real example
- [Automatic Score Computation for local submissions](#auto-score) we provide a function which computes the score from a trained model or from saved metrics

For more information concerning the computation of the score, the readers could refer to [this file](Evaluation.md).

#### Import required packages

In [38]:
import os
import math

## Acceleration reference computation <a id="acceleration"></a>

As the acceleration of simulation is one of the most important criteria in this competition, in this section we try to explain with respect to which reference the acceleration will be computed.

#### Using Grid2op solver

- The reference in this competition is the physical solver based on Newton Raphson optimisation which is implemented in [Grid2op](https://github.com/rte-france/Grid2Op) framework. It tries to solve the power flow equations in AC power system. We first provide a function which return the corresponding computation time for an indicated number of samples.

Get the AC solver time (used as the reference) with respect to which the acceleration rate is computed.

In [11]:
from lips.metrics.power_grid.compute_solver_time_grid2op import compute_solver_time_grid2op

BENCH_CONFIG_PATH = os.path.join("configs", "benchmarks", "lips_idf_2023.ini")
grid2op_solver_time = compute_solver_time_grid2op(config_path=BENCH_CONFIG_PATH, benchmark_name="Benchmark_competition", nb_samples=int(1e5))

100%|██████████| 1000/1000 [00:20<00:00, 49.00it/s]

Time required to solve one power flow:  0.00032784996554255483
Time required to solve 100000 power flows:  32.784996554255486





#### Using Security Analysis

However, a more optimized way to compute the power flow is through the security analysis which is based on the factorization of the decomposition of a matrix. This happens only during first step of power flow computation and allows to obtain a significant acceleration in comparison to the above mentioned approach. <span style="color:red">In this competition, we use this optimized version as the reference power flow computation time to calculate the speed-ups of submissions.</span>

In [39]:
from lips.config import ConfigManager
from lips.metrics.power_grid.compute_solver_time import compute_solver_time

BENCH_CONFIG_PATH = os.path.join("configs", "benchmarks", "lips_idf_2023.ini")
config = ConfigManager(path=BENCH_CONFIG_PATH, section_name="Benchmark_competition")

sa_solver_time = compute_solver_time(nb_samples=int(1e5), config=config)

In [40]:
sa_solver_time

8.627058052987502

In [20]:
print(f"The acceleration obtained using Security Analysis is :  {(grid2op_solver_time / sa_solver_time):.2f} times")

The acceleration obtained using Security Analysis is :  3.77 times


## Score computation step-by-step <a id='step-by-step'></a>
Hereafter, we provide the score computation procedure for the submissions. We start by an example of metrics returned by a baseline approach on `lips_idf_2023` environment. 

In [41]:
test_metrics = {"ML":{"a_or":0.02, # MAPE90 
                      "a_ex":0.02, # MAPE90 
                      "p_or":0.02, # MAPE90 
                      "p_ex":0.02, # MAPE90 
                      "v_or":1.49, # MAE
                      "v_ex":1.28  # MAE
                },
                "Physics":{
                      "CURRENT_POS": 0.2,
                      "VOLTAGE_POS": 0.1,
                      "LOSS_POS": 31.99,
                      "DISC_LINES": 0,
                      "CHECK_LOSS": 3.06,
                      "CHECK_GC": 99.99,
                      "CHECK_LC": 98.82,
                      "CHECK_JOULE_LAW": 87.82                      
                }
               }

test_ood_metrics = {"ML":{"a_or":0.03, # MAPE90 
                          "a_ex":0.03, # MAPE90 
                          "p_or":0.03, # MAPE90 
                          "p_ex":0.03, # MAPE90 
                          "v_or":2.61, # MAE
                          "v_ex":2.27  # MAE
                    },
                    "Physics":{
                          "CURRENT_POS": 0.4, # violation percentage (%)
                          "VOLTAGE_POS": 0.1,
                          "LOSS_POS": 34.94,
                          "DISC_LINES": 0,
                          "CHECK_LOSS": 6.61,
                          "CHECK_GC": 99.99,
                          "CHECK_LC": 98.95,
                          "CHECK_JOULE_LAW": 89.59 
                    }
                   } 

speed_up = sa_solver_time / 3.34 # 3.34 is the inference time of a Neural Network based approach

We define the acceptability thresholds. Each variable is associated with 2 thresholds used to determine whether the result are great, acceptable or unacceptable and whether the result should be maximized or minimized.

In [42]:
thresholds={"a_or":(0.05,0.10,"min"),
            "a_ex":(0.05,0.10,"min"),
            "p_or":(0.05,0.10,"min"),
            "p_ex":(0.05,0.10,"min"),
            "v_or":(0.2,0.5,"min"),
            "v_ex":(0.2,0.5,"min"),
            "CURRENT_POS":(1., 5.,"min"),
            "VOLTAGE_POS":(1.,5.,"min"),
            "LOSS_POS":(1.,5.,"min"),
            "DISC_LINES":(1.,5.,"min"),
            "CHECK_LOSS":(1.,5.,"min"),
            "CHECK_GC":(0.05,0.10,"min"),
            "CHECK_LC":(0.05,0.10,"min"),
            "CHECK_JOULE_LAW":(1.,5.,"min")
           }

For instance, regarding the value obtained for the variable 'a_or'

- if it is lower than 0.05, the result is great
- if it is greater than 0.05 but lower than 0.10, the result is acceptable
- if it is greater than 0.10, the result is not acceptable

For a physical criteria `CHECK_GC` (check global conservation):

- if the violation is less than 5 percent, the result is acceptable
- if it is greater than 5 percent but lower than 10 percent, the result is acceptable
- if it is greater than 10 percent, the result is not acceptable

We also define the configuration which are the coefficients considered for each category and subcategories.

In [43]:
configuration={
    "coefficients":{"test":0.33, "test_ood":0.33, "speed_up":0.34},
    "test_ratio":{"ml": 0.6, "physics":0.4},
    "test_ood_ratio":{"ml": 0.6, "physics":0.4},
    "value_by_color":{"g":2,"o":1,"r":0},
    "max_speed_ratio_allowed":50
}

We evaluate the result accuracy performances for all variables. We denote by:

- g, a great result
- o, an acceptable result
- r, a not acceptable result

In [44]:
results_test=dict()
for subcategoryName, subcategoryVal in test_metrics.items():
    results_test[subcategoryName]=[]
    for variableName, variableError in subcategoryVal.items():
        thresholdMin,thresholdMax,evalType=thresholds[variableName]
        if evalType=="min":
            if variableError<thresholdMin:
                accuracyEval="g"
            elif thresholdMin<variableError<thresholdMax:
                accuracyEval="o"
            else:
                accuracyEval="r"
        elif evalType=="max":
            if variableError<thresholdMin:
                accuracyEval="r"
            elif thresholdMin<variableError<thresholdMax:
                accuracyEval="o"
            else:
                accuracyEval="g"

        results_test[subcategoryName].append(accuracyEval)
    
print(results_test)

{'ML': ['g', 'g', 'g', 'g', 'r', 'r'], 'Physics': ['g', 'g', 'r', 'g', 'o', 'r', 'r', 'r']}


the same for OOD dataset

In [45]:
results_test_ood=dict()
for subcategoryName, subcategoryVal in test_ood_metrics.items():
    results_test_ood[subcategoryName]=[]
    for variableName, variableError in subcategoryVal.items():
        thresholdMin,thresholdMax,evalType=thresholds[variableName]
        if evalType=="min":
            if variableError<thresholdMin:
                accuracyEval="g"
            elif thresholdMin<variableError<thresholdMax:
                accuracyEval="o"
            else:
                accuracyEval="r"
        elif evalType=="max":
            if variableError<thresholdMin:
                accuracyEval="r"
            elif thresholdMin<variableError<thresholdMax:
                accuracyEval="o"
            else:
                accuracyEval="g"

        results_test_ood[subcategoryName].append(accuracyEval)
    
print(results_test_ood)

{'ML': ['g', 'g', 'g', 'g', 'r', 'r'], 'Physics': ['g', 'g', 'r', 'g', 'r', 'r', 'r', 'r']}


In [46]:
def SpeedMetric(speedUp,speedMax):
    return max(min(math.log10(speedUp)/math.log10(speedMax),1),0)

In [47]:
coefficients = configuration["coefficients"]
test_ratio = configuration["test_ratio"]
test_ood_ratio = configuration["test_ood_ratio"]
value_by_color = configuration["value_by_color"]
max_speed_ratio_allowed = configuration["max_speed_ratio_allowed"]

### Test dataset:

- ML

In [48]:
test_ml_subscore=0

test_ml_res = sum([value_by_color[color] for color in results_test["ML"]])
test_ml_res = (test_ml_res * test_ratio["ml"]) / (len(results_test["ML"])*max(value_by_color.values()))
test_ml_subscore += test_ml_res

- Physics:

In [49]:
test_physics_res = sum([value_by_color[color] for color in results_test["Physics"]])
test_physics_res = (test_physics_res*test_ratio["physics"]) / (len(results_test["Physics"])*max(value_by_color.values()))
test_physics_subscore = test_physics_res

In [50]:
test_subscore = test_ml_subscore + test_physics_subscore

In [51]:
test_subscore

0.575

### Speed up

In [53]:
speedup_score = SpeedMetric(speedUp=speed_up,speedMax=max_speed_ratio_allowed)

In [54]:
speedup_score

0.2425682929234455

### Test OOD

- ML

In [55]:
test_ood_ml_subscore=0

test_ood_ml_res = sum([value_by_color[color] for color in results_test_ood["ML"]])
test_ood_ml_res = (test_ood_ml_res * test_ood_ratio["ml"]) / (len(results_test_ood["ML"])*max(value_by_color.values()))
test_ood_ml_subscore += test_ood_ml_res

In [56]:
test_ood_ml_subscore

0.39999999999999997

- Physics

In [57]:
test_ood_physics_res = sum([value_by_color[color] for color in results_test_ood["Physics"]])
test_ood_physics_res = (test_ood_physics_res*test_ood_ratio["physics"]) / (len(results_test_ood["Physics"])*max(value_by_color.values()))
test_ood_physics_subscore = test_ood_physics_res

In [58]:
test_ood_physics_subscore

0.15000000000000002

In [59]:
test_ood_subscore = test_ood_ml_subscore + test_ood_physics_subscore

In [60]:
test_ood_subscore

0.55

- Global Score

In [61]:
globalScore=100*(coefficients["test"]*test_subscore+coefficients["test_ood"]*test_ood_subscore+coefficients["speed_up"]*speedup_score)
print(globalScore)

45.37232195939715


## Automatic Score Computation for local submissions <a id='auto-score'></a>

In this section, we use the scoring function (available under `utils.compute_score`) in two ways:  
- to compute the score for already trained baseline models;
- to compute the score on the basis of the saved evaluation results (dictionary).

### Compute the score using an already trained model

Evaluate an already trained baseline (a fully connected architecture) and get the corresponding score using `compute_score` function.

In [68]:
### Import required packages
import os
from lips.benchmark.powergridBenchmark import PowerGridBenchmark

#Define the required paths
BENCH_CONFIG_PATH = os.path.join("configs", "benchmarks", "lips_idf_2023.ini")
DATA_PATH = os.path.join("input_data_local", "lips_idf_2023")
TRAINED_MODELS = os.path.join("input_data_local", "trained_models")
LOG_PATH = "logs.log"

benchmark_kwargs = {"attr_x": ("prod_p", "prod_v", "load_p", "load_q"),
                    "attr_y": ("a_or", "a_ex", "p_or", "p_ex", "v_or", "v_ex"),
                    "attr_tau": ("line_status", "topo_vect"),
                    "attr_physics": None}

benchmark = PowerGridBenchmark(benchmark_path=DATA_PATH,
                               config_path=BENCH_CONFIG_PATH,
                               benchmark_name="Benchmark_competition",
                               load_data_set=True, 
                               load_ybus_as_sparse=False,
                               log_path=LOG_PATH,
                               **benchmark_kwargs)

In [70]:
# load an already trained augmented simulator
from lips.augmented_simulators.tensorflow_models import TfFullyConnected
from lips.dataset.scaler import StandardScaler

# Indicate the path required for corresponding augmented simulator parameters
SIM_CONFIG_PATH = os.path.join("configs", "simulators", "tf_fc.ini")

tf_fc = TfFullyConnected(name="tf_fc",
                         bench_config_path=BENCH_CONFIG_PATH,
                         bench_config_name="Benchmark_competition",
                         bench_kwargs=benchmark_kwargs,
                         sim_config_path=SIM_CONFIG_PATH,
                         sim_config_name="DEFAULT",
                         scaler=StandardScaler,
                         log_path=LOG_PATH)

LOAD_PATH = os.path.join(TRAINED_MODELS, "lips_idf_2023")
tf_fc.restore(path=LOAD_PATH)

In [71]:
EVALUATION_PATH = os.path.join("input_data_local", "eval_results", "lips_idf_2023")
metrics = benchmark.evaluate_simulator(augmented_simulator=tf_fc,
                                       eval_batch_size=128,
                                       dataset="all",
                                       shuffle=False,
                                       save_path=EVALUATION_PATH,
                                       save_predictions=False
                                      )

In [72]:
from utils.compute_score import compute_global_score
score = compute_global_score(metrics, benchmark.config)

43.90493509272275


In [63]:
compute_score.configuration

{'coefficients': {'test': 0.33, 'test_ood': 0.33, 'speed_up': 0.34},
 'test_ratio': {'ml': 0.6, 'physics': 0.4},
 'test_ood_ratio': {'ml': 0.6, 'physics': 0.4},
 'value_by_color': {'g': 2, 'o': 1, 'r': 0},
 'max_speed_ratio_allowed': 50}

### Compute the score using the saved evaluation results

Read the evaluation results for the baseline architecture (LeapNet architecture in this case).

In [30]:
import json

def import_metrics(path, dataset):
    path_to_results = os.path.join(path, dataset, "eval_res.json")
    with open(path_to_results) as json_file:
        metrics = json.load(json_file)
    return metrics


In [35]:
EVALUATION_PATH = os.path.join("input_data_local", "eval_results", "lips_idf_2023", "tf_leapnet_DEFAULT")
metrics_dict = dict()
metrics_dict["test"] = import_metrics(EVALUATION_PATH, "test")
metrics_dict["test_ood_topo"] = import_metrics(EVALUATION_PATH, "test_ood_topo")

In [64]:
from utils.compute_score import compute_global_score
score = compute_global_score(metrics_dict, benchmark.config)

45.35264931559977
