Skip to content

Comments

Fix the evaluation strategy#57

Merged
sarusso merged 18 commits intoICSC-Spoke3:developfrom
agataben:fix/wrong_evaluation_strategy
Dec 9, 2025
Merged

Fix the evaluation strategy#57
sarusso merged 18 commits intoICSC-Spoke3:developfrom
agataben:fix/wrong_evaluation_strategy

Conversation

@agataben
Copy link
Collaborator

@agataben agataben commented Dec 3, 2025

The new parameter strategy sets the evaluation strategy. Supported values are:

  • strategy = 'flags': detected anomalies are counted matching the anomaly labels in the series and the anomaly flag of the model, one by one. This is the default evaluation strategy.
  • strategy = 'events': Not yet implemented, will group anomaly flags in events.

The parameter granularity sets the "finesse" of the evaluation and affects how detected, inserted anomalies and false positives are counted. The evaluation logic has been fixed and improved. The available granularities are:

  • granularity = 'variable' : anomalies and false positives are counted for each variable of the series
  • granularity = 'point': anomalies and false positives are counted for each timestamp
  • granularity = 'series': anomalies and false positives are counted for each series, so a series can be either anomalous or not

In the module "evaluators.py" there are 3 private functions aimed at the evaluation of the model on a single series: _variable_granularity_evaluation(), _point_granularity_evaluation(), _series_granularity_evaluation(). As names suggest, each function evaluate with a specific granularity.

Given:
$N_d=$ n. of correctly detected anomalies
$N_{tot}=$ n. of inserted anomalies
$K=$ normalization factor
$N_{FP}$ = n. of false positives
$N_p=$ series length
$n=$ series dimension

The output of the above functions is a dictionary with 4 couples key-value:

  1. anomalies_count: $N_d$
  2. anomalies_ratio: $\frac{N_d}{N_{tot}}$
    If there are no anomalies, the value is "None"
  3. false_positives_count: $N_{FP}$
  4. false_positives_ratio: $\frac{N_{FP} }{K}$
K granularity
$N_p\times n$ 'variable'
$N_p$ 'point'
1 'series'

The evaluation of the model on the dataset is assembled in the private function _calculate_model_scores(). This function takes in input the evaluations on the single series (each done with a fixed granularity) and it returns a dictionary with the structure already discussed: what changes are the values. If $N$ is the number of series in the dataset:

  1. anomalies_count: $\sum_{i=1}^{N}N_d^{i}$
  2. anomalies_ratio: $\frac{1}{N_A}\sum_{i=1}^{N_A}\frac{N_d^{i}}{N^{i}_{tot}}$.
    If there are no anomalies, the value is "None"
  3. false_positive_count: $\sum_{i=1}^{N}N_{FP}^{i}$
  4. false_positive_ratio: $\frac{1}{N}\sum_{i=1}^{N}\frac{N_{FP}^{i} }{K}$

$N_A$ states for number of series in the dataset with at least 1 anomaly.

In the following, an example of evaluation using the MinMaxAnomalyDetector() and series generated with generate_timeseries_df():
series1

timestamp value_1 value_2 anomaly_label value_1_anomaly value_2_anomaly
2025-06-10 14:00:00+00:00 0.000000 0.707107 None 0 0
2025-06-10 15:00:00+00:00 0.841471 0.977061 anomaly_2 0 1
2025-06-10 16:00:00+00:00 0.909297 0.348710 anomaly_1 1 0
2025-06-10 17:00:00+00:00 0.141120 -0.600243 None 0 0
2025-06-10 18:00:00+00:00 -0.756802 -0.997336 anomaly_1 1 1

series2

timestamp value_1 value_2 anomaly_label value_1_anomaly value_2_anomaly
2025-06-10 14:00:00+00:00 0.000000 0.707107 anomaly_1 0 0
2025-06-10 15:00:00+00:00 0.841471 0.977061 anomaly_2 0 1
2025-06-10 16:00:00+00:00 0.909297 0.348710 anomaly_1 1 0
2025-06-10 17:00:00+00:00 0.141120 -0.600243 None 0 0
2025-06-10 18:00:00+00:00 -0.756802 -0.997336 anomaly_1 0 1
2025-06-10 19:00:00+00:00 -0.958924 -0.477482 None 1 0
2025-06-10 20:00:00+00:00 -0.279415 0.481366 None 0 0

series3

timestamp value_1 value_2 anomaly_label value_1_anomaly value_2_anomaly
2025-06-10 14:00:00+00:00 0.000000 0.707107 None 1 0
2025-06-10 15:00:00+00:00 0.841471 0.977061 None 0 1
2025-06-10 16:00:00+00:00 0.909297 0.348710 None 1 1

The columns 'value_1_anomaly' and 'value_2_anomaly' are the result of model: 1 means anomalous, while 0 not anomalous.
Here the result of the evaluation on this dataset, using the 3 different granularities:

dataset = [series1, series2, series3]
minmax =  MinMaxAnomalyDetector()
models = {'my_model': minmax}
evaluator = Evaluator(test_data=dataset)
evaluation_result = evaluator.evaluate(models,granularity='variable')
{'anomalies_count': 7,
 'anomalies_ratio': 0.5208333333333333,
 'false_positives_count': 5,
 'false_positives_ratio': 0.24603174603174602}
evaluation_result = evaluator.evaluate(models,granularity='data_point')
{'anomalies_count': 6,
 'anomalies_ratio': 0.875,
 'false_positives_count': 4,
 'false_positives_ratio': 0.38095238095238093}
evaluation_result = evaluator.evaluate(models,granularity='series')
{'anomalies_count': 2,
 'anomalies_ratio': 1.0,
 'false_positives_count': 1,
 'false_positives_ratio': 0.3333333333333333}

Notes:

Some commits are about generating series inside the setUp() method in "test_evaluators.py", useful for testing the evaluation now and for the future.

For now, the evaluation is based on the following assumption: if the timestamp t is marked as anomalous, the anomaly is present in all variables making up the series.

This PR partially addresses #56: the evaluation does not already distinguish between different type of anomalies. This will be done in the future adding the breakdown switch.

The evaluate() takes in input a dictionary {'model_name': model_instance}, so it can evaluate more than one model on the same dataset. The output for multiple models is a nested dictionary: its primary keys are model names and the coupled values are a dictionaries as the ones showed above.

The names "series1" and "series2" have been changed to "series_1" and "series_2"
inside test functions to be distinguished from series generated in the "setUp()" method
The new evaluation strategy has been used
The new version of the function have no more this argument
About the evaluation on a single series:
Before, the calculation of the 'anomalies_ratio' value was a zero division for not anomalous series.
Now, 'anomalies_ratio' value is "None" for not anomalous series.
About the evaluation on a dataset:
Now, the value 'anomalies_ratio' is calculated averaging only on anomalous series
When the series is not anomalous and when the dataset does not contain anomalous series
@agataben agataben changed the title Fix the wrong evaluation strategy Change the default evaluation strategy Dec 5, 2025
This argument set the evaluation strategy. Suported values are "flags" (default)
and "events" (Not implemented)
@sarusso
Copy link
Collaborator

sarusso commented Dec 6, 2025

Just tried, it seems to works great!

Just one thing, I investigated common terminology and we should really change some naming in order to adopt more common nomenclature:

'anomalies_count' -> 'true_positives_count'
'anomalies_ratio' -> 'true_positives_rate'
'false_positives_ratio' -> 'false_positives_rate'

The main reason behind this is that it is quite unclear what anomalies_count and anomalies_ratio refer to, if the total anomalies in the dataset or the detected ones.

Then, when finishing implementing the breakdown:

spike_uv_true_positives_count
spike_uv_true_positives_rate
step_uv_true_positives_count
step_uv_ true_positives_rate
etc...

Also, can I make a few edits/fixes on the text of the PR?

@agataben
Copy link
Collaborator Author

agataben commented Dec 6, 2025

Yes!

@sarusso
Copy link
Collaborator

sarusso commented Dec 8, 2025

Can you please change the branch name form fix/wrong_evaluation_strategy to fix/change_evaluation_strategy?

@sarusso sarusso changed the title Change the default evaluation strategy Change the evaluation strategy Dec 8, 2025
@agataben agataben closed this Dec 8, 2025
@agataben agataben deleted the fix/wrong_evaluation_strategy branch December 8, 2025 15:53
@agataben
Copy link
Collaborator Author

agataben commented Dec 8, 2025

done, but the PR has been closed

@agataben agataben restored the fix/wrong_evaluation_strategy branch December 8, 2025 16:11
@sarusso
Copy link
Collaborator

sarusso commented Dec 8, 2025

Right, my bad, sorry. Let's stick with fix/wrong_evaluation_strategy then.

@agataben
Copy link
Collaborator Author

agataben commented Dec 8, 2025

ok

@agataben agataben reopened this Dec 8, 2025
@sarusso sarusso changed the title Change the evaluation strategy Fix the evaluation strategy Dec 8, 2025
@sarusso sarusso merged commit 475968a into ICSC-Spoke3:develop Dec 9, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants