Fix the evaluation strategy by agataben · Pull Request #57 · ICSC-Spoke3/ATS

agataben · 2025-12-03T13:51:33Z

The new parameter strategy sets the evaluation strategy. Supported values are:

strategy = 'flags': detected anomalies are counted matching the anomaly labels in the series and the anomaly flag of the model, one by one. This is the default evaluation strategy.
strategy = 'events': Not yet implemented, will group anomaly flags in events.

The parameter granularity sets the "finesse" of the evaluation and affects how detected, inserted anomalies and false positives are counted. The evaluation logic has been fixed and improved. The available granularities are:

granularity = 'variable' : anomalies and false positives are counted for each variable of the series
granularity = 'point': anomalies and false positives are counted for each timestamp
granularity = 'series': anomalies and false positives are counted for each series, so a series can be either anomalous or not

In the module "evaluators.py" there are 3 private functions aimed at the evaluation of the model on a single series: _variable_granularity_evaluation(), _point_granularity_evaluation(), _series_granularity_evaluation(). As names suggest, each function evaluate with a specific granularity.

Given:
$N_d=$ n. of correctly detected anomalies
$N_{tot}=$ n. of inserted anomalies
$K=$ normalization factor
$N_{FP}$ = n. of false positives
$N_p=$ series length
$n=$ series dimension

The output of the above functions is a dictionary with 4 couples key-value:

anomalies_count: $N_d$
anomalies_ratio: $\frac{N_d}{N_{tot}}$
If there are no anomalies, the value is "None"
false_positives_count: $N_{FP}$
false_positives_ratio: $\frac{N_{FP} }{K}$

K	granularity
$N_p\times n$	`'variable'`
$N_p$	`'point'`
1	`'series'`

The evaluation of the model on the dataset is assembled in the private function _calculate_model_scores(). This function takes in input the evaluations on the single series (each done with a fixed granularity) and it returns a dictionary with the structure already discussed: what changes are the values. If $N$ is the number of series in the dataset:

anomalies_count: $\sum_{i=1}^{N}N_d^{i}$
anomalies_ratio: $\frac{1}{N_A}\sum_{i=1}^{N_A}\frac{N_d^{i}}{N^{i}_{tot}}$.
If there are no anomalies, the value is "None"
false_positive_count: $\sum_{i=1}^{N}N_{FP}^{i}$
false_positive_ratio: $\frac{1}{N}\sum_{i=1}^{N}\frac{N_{FP}^{i} }{K}$

$N_A$ states for number of series in the dataset with at least 1 anomaly.

In the following, an example of evaluation using the MinMaxAnomalyDetector() and series generated with generate_timeseries_df():
series1

timestamp	value_1	value_2	anomaly_label	value_1_anomaly	value_2_anomaly
2025-06-10 14:00:00+00:00	0.000000	0.707107	None	0	0
2025-06-10 15:00:00+00:00	0.841471	0.977061	anomaly_2	0	1
2025-06-10 16:00:00+00:00	0.909297	0.348710	anomaly_1	1	0
2025-06-10 17:00:00+00:00	0.141120	-0.600243	None	0	0
2025-06-10 18:00:00+00:00	-0.756802	-0.997336	anomaly_1	1	1

series2

timestamp	value_1	value_2	anomaly_label	value_1_anomaly	value_2_anomaly
2025-06-10 14:00:00+00:00	0.000000	0.707107	anomaly_1	0	0
2025-06-10 15:00:00+00:00	0.841471	0.977061	anomaly_2	0	1
2025-06-10 16:00:00+00:00	0.909297	0.348710	anomaly_1	1	0
2025-06-10 17:00:00+00:00	0.141120	-0.600243	None	0	0
2025-06-10 18:00:00+00:00	-0.756802	-0.997336	anomaly_1	0	1
2025-06-10 19:00:00+00:00	-0.958924	-0.477482	None	1	0
2025-06-10 20:00:00+00:00	-0.279415	0.481366	None	0	0

series3

timestamp	value_1	value_2	anomaly_label	value_1_anomaly	value_2_anomaly
2025-06-10 14:00:00+00:00	0.000000	0.707107	None	1	0
2025-06-10 15:00:00+00:00	0.841471	0.977061	None	0	1
2025-06-10 16:00:00+00:00	0.909297	0.348710	None	1	1

The columns 'value_1_anomaly' and 'value_2_anomaly' are the result of model: 1 means anomalous, while 0 not anomalous.
Here the result of the evaluation on this dataset, using the 3 different granularities:

dataset = [series1, series2, series3]
minmax =  MinMaxAnomalyDetector()
models = {'my_model': minmax}
evaluator = Evaluator(test_data=dataset)

evaluation_result = evaluator.evaluate(models,granularity='variable')
{'anomalies_count': 7,
 'anomalies_ratio': 0.5208333333333333,
 'false_positives_count': 5,
 'false_positives_ratio': 0.24603174603174602}

evaluation_result = evaluator.evaluate(models,granularity='data_point')
{'anomalies_count': 6,
 'anomalies_ratio': 0.875,
 'false_positives_count': 4,
 'false_positives_ratio': 0.38095238095238093}

evaluation_result = evaluator.evaluate(models,granularity='series')
{'anomalies_count': 2,
 'anomalies_ratio': 1.0,
 'false_positives_count': 1,
 'false_positives_ratio': 0.3333333333333333}

Notes:

Some commits are about generating series inside the setUp() method in "test_evaluators.py", useful for testing the evaluation now and for the future.

For now, the evaluation is based on the following assumption: if the timestamp t is marked as anomalous, the anomaly is present in all variables making up the series.

This PR partially addresses #56: the evaluation does not already distinguish between different type of anomalies. This will be done in the future adding the breakdown switch.

The evaluate() takes in input a dictionary {'model_name': model_instance}, so it can evaluate more than one model on the same dataset. The output for multiple models is a nested dictionary: its primary keys are model names and the coupled values are a dictionaries as the ones showed above.

The names "series1" and "series2" have been changed to "series_1" and "series_2" inside test functions to be distinguished from series generated in the "setUp()" method

… "_calculate_model_scores()"

The new evaluation strategy has been used

The new version of the function have no more this argument

About the evaluation on a single series: Before, the calculation of the 'anomalies_ratio' value was a zero division for not anomalous series. Now, 'anomalies_ratio' value is "None" for not anomalous series. About the evaluation on a dataset: Now, the value 'anomalies_ratio' is calculated averaging only on anomalous series

When the series is not anomalous and when the dataset does not contain anomalous series

This argument set the evaluation strategy. Suported values are "flags" (default) and "events" (Not implemented)

sarusso · 2025-12-06T17:50:27Z

Just tried, it seems to works great!

Just one thing, I investigated common terminology and we should really change some naming in order to adopt more common nomenclature:

'anomalies_count' -> 'true_positives_count'
'anomalies_ratio' -> 'true_positives_rate'
'false_positives_ratio' -> 'false_positives_rate'

The main reason behind this is that it is quite unclear what anomalies_count and anomalies_ratio refer to, if the total anomalies in the dataset or the detected ones.

Then, when finishing implementing the breakdown:

spike_uv_true_positives_count
spike_uv_true_positives_rate
step_uv_true_positives_count
step_uv_ true_positives_rate
etc...

Also, can I make a few edits/fixes on the text of the PR?

agataben · 2025-12-06T17:55:13Z

Yes!

sarusso · 2025-12-08T12:36:57Z

Can you please change the branch name form fix/wrong_evaluation_strategy to fix/change_evaluation_strategy?

agataben · 2025-12-08T16:03:04Z

done, but the PR has been closed

sarusso · 2025-12-08T16:12:40Z

Right, my bad, sorry. Let's stick with fix/wrong_evaluation_strategy then.

agataben · 2025-12-08T16:14:59Z

ok

agataben added 16 commits December 2, 2025 18:41

Generate 2 series in "setUp()" method

7ae60a1

Change variables name inside test functions

4fca96d

The names "series1" and "series2" have been changed to "series_1" and "series_2" inside test functions to be distinguished from series generated in the "setUp()" method

Amend an error in the "setUp()" method

3fcdfd3

Fix the evaluation strategy in "_variable_granularity_evaluation()"

e711f71

Fix the evaluation strategy in "_point_granularity_evaluation()"

e4bf424

Add series generation in "setUp()" method

3b3f5ef

Fix the evaluation strategy in "_series_granularity_evaluation()" and…

b95d0c7

… "_calculate_model_scores()"

Add test on evaluation with granularity = 'data_point'

f74a36f

The new evaluation strategy has been used

Delete the argument "granularity" from "_calculate_model_scores()"

a83b7f7

The new version of the function have no more this argument

Add test on evaluation with granularity='variable'

0758d98

Add test on evaluation with granularity='series'

669c176

Delete tests on old evaluation strategy

6990662

Change the way anomalous series in the dataset are counted

bddce75

Set to "None" the anomalies_ratio in 2 particular cases

328f185

When the series is not anomalous and when the dataset does not contain anomalous series

Add check on available granularity

ab5a142

agataben changed the title ~~Fix the wrong evaluation strategy~~ Change the default evaluation strategy Dec 5, 2025

agataben added 2 commits December 5, 2025 18:50

Change 'data_point' to 'point'

05af0f1

Add "strategy" argument to function "evaluate()"

c15939a

This argument set the evaluation strategy. Suported values are "flags" (default) and "events" (Not implemented)

agataben mentioned this pull request Dec 7, 2025

Feature: evaluation breakdown #58

Merged

sarusso changed the title ~~Change the default evaluation strategy~~ Change the evaluation strategy Dec 8, 2025

agataben closed this Dec 8, 2025

agataben deleted the fix/wrong_evaluation_strategy branch December 8, 2025 15:53

agataben restored the fix/wrong_evaluation_strategy branch December 8, 2025 16:11

agataben reopened this Dec 8, 2025

sarusso changed the title ~~Change the evaluation strategy~~ Fix the evaluation strategy Dec 8, 2025

sarusso merged commit 475968a into ICSC-Spoke3:develop Dec 9, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix the evaluation strategy#57

Fix the evaluation strategy#57
sarusso merged 18 commits intoICSC-Spoke3:developfrom
agataben:fix/wrong_evaluation_strategy

agataben commented Dec 3, 2025 •

edited

Loading

Uh oh!

sarusso commented Dec 6, 2025 •

edited

Loading

Uh oh!

agataben commented Dec 6, 2025

Uh oh!

sarusso commented Dec 8, 2025

Uh oh!

agataben commented Dec 8, 2025

Uh oh!

sarusso commented Dec 8, 2025

Uh oh!

agataben commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

agataben commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sarusso commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agataben commented Dec 6, 2025

Uh oh!

sarusso commented Dec 8, 2025

Uh oh!

agataben commented Dec 8, 2025

Uh oh!

sarusso commented Dec 8, 2025

Uh oh!

agataben commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

agataben commented Dec 3, 2025 •

edited

Loading

sarusso commented Dec 6, 2025 •

edited

Loading