# Using anomaly handlers

This notebook will only use TimeBasedCesnetDataset, but all methods work almost the same way for other dataset types.

### Import

In [1]:
import numpy as np
import logging

from cesnet_tszoo.utils.enums import AgreggationType, SourceType, AnomalyHandlerType, DatasetType
from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.configs import TimeBasedConfig # Time based dataset MUST use TimeBasedConfig

from cesnet_tszoo.utils.anomaly_handler import AnomalyHandler # For creating custom Anomaly handler

### Setting logger

In [2]:
logging.basicConfig(
    level=logging.INFO,
    format="[%(asctime)s][%(name)s][%(levelname)s] - %(message)s")

### Preparing dataset

In [3]:
time_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.IP_ADDRESSES_SAMPLE, aggregation=AgreggationType.AGG_10_MINUTES, dataset_type=DatasetType.TIME_BASED, display_details=True)

[2025-11-14 18:46:16,238][cesnet_dataset][INFO] - Dataset is time-based. Use cesnet_tszoo.configs.TimeBasedConfig



Dataset details:

    AgreggationType.AGG_10_MINUTES
        Time indices: range(0, 40297)
        Datetime: (datetime.datetime(2023, 10, 9, 0, 3, 49, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 7, 14, 21, 50, 52, tzinfo=datetime.timezone.utc))

    SourceType.IP_ADDRESSES_SAMPLE
        Time series indices: [ 11  20 101 103 118 ... 2003134 2008461 2011839 2022235 2044888], Length=1000; use 'get_available_ts_indices' for full list
        Features with default values: {'n_flows': 0, 'n_packets': 0, 'n_bytes': 0, 'n_dest_ip': 0, 'n_dest_asn': 0, 'n_dest_ports': 0, 'tcp_udp_ratio_packets': 0.5, 'tcp_udp_ratio_bytes': 0.5, 'dir_ratio_packets': 0.5, 'dir_ratio_bytes': 0.5, 'avg_duration': 0, 'avg_ttl': 0}
        
        Additional data: ['ids_relationship', 'weekends_and_holidays']
        


### Anomaly handlers

- Anomaly handlers are implemented as class.
    - You can create your own or use built-in one.
- Anomaly handler is applied before `default_values` and fillers took care of missing values (default preprocess order).
- Every time series in train set has its own anomaly handler instance.
- Anomaly handler must implement `fit` and `transform_anomalies`.
- To use anomaly handler, train set must be implemented.
- Anomaly handler will only be used on train set.
- You can change used anomaly handler later with `update_dataset_config_and_initialize` or `apply_anomaly_handler`.

#### Built-in

In [4]:
# Options

## Supported
AnomalyHandlerType.Z_SCORE
AnomalyHandlerType.INTERQUARTILE_RANGE

<AnomalyHandlerType.INTERQUARTILE_RANGE: 'interquartile_range'>

In [5]:
config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
                           handle_anomalies_with=AnomalyHandlerType.Z_SCORE, nan_threshold=0.5, random_state=1500)
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:46:16,251][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:46:16,337][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:46:16,338][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 500/500 [00:00<00:00, 843.60it/s]
[2025-11-14 18:46:17,011][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [182151  10158  65072  10196 338309 ... 175742 659213  11188  73422 483796], Length=53
    Time periods
        Train time periods: range(0, 20149)
        Val time periods: range(20149, 28208)
        Test time periods: range(28208, 32237)
        All time periods: range(0, 32237)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: ZScore     

In [6]:
time_based_dataset.get_train_df(workers=0).head(10)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,0.0,7.0,7.0
1,182151.0,1.0,4.0,4.0
2,182151.0,2.0,4.0,4.0
3,182151.0,3.0,5.0,5.0
4,182151.0,4.0,8.0,8.0
5,182151.0,5.0,3.0,3.0
6,182151.0,6.0,6.0,6.0
7,182151.0,7.0,4.0,4.0
8,182151.0,8.0,9.0,12.0
9,182151.0,9.0,0.0,0.0


Or later with:

In [7]:
time_based_dataset.update_dataset_config_and_initialize(handle_anomalies_with=AnomalyHandlerType.Z_SCORE, workers=0)
# Or
time_based_dataset.apply_anomaly_handler(handle_anomalies_with=AnomalyHandlerType.Z_SCORE, workers=0)

[2025-11-14 18:46:17,220][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:46:17,302][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:46:17,303][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 53/53 [00:00<00:00, 198.96it/s]
[2025-11-14 18:46:17,576][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:46:17,577][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:46:17,579][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:46:17,660][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:46:17,661][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 53/53 [00:00<00:00, 203.51it/s]
[2025-11-14 18:46:17,929][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:46:17,931][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2

#### Custom

- You can create your own custom anomaly handler. It is recommended to derive from AnomalyHandler base class.
- Take care that custom anomaly handler should be imported from other file when while using this library in Jupyter notebook. When not importing from other file/s use workers == 0.

In [8]:
class CustomAnomalyHandler(AnomalyHandler):
    def __init__(self):
        self.lower_bound = None
        self.upper_bound = None
        self.iqr = None

    def fit(self, data: np.ndarray) -> None:
        q25, q75 = np.percentile(data, [25, 75], axis=0)
        self.iqr = q75 - q25

        self.lower_bound = q25 - 1.5 * self.iqr
        self.upper_bound = q75 + 1.5 * self.iqr

    def transform_anomalies(self, data: np.ndarray) -> np.ndarray:
        mask_lower_outliers = data < self.lower_bound
        mask_upper_outliers = data > self.upper_bound

        data[mask_lower_outliers] = np.take(self.lower_bound, np.where(mask_lower_outliers)[1])
        data[mask_upper_outliers] = np.take(self.upper_bound, np.where(mask_upper_outliers)[1])       

In [9]:
config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
                           handle_anomalies_with=CustomAnomalyHandler, nan_threshold=0.5, random_state=1500)
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:46:17,942][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:46:18,024][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:46:18,025][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 500/500 [00:00<00:00, 917.18it/s]
[2025-11-14 18:46:18,592][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [182151  10158  65072  10196 338309 ... 175742 659213  11188  73422 483796], Length=53
    Time periods
        Train time periods: range(0, 20149)
        Val time periods: range(20149, 28208)
        Test time periods: range(28208, 32237)
        All time periods: range(0, 32237)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: CustomAnoma

In [10]:
time_based_dataset.get_train_df(workers=0).head(10)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,0.0,7.0,7.0
1,182151.0,1.0,4.0,4.0
2,182151.0,2.0,4.0,4.0
3,182151.0,3.0,5.0,5.0
4,182151.0,4.0,8.0,8.0
5,182151.0,5.0,3.0,3.0
6,182151.0,6.0,6.0,6.0
7,182151.0,7.0,4.0,4.0
8,182151.0,8.0,9.0,12.0
9,182151.0,9.0,0.0,0.0


Or later with:

In [11]:
time_based_dataset.update_dataset_config_and_initialize(handle_anomalies_with=CustomAnomalyHandler, workers=0)
# Or
time_based_dataset.apply_anomaly_handler(handle_anomalies_with=CustomAnomalyHandler, workers=0)

[2025-11-14 18:46:18,775][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:46:18,859][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:46:18,859][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 53/53 [00:00<00:00, 229.86it/s]
[2025-11-14 18:46:19,098][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:46:19,099][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:46:19,101][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:46:19,185][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:46:19,186][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 53/53 [00:00<00:00, 235.70it/s]
[2025-11-14 18:46:19,417][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:46:19,419][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2

#### Changing when is anomaly handler applied

- You can change when is a anomaly handler applied with `preprocess_order` parameter

In [12]:
config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
                           handle_anomalies_with=AnomalyHandlerType.Z_SCORE, nan_threshold=0.5, random_state=1500, preprocess_order=["handling_anomalies", "filling_gaps", "transforming"])
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:46:19,425][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:46:19,510][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:46:19,511][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 500/500 [00:00<00:00, 857.68it/s]
[2025-11-14 18:46:20,116][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [182151  10158  65072  10196 338309 ... 175742 659213  11188  73422 483796], Length=53
    Time periods
        Train time periods: range(0, 20149)
        Val time periods: range(20149, 28208)
        Test time periods: range(28208, 32237)
        All time periods: range(0, 32237)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: ZScore     

Or later with:

In [13]:
time_based_dataset.update_dataset_config_and_initialize(preprocess_order=["filling_gaps", "handling_anomalies", "transforming"], workers=0)
# Or
time_based_dataset.set_preprocess_order(preprocess_order=["filling_gaps", "handling_anomalies", "transforming"], workers=0)

[2025-11-14 18:46:20,126][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:46:20,212][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:46:20,213][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 53/53 [00:00<00:00, 191.77it/s]
[2025-11-14 18:46:20,496][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:46:20,498][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:46:20,500][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:46:20,584][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:46:20,585][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 53/53 [00:00<00:00, 201.72it/s]
[2025-11-14 18:46:20,855][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:46:20,906][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2