# Using custom handlers

This notebook will only use TimeBasedCesnetDataset, but all methods work almost the same way for other dataset types.

In [1]:
import numpy as np
import logging

from cesnet_tszoo.utils.enums import AgreggationType, SourceType, DatasetType
from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.configs import TimeBasedConfig # Time based dataset MUST use TimeBasedConfig

from cesnet_tszoo.utils.custom_handler import AllSeriesCustomHandler, PerSeriesCustomHandler, NoFitCustomHandler # For creating custom handlers

### Setting logger

In [2]:
logging.basicConfig(
    level=logging.INFO,
    format="[%(asctime)s][%(name)s][%(levelname)s] - %(message)s")

### Preparing dataset

In [3]:
time_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.IP_ADDRESSES_SAMPLE, aggregation=AgreggationType.AGG_10_MINUTES, dataset_type=DatasetType.TIME_BASED, display_details=True)

[2025-11-14 18:46:58,345][cesnet_dataset][INFO] - Dataset is time-based. Use cesnet_tszoo.configs.TimeBasedConfig



Dataset details:

    AgreggationType.AGG_10_MINUTES
        Time indices: range(0, 40297)
        Datetime: (datetime.datetime(2023, 10, 9, 0, 3, 49, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 7, 14, 21, 50, 52, tzinfo=datetime.timezone.utc))

    SourceType.IP_ADDRESSES_SAMPLE
        Time series indices: [ 11  20 101 103 118 ... 2003134 2008461 2011839 2022235 2044888], Length=1000; use 'get_available_ts_indices' for full list
        Features with default values: {'n_flows': 0, 'n_packets': 0, 'n_bytes': 0, 'n_dest_ip': 0, 'n_dest_asn': 0, 'n_dest_ports': 0, 'tcp_udp_ratio_packets': 0.5, 'tcp_udp_ratio_bytes': 0.5, 'dir_ratio_packets': 0.5, 'dir_ratio_bytes': 0.5, 'avg_duration': 0, 'avg_ttl': 0}
        
        Additional data: ['ids_relationship', 'weekends_and_holidays']
        


### Custom handlers

- Custom handlers are implemented as a class
    - Their main purpose is to allow creation of custom preprocessing steps
- There are three types of custom handlers: `AllSeriesCustomHandler`, `PerSeriesCustomHandler`, `NoFitCustomHandler`
- Custom handlers can be used by adding their type to `preprocessing_order` -> which will also define when they will be applied
- All custom handler types allow specifying to which set they can be applied
- You can change used custom handlers later with `update_dataset_config_and_initialize` or `set_preprocess_order`, by modifying `preprocessing_order` parameter.
- Take care that all custom handlers should be imported from other file when while using this library in Jupyter notebook. When not importing from other file/s use workers == 0.

#### AllSeriesCustomHandler

- One instance is created for all time series
- Must always be fitted on train set before use

In [4]:
class AllFitTest(AllSeriesCustomHandler):

    def __init__(self):
        self.count = 0
        super().__init__()

    def partial_fit(self, data: np.ndarray) -> None:
        self.count += 1

    def apply(self, data: np.ndarray) -> np.ndarray:
        data[:, :] = self.count
        return data

    @staticmethod
    def get_target_sets():
        return ["train"]


In [5]:
config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
                        nan_threshold=0.5, random_state=1500, preprocess_order=["handling_anomalies", "filling_gaps", "transforming", AllFitTest])
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:46:58,355][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:46:58,438][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:46:58,439][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 500/500 [00:00<00:00, 1000.39it/s]
[2025-11-14 18:46:59,023][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [182151  10158  65072  10196 338309 ... 175742 659213  11188  73422 483796], Length=53
    Time periods
        Train time periods: range(0, 20149)
        Val time periods: range(20149, 28208)
        Test time periods: range(28208, 32237)
        All time periods: range(0, 32237)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: NoAnomalyHa

In [6]:
time_based_dataset.get_train_df(workers=0).head(5)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,0.0,53.0,53.0
1,182151.0,1.0,53.0,53.0
2,182151.0,2.0,53.0,53.0
3,182151.0,3.0,53.0,53.0
4,182151.0,4.0,53.0,53.0


In [7]:
time_based_dataset.get_val_df(workers=0).head(5)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,20149.0,2.0,2.0
1,182151.0,20150.0,4.0,4.0
2,182151.0,20151.0,0.0,0.0
3,182151.0,20152.0,6.0,6.0
4,182151.0,20153.0,4.0,4.0


In [8]:
time_based_dataset.get_test_df(workers=0).head(5)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,28208.0,2.0,2.0
1,182151.0,28209.0,4.0,4.0
2,182151.0,28210.0,0.0,0.0
3,182151.0,28211.0,0.0,0.0
4,182151.0,28212.0,3.0,3.0


#### PerSeriesCustomHandler

- One instance is created per time series
- Must always be fitted on train set before use
- Supported only for Time-Based dataset

In [9]:
class PerFitTest(PerSeriesCustomHandler):

    def __init__(self):
        self.count = 0
        super().__init__()

    def fit(self, data: np.ndarray) -> None:
        self.count += 1

    def apply(self, data: np.ndarray) -> np.ndarray:
        data[:, :] = self.count
        return data

    @staticmethod
    def get_target_sets():
        return ["val"]


In [10]:
config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
                        nan_threshold=0.5, random_state=1500, preprocess_order=["handling_anomalies", "filling_gaps", "transforming", PerFitTest])
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:46:59,360][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:46:59,446][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:46:59,446][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 500/500 [00:00<00:00, 1019.84it/s]
[2025-11-14 18:46:59,963][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [182151  10158  65072  10196 338309 ... 175742 659213  11188  73422 483796], Length=53
    Time periods
        Train time periods: range(0, 20149)
        Val time periods: range(20149, 28208)
        Test time periods: range(28208, 32237)
        All time periods: range(0, 32237)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: NoAnomalyHa

In [11]:
time_based_dataset.get_train_df(workers=0).head(5)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,0.0,7.0,7.0
1,182151.0,1.0,4.0,4.0
2,182151.0,2.0,4.0,4.0
3,182151.0,3.0,5.0,5.0
4,182151.0,4.0,8.0,8.0


In [12]:
time_based_dataset.get_val_df(workers=0).head(5)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,20149.0,1.0,1.0
1,182151.0,20150.0,1.0,1.0
2,182151.0,20151.0,1.0,1.0
3,182151.0,20152.0,1.0,1.0
4,182151.0,20153.0,1.0,1.0


In [13]:
time_based_dataset.get_test_df(workers=0).head(5)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,28208.0,2.0,2.0
1,182151.0,28209.0,4.0,4.0
2,182151.0,28210.0,0.0,0.0
3,182151.0,28211.0,0.0,0.0
4,182151.0,28212.0,3.0,3.0


#### NoFitCustomHandler

- One instance is created per time series
- Does not require nor supports fitting

In [14]:
class NoFitTest(NoFitCustomHandler):
    def apply(self, data: np.ndarray) -> np.ndarray:
        data[:, :] = -1
        return data

    @staticmethod
    def get_target_sets():
        return ["test"]

In [15]:
config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
                        nan_threshold=0.5, random_state=1500, preprocess_order=["handling_anomalies", "filling_gaps", "transforming", NoFitTest])
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:47:00,285][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:47:00,370][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:47:00,371][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 500/500 [00:00<00:00, 1179.32it/s]
[2025-11-14 18:47:00,800][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [182151  10158  65072  10196 338309 ... 175742 659213  11188  73422 483796], Length=53
    Time periods
        Train time periods: range(0, 20149)
        Val time periods: range(20149, 28208)
        Test time periods: range(28208, 32237)
        All time periods: range(0, 32237)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: NoAnomalyHa

In [16]:
time_based_dataset.get_train_df(workers=0).head(5)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,0.0,7.0,7.0
1,182151.0,1.0,4.0,4.0
2,182151.0,2.0,4.0,4.0
3,182151.0,3.0,5.0,5.0
4,182151.0,4.0,8.0,8.0


In [17]:
time_based_dataset.get_val_df(workers=0).head(5)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,20149.0,2.0,2.0
1,182151.0,20150.0,4.0,4.0
2,182151.0,20151.0,0.0,0.0
3,182151.0,20152.0,6.0,6.0
4,182151.0,20153.0,4.0,4.0


In [18]:
time_based_dataset.get_test_df(workers=0).head(5)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,28208.0,-1.0,-1.0
1,182151.0,28209.0,-1.0,-1.0
2,182151.0,28210.0,-1.0,-1.0
3,182151.0,28211.0,-1.0,-1.0
4,182151.0,28212.0,-1.0,-1.0


#### Combined usage

- Custom handlers can be combined how many times is needed

In [19]:
config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
                        nan_threshold=0.5, random_state=1500, preprocess_order=["handling_anomalies", "filling_gaps", "transforming", AllFitTest, PerFitTest, NoFitTest])
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:47:01,102][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:47:01,186][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:47:01,187][cesnet_dataset][INFO] - Starting fitting cycle 1/2.
100%|██████████| 500/500 [00:00<00:00, 1065.12it/s]
[2025-11-14 18:47:01,675][cesnet_dataset][INFO] - Starting fitting cycle 2/2.
100%|██████████| 500/500 [00:00<00:00, 2681.25it/s]
[2025-11-14 18:47:01,946][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [182151  10158  65072  10196 338309 ... 175742 659213  11188  73422 483796], Length=53
    Time periods
        Train time periods: range(0, 20149)
        Val time periods: range(20149, 28208)
        Test time periods: range(28208, 32237)
        All time periods: range(0, 32237)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: NoAnomalyHa

In [20]:
time_based_dataset.get_train_df(workers=0).head(5)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,0.0,53.0,53.0
1,182151.0,1.0,53.0,53.0
2,182151.0,2.0,53.0,53.0
3,182151.0,3.0,53.0,53.0
4,182151.0,4.0,53.0,53.0


In [21]:
time_based_dataset.get_val_df(workers=0).head(5)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,20149.0,1.0,1.0
1,182151.0,20150.0,1.0,1.0
2,182151.0,20151.0,1.0,1.0
3,182151.0,20152.0,1.0,1.0
4,182151.0,20153.0,1.0,1.0


In [22]:
time_based_dataset.get_test_df(workers=0).head(5)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,28208.0,-1.0,-1.0
1,182151.0,28209.0,-1.0,-1.0
2,182151.0,28210.0,-1.0,-1.0
3,182151.0,28211.0,-1.0,-1.0
4,182151.0,28212.0,-1.0,-1.0


#### Changing when or if custom handlers are applied later

- You can change when or if is custom handler applied with `preprocess_order` parameter

In [23]:
config = TimeBasedConfig(ts_ids=500, train_time_period=0.5, val_time_period=0.2, test_time_period=0.1, features_to_take=['n_flows', 'n_packets'],
                        nan_threshold=0.5, random_state=1500, preprocess_order=["handling_anomalies", "filling_gaps", "transforming", AllFitTest, PerFitTest, NoFitTest])
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:47:02,257][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:47:02,342][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:47:02,343][cesnet_dataset][INFO] - Starting fitting cycle 1/2.
100%|██████████| 500/500 [00:00<00:00, 1080.85it/s]
[2025-11-14 18:47:02,824][cesnet_dataset][INFO] - Starting fitting cycle 2/2.
100%|██████████| 500/500 [00:00<00:00, 2921.31it/s]
[2025-11-14 18:47:03,029][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [182151  10158  65072  10196 338309 ... 175742 659213  11188  73422 483796], Length=53
    Time periods
        Train time periods: range(0, 20149)
        Val time periods: range(20149, 28208)
        Test time periods: range(28208, 32237)
        All time periods: range(0, 32237)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: NoAnomalyHa

And the later change can be done with:

In [24]:
time_based_dataset.update_dataset_config_and_initialize(preprocess_order=["handling_anomalies", AllFitTest, "filling_gaps", NoFitTest, "transforming", PerFitTest], workers=0)
# Or
time_based_dataset.set_preprocess_order(preprocess_order=["handling_anomalies", AllFitTest, "filling_gaps", NoFitTest, "transforming", PerFitTest], workers=0)

[2025-11-14 18:47:03,039][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:47:03,118][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:47:03,119][cesnet_dataset][INFO] - Starting fitting cycle 1/2.
100%|██████████| 53/53 [00:00<00:00, 369.70it/s]
[2025-11-14 18:47:03,268][cesnet_dataset][INFO] - Starting fitting cycle 2/2.
100%|██████████| 53/53 [00:00<00:00, 332.24it/s]
[2025-11-14 18:47:03,435][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:47:03,438][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:47:03,440][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:47:03,521][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:47:03,521][cesnet_dataset][INFO] - Starting fitting cycle 1/2.
100%|██████████| 53/53 [00:00<00:00, 376.85it/s]
[2025-11-14 18:47:03,719][cesnet_dataset][INFO] - St