# Handling missing data

This notebook will only use TimeBasedCesnetDataset, but all methods work almost the same way for other dataset types.

### Import

In [1]:
import logging
import numpy as np

from cesnet_tszoo.utils.enums import AgreggationType, SourceType, FillerType, DatasetType
from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.configs import TimeBasedConfig # Time based dataset MUST use TimeBasedConfig

from cesnet_tszoo.utils.filler import Filler # For creating custom Filler

### Setting logger

In [2]:
logging.basicConfig(
    level=logging.INFO,
    format="[%(asctime)s][%(name)s][%(levelname)s] - %(message)s")

### Preparing dataset

In [3]:
time_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.IP_ADDRESSES_SAMPLE, aggregation=AgreggationType.AGG_10_MINUTES, dataset_type=DatasetType.TIME_BASED, display_details=True)

[2025-11-14 18:36:40,118][cesnet_dataset][INFO] - Dataset is time-based. Use cesnet_tszoo.configs.TimeBasedConfig



Dataset details:

    AgreggationType.AGG_10_MINUTES
        Time indices: range(0, 40297)
        Datetime: (datetime.datetime(2023, 10, 9, 0, 3, 49, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 7, 14, 21, 50, 52, tzinfo=datetime.timezone.utc))

    SourceType.IP_ADDRESSES_SAMPLE
        Time series indices: [ 11  20 101 103 118 ... 2003134 2008461 2011839 2022235 2044888], Length=1000; use 'get_available_ts_indices' for full list
        Features with default values: {'n_flows': 0, 'n_packets': 0, 'n_bytes': 0, 'n_dest_ip': 0, 'n_dest_asn': 0, 'n_dest_ports': 0, 'tcp_udp_ratio_packets': 0.5, 'tcp_udp_ratio_bytes': 0.5, 'dir_ratio_packets': 0.5, 'dir_ratio_bytes': 0.5, 'avg_duration': 0, 'avg_ttl': 0}
        
        Additional data: ['ids_relationship', 'weekends_and_holidays']
        


### Default values

- Default values are set to missing values before filler is used.
- You can change used default values later with `update_dataset_config_and_initialize` or `set_default_values`.

#### Using default

- Default values are provided from used dataset.
- You can look at default values for each feature with `time_based_dataset.display_dataset_details()`.

In [4]:
config = TimeBasedConfig(ts_ids=[1200], train_time_period=range(0, 30), test_time_period=range(30, 80), features_to_take=['n_flows', 'n_packets'],
                         default_values="default")

time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:40,124][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:40,174][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:40,175][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<00:00, 995.56it/s]
[2025-11-14 18:36:40,181][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [1200], Length=1
    Time periods
        Train time periods: range(0, 30)
        Val time periods: None
        Test time periods: range(30, 80)
        All time periods: range(0, 80)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: NoAnomalyHandler        
    Batch sizes
        Train batch size: 32
        Val batch size: 64
        Tes

In [5]:
time_based_dataset.get_train_df(workers=0).iloc[:30]

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,1200.0,0.0,0.0,0.0
1,1200.0,1.0,0.0,0.0
2,1200.0,2.0,0.0,0.0
3,1200.0,3.0,0.0,0.0
4,1200.0,4.0,0.0,0.0
5,1200.0,5.0,0.0,0.0
6,1200.0,6.0,0.0,0.0
7,1200.0,7.0,0.0,0.0
8,1200.0,8.0,0.0,0.0
9,1200.0,9.0,4.0,4.0


Or later with:

In [6]:
time_based_dataset.update_dataset_config_and_initialize(default_values="default", workers=0)
# Or
time_based_dataset.set_default_values(default_values="default", workers=0)

[2025-11-14 18:36:40,208][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:40,310][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:40,310][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<00:00, 1000.55it/s]
[2025-11-14 18:36:40,317][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:40,317][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:36:40,318][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:40,366][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:40,366][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<00:00, 999.12it/s]
[2025-11-14 18:36:40,371][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:40,372][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025

#### Setting default_values as None

In [7]:
config = TimeBasedConfig(ts_ids=[1200], train_time_period=range(0, 30), test_time_period=range(30, 80), features_to_take=['n_flows', 'n_packets'],
                         default_values=None)

time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:40,378][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:40,430][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:40,431][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<?, ?it/s]
[2025-11-14 18:36:40,436][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [1200], Length=1
    Time periods
        Train time periods: range(0, 30)
        Val time periods: None
        Test time periods: range(30, 80)
        All time periods: range(0, 80)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [nan nan]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: NoAnomalyHandler        
    Batch sizes
        Train batch size: 32
        Val batch size: 64
        T

In [8]:
time_based_dataset.get_train_df(workers=0).iloc[:30]

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,1200.0,0.0,,
1,1200.0,1.0,,
2,1200.0,2.0,,
3,1200.0,3.0,,
4,1200.0,4.0,,
5,1200.0,5.0,,
6,1200.0,6.0,,
7,1200.0,7.0,,
8,1200.0,8.0,,
9,1200.0,9.0,4.0,4.0


Or later with:

In [9]:
time_based_dataset.update_dataset_config_and_initialize(default_values=None, workers=0)
# Or
time_based_dataset.set_default_values(default_values=None, workers=0)

[2025-11-14 18:36:40,457][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:40,504][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:40,505][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<00:00, 999.83it/s]
[2025-11-14 18:36:40,510][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:40,510][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:36:40,511][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:40,560][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:40,560][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<00:00, 999.12it/s]
[2025-11-14 18:36:40,565][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:40,566][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-

#### Setting default_values with single number

In [10]:
config = TimeBasedConfig(ts_ids=[1200], train_time_period=range(0, 30), test_time_period=range(30, 80), features_to_take=['n_flows', 'n_packets'],
                         default_values=0)

time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:40,570][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:40,618][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:40,619][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<00:00, 998.17it/s]
[2025-11-14 18:36:40,624][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [1200], Length=1
    Time periods
        Train time periods: range(0, 30)
        Val time periods: None
        Test time periods: range(30, 80)
        All time periods: range(0, 80)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: NoAnomalyHandler        
    Batch sizes
        Train batch size: 32
        Val batch size: 64
        Tes

In [11]:
time_based_dataset.get_train_df(workers=0).iloc[:30]

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,1200.0,0.0,0.0,0.0
1,1200.0,1.0,0.0,0.0
2,1200.0,2.0,0.0,0.0
3,1200.0,3.0,0.0,0.0
4,1200.0,4.0,0.0,0.0
5,1200.0,5.0,0.0,0.0
6,1200.0,6.0,0.0,0.0
7,1200.0,7.0,0.0,0.0
8,1200.0,8.0,0.0,0.0
9,1200.0,9.0,4.0,4.0


Or later with:

In [12]:
time_based_dataset.update_dataset_config_and_initialize(default_values=0, workers=0)
# Or
time_based_dataset.set_default_values(default_values=0, workers=0)

[2025-11-14 18:36:40,645][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:40,694][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:40,695][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<00:00, 1001.03it/s]
[2025-11-14 18:36:40,700][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:40,700][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:36:40,701][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:40,753][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:40,753][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<00:00, 999.12it/s]
[2025-11-14 18:36:40,759][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:40,759][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025

#### Setting default_values with list

- Position of values in list correspond to order of features in `features_to_take`.
- Number of values in list must be equal to number of used features.

In [13]:
config = TimeBasedConfig(ts_ids=[1200], train_time_period=range(0, 30), test_time_period=range(30, 80), features_to_take=['n_flows', 'n_packets'],
                         default_values=[1, None])

time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:40,764][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:40,813][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:40,814][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<?, ?it/s]
[2025-11-14 18:36:40,819][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [1200], Length=1
    Time periods
        Train time periods: range(0, 30)
        Val time periods: None
        Test time periods: range(30, 80)
        All time periods: range(0, 80)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [ 1. nan]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: NoAnomalyHandler        
    Batch sizes
        Train batch size: 32
        Val batch size: 64
        T

In [14]:
time_based_dataset.get_train_df(workers=0).iloc[:30]

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,1200.0,0.0,1.0,
1,1200.0,1.0,1.0,
2,1200.0,2.0,1.0,
3,1200.0,3.0,1.0,
4,1200.0,4.0,1.0,
5,1200.0,5.0,1.0,
6,1200.0,6.0,1.0,
7,1200.0,7.0,1.0,
8,1200.0,8.0,1.0,
9,1200.0,9.0,4.0,4.0


Or later with:

In [15]:
time_based_dataset.update_dataset_config_and_initialize(default_values=[1, None], workers=0)
# Or
time_based_dataset.set_default_values(default_values=[1, None], workers=0)

[2025-11-14 18:36:40,841][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:40,890][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:40,891][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<?, ?it/s]
[2025-11-14 18:36:40,896][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:40,896][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:36:40,897][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:40,947][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:40,948][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<?, ?it/s]
[2025-11-14 18:36:40,953][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:40,953][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:36:40,953

#### Setting default_values with dictionary

- Dictionary must contain key and value for every feature in `features_to_take`.

In [16]:
config = TimeBasedConfig(ts_ids=[1200], train_time_period=range(0, 30), test_time_period=range(30, 80), features_to_take=['n_flows', 'n_packets'],
                         default_values={"n_flows" : 1, "n_packets": None})

time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:40,958][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:41,010][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:41,011][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<?, ?it/s]
[2025-11-14 18:36:41,016][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [1200], Length=1
    Time periods
        Train time periods: range(0, 30)
        Val time periods: None
        Test time periods: range(30, 80)
        All time periods: range(0, 80)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [ 1. nan]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: NoAnomalyHandler        
    Batch sizes
        Train batch size: 32
        Val batch size: 64
        T

In [17]:
time_based_dataset.get_train_df(workers=0).iloc[:30]

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,1200.0,0.0,1.0,
1,1200.0,1.0,1.0,
2,1200.0,2.0,1.0,
3,1200.0,3.0,1.0,
4,1200.0,4.0,1.0,
5,1200.0,5.0,1.0,
6,1200.0,6.0,1.0,
7,1200.0,7.0,1.0,
8,1200.0,8.0,1.0,
9,1200.0,9.0,4.0,4.0


Or later with:

In [18]:
time_based_dataset.update_dataset_config_and_initialize(default_values={"n_flows" : 1, "n_packets": None}, workers=0)
# Or
time_based_dataset.set_default_values(default_values={"n_flows" : 1, "n_packets": None}, workers=0)

[2025-11-14 18:36:41,038][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:41,089][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:41,089][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<?, ?it/s]
[2025-11-14 18:36:41,094][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:41,094][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:36:41,095][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:41,145][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:41,146][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<?, ?it/s]
[2025-11-14 18:36:41,150][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:41,150][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:36:41,151

### Fillers

- Fillers are implemented as classes.
    - You can create your own or use built-in one.
- One filler per time series is created.
- Filler is applied after default values and usually overrides them.
- You can change used filler later with `update_dataset_config_and_initialize` or `apply_filler`.

#### Built-in

In [19]:
# Options

FillerType.FORWARD_FILLER
FillerType.LINEAR_INTERPOLATION_FILLER
FillerType.MEAN_FILLER

<FillerType.MEAN_FILLER: 'mean_filler'>

In example below, you can see how `ForwardFiller` fills missing values, except those at the beginning which values are defined by default_values.

In [20]:
config = TimeBasedConfig(ts_ids=[1200], train_time_period=range(0, 30), test_time_period=range(30, 80), features_to_take=['n_flows', 'n_packets'],
                         default_values=None, fill_missing_with=FillerType.FORWARD_FILLER)

time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:41,160][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:41,211][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:41,212][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<00:00, 997.46it/s]
[2025-11-14 18:36:41,217][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [1200], Length=1
    Time periods
        Train time periods: range(0, 30)
        Val time periods: None
        Test time periods: range(30, 80)
        All time periods: range(0, 80)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [nan nan]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: ForwardFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: NoAnomalyHandler        
    Batch sizes
        Train batch size: 32
        Val batch size: 64
    

In [21]:
time_based_dataset.get_train_df(workers=0).iloc[:30]

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,1200.0,0.0,,
1,1200.0,1.0,,
2,1200.0,2.0,,
3,1200.0,3.0,,
4,1200.0,4.0,,
5,1200.0,5.0,,
6,1200.0,6.0,,
7,1200.0,7.0,,
8,1200.0,8.0,,
9,1200.0,9.0,4.0,4.0


Or later with:

In [22]:
time_based_dataset.update_dataset_config_and_initialize(fill_missing_with=FillerType.FORWARD_FILLER, workers=0)
# Or
time_based_dataset.apply_filler(FillerType.FORWARD_FILLER, workers=0)

[2025-11-14 18:36:41,241][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:41,292][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:41,292][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<?, ?it/s]
[2025-11-14 18:36:41,298][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:41,298][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:36:41,299][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:41,348][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:41,349][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<00:00, 999.12it/s]
[2025-11-14 18:36:41,354][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:41,355][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:

#### Custom

- You can create your own custom filler, which must derive from Filler base class.
- Take care that custom filler should be imported from other file when while using this library in Jupyter notebook. When not importing from other file/s use workers == 0.

In [23]:
class CustomFiller(Filler):
    def fill(self, batch_values: np.ndarray, mask: np.ndarray, **kwargs):
        batch_values[mask] = -1

In [24]:
config = TimeBasedConfig(ts_ids=[1200], train_time_period=range(0, 30), test_time_period=range(30, 80), features_to_take=['n_flows', 'n_packets'],
                         default_values=None, fill_missing_with=CustomFiller)

time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:41,364][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:41,416][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:41,416][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<00:00, 995.33it/s]
[2025-11-14 18:36:41,421][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [1200], Length=1
    Time periods
        Train time periods: range(0, 30)
        Val time periods: None
        Test time periods: range(30, 80)
        All time periods: range(0, 80)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [nan nan]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: CustomFiller (Custom)
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: NoAnomalyHandler        
    Batch sizes
        Train batch size: 32
        Val batch size:

In [25]:
time_based_dataset.get_train_df(workers=0).iloc[:30]

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,1200.0,0.0,-1.0,-1.0
1,1200.0,1.0,-1.0,-1.0
2,1200.0,2.0,-1.0,-1.0
3,1200.0,3.0,-1.0,-1.0
4,1200.0,4.0,-1.0,-1.0
5,1200.0,5.0,-1.0,-1.0
6,1200.0,6.0,-1.0,-1.0
7,1200.0,7.0,-1.0,-1.0
8,1200.0,8.0,-1.0,-1.0
9,1200.0,9.0,4.0,4.0


Or later with:

In [26]:
time_based_dataset.update_dataset_config_and_initialize(fill_missing_with=CustomFiller, workers=0)
# Or
time_based_dataset.apply_filler(CustomFiller, workers=0)

[2025-11-14 18:36:41,443][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:41,491][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:41,491][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<00:00, 1001.03it/s]
[2025-11-14 18:36:41,496][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:41,497][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:36:41,498][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:41,546][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:41,547][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<00:00, 996.04it/s]
[2025-11-14 18:36:41,553][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:41,553][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025

#### Only for TimeBasedCesnetDataset

Values are carried over from train -> val -> test. Look below at example.

In [27]:
config = TimeBasedConfig(ts_ids=[1200], train_time_period=range(0, 30), test_time_period=range(30, 80), features_to_take=['n_flows', 'n_packets'],
                         default_values=None, fill_missing_with=FillerType.FORWARD_FILLER)

time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:41,558][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:41,608][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:41,608][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<?, ?it/s]
[2025-11-14 18:36:41,614][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [1200], Length=1
    Time periods
        Train time periods: range(0, 30)
        Val time periods: None
        Test time periods: range(30, 80)
        All time periods: range(0, 80)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [nan nan]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: ForwardFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: NoAnomalyHandler        
    Batch sizes
        Train batch size: 32
        Val batch size: 64
    

In [28]:
time_based_dataset.get_train_df(workers=0).iloc[:30]

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,1200.0,0.0,,
1,1200.0,1.0,,
2,1200.0,2.0,,
3,1200.0,3.0,,
4,1200.0,4.0,,
5,1200.0,5.0,,
6,1200.0,6.0,,
7,1200.0,7.0,,
8,1200.0,8.0,,
9,1200.0,9.0,4.0,4.0


You can see that values for n_flows and n_packets were carried over from train to test.

In [29]:
time_based_dataset.get_test_df(workers=0).iloc[:30]

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,1200.0,30.0,6.0,6.0
1,1200.0,31.0,6.0,6.0
2,1200.0,32.0,6.0,6.0
3,1200.0,33.0,6.0,6.0
4,1200.0,34.0,6.0,6.0
5,1200.0,35.0,6.0,6.0
6,1200.0,36.0,6.0,6.0
7,1200.0,37.0,6.0,6.0
8,1200.0,38.0,6.0,6.0
9,1200.0,39.0,6.0,6.0


#### Changing when are missing values handled

- You can change when are `default_values` and filler applied with `preprocess_order` parameter
- `default_values` are always applied before filler and filler considers values filled with `default_values`, still as missing

In [30]:
config = TimeBasedConfig(ts_ids=[1200], train_time_period=range(0, 30), test_time_period=range(30, 80), features_to_take=['n_flows', 'n_packets'],
                         default_values=None, fill_missing_with=FillerType.FORWARD_FILLER, preprocess_order=["handling_anomalies", "filling_gaps", "transforming"])

time_based_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:41,663][time_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:41,714][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:41,715][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<?, ?it/s]
[2025-11-14 18:36:41,721][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [1200], Length=1
    Time periods
        Train time periods: range(0, 30)
        Val time periods: None
        Test time periods: range(30, 80)
        All time periods: range(0, 80)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [nan nan]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: ForwardFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type: NoAnomalyHandler        
    Batch sizes
        Train batch size: 32
        Val batch size: 64
    

Or later with:

In [31]:
time_based_dataset.update_dataset_config_and_initialize(preprocess_order=["filling_gaps", "handling_anomalies", "transforming"], workers=0)
# Or
time_based_dataset.set_preprocess_order(preprocess_order=["filling_gaps", "handling_anomalies", "transforming"], workers=0)

[2025-11-14 18:36:41,726][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:41,775][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:41,775][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<?, ?it/s]
[2025-11-14 18:36:41,781][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:41,782][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:36:41,782][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:41,832][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
[2025-11-14 18:36:41,833][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 1/1 [00:00<00:00, 998.64it/s]
[2025-11-14 18:36:41,838][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:41,839][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18: