# Choosing data for SeriesBasedCesnetDataset

### Import

In [1]:
import logging
from datetime import datetime

from cesnet_tszoo.utils.enums import AgreggationType, SourceType, TimeFormat, DatasetType
from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.configs import SeriesBasedConfig # Series based dataset MUST use SeriesBasedConfig

### Setting logger

In [2]:
logging.basicConfig(
    level=logging.INFO,
    format="[%(asctime)s][%(name)s][%(levelname)s] - %(message)s")

### Preparing dataset

In [3]:
series_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTION_SUBNETS, aggregation=AgreggationType.AGG_1_HOUR, dataset_type=DatasetType.SERIES_BASED, display_details=True)

[2025-11-08 21:02:54,352][cesnet_dataset][INFO] - Dataset is series-based. Use cesnet_tszoo.configs.SeriesBasedConfig



Dataset details:

    AgreggationType.AGG_1_HOUR
        Time indices: range(0, 6717)
        Datetime: (datetime.datetime(2023, 10, 9, 0, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 7, 14, 21, 0, tzinfo=datetime.timezone.utc))

    SourceType.INSTITUTION_SUBNETS
        Time series indices: [0 1 2 3 4 ... 543 544 545 546 547], Length=548; use 'get_available_ts_indices' for full list
        Features with default values: {'n_flows': 0, 'n_packets': 0, 'n_bytes': 0, 'tcp_udp_ratio_packets': 0.5, 'tcp_udp_ratio_bytes': 0.5, 'dir_ratio_packets': 0.5, 'dir_ratio_bytes': 0.5, 'avg_duration': 0, 'avg_ttl': 0, 'sum_n_dest_asn': 0, 'avg_n_dest_asn': 0, 'std_n_dest_asn': 0, 'sum_n_dest_ports': 0, 'avg_n_dest_ports': 0, 'std_n_dest_ports': 0, 'sum_n_dest_ip': 0, 'avg_n_dest_ip': 0, 'std_n_dest_ip': 0}
        
        Additional data: ['ids_relationship', 'weekends_and_holidays']
        


### Selecting time period

- `time_period` sets time period for all sets (used time series).

#### Setting time period as "all"

- Sets time period for time series as a whole time period from dataset.

In [4]:
config = SeriesBasedConfig(time_period="all")
series_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-11-08 21:02:54,357][series_config][INFO] - Quick validation succeeded.
[2025-11-08 21:02:54,358][series_config][INFO] - Using all time series for all_ts because train_ts, val_ts, and test_ts are all set to None.
[2025-11-08 21:02:54,368][cesnet_dataset][INFO] - Updating config for all set.
100%|██████████| 548/548 [00:00<00:00, 620.11it/s]
[2025-11-08 21:02:55,273][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-08 21:02:55,274][cesnet_dataset][INFO] - Config initialized successfully.



Config Details:
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Train time series IDS: None
        Val time series IDS: None
        Test time series IDS None
        All time series IDS [0 1 2 3 4 ... 543 544 545 546 547], Length=548
    Time periods
        Time period: range(0, 6718)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 0.  0. ]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Fillers         
        Filler type: 

#### Setting time period with time indices

- Sets time period for time series as range of time indices.

In [5]:
config = SeriesBasedConfig(time_period=range(0, 2000))
series_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-11-08 21:02:55,286][series_config][INFO] - Quick validation succeeded.
[2025-11-08 21:02:55,289][series_config][INFO] - Using all time series for all_ts because train_ts, val_ts, and test_ts are all set to None.
[2025-11-08 21:02:55,299][cesnet_dataset][INFO] - Updating config for all set.
100%|██████████| 548/548 [00:00<00:00, 1768.18it/s]
[2025-11-08 21:02:55,629][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-08 21:02:55,629][cesnet_dataset][INFO] - Config initialized successfully.



Config Details:
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Train time series IDS: None
        Val time series IDS: None
        Test time series IDS None
        All time series IDS [0 1 2 3 4 ... 543 544 545 546 547], Length=548
    Time periods
        Time period: range(0, 2000)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 0.  0. ]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Fillers         
        Filler type: 

#### Setting time period with datetime

- Sets time period for time series with tuple of datetime objects.
- Datetime objects are expected to be of UTC.

In [6]:
config = SeriesBasedConfig(time_period=(datetime(2023, 10, 9, 0), datetime(2023, 11, 9, 23)))
series_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-11-08 21:02:55,639][series_config][INFO] - Quick validation succeeded.
[2025-11-08 21:02:55,640][series_config][INFO] - Using all time series for all_ts because train_ts, val_ts, and test_ts are all set to None.
[2025-11-08 21:02:55,646][cesnet_dataset][INFO] - Updating config for all set.
100%|██████████| 548/548 [00:00<00:00, 2273.08it/s]
[2025-11-08 21:02:55,906][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-08 21:02:55,906][cesnet_dataset][INFO] - Config initialized successfully.



Config Details:
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Train time series IDS: None
        Val time series IDS: None
        Test time series IDS None
        All time series IDS [0 1 2 3 4 ... 543 544 545 546 547], Length=548
    Time periods
        Time period: range(0, 767)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 0.  0. ]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Fillers         
        Filler type: N

#### Setting time period with percentage

- Sets time period for time series as a percentage of whole time period from dataset.
- Always starts from first time.
- Must be: 0 < `time_period` <= 1.

In [7]:
config = SeriesBasedConfig(time_period=0.5)
series_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-11-08 21:02:55,917][series_config][INFO] - Quick validation succeeded.
[2025-11-08 21:02:55,919][series_config][INFO] - Using all time series for all_ts because train_ts, val_ts, and test_ts are all set to None.
[2025-11-08 21:02:55,928][cesnet_dataset][INFO] - Updating config for all set.
100%|██████████| 548/548 [00:00<00:00, 1487.01it/s]
[2025-11-08 21:02:56,318][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-08 21:02:56,318][cesnet_dataset][INFO] - Config initialized successfully.



Config Details:
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Train time series IDS: None
        Val time series IDS: None
        Test time series IDS None
        All time series IDS [0 1 2 3 4 ... 543 544 545 546 547], Length=548
    Time periods
        Time period: range(0, 3359)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 0.  0. ]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Fillers         
        Filler type: 

### Creating train/val/test sets

- Sets how many time series will be in each set.
- You can leave any set value set as None.
- Can use `nan_threshold` to set how many nan values will be tolerated.
    - `nan_threshold` = 1.0, means that time series can be completely empty.
    - is applied after sets.

#### Setting sets with count of time series

- Sets time series in set with count.
- Each set will contain unique time series.
- Count must be greater than zero.
- Total sum of time series in sets must be smaller than number of time series in dataset.
- Is affected by `random_state`.
    - When `random_state` is set, sets will contain same time series.

In [8]:
config = SeriesBasedConfig(time_period=0.5, train_ts=54, val_ts=25, test_ts=10, random_state=None, nan_threshold=1.0)
series_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-11-08 21:02:56,329][series_config][INFO] - Quick validation succeeded.
[2025-11-08 21:02:56,391][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-08 21:02:56,392][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 54/54 [00:00<00:00, 1240.14it/s]
[2025-11-08 21:02:56,448][cesnet_dataset][INFO] - Updating config for val set.
100%|██████████| 25/25 [00:00<00:00, 1161.61it/s]
[2025-11-08 21:02:56,476][cesnet_dataset][INFO] - Updating config for test set.
100%|██████████| 10/10 [00:00<00:00, 1174.45it/s]
[2025-11-08 21:02:56,488][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-08 21:02:56,488][cesnet_dataset][INFO] - Config initialized successfully.



Config Details:
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Train time series IDS: [367  67 388 239  12 ... 483 136 358 162 479], Length=54
        Val time series IDS: [499 237 163 262 190 ... 361 383 325 378  95], Length=25
        Test time series IDS [421  34 447 102 126 329 230 245 495 542], Length=10
        All time series IDS [367  67 388 239  12 ... 329 230 245 495 542], Length=89
    Time periods
        Time period: range(0, 3359)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 

#### Setting sets with percentage of time series in dataset

- Sets time series in set with percentage of time series in dataset.
- Each set will contain unique time series.
- Percentage must be greater than 0.
- Total sum of set percentages must be smaller or equal to 1.0.
- Is affected by `random_state`.
    - When `random_state` is set, sets will contain same time series.

In [9]:
config = SeriesBasedConfig(time_period=0.5, train_ts=0.5, val_ts=0.2, test_ts=0.1, random_state=None, nan_threshold=1.0)
series_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-11-08 21:02:56,495][series_config][INFO] - Quick validation succeeded.
[2025-11-08 21:02:56,502][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-08 21:02:56,503][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 274/274 [00:00<00:00, 1342.10it/s]
[2025-11-08 21:02:56,728][cesnet_dataset][INFO] - Updating config for val set.
100%|██████████| 109/109 [00:00<00:00, 1074.36it/s]
[2025-11-08 21:02:56,842][cesnet_dataset][INFO] - Updating config for test set.
100%|██████████| 54/54 [00:00<00:00, 1093.74it/s]
[2025-11-08 21:02:56,896][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-08 21:02:56,897][cesnet_dataset][INFO] - Config initialized successfully.



Config Details:
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Train time series IDS: [286 322 441 257 249 ...  57  12 380 234 340], Length=274
        Val time series IDS: [  0 247 337 110 310 ...  97 174 315   1 350], Length=109
        Test time series IDS [499 514 398 172 347 ... 530 517  91 111 300], Length=54
        All time series IDS [286 322 441 257 249 ... 530 517  91 111 300], Length=437
    Time periods
        Time period: range(0, 3359)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0

#### Setting sets with specific time series indices

- Each set must have unique time series

In [10]:
config = SeriesBasedConfig(time_period=0.5, train_ts=[0,1,2,3,4], val_ts=[5,6,7,8,9], test_ts=[10,11,12,13,14], nan_threshold=1.0)
series_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-11-08 21:02:56,912][series_config][INFO] - Quick validation succeeded.
[2025-11-08 21:02:56,919][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-08 21:02:56,919][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 5/5 [00:00<00:00, 1249.72it/s]
[2025-11-08 21:02:56,934][cesnet_dataset][INFO] - Updating config for val set.
100%|██████████| 5/5 [00:00<00:00, 1249.87it/s]
[2025-11-08 21:02:56,943][cesnet_dataset][INFO] - Updating config for test set.
100%|██████████| 5/5 [00:00<00:00, 1249.94it/s]
[2025-11-08 21:02:56,950][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-08 21:02:56,950][cesnet_dataset][INFO] - Config initialized successfully.



Config Details:
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Train time series IDS: [0 1 2 3 4], Length=5
        Val time series IDS: [5 6 7 8 9], Length=5
        Test time series IDS [10 11 12 13 14], Length=5
        All time series IDS [0 1 2 3 4 ... 10 11 12 13 14], Length=15
    Time periods
        Time period: range(0, 3359)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 0.  0. ]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.

### Selecting features

- Affects which features will be returned when loading data.
- Setting `include_time` as True will add time to features that return when loading data.
- Setting `include_ts_id` as True will add time series id to features that return when loading data.

#### Setting features to take as "all"

In [11]:
config = SeriesBasedConfig(time_period=0.5, train_ts=54, val_ts=25, test_ts=10, features_to_take="all")
series_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-11-08 21:02:56,957][series_config][INFO] - Quick validation succeeded.
[2025-11-08 21:02:56,962][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-08 21:02:56,963][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 54/54 [00:00<00:00, 1252.99it/s]
[2025-11-08 21:02:57,017][cesnet_dataset][INFO] - Updating config for val set.
100%|██████████| 25/25 [00:00<00:00, 1211.46it/s]
[2025-11-08 21:02:57,044][cesnet_dataset][INFO] - Updating config for test set.
100%|██████████| 10/10 [00:00<00:00, 1082.62it/s]
[2025-11-08 21:02:57,056][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-08 21:02:57,056][cesnet_dataset][INFO] - Config initialized successfully.



Config Details:
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Train time series IDS: [480 456 363 426 236 ... 459 354 181 326 171], Length=54
        Val time series IDS: [137  35 232 123 545 ... 387  17 308 401  68], Length=25
        Test time series IDS [283 366 523 252  53 142 229 339 532 469], Length=10
        All time series IDS [480 456 363 426 236 ... 142 229 339 532 469], Length=89
    Time periods
        Time period: range(0, 3359)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 

#### Setting features via list

In [12]:
config = SeriesBasedConfig(time_period=0.5, train_ts=54, val_ts=25, test_ts=10, features_to_take=["n_flows", "n_packets"])
series_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-11-08 21:02:57,064][series_config][INFO] - Quick validation succeeded.
[2025-11-08 21:02:57,069][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-08 21:02:57,070][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 54/54 [00:00<00:00, 1429.44it/s]
[2025-11-08 21:02:57,122][cesnet_dataset][INFO] - Updating config for val set.
100%|██████████| 25/25 [00:00<00:00, 1314.50it/s]
[2025-11-08 21:02:57,148][cesnet_dataset][INFO] - Updating config for test set.
100%|██████████| 10/10 [00:00<00:00, 1227.59it/s]
[2025-11-08 21:02:57,159][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-08 21:02:57,160][cesnet_dataset][INFO] - Config initialized successfully.



Config Details:
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Train time series IDS: [266 137 127 311 104 ... 288 178 305 514 180], Length=54
        Val time series IDS: [399 455 317 271  25 ... 519  86 277 136 202], Length=25
        Test time series IDS [432 532 417 506 253 501 428 252  93 209], Length=10
        All time series IDS [266 137 127 311 104 ... 501 428 252  93 209], Length=89
    Time periods
        Time period: range(0, 3359)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Fillers         
        Filler type: NoFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type (train set): NoAnomalyHandler   
    Batch sizes
        Train batch size: 32
   

#### Including time and time series id

In [13]:
config = SeriesBasedConfig(time_period=0.5, train_ts=54, val_ts=25, test_ts=10, features_to_take=["n_flows", "n_packets"], include_time=True, include_ts_id=True, time_format=TimeFormat.ID_TIME)
series_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-11-08 21:02:57,168][series_config][INFO] - Quick validation succeeded.
[2025-11-08 21:02:57,174][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-08 21:02:57,175][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 54/54 [00:00<00:00, 1269.68it/s]
[2025-11-08 21:02:57,230][cesnet_dataset][INFO] - Updating config for val set.
100%|██████████| 25/25 [00:00<00:00, 1183.49it/s]
[2025-11-08 21:02:57,258][cesnet_dataset][INFO] - Updating config for test set.
100%|██████████| 10/10 [00:00<00:00, 1250.24it/s]
[2025-11-08 21:02:57,271][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-08 21:02:57,271][cesnet_dataset][INFO] - Config initialized successfully.



Config Details:
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Train time series IDS: [116 239 159 427 374 ... 212 138 516  93 251], Length=54
        Val time series IDS: [388 108 199 367 318 ... 228 358 463 417   8], Length=25
        Test time series IDS [ 16 273 495 409 529 396 137 218 400 382], Length=10
        All time series IDS [116 239 159 427 374 ... 396 137 218 400 382], Length=89
    Time periods
        Time period: range(0, 3359)
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Fillers         
        Filler type: NoFiller
    Transformers
        Transformer type: NoTransformer
    Anomaly handler
        Anomaly handler type (train set): NoAnomalyHandler   
    Batch sizes
        Train batch size: 32
   

### Selecting all set

#### All set when other sets are None

- All set will contain all time series from dataset.

In [14]:
config = SeriesBasedConfig(time_period=0.5, train_ts=None, val_ts=None, test_ts=None)
series_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-11-08 21:02:57,279][series_config][INFO] - Quick validation succeeded.
[2025-11-08 21:02:57,281][series_config][INFO] - Using all time series for all_ts because train_ts, val_ts, and test_ts are all set to None.
[2025-11-08 21:02:57,290][cesnet_dataset][INFO] - Updating config for all set.
100%|██████████| 548/548 [00:00<00:00, 1514.21it/s]
[2025-11-08 21:02:57,672][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-08 21:02:57,673][cesnet_dataset][INFO] - Config initialized successfully.



Config Details:
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Train time series IDS: None
        Val time series IDS: None
        Test time series IDS None
        All time series IDS [0 1 2 3 4 ... 543 544 545 546 547], Length=548
    Time periods
        Time period: range(0, 3359)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 0.  0. ]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Fillers         
        Filler type: 

#### All set when at least one other set is not None

- All set will contain all time series that were set by other sets.

In [15]:
config = SeriesBasedConfig(time_period=0.5, train_ts=54, val_ts=25, test_ts=10)
series_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-11-08 21:02:57,683][series_config][INFO] - Quick validation succeeded.
[2025-11-08 21:02:57,690][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-08 21:02:57,691][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 54/54 [00:00<00:00, 1265.23it/s]
[2025-11-08 21:02:57,746][cesnet_dataset][INFO] - Updating config for val set.
100%|██████████| 25/25 [00:00<00:00, 1189.41it/s]
[2025-11-08 21:02:57,773][cesnet_dataset][INFO] - Updating config for test set.
100%|██████████| 10/10 [00:00<00:00, 952.13it/s]
[2025-11-08 21:02:57,786][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-08 21:02:57,787][cesnet_dataset][INFO] - Config initialized successfully.



Config Details:
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Train time series IDS: [465 457 365 180 473 ...  84 197 333  74 384], Length=54
        Val time series IDS: [ 21 345 514 331 396 ... 315 140 417 439 171], Length=25
        Test time series IDS [390 506  91 510 494 157  24 401 451  29], Length=10
        All time series IDS [465 457 365 180 473 ... 157  24 401 451  29], Length=89
    Time periods
        Time period: range(0, 3359)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 