# Loading data with TimeBasedCesnetDataset

### Import

In [1]:
from tqdm import tqdm
import logging

from cesnet_tszoo.utils.enums import AgreggationType, SourceType, TimeFormat
from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.configs import TimeBasedConfig # Time based dataset MUST use TimeBasedConfig

### Setting logger

In [2]:
logging.basicConfig(
    level=logging.INFO,
    format="[%(asctime)s][%(name)s][%(levelname)s] - %(message)s")

### Preparing dataset

In [3]:
time_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.INSTITUTION_SUBNETS, aggregation=AgreggationType.AGG_1_HOUR, is_series_based=False, display_details=True)

[2025-08-05 19:46:49,514][wrapper_dataset][INFO] - Dataset is time-based. Use cesnet_tszoo.configs.TimeBasedConfig



Dataset details:

    AgreggationType.AGG_1_HOUR
        Time indices: range(0, 6717)
        Datetime: (datetime.datetime(2023, 10, 9, 0, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 7, 14, 21, 0, tzinfo=datetime.timezone.utc))

    SourceType.INSTITUTION_SUBNETS
        Time series indices: [0 1 2 3 4 ... 543 544 545 546 547], Length=548; use 'get_available_ts_indices' for full list
        Features with default values: {'n_flows': 0, 'n_packets': 0, 'n_bytes': 0, 'tcp_udp_ratio_packets': 0.5, 'tcp_udp_ratio_bytes': 0.5, 'dir_ratio_packets': 0.5, 'dir_ratio_bytes': 0.5, 'avg_duration': 0, 'avg_ttl': 0, 'sum_n_dest_asn': 0, 'avg_n_dest_asn': 0, 'std_n_dest_asn': 0, 'sum_n_dest_ports': 0, 'avg_n_dest_ports': 0, 'std_n_dest_ports': 0, 'sum_n_dest_ip': 0, 'avg_n_dest_ip': 0, 'std_n_dest_ip': 0}
        
        Additional data: ['ids_relationship', 'weekends_and_holidays']
        


### Loading data with DataLoader

- Load data using Pytorch Dataloader.
- Last batch is never dropped (unless sliding_window is used).
- Workers affect how many processes will be used for loading data for specific set.
    - Workers set to 0, means loading will be ran on main process.
    - Set workers can be overriden in `get_*_dataloader` with parameter `workers`.
- Batch size affect how many times for every time series will be in one batch (differs when sliding window is used).
- Batch consists of: (only when sliding window is not used).
    - When `time_format` is not TimeFormat.DATETIME, then batch is one Numpy array of shape `(ts_ids/test_ts_ids, batch_size, features_to_take + used ids)`.
    - When `time_format` is TimeFormat.DATETIME, then batch is a tuple: (Numpy array of shape `(ts_ids/test_ts_ids, batch_size, features_to_take + used ids (without time))`, Numpy array of shape `(batch_size)`)

In [4]:
config = TimeBasedConfig(ts_ids=54, train_time_period=0.5, val_time_period=0.3, test_time_period=0.2, test_ts_ids=22, features_to_take="all", time_format=TimeFormat.ID_TIME,
                         train_workers=0, val_workers=0, test_workers=0, all_workers=0, init_workers=0,
                         train_batch_size=32, val_batch_size=64, test_batch_size=128, all_batch_size=128)
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-08-05 19:46:49,523][config][INFO] - Quick validation succeeded.
[2025-08-05 19:46:49,545][config][INFO] - Finalization and validation completed successfully.
[2025-08-05 19:46:49,550][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
100%|██████████| 54/54 [00:00<00:00, 496.57it/s]
[2025-08-05 19:46:49,669][cesnet_dataset][INFO] - Updating config on test_other and selected time series.
100%|██████████| 22/22 [00:00<00:00, 878.77it/s]
[2025-08-05 19:46:49,697][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Time series IDS: [518 131 340  70 520 ... 530 118 307 155 265], Length=54
        Test time series IDS: [480  71  73 218 442 ... 126 156 246  20 185], Length=22
    Time periods
        Train time periods: range(0, 3359)
        Val time periods: range(3359, 5374)
        Test time periods: range(5374, 6717)
        All time periods: range(0, 6717)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 0.  0. ]
        Time series ID incl

You can also change set batch sizes later with `update_dataset_config_and_initialize` or `set_batch_sizes`.

In [5]:
time_based_dataset.update_dataset_config_and_initialize(train_batch_size=33, val_batch_size=65, test_batch_size="config", all_batch_size="config")
# Or
time_based_dataset.set_batch_sizes(train_batch_size=33, val_batch_size=65, test_batch_size="config", all_batch_size="config")

[2025-08-05 19:46:49,709][cesnet_dataset][INFO] - Re-initialization is not needed.
[2025-08-05 19:46:49,709][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-08-05 19:46:49,711][cesnet_dataset][INFO] - Re-initialization is not needed.
[2025-08-05 19:46:49,711][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-08-05 19:46:49,712][cesnet_dataset][INFO] - Batch sizes has been changed successfuly.


You can also change set workers later with `update_dataset_config_and_initialize` or `set_workers`.

In [6]:
time_based_dataset.update_dataset_config_and_initialize(train_workers=0, val_workers=0, test_workers=0, all_workers=0, init_workers=0)
# Or
time_based_dataset.set_workers(train_workers=0, val_workers=0, test_workers=0, all_workers=0, init_workers=0)

[2025-08-05 19:46:49,724][cesnet_dataset][INFO] - Re-initialization is not needed.
[2025-08-05 19:46:49,725][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-08-05 19:46:49,726][cesnet_dataset][INFO] - Re-initialization is not needed.
[2025-08-05 19:46:49,727][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-08-05 19:46:49,727][cesnet_dataset][INFO] - Workers has been changed successfuly.


#### Train set

- Affected by `train_batch_size`.
- Affected by `train_workers`.

In [7]:
dataloader = time_based_dataset.get_train_dataloader(workers="config")

batches = []

for batch in tqdm(dataloader):
    batches.append(batch)
    
display(batches[0].shape)

[2025-08-05 19:46:49,745][cesnet_dataset][INFO] - Created new cached train_dataloader.
100%|██████████| 102/102 [00:00<00:00, 196.59it/s]


(54, 33, 20)

#### Val set

- Affected by `val_batch_size`.
- Affected by `val_workers`.

In [8]:
dataloader = time_based_dataset.get_val_dataloader(workers="config")

batches = []

for batch in tqdm(dataloader):
    batches.append(batch)
    
display(batches[0].shape)

[2025-08-05 19:46:50,287][cesnet_dataset][INFO] - Created new cached val_dataloader.
100%|██████████| 31/31 [00:00<00:00, 124.64it/s]


(54, 65, 20)

#### Test set

- Affected by `test_batch_size`.
- Affected by `test_workers`.

In [9]:
dataloader = time_based_dataset.get_test_dataloader(workers="config")

batches = []

for batch in tqdm(dataloader):
    batches.append(batch)
    
display(batches[0].shape)

[2025-08-05 19:46:50,552][cesnet_dataset][INFO] - Created new cached test_dataloader.
100%|██████████| 11/11 [00:00<00:00, 91.89it/s]


(54, 128, 20)

##### When using test_ts_ids and test_time_period is set.

- Affected by `test_batch_size`.
- Affected by `test_workers`.

In [10]:
dataloader = time_based_dataset.get_test_other_dataloader(workers="config")

batches = []

for batch in tqdm(dataloader):
    batches.append(batch)
    
display(batches[0].shape)

[2025-08-05 19:46:50,694][cesnet_dataset][INFO] - Created new cached test_other_dataloader.
100%|██████████| 11/11 [00:00<00:00, 178.16it/s]


(22, 128, 20)

#### All set

- Affected by `all_batch_size`.
- Affected by `all_workers`.

In [11]:
dataloader = time_based_dataset.get_all_dataloader(workers="config")

batches = []

for batch in tqdm(dataloader):
    batches.append(batch)
    
display(batches[0].shape)

[2025-08-05 19:46:50,773][cesnet_dataset][INFO] - Created new cached all_dataloader.
100%|██████████| 53/53 [00:00<00:00, 164.75it/s]


(54, 128, 20)

#### Using time_format=TimeFormat.DATETIME

In [12]:
config = TimeBasedConfig(ts_ids=54, train_time_period=0.5, val_time_period=0.3, test_time_period=0.2, test_ts_ids=22, features_to_take="all", time_format=TimeFormat.DATETIME,
                         train_workers=0, val_workers=0, test_workers=0, all_workers=0, init_workers=0,
                         train_batch_size=32, val_batch_size=64, test_batch_size=128, all_batch_size=128)
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-08-05 19:46:51,109][config][INFO] - Quick validation succeeded.
[2025-08-05 19:46:51,137][config][INFO] - Finalization and validation completed successfully.
[2025-08-05 19:46:51,145][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
100%|██████████| 54/54 [00:00<00:00, 562.27it/s]
[2025-08-05 19:46:51,250][cesnet_dataset][INFO] - Updating config on test_other and selected time series.
100%|██████████| 22/22 [00:00<00:00, 1156.01it/s]
[2025-08-05 19:46:51,272][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Time series IDS: [542 419 488 497 156 ... 166 168 267 138 165], Length=54
        Test time series IDS: [461  19 490 204 486 ... 177 183  90 529 326], Length=22
    Time periods
        Train time periods: range(0, 3359)
        Val time periods: range(3359, 5374)
        Test time periods: range(5374, 6717)
        All time periods: range(0, 6717)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 0.  0. ]
        Time series ID incl

In [13]:
dataloader = time_based_dataset.get_train_dataloader(workers="config")

batches = []

for batch in tqdm(dataloader):
    batches.append(batch)
    
display(batches[0][0].shape) # data without time
display(batches[0][1].shape) # time

[2025-08-05 19:46:51,288][cesnet_dataset][INFO] - Created new cached train_dataloader.
100%|██████████| 105/105 [00:00<00:00, 207.90it/s]


(54, 32, 19)

(32,)

#### Specifying which time series to load

- Every `get_*_dataloader` has parameter `ts_id`.
    - When `ts_id` is None, then it returns as previous examples.
    - When `ts_id` is not None, then it returns only one time series of that specified id.

In [14]:
config = TimeBasedConfig(ts_ids=[177, 176, 319, 267], train_time_period=0.5, val_time_period=0.3, test_time_period=0.2, test_ts_ids=22, features_to_take="all", time_format=TimeFormat.ID_TIME,
                         train_workers=0, val_workers=0, test_workers=0, all_workers=0, init_workers=0,
                         train_batch_size=32, val_batch_size=64, test_batch_size=128, all_batch_size=128)
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-08-05 19:46:51,809][config][INFO] - Quick validation succeeded.
[2025-08-05 19:46:51,829][config][INFO] - Finalization and validation completed successfully.
[2025-08-05 19:46:51,834][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
100%|██████████| 4/4 [00:00<00:00, 567.32it/s]
[2025-08-05 19:46:51,848][cesnet_dataset][INFO] - Updating config on test_other and selected time series.
100%|██████████| 22/22 [00:00<00:00, 1157.17it/s]
[2025-08-05 19:46:51,870][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Time series IDS: [177 176 319 267], Length=4
        Test time series IDS: [293 468  27 311 332 ... 308  95  90 443 263], Length=22
    Time periods
        Train time periods: range(0, 3359)
        Val time periods: range(3359, 5374)
        Test time periods: range(5374, 6717)
        All time periods: range(0, 6717)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 0.  0. ]
        Time series ID included: True
        Time inclu

In [15]:
dataloader = time_based_dataset.get_train_dataloader(ts_id=177 , workers="config",)

batches = []

for batch in tqdm(dataloader):
    batches.append(batch)
    
display(batches[0].shape)

[2025-08-05 19:46:51,892][cesnet_dataset][INFO] - Created new cached train_dataloader.
100%|██████████| 105/105 [00:00<00:00, 1912.75it/s]


(1, 32, 20)

#### Sliding window

- Both `sliding_window_size` and `sliding_window_prediction_size` must be set if you want to use sliding window.
- Batch sizes are used for background caching.
- Batch consists of:
    - When `time_format` is not TimeFormat.DATETIME, then batch is a tuple: <br>
    ( <br>
        Numpy array of shape `(ts_ids/test_ts_ids, sliding_window_size, features_to_take + used ids)`, <br>
        Numpy array of shape `(ts_ids/test_ts_ids, sliding_window_prediction_size, features_to_take + used ids)` <br>
    ).
    - When `time_format` is TimeFormat.DATETIME, then batch is a tuple: <br>
    ( <br>
        Numpy array of shape `(ts_ids/test_ts_ids, sliding_window_size, features_to_take + used ids (without time))`, <br>
        Numpy array of shape `(ts_ids/test_ts_ids, sliding_window_prediction_size, features_to_take + used ids (without time))`, <br>
        Numpy array of shape `(sliding_window_size)` of time, <br>
        Numpy array of shape `(sliding_window_prediction_size)` of time, <br>
    ).
- You can modify sliding window step size with `sliding_window_step`
- You can use `set_shared_size` to set how many times time periods should share.
    - `val_time_period` takes from `train_time_period`
    - `test_time_period` takes from `val_time_period` or `train_time_period`

In [16]:
config = TimeBasedConfig(ts_ids=54, train_time_period=range(0, 1000), val_time_period=range(1000, 1500), test_time_period=range(1500, 2000), test_ts_ids=22, features_to_take=["n_flows"], time_format=TimeFormat.ID_TIME,
                         train_workers=0, val_workers=0, test_workers=0, all_workers=0, init_workers=0,
                         train_batch_size=32, val_batch_size=64, test_batch_size=128, all_batch_size=128,
                         sliding_window_size=22, sliding_window_prediction_size=2, sliding_window_step=2, set_shared_size=0.05)
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-08-05 19:46:51,965][config][INFO] - Quick validation succeeded.
[2025-08-05 19:46:52,028][config][INFO] - Finalization and validation completed successfully.
[2025-08-05 19:46:52,032][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
100%|██████████| 54/54 [00:00<00:00, 2222.52it/s]
[2025-08-05 19:46:52,063][cesnet_dataset][INFO] - Updating config on test_other and selected time series.
100%|██████████| 22/22 [00:00<00:00, 2368.14it/s]
[2025-08-05 19:46:52,073][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Time series IDS: [ 15 473 201 245 342 ... 528 513  68 301 487], Length=54
        Test time series IDS: [308 449 518 447 222 ... 470 191 493 287 170], Length=22
    Time periods
        Train time periods: range(0, 1000)
        Val time periods: range(665, 1500)
        Test time periods: range(1165, 2000)
        All time periods: range(0, 2000)
    Features
        Taken features: ['n_flows']
        Default values: [0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: 22
        Sliding window prediction size: 2
        Sliding window step size: 2
        Set shared size: 335
    Fillers
        Filler type: None
    Transformers
        Transformer type: None
    Batch sizes
        Train batch size: 32
  

In [17]:
dataloader = time_based_dataset.get_train_dataloader(workers="config")

batches = []

for sliding_window, sliding_window_prediction in tqdm(dataloader):
    batches.append((sliding_window, sliding_window_prediction))

[2025-08-05 19:46:52,093][cesnet_dataset][INFO] - Created new cached train_dataloader.
100%|██████████| 489/489 [00:00<00:00, 4487.20it/s]


You can also change sliding window parameters later with `update_dataset_config_and_initialize` or `set_sliding_window`.

In [18]:
time_based_dataset.update_dataset_config_and_initialize(sliding_window_size=22, sliding_window_prediction_size=3, sliding_window_step="config", set_shared_size="config", workers=0)
# Or
time_based_dataset.set_sliding_window(sliding_window_size=22, sliding_window_prediction_size=3, sliding_window_step="config", set_shared_size="config", workers=0)

[2025-08-05 19:46:52,213][cesnet_dataset][INFO] - Re-initialization is not needed.
[2025-08-05 19:46:52,214][cesnet_dataset][INFO] - Destroyed cached train_dataloader.
[2025-08-05 19:46:52,215][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-08-05 19:46:52,215][cesnet_dataset][INFO] - Re-initialization is not needed.
[2025-08-05 19:46:52,216][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-08-05 19:46:52,216][cesnet_dataset][INFO] - Sliding window values has been changed successfuly.


##### Using time_format=TimeFormat.DATETIME

In [19]:
config = TimeBasedConfig(ts_ids=54, train_time_period=range(0, 1000), val_time_period=range(978, 1500), test_time_period=range(1478, 2000), test_ts_ids=22, features_to_take=["n_flows"], time_format=TimeFormat.DATETIME,
                         train_workers=0, val_workers=0, test_workers=0, all_workers=0, init_workers=0,
                         train_batch_size=32, val_batch_size=64, test_batch_size=128, all_batch_size=128,
                         sliding_window_size=22, sliding_window_prediction_size=2, sliding_window_step=2, set_shared_size=100)
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-08-05 19:46:52,229][config][INFO] - Quick validation succeeded.
[2025-08-05 19:46:52,248][config][INFO] - Finalization and validation completed successfully.
[2025-08-05 19:46:52,251][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
100%|██████████| 54/54 [00:00<00:00, 2453.53it/s]
[2025-08-05 19:46:52,279][cesnet_dataset][INFO] - Updating config on test_other and selected time series.
100%|██████████| 22/22 [00:00<00:00, 3143.51it/s]
[2025-08-05 19:46:52,289][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Time series IDS: [221  39 254 334 274 ...  45  13  94 520 417], Length=54
        Test time series IDS: [ 60  69 203 404 270 ... 112 241 516 445 136], Length=22
    Time periods
        Train time periods: range(0, 1000)
        Val time periods: range(900, 1500)
        Test time periods: range(1400, 2000)
        All time periods: range(0, 2000)
    Features
        Taken features: ['n_flows']
        Default values: [0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.DATETIME
    Sliding window
        Sliding window size: 22
        Sliding window prediction size: 2
        Sliding window step size: 2
        Set shared size: 100
    Fillers
        Filler type: None
    Transformers
        Transformer type: None
    Batch sizes
        Train batch size: 32
 

In [20]:
dataloader = time_based_dataset.get_train_dataloader(workers="config")

batches = []

for sliding_window, sliding_window_prediction, sliding_window_times, sliding_window_prediction_times in tqdm(dataloader):
    batches.append((sliding_window, sliding_window_prediction, sliding_window_times, sliding_window_prediction_times))

[2025-08-05 19:46:52,310][cesnet_dataset][INFO] - Created new cached train_dataloader.
100%|██████████| 489/489 [00:00<00:00, 4812.58it/s]


### Loading data as Dataframe

- Batch size has no effect.
- Sliding window has no effect.
- Returns every time series in `ts_ids`/`test_ts_ids` with sets specified time period.
- Data is returned as Pandas Dataframe.
- Workers affect how many processes will be used for loading data for specific set.
    - Workers set to 0, means loading will be ran on main process.
    - Set workers can be overriden in `get_*_df` with parameter `workers`.

In [21]:
config = TimeBasedConfig(ts_ids=54, train_time_period=0.5, val_time_period=0.3, test_time_period=0.2, test_ts_ids=22, features_to_take="all", time_format=TimeFormat.ID_TIME,
                         train_workers=0, val_workers=0, test_workers=0, all_workers=0, init_workers=0)
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-08-05 19:46:52,431][config][INFO] - Quick validation succeeded.
[2025-08-05 19:46:52,450][config][INFO] - Finalization and validation completed successfully.
[2025-08-05 19:46:52,454][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
100%|██████████| 54/54 [00:00<00:00, 642.96it/s]
[2025-08-05 19:46:52,545][cesnet_dataset][INFO] - Updating config on test_other and selected time series.
100%|██████████| 22/22 [00:00<00:00, 1912.19it/s]
[2025-08-05 19:46:52,557][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Time series IDS: [304 108 528 148 273 ... 470  64 253 307 462], Length=54
        Test time series IDS: [526 313  85 162 374 ... 455 234   0  45 424], Length=22
    Time periods
        Train time periods: range(0, 3359)
        Val time periods: range(3359, 5374)
        Test time periods: range(5374, 6717)
        All time periods: range(0, 6717)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 0.  0. ]
        Time series ID incl

#### Train set

- Affected by `train_workers`.

In [22]:
df = time_based_dataset.get_train_df(as_single_dataframe=True, workers="config")
dfs = time_based_dataset.get_train_df(as_single_dataframe=False, workers="config")

df.head(10)

Unnamed: 0,id_institution_subnet,id_time,n_flows,n_packets,n_bytes,sum_n_dest_asn,avg_n_dest_asn,std_n_dest_asn,sum_n_dest_ports,avg_n_dest_ports,std_n_dest_ports,sum_n_dest_ip,avg_n_dest_ip,std_n_dest_ip,tcp_udp_ratio_packets,tcp_udp_ratio_bytes,dir_ratio_packets,dir_ratio_bytes,avg_duration,avg_ttl
0,304.0,0.0,1909.0,46862.0,33268534.0,606.0,9.62,6.83,1140.0,18.1,20.75,885.0,14.05,14.44,0.839844,0.830078,0.439941,0.449951,10.0,123.309998
1,304.0,1.0,2040.0,198300.0,226934054.0,685.0,11.05,7.8,1145.0,18.469999,21.57,994.0,16.030001,16.35,0.879883,0.879883,0.439941,0.439941,10.21,120.699997
2,304.0,2.0,2489.0,47309.0,28702391.0,682.0,10.83,8.46,1023.0,16.24,16.620001,1124.0,17.84,22.32,0.870117,0.879883,0.439941,0.429932,7.8,113.139999
3,304.0,3.0,8042.0,575831.0,480684476.0,899.0,14.74,15.97,1221.0,20.02,18.959999,2688.0,44.07,93.019997,0.859863,0.850098,0.47998,0.439941,8.59,113.18
4,304.0,4.0,13003.0,968057.0,867045007.0,1096.0,19.23,22.610001,903.0,15.84,14.43,4167.0,73.110001,136.070007,0.850098,0.839844,0.439941,0.389893,9.68,108.93
5,304.0,5.0,11886.0,826750.0,683669778.0,1066.0,17.77,24.139999,841.0,14.02,12.89,4181.0,69.68,144.300003,0.810059,0.790039,0.469971,0.419922,9.78,105.949997
6,304.0,6.0,10368.0,799596.0,567231893.0,986.0,18.6,22.35,816.0,15.4,14.41,3740.0,70.57,139.520004,0.830078,0.819824,0.439941,0.389893,6.8,98.330002
7,304.0,7.0,10163.0,653004.0,613372257.0,1022.0,17.030001,22.049999,965.0,16.08,18.99,3646.0,60.77,124.57,0.890137,0.899902,0.459961,0.419922,8.81,110.660004
8,304.0,8.0,8657.0,686996.0,641084557.0,931.0,15.52,18.17,876.0,14.6,16.42,3246.0,54.099998,102.980003,0.819824,0.810059,0.47998,0.439941,7.75,105.849998
9,304.0,9.0,8793.0,454893.0,353144692.0,1041.0,17.950001,21.0,1193.0,20.57,25.309999,3343.0,57.639999,110.089996,0.850098,0.850098,0.469971,0.469971,10.65,106.480003


In [23]:
dfs

[      id_institution_subnet  id_time  n_flows  n_packets      n_bytes  \
 0                     304.0      0.0   1909.0    46862.0   33268534.0   
 1                     304.0      1.0   2040.0   198300.0  226934054.0   
 2                     304.0      2.0   2489.0    47309.0   28702391.0   
 3                     304.0      3.0   8042.0   575831.0  480684476.0   
 4                     304.0      4.0  13003.0   968057.0  867045007.0   
 ...                     ...      ...      ...        ...          ...   
 3354                  304.0   3354.0   1494.0   102534.0  114264808.0   
 3355                  304.0   3355.0   2044.0   110127.0   82518332.0   
 3356                  304.0   3356.0   1698.0   140200.0  152364577.0   
 3357                  304.0   3357.0   1488.0    88010.0   86263575.0   
 3358                  304.0   3358.0   1668.0    20126.0    7467856.0   
 
       sum_n_dest_asn  avg_n_dest_asn  std_n_dest_asn  sum_n_dest_ports  \
 0              606.0            9.

#### Val set

- Affected by `val_workers`.

In [24]:
df = time_based_dataset.get_val_df(as_single_dataframe=True, workers="config")
dfs = time_based_dataset.get_val_df(as_single_dataframe=False, workers="config")

df.head(10)

Unnamed: 0,id_institution_subnet,id_time,n_flows,n_packets,n_bytes,sum_n_dest_asn,avg_n_dest_asn,std_n_dest_asn,sum_n_dest_ports,avg_n_dest_ports,std_n_dest_ports,sum_n_dest_ip,avg_n_dest_ip,std_n_dest_ip,tcp_udp_ratio_packets,tcp_udp_ratio_bytes,dir_ratio_packets,dir_ratio_bytes,avg_duration,avg_ttl
0,304.0,3359.0,1586.0,58242.0,62233140.0,482.0,7.9,6.0,1068.0,17.51,27.35,676.0,11.08,9.61,0.870117,0.859863,0.529785,0.549805,10.69,99.760002
1,304.0,3360.0,1293.0,14805.0,4019821.0,488.0,8.71,6.94,786.0,14.04,16.67,656.0,11.71,11.08,0.930176,0.930176,0.509766,0.52002,10.36,104.43
2,304.0,3361.0,1303.0,69750.0,7759548.0,447.0,7.98,6.67,665.0,11.88,12.48,624.0,11.14,10.64,0.910156,0.919922,0.52002,0.569824,14.73,101.099998
3,304.0,3362.0,1074.0,14454.0,6653263.0,435.0,7.63,5.91,696.0,12.21,12.51,610.0,10.7,9.37,0.899902,0.899902,0.509766,0.540039,10.19,105.099998
4,304.0,3363.0,1251.0,92595.0,93886930.0,464.0,7.61,6.05,770.0,12.62,14.39,638.0,10.46,9.4,0.870117,0.870117,0.509766,0.529785,9.24,112.919998
5,304.0,3364.0,1321.0,518933.0,543827900.0,419.0,7.91,6.31,567.0,10.7,9.62,657.0,12.4,12.67,0.870117,0.879883,0.47998,0.459961,7.51,98.709999
6,304.0,3365.0,5779.0,514718.0,505821800.0,670.0,12.88,14.53,886.0,17.040001,18.0,2071.0,39.830002,89.93,0.919922,0.930176,0.469971,0.429932,17.309999,104.220001
7,304.0,3366.0,11228.0,1369475.0,1285225000.0,929.0,16.02,20.33,724.0,12.48,11.28,4016.0,69.239998,138.729996,0.870117,0.870117,0.48999,0.439941,9.55,99.980003
8,304.0,3367.0,12403.0,1173847.0,1069809000.0,986.0,17.610001,22.15,790.0,14.11,13.33,4105.0,73.300003,137.800003,0.919922,0.910156,0.48999,0.459961,8.83,100.230003
9,304.0,3368.0,10976.0,1009392.0,989153800.0,899.0,16.35,20.68,745.0,13.55,12.39,4267.0,77.580002,161.550003,0.830078,0.819824,0.449951,0.439941,8.87,89.949997


In [25]:
dfs

[      id_institution_subnet  id_time  n_flows  n_packets      n_bytes  \
 0                     304.0   3359.0   1586.0    58242.0   62233144.0   
 1                     304.0   3360.0   1293.0    14805.0    4019821.0   
 2                     304.0   3361.0   1303.0    69750.0    7759548.0   
 3                     304.0   3362.0   1074.0    14454.0    6653263.0   
 4                     304.0   3363.0   1251.0    92595.0   93886932.0   
 ...                     ...      ...      ...        ...          ...   
 2010                  304.0   5369.0   1910.0   112057.0   75148456.0   
 2011                  304.0   5370.0   1995.0   120011.0  125218556.0   
 2012                  304.0   5371.0   1545.0    42502.0   29804196.0   
 2013                  304.0   5372.0   1677.0    49523.0   39567965.0   
 2014                  304.0   5373.0   2019.0    82989.0   73683874.0   
 
       sum_n_dest_asn  avg_n_dest_asn  std_n_dest_asn  sum_n_dest_ports  \
 0              482.0            7.

#### Test set

- Affected by `test_workers`.

In [26]:
df = time_based_dataset.get_test_df(as_single_dataframe=True, workers="config")
dfs = time_based_dataset.get_test_df(as_single_dataframe=False, workers="config")

df.head(10)

Unnamed: 0,id_institution_subnet,id_time,n_flows,n_packets,n_bytes,sum_n_dest_asn,avg_n_dest_asn,std_n_dest_asn,sum_n_dest_ports,avg_n_dest_ports,std_n_dest_ports,sum_n_dest_ip,avg_n_dest_ip,std_n_dest_ip,tcp_udp_ratio_packets,tcp_udp_ratio_bytes,dir_ratio_packets,dir_ratio_bytes,avg_duration,avg_ttl
0,304.0,5374.0,1786.0,67275.0,36958370.0,402.0,7.44,5.03,1287.0,23.83,40.07,762.0,14.11,13.27,0.890137,0.890137,0.5,0.52002,4.11,113.239998
1,304.0,5375.0,1780.0,39695.0,26967150.0,458.0,8.33,6.26,1295.0,23.549999,34.279999,757.0,13.76,12.49,0.899902,0.910156,0.459961,0.469971,7.01,106.080002
2,304.0,5376.0,1640.0,40092.0,30603940.0,446.0,8.11,5.92,1205.0,21.91,30.82,821.0,14.93,13.52,0.850098,0.830078,0.509766,0.529785,5.24,109.860001
3,304.0,5377.0,1557.0,94957.0,98800530.0,419.0,8.73,5.61,1118.0,23.290001,30.969999,801.0,16.690001,11.99,0.939941,0.930176,0.449951,0.469971,7.28,104.620003
4,304.0,5378.0,2357.0,104226.0,91147890.0,541.0,10.21,7.77,1127.0,21.26,28.780001,1132.0,21.360001,24.809999,0.850098,0.850098,0.459961,0.449951,5.06,95.239998
5,304.0,5379.0,6972.0,524267.0,503674300.0,757.0,14.56,16.370001,1057.0,20.33,22.360001,2722.0,52.349998,97.029999,0.810059,0.799805,0.469971,0.439941,6.81,99.599998
6,304.0,5380.0,13159.0,1122454.0,1025243000.0,905.0,16.450001,19.73,1083.0,19.690001,24.870001,4329.0,78.709999,148.100006,0.879883,0.890137,0.459961,0.399902,5.61,101.660004
7,304.0,5381.0,13217.0,1465672.0,1356218000.0,1014.0,17.190001,20.83,1078.0,18.27,20.360001,4968.0,84.199997,168.139999,0.839844,0.839844,0.509766,0.459961,8.87,98.050003
8,304.0,5382.0,11309.0,1311751.0,1283210000.0,922.0,18.82,20.120001,1031.0,21.040001,23.41,4467.0,91.160004,157.229996,0.870117,0.890137,0.449951,0.399902,9.22,93.050003
9,304.0,5383.0,10600.0,821471.0,718233000.0,895.0,16.27,19.24,1091.0,19.84,23.0,4225.0,76.82,148.509995,0.850098,0.850098,0.459961,0.399902,8.06,103.980003


In [27]:
dfs

[      id_institution_subnet  id_time  n_flows  n_packets      n_bytes  \
 0                     304.0   5374.0   1786.0    67275.0   36958367.0   
 1                     304.0   5375.0   1780.0    39695.0   26967153.0   
 2                     304.0   5376.0   1640.0    40092.0   30603943.0   
 3                     304.0   5377.0   1557.0    94957.0   98800526.0   
 4                     304.0   5378.0   2357.0   104226.0   91147892.0   
 ...                     ...      ...      ...        ...          ...   
 1338                  304.0   6712.0   2094.0   168447.0  178268708.0   
 1339                  304.0   6713.0   2117.0   161590.0  132454629.0   
 1340                  304.0   6714.0   2089.0   286379.0  291500925.0   
 1341                  304.0   6715.0   2061.0    95528.0   80289393.0   
 1342                  304.0   6716.0   1677.0    44479.0   34841189.0   
 
       sum_n_dest_asn  avg_n_dest_asn  std_n_dest_asn  sum_n_dest_ports  \
 0              402.0            7.

##### When using test_ts_ids and test_time_period is set.

- Affected by `test_workers`.

In [28]:
df = time_based_dataset.get_test_other_df(as_single_dataframe=True, workers="config")
dfs = time_based_dataset.get_test_other_df(as_single_dataframe=False, workers="config")

df.head(10)

Unnamed: 0,id_institution_subnet,id_time,n_flows,n_packets,n_bytes,sum_n_dest_asn,avg_n_dest_asn,std_n_dest_asn,sum_n_dest_ports,avg_n_dest_ports,std_n_dest_ports,sum_n_dest_ip,avg_n_dest_ip,std_n_dest_ip,tcp_udp_ratio_packets,tcp_udp_ratio_bytes,dir_ratio_packets,dir_ratio_bytes,avg_duration,avg_ttl
0,526.0,5374.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.0,0.0
1,526.0,5375.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.0,0.0
2,526.0,5376.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.0,0.0
3,526.0,5377.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.0,0.0
4,526.0,5378.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.0,0.0
5,526.0,5379.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.0,0.0
6,526.0,5380.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.0,0.0
7,526.0,5381.0,11.0,27.0,1797.0,6.0,3.0,0.0,7.0,3.5,0.71,11.0,5.5,0.71,0.280029,0.280029,0.23999,0.199951,41.91,210.520004
8,526.0,5382.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.0,0.0
9,526.0,5383.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.0,0.0


In [29]:
dfs

[      id_institution_subnet  id_time  n_flows  n_packets  n_bytes  \
 0                     526.0   5374.0      0.0        0.0      0.0   
 1                     526.0   5375.0      0.0        0.0      0.0   
 2                     526.0   5376.0      0.0        0.0      0.0   
 3                     526.0   5377.0      0.0        0.0      0.0   
 4                     526.0   5378.0      0.0        0.0      0.0   
 ...                     ...      ...      ...        ...      ...   
 1338                  526.0   6712.0     23.0       45.0   3009.0   
 1339                  526.0   6713.0      0.0        0.0      0.0   
 1340                  526.0   6714.0      0.0        0.0      0.0   
 1341                  526.0   6715.0      0.0        0.0      0.0   
 1342                  526.0   6716.0     47.0       61.0   2929.0   
 
       sum_n_dest_asn  avg_n_dest_asn  std_n_dest_asn  sum_n_dest_ports  \
 0                0.0            0.00            0.00               0.0   
 1      

#### All set

- Affected by `all_workers`.

In [30]:
df = time_based_dataset.get_all_df(as_single_dataframe=True, workers="config")
dfs = time_based_dataset.get_all_df(as_single_dataframe=False, workers="config")

df.head(10)

Unnamed: 0,id_institution_subnet,id_time,n_flows,n_packets,n_bytes,sum_n_dest_asn,avg_n_dest_asn,std_n_dest_asn,sum_n_dest_ports,avg_n_dest_ports,std_n_dest_ports,sum_n_dest_ip,avg_n_dest_ip,std_n_dest_ip,tcp_udp_ratio_packets,tcp_udp_ratio_bytes,dir_ratio_packets,dir_ratio_bytes,avg_duration,avg_ttl
0,304.0,0.0,1909.0,46862.0,33268534.0,606.0,9.62,6.83,1140.0,18.1,20.75,885.0,14.05,14.44,0.839844,0.830078,0.439941,0.449951,10.0,123.309998
1,304.0,1.0,2040.0,198300.0,226934054.0,685.0,11.05,7.8,1145.0,18.469999,21.57,994.0,16.030001,16.35,0.879883,0.879883,0.439941,0.439941,10.21,120.699997
2,304.0,2.0,2489.0,47309.0,28702391.0,682.0,10.83,8.46,1023.0,16.24,16.620001,1124.0,17.84,22.32,0.870117,0.879883,0.439941,0.429932,7.8,113.139999
3,304.0,3.0,8042.0,575831.0,480684476.0,899.0,14.74,15.97,1221.0,20.02,18.959999,2688.0,44.07,93.019997,0.859863,0.850098,0.47998,0.439941,8.59,113.18
4,304.0,4.0,13003.0,968057.0,867045007.0,1096.0,19.23,22.610001,903.0,15.84,14.43,4167.0,73.110001,136.070007,0.850098,0.839844,0.439941,0.389893,9.68,108.93
5,304.0,5.0,11886.0,826750.0,683669778.0,1066.0,17.77,24.139999,841.0,14.02,12.89,4181.0,69.68,144.300003,0.810059,0.790039,0.469971,0.419922,9.78,105.949997
6,304.0,6.0,10368.0,799596.0,567231893.0,986.0,18.6,22.35,816.0,15.4,14.41,3740.0,70.57,139.520004,0.830078,0.819824,0.439941,0.389893,6.8,98.330002
7,304.0,7.0,10163.0,653004.0,613372257.0,1022.0,17.030001,22.049999,965.0,16.08,18.99,3646.0,60.77,124.57,0.890137,0.899902,0.459961,0.419922,8.81,110.660004
8,304.0,8.0,8657.0,686996.0,641084557.0,931.0,15.52,18.17,876.0,14.6,16.42,3246.0,54.099998,102.980003,0.819824,0.810059,0.47998,0.439941,7.75,105.849998
9,304.0,9.0,8793.0,454893.0,353144692.0,1041.0,17.950001,21.0,1193.0,20.57,25.309999,3343.0,57.639999,110.089996,0.850098,0.850098,0.469971,0.469971,10.65,106.480003


In [31]:
dfs

[      id_institution_subnet  id_time  n_flows  n_packets      n_bytes  \
 0                     304.0      0.0   1909.0    46862.0   33268534.0   
 1                     304.0      1.0   2040.0   198300.0  226934054.0   
 2                     304.0      2.0   2489.0    47309.0   28702391.0   
 3                     304.0      3.0   8042.0   575831.0  480684476.0   
 4                     304.0      4.0  13003.0   968057.0  867045007.0   
 ...                     ...      ...      ...        ...          ...   
 6712                  304.0   6712.0   2094.0   168447.0  178268708.0   
 6713                  304.0   6713.0   2117.0   161590.0  132454629.0   
 6714                  304.0   6714.0   2089.0   286379.0  291500925.0   
 6715                  304.0   6715.0   2061.0    95528.0   80289393.0   
 6716                  304.0   6716.0   1677.0    44479.0   34841189.0   
 
       sum_n_dest_asn  avg_n_dest_asn  std_n_dest_asn  sum_n_dest_ports  \
 0              606.0            9.

### Loading data as singular Numpy array 

- Batch size has no effect.
- Sliding window has no effect.
- Returns every time series in `ts_ids`/`test_ts_ids` with sets specified time period.
- Data is returned as one Numpy array.
- Follows similar rules to Dataloader batches, regarding shape (excluding sliding window parameters).
- Workers affect how many processes will be used for loading data for specific set.
    - Workers set to 0, means loading will be ran on main process.
    - Set workers can be overriden in `get_*_numpy` with parameter `workers`.

In [32]:
config = TimeBasedConfig(ts_ids=54, train_time_period=0.5, val_time_period=0.3, test_time_period=0.2, test_ts_ids=22, features_to_take="all", time_format=TimeFormat.ID_TIME,
                         train_workers=0, val_workers=0, test_workers=0, all_workers=0, init_workers=0)
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-08-05 19:46:54,345][config][INFO] - Quick validation succeeded.
[2025-08-05 19:46:54,363][config][INFO] - Finalization and validation completed successfully.
[2025-08-05 19:46:54,367][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
100%|██████████| 54/54 [00:00<00:00, 730.12it/s]
[2025-08-05 19:46:54,446][cesnet_dataset][INFO] - Updating config on test_other and selected time series.
100%|██████████| 22/22 [00:00<00:00, 2585.09it/s]
[2025-08-05 19:46:54,457][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Time series IDS: [335 474 538  37 543 ... 270  70 385 531 189], Length=54
        Test time series IDS: [253 393  25 399 147 ... 412 513 504 164 541], Length=22
    Time periods
        Train time periods: range(0, 3359)
        Val time periods: range(3359, 5374)
        Test time periods: range(5374, 6717)
        All time periods: range(0, 6717)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 0.  0. ]
        Time series ID incl

#### Train set

- Affected by `train_workers`.

In [33]:
numpy_array = time_based_dataset.get_train_numpy(workers="config")

display(numpy_array.shape)

(54, 3359, 20)

#### Val set

- Affected by `val_workers`.

In [34]:
numpy_array = time_based_dataset.get_val_numpy(workers="config")

display(numpy_array.shape)

(54, 2015, 20)

#### Test set

- Affected by `test_workers`.

In [35]:
numpy_array = time_based_dataset.get_test_numpy(workers="config")

display(numpy_array.shape)

(54, 1343, 20)

##### When using test_ts_ids and test_time_period is set.

- Affected by `test_workers`.

In [36]:
numpy_array = time_based_dataset.get_test_other_numpy(workers="config")

display(numpy_array.shape)

(22, 1343, 20)

#### All set

- Affected by `all_workers`.

In [37]:
numpy_array = time_based_dataset.get_all_numpy(workers="config")

display(numpy_array.shape)

(54, 6717, 20)

#### Using time_format=TimeFormat.DATETIME

In [38]:
config = TimeBasedConfig(ts_ids=54, train_time_period=0.5, val_time_period=0.3, test_time_period=0.2, test_ts_ids=22, features_to_take="all", time_format=TimeFormat.DATETIME,
                         train_workers=0, val_workers=0, test_workers=0, all_workers=0, init_workers=0)
time_based_dataset.set_dataset_config_and_initialize(config, display_config_details=True, workers=0)

[2025-08-05 19:46:54,826][config][INFO] - Quick validation succeeded.
[2025-08-05 19:46:54,848][config][INFO] - Finalization and validation completed successfully.
[2025-08-05 19:46:54,853][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
100%|██████████| 54/54 [00:00<00:00, 1253.89it/s]
[2025-08-05 19:46:54,902][cesnet_dataset][INFO] - Updating config on test_other and selected time series.
100%|██████████| 22/22 [00:00<00:00, 2586.83it/s]
[2025-08-05 19:46:54,912][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.INSTITUTION_SUBNETS

    Time series
        Time series IDS: [446  70  20 263 543 ... 526 319 487 535 212], Length=54
        Test time series IDS: [427 219  82 399 104 ... 338 175 465 195 157], Length=22
    Time periods
        Train time periods: range(0, 3359)
        Val time periods: range(3359, 5374)
        Test time periods: range(5374, 6717)
        All time periods: range(0, 6717)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes', 'sum_n_dest_asn', 'avg_n_dest_asn', 'std_n_dest_asn', 'sum_n_dest_ports', 'avg_n_dest_ports', 'std_n_dest_ports', 'sum_n_dest_ip', 'avg_n_dest_ip', 'std_n_dest_ip', 'tcp_udp_ratio_packets', 'tcp_udp_ratio_bytes', 'dir_ratio_packets', 'dir_ratio_bytes', 'avg_duration', 'avg_ttl']
        Default values: [0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.5 0.5 0.5 0.5 0.  0. ]
        Time series ID incl

In [39]:
numpy_array, times = time_based_dataset.get_train_numpy(workers="config")

display(numpy_array.shape)
display(times)

(54, 3359, 19)

array([datetime.datetime(2023, 10, 9, 0, 0, tzinfo=datetime.timezone.utc),
       datetime.datetime(2023, 10, 9, 1, 0, tzinfo=datetime.timezone.utc),
       datetime.datetime(2023, 10, 9, 2, 0, tzinfo=datetime.timezone.utc),
       ...,
       datetime.datetime(2024, 2, 25, 20, 0, tzinfo=datetime.timezone.utc),
       datetime.datetime(2024, 2, 25, 21, 0, tzinfo=datetime.timezone.utc),
       datetime.datetime(2024, 2, 25, 22, 0, tzinfo=datetime.timezone.utc)],
      shape=(3359,), dtype=object)