# Benchmarks

This notebook will only use TimeBasedCesnetDataset, but all methods work almost the same way for SeriesBasedCesnetDataset.

### Import

In [1]:
import numpy as np
import logging
import os

from cesnet_tszoo.benchmarks import load_benchmark

from cesnet_tszoo.utils.enums import AgreggationType, SourceType, AnnotationType
from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.configs import TimeBasedConfig # Time based dataset MUST use TimeBasedConfig

from cesnet_tszoo.utils.scaler import Scaler # For creating custom Scaler
from cesnet_tszoo.utils.filler import Filler # For creating custom filler

### Setting logger

In [2]:
logging.basicConfig(
    level=logging.INFO,
    format="[%(asctime)s][%(name)s][%(levelname)s] - %(message)s")

### Benchmark structure

- Benchmarks can consist of various parts:
    - identifier of used config
    - identifier of used annotations (for each AnnotationType)
    - identifier of related_results (only available for built-in benchmarks)
    - Used SourceType and AggregationType
    - Database name (here it would be CESNET_TimeSeries24)
    - Whether config or annotations are built-in

### Exporting benchmarks

- You can use method `save_benchmark` to save benchmark.
- Saving benchmark creates YAML file, which hold metadata, at: `os.path.join(time_based_dataset.benchmarks_root, identifier)`.
- Saving benchmark automatically creates files for config and annotations with identifiers matching benchmark identifier
    - config will be saved at: `os.path.join(time_based_dataset.configs_root, identifier)`
    - annotations will be saved at: `os.path.join(time_based_dataset.annotations_root, identifier, str(AnnotationType))`
    - When parameter `force_write` is True, existing files with the same name will be overwritten.
- When using imported config or annotations, only their identifier will be passed to benchmark and no new files will get created
    - if calling anything that changes annotations, it will no longer be taken as imported
- Only annotations with at least one value will be exported.

In [3]:
time_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.IP_ADDRESSES_FULL, aggregation=AgreggationType.AGG_1_DAY, is_series_based=False, display_details=True)
config = TimeBasedConfig([1548925, 443967], train_time_period=1.0, features_to_take=["n_flows", "n_packets", "n_bytes"], scale_with=None)

time_based_dataset.set_dataset_config_and_initialize(config, workers=0, display_config_details=True)

[2025-04-09 11:41:17,580][wrapper_dataset][INFO] - Dataset is time-based. Use cesnet_tszoo.configs.TimeBasedConfig
[2025-04-09 11:41:17,581][config][INFO] - Quick validation succeeded.
[2025-04-09 11:41:17,592][config][INFO] - Finalization and validation completed successfully.
[2025-04-09 11:41:17,597][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.



Dataset details:

    AgreggationType.AGG_1_DAY
        Time indices: range(0, 279)
        Datetime: (datetime.datetime(2023, 10, 9, 0, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 7, 14, 0, 0, tzinfo=datetime.timezone.utc))

    SourceType.IP_ADDRESSES_FULL
        Time series indices: [ 3  5 10 11 12 ... 2051841 2051849 2051850 2051853 2055783], Length=275124; use 'get_available_ts_indices' for full list
        Features with default values: {'n_flows': 0, 'n_packets': 0, 'n_bytes': 0, 'tcp_udp_ratio_packets': 0.5, 'tcp_udp_ratio_bytes': 0.5, 'dir_ratio_packets': 0.5, 'dir_ratio_bytes': 0.5, 'avg_duration': 0, 'avg_ttl': 0, 'sum_n_dest_asn': 0, 'avg_n_dest_asn': 0, 'std_n_dest_asn': 0, 'sum_n_dest_ports': 0, 'avg_n_dest_ports': 0, 'std_n_dest_ports': 0, 'sum_n_dest_ip': 0, 'avg_n_dest_ip': 0, 'std_n_dest_ip': 0}
        
        Additional data: ['ids_relationship', 'weekends_and_holidays']
        


100%|██████████| 2/2 [00:00<00:00, 1998.24it/s]
[2025-04-09 11:41:17,601][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_DAY
    Source: SourceType.IP_ADDRESSES_FULL

    Time series
        Time series IDS: [1548925  443967], Length=2
        Test time series IDS: None
    Time periods
        Train time periods: range(0, 280)
        Val time periods: None
        Test time periods: None
        All time periods: range(0, 280)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes']
        Default values: [0. 0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: None
    Scalers
        Scaler type: None
    Batch sizes
        Train batch size: 32
        Val batch size: 64
        Test batch size: 128
        All batch size: 128
    De

In [4]:
time_based_dataset.save_benchmark(identifier="test1", force_write=True)

[2025-04-09 11:41:17,606][cesnet_dataset][INFO] - Config pickle saved to \some_directory\tszoo\configs\test1.pickle
[2025-04-09 11:41:17,607][cesnet_dataset][INFO] - Config details saved to \some_directory\tszoo\configs\test1.txt
[2025-04-09 11:41:17,607][cesnet_dataset][INFO] - Config successfully saved
[2025-04-09 11:41:17,608][cesnet_dataset][INFO] - Benchmark successfully saved to \some_directory\tszoo\benchmarks\test1.yaml


Here you can see structure of created YAML file.

In [5]:
with open(os.path.join(time_based_dataset.benchmarks_root, "test1.yaml")) as file:
    display(file.readlines())

['aggregation: 1_day\n',
 'annotations_both_identifier: null\n',
 'annotations_time_identifier: null\n',
 'annotations_ts_identifier: null\n',
 'config_identifier: test1\n',
 'database_name: CESNET-TimeSeries24\n',
 'description: null\n',
 'is_series_based: false\n',
 'related_results_identifier: null\n',
 'source_type: ip_addresses_full\n']

In [6]:
time_based_dataset.add_annotation(annotation="test_annotation3_3_0", annotation_group="test3", ts_id=3, id_time=0, enforce_ids=True)
time_based_dataset.add_annotation(annotation="test_annotation3_3_5", annotation_group="test3_2", ts_id=3, id_time=5, enforce_ids=True)
time_based_dataset.add_annotation(annotation="test_annotation3_5_0", annotation_group="test3", ts_id=5, id_time=0, enforce_ids=True)
time_based_dataset.add_annotation(annotation="test_annotation3_5_1", annotation_group="test3_2", ts_id=5, id_time=1, enforce_ids=True)
time_based_dataset.get_annotations(on=AnnotationType.BOTH)

Unnamed: 0,id_ip,id_time,test3,test3_2
0,3,0,test_annotation3_3_0,
1,5,0,test_annotation3_5_0,
2,3,5,,test_annotation3_3_5
3,5,1,,test_annotation3_5_1


In [7]:
time_based_dataset.save_benchmark(identifier="test2", force_write=True)

[2025-04-09 11:41:17,633][cesnet_dataset][INFO] - Using already existing config with identifier: test1
[2025-04-09 11:41:17,638][cesnet_dataset][INFO] - Annotations successfully saved to \some_directory\tszoo\annotations\test2_both.csv
[2025-04-09 11:41:17,639][cesnet_dataset][INFO] - Benchmark successfully saved to \some_directory\tszoo\benchmarks\test2.yaml


Here you can see structure of created YAML file, with annotations added.

In [8]:
with open(os.path.join(time_based_dataset.benchmarks_root, "test2.yaml")) as file:
    display(file.readlines())

['aggregation: 1_day\n',
 'annotations_both_identifier: test2_both\n',
 'annotations_time_identifier: null\n',
 'annotations_ts_identifier: null\n',
 'config_identifier: test1\n',
 'database_name: CESNET-TimeSeries24\n',
 'description: null\n',
 'is_series_based: false\n',
 'related_results_identifier: null\n',
 'source_type: ip_addresses_full\n']

#### Using custom scaler

- When using custom scaler, you must share benchmark (especially created config file), with custom scaler source code.

In [9]:
class CustomScaler(Scaler):
    def __init__(self):
        super().__init__()
        
        self.max = None
        self.min = None
    
    def transform(self, data):
        return (data - self.min) / (self.max - self.min)
    
    def fit(self, data):
        self.partial_fit(data)
    
    def partial_fit(self, data):
        
        if self.max is None and self.min is None:
            self.max = np.max(data, axis=0)
            self.min = np.min(data, axis=0)
            return
        
        temp_max = np.max(data, axis=0)
        temp = np.vstack((self.max, temp_max)) 
        self.max = np.max(temp, axis=0)
        
        temp_min = np.min(data, axis=0)
        temp = np.vstack((self.min, temp_min)) 
        self.min = np.min(temp, axis=0)            

In [10]:
config = TimeBasedConfig([1548925, 443967], train_time_period=1.0, features_to_take=["n_flows", "n_packets", "n_bytes"], scale_with=CustomScaler)

time_based_dataset.set_dataset_config_and_initialize(config, workers=0, display_config_details=True)

[2025-04-09 11:41:17,662][config][INFO] - Quick validation succeeded.
[2025-04-09 11:41:17,679][config][INFO] - Finalization and validation completed successfully.
[2025-04-09 11:41:17,683][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
100%|██████████| 2/2 [00:00<00:00, 1001.27it/s]
[2025-04-09 11:41:17,688][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_DAY
    Source: SourceType.IP_ADDRESSES_FULL

    Time series
        Time series IDS: [1548925  443967], Length=2
        Test time series IDS: None
    Time periods
        Train time periods: range(0, 280)
        Val time periods: None
        Test time periods: None
        All time periods: range(0, 280)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes']
        Default values: [0. 0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: None
    Scalers
        Scaler type: CustomScaler (Custom)
        Is scaler per Time series: True
        Are scalers premade: False
        Are premade scalers partial_fitted:

In [11]:
time_based_dataset.save_benchmark(identifier="test3", force_write=True)

[2025-04-09 11:41:17,695][cesnet_dataset][INFO] - Config pickle saved to \some_directory\tszoo\configs\test3.pickle
[2025-04-09 11:41:17,696][cesnet_dataset][INFO] - Config details saved to \some_directory\tszoo\configs\test3.txt
[2025-04-09 11:41:17,696][cesnet_dataset][INFO] - Config successfully saved
[2025-04-09 11:41:17,696][cesnet_dataset][INFO] - Using already existing annotations with identifier: test2_both; type: AnnotationType.BOTH
[2025-04-09 11:41:17,697][cesnet_dataset][INFO] - Benchmark successfully saved to \some_directory\tszoo\benchmarks\test3.yaml


#### Using custom filler

- When using custom filler, you must share benchmark (especially created config file), with custom filler source code.

In [12]:
class CustomFiller(Filler):
    def fill(self, batch_values: np.ndarray, existing_indices: np.ndarray, missing_indices: np.ndarray, **kwargs):
        batch_values[missing_indices] = -1

In [13]:
config = TimeBasedConfig([1548925, 443967], train_time_period=1.0, features_to_take=["n_flows", "n_packets", "n_bytes"], fill_missing_with=CustomFiller)

time_based_dataset.set_dataset_config_and_initialize(config, workers=0, display_config_details=True)

[2025-04-09 11:41:17,708][config][INFO] - Quick validation succeeded.
[2025-04-09 11:41:17,722][config][INFO] - Finalization and validation completed successfully.
[2025-04-09 11:41:17,726][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
100%|██████████| 2/2 [00:00<00:00, 1998.72it/s]
[2025-04-09 11:41:17,730][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_DAY
    Source: SourceType.IP_ADDRESSES_FULL

    Time series
        Time series IDS: [1548925  443967], Length=2
        Test time series IDS: None
    Time periods
        Train time periods: range(0, 280)
        Val time periods: None
        Test time periods: None
        All time periods: range(0, 280)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes']
        Default values: [0. 0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: CustomFiller (Custom)
    Scalers
        Scaler type: None
    Batch sizes
        Train batch size: 32
        Val batch size: 64
        Test batch size: 128
        All batch

In [14]:
time_based_dataset.save_benchmark(identifier="test4", force_write=True)

[2025-04-09 11:41:17,737][cesnet_dataset][INFO] - Config pickle saved to \some_directory\tszoo\configs\test4.pickle
[2025-04-09 11:41:17,738][cesnet_dataset][INFO] - Config details saved to \some_directory\tszoo\configs\test4.txt
[2025-04-09 11:41:17,738][cesnet_dataset][INFO] - Config successfully saved
[2025-04-09 11:41:17,739][cesnet_dataset][INFO] - Using already existing annotations with identifier: test2_both; type: AnnotationType.BOTH
[2025-04-09 11:41:17,740][cesnet_dataset][INFO] - Benchmark successfully saved to \some_directory\tszoo\benchmarks\test4.yaml


### Importing benchmarks

- You can import your own or built-in benchmark with `load_benchmark` function.
- First, it attempts to load the built-in benchmark, if no built-in benchmark with such an identifier exists, it attempts to load a custom benchmark from the `"data_root"/tszoo/benchmarks/` directory.
- When importing benchmark with annotations that exist, but are not downloaded, they will be downloaded (only works for built-in annotations).

#### Importing own benchmark

- Looks for benchmark at: `os.path.join("/some_directory/", "tszoo", "benchmarks", identifier)`.

In [15]:
benchmark = load_benchmark(identifier="test2", data_root="/some_directory/")
dataset = benchmark.get_initialized_dataset(display_config_details=True, check_errors=False, workers="config")

[2025-04-09 11:41:17,746][benchmark][INFO] - Custom benchmark found: test2. Loading it.
[2025-04-09 11:41:17,747][benchmark][INFO] - Loaded benchmark 'test2' with description: 'None'.
[2025-04-09 11:41:17,756][wrapper_dataset][INFO] - Dataset is time-based. Use cesnet_tszoo.configs.TimeBasedConfig
[2025-04-09 11:41:17,758][benchmark][INFO] - Custom config found: test2. Loading it.
[2025-04-09 11:41:17,758][benchmark][INFO] - No AnnotationType.TS_ID annotations found.
[2025-04-09 11:41:17,759][benchmark][INFO] - No AnnotationType.ID_TIME annotations found.
[2025-04-09 11:41:17,762][cesnet_dataset][INFO] - Custom annotations found: test2_both.
[2025-04-09 11:41:17,764][cesnet_dataset][INFO] - Annotations detected as AnnotationType.BOTH (both id_ip and id_time)
[2025-04-09 11:41:17,765][cesnet_dataset][INFO] - Successfully imported annotations from \some_directory\tszoo\annotations\test2_both.csv
[2025-04-09 11:41:17,765][benchmark][INFO] - As benchmark 'test2' is custom, related results 


Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_DAY
    Source: SourceType.IP_ADDRESSES_FULL

    Time series
        Time series IDS: [1548925  443967], Length=2
        Test time series IDS: None
    Time periods
        Train time periods: range(0, 280)
        Val time periods: None
        Test time periods: None
        All time periods: range(0, 280)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes']
        Default values: [0. 0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: None
    Scalers
        Scaler type: None
    Batch sizes
        Train batch size: 32
        Val batch size: 64
        Test batch size: 128
        All batch size: 128
    De

#### Importing built-in benchmark

- Looks for built-in benchmark
- Can get related_results with `get_related_results` method.
- Method `get_related_results` returns pandas Dataframe. 
- Related results are score rewards of other people models.

In [16]:
benchmark = load_benchmark(identifier="2e92831cb502", data_root="/some_directory/")
dataset = benchmark.get_initialized_dataset(display_config_details=True, check_errors=False, workers="config")

[2025-04-09 11:41:21,817][benchmark][INFO] - Built-in benchmark found: 2e92831cb502. Loading it.
[2025-04-09 11:41:21,820][wrapper_dataset][INFO] - Downloading CESNET-TimeSeries24-ip_addresses_sample-hour dataset.


File size: 0.11GB
Remaining: 0.11GB


100%|██████████| 113M/113M [00:03<00:00, 30.9MB/s] 
[2025-04-09 11:41:26,097][wrapper_dataset][INFO] - Dataset is time-based. Use cesnet_tszoo.configs.TimeBasedConfig
[2025-04-09 11:41:26,099][benchmark][INFO] - No AnnotationType.TS_ID annotations found.
[2025-04-09 11:41:26,099][benchmark][INFO] - No AnnotationType.ID_TIME annotations found.
[2025-04-09 11:41:26,099][benchmark][INFO] - No AnnotationType.BOTH annotations found.
[2025-04-09 11:41:26,101][benchmark][INFO] - Related results found and loaded.
[2025-04-09 11:41:26,101][benchmark][INFO] - Built-in benchmark '2e92831cb502' successfully prepared and ready for use.
[2025-04-09 11:41:26,123][config][INFO] - Finalization and validation completed successfully.
[2025-04-09 11:41:26,124][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
100%|██████████| 1000/1000 [00:11<00:00, 84.93it/s]
[2025-04-09 11:41:37,900][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_HOUR
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Time series IDS: [ 268100  363446 1793924 1625190  362327 ...  134220  294201 1800011  758759  377112], Length=1000
        Test time series IDS: None
    Time periods
        Train time periods: range(0, 2351)
        Val time periods: range(2327, 2686)
        Test time periods: range(2662, 6716)
        All time periods: range(0, 6716)
    Features
        Taken features: ['n_bytes']
        Default values: [0.]
        Time series ID included: False
        Time included: False
    Sliding window
        Sliding window size: 24
        Sliding window prediction size: 1
        Sliding window step size: 1
        Set shared size: 24
    Fillers
        Filler type: None
    Scalers
        Scaler type: min_max_scaler
        Is scaler per Time series: True
        Are scalers premade: False
        Are premade scal

In [17]:
benchmark.get_related_results()

Unnamed: 0,DOI,Model,Avg. RMSE,Std. RMSE,Avg. R2-score,Std. R2-score
0,https://arxiv.org/abs/2503.17410,GRU,0.149,0.82,-0.46,1.9
1,https://arxiv.org/abs/2503.17410,GRU_FCN,0.15,0.82,-0.12,1.1
2,https://arxiv.org/abs/2503.17410,INCEPTIONTIME,0.165,0.82,-2.7,3.9
3,https://arxiv.org/abs/2503.17410,LSTM,0.15,0.82,-0.41,1.8
4,https://arxiv.org/abs/2503.17410,LSTM_FCN,0.151,0.82,-0.44,1.9
5,https://arxiv.org/abs/2503.17410,MEAN,1.01,2.86,0.0,0.1
6,https://arxiv.org/abs/2503.17410,RCLSTM,0.221,1.08,-0.09,1.0
7,https://arxiv.org/abs/2503.17410,RESNET,0.152,0.82,-0.81,2.4


### Other

Instead of exporting or importing whole benchmark you can do for specific config or annotations.

#### Config

In [18]:
time_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.IP_ADDRESSES_FULL, aggregation=AgreggationType.AGG_1_DAY, is_series_based=False, display_details=True)
config = TimeBasedConfig([1548925, 443967], train_time_period=1.0, features_to_take=["n_flows", "n_packets", "n_bytes"], scale_with=None)

time_based_dataset.set_dataset_config_and_initialize(config, workers=0, display_config_details=True)

[2025-04-09 11:41:37,922][wrapper_dataset][INFO] - Dataset is time-based. Use cesnet_tszoo.configs.TimeBasedConfig
[2025-04-09 11:41:37,923][config][INFO] - Quick validation succeeded.
[2025-04-09 11:41:37,935][config][INFO] - Finalization and validation completed successfully.
[2025-04-09 11:41:37,940][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.



Dataset details:

    AgreggationType.AGG_1_DAY
        Time indices: range(0, 279)
        Datetime: (datetime.datetime(2023, 10, 9, 0, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 7, 14, 0, 0, tzinfo=datetime.timezone.utc))

    SourceType.IP_ADDRESSES_FULL
        Time series indices: [ 3  5 10 11 12 ... 2051841 2051849 2051850 2051853 2055783], Length=275124; use 'get_available_ts_indices' for full list
        Features with default values: {'n_flows': 0, 'n_packets': 0, 'n_bytes': 0, 'tcp_udp_ratio_packets': 0.5, 'tcp_udp_ratio_bytes': 0.5, 'dir_ratio_packets': 0.5, 'dir_ratio_bytes': 0.5, 'avg_duration': 0, 'avg_ttl': 0, 'sum_n_dest_asn': 0, 'avg_n_dest_asn': 0, 'std_n_dest_asn': 0, 'sum_n_dest_ports': 0, 'avg_n_dest_ports': 0, 'std_n_dest_ports': 0, 'sum_n_dest_ip': 0, 'avg_n_dest_ip': 0, 'std_n_dest_ip': 0}
        
        Additional data: ['ids_relationship', 'weekends_and_holidays']
        


100%|██████████| 2/2 [00:00<00:00, 1984.06it/s]
[2025-04-09 11:41:37,944][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_DAY
    Source: SourceType.IP_ADDRESSES_FULL

    Time series
        Time series IDS: [1548925  443967], Length=2
        Test time series IDS: None
    Time periods
        Train time periods: range(0, 280)
        Val time periods: None
        Test time periods: None
        All time periods: range(0, 280)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes']
        Default values: [0. 0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: None
    Scalers
        Scaler type: None
    Batch sizes
        Train batch size: 32
        Val batch size: 64
        Test batch size: 128
        All batch size: 128
    De

##### Exporting config

- When parameter `force_write` is True, existing files with the same name will be overwritten.
- Config will be saved as pickle file at: `os.path.join(time_based_dataset.configs_root, identifier)`.
- When parameter `create_with_details_file` is True, text file with config details will be exported along pickle config.

In [19]:
time_based_dataset.save_config(identifier="test_config1", create_with_details_file=True, force_write=True)

[2025-04-09 11:41:37,950][cesnet_dataset][INFO] - Config pickle saved to \some_directory\tszoo\configs\test_config1.pickle
[2025-04-09 11:41:37,951][cesnet_dataset][INFO] - Config details saved to \some_directory\tszoo\configs\test_config1.txt
[2025-04-09 11:41:37,952][cesnet_dataset][INFO] - Config successfully saved


##### Importing config

- First, it attempts to load the built-in config, if no built-in config with such an identifier exists, it attempts to load a custom config from the `"data_root"/tszoo/configs/` directory.

In [20]:
time_based_dataset.import_config(identifier="test_config1", display_config_details=True, workers="config")

[2025-04-09 11:41:37,960][cesnet_dataset][INFO] - Custom config found: test_config1. Loading it.
[2025-04-09 11:41:37,960][cesnet_dataset][INFO] - Initializing dataset configuration with the imported config.
[2025-04-09 11:41:37,972][config][INFO] - Finalization and validation completed successfully.
[2025-04-09 11:41:37,973][cesnet_dataset][INFO] - Updating config on train/val/test/all and selected time series.
100%|██████████| 2/2 [00:04<00:00,  2.03s/it]
[2025-04-09 11:41:42,030][cesnet_dataset][INFO] - Config initialized successfully.
[2025-04-09 11:41:42,031][cesnet_dataset][INFO] - Successfully imported config from \some_directory\tszoo\configs\test_config1.pickle



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_1_DAY
    Source: SourceType.IP_ADDRESSES_FULL

    Time series
        Time series IDS: [1548925  443967], Length=2
        Test time series IDS: None
    Time periods
        Train time periods: range(0, 280)
        Val time periods: None
        Test time periods: None
        All time periods: range(0, 280)
    Features
        Taken features: ['n_flows', 'n_packets', 'n_bytes']
        Default values: [0. 0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
        Set shared size: 0
    Fillers
        Filler type: None
    Scalers
        Scaler type: None
    Batch sizes
        Train batch size: 32
        Val batch size: 64
        Test batch size: 128
        All batch size: 128
    De

#### Annotations

In [21]:
time_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.IP_ADDRESSES_FULL, aggregation=AgreggationType.AGG_1_DAY, is_series_based=False, display_details=True)

[2025-04-09 11:41:42,045][wrapper_dataset][INFO] - Dataset is time-based. Use cesnet_tszoo.configs.TimeBasedConfig



Dataset details:

    AgreggationType.AGG_1_DAY
        Time indices: range(0, 279)
        Datetime: (datetime.datetime(2023, 10, 9, 0, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 7, 14, 0, 0, tzinfo=datetime.timezone.utc))

    SourceType.IP_ADDRESSES_FULL
        Time series indices: [ 3  5 10 11 12 ... 2051841 2051849 2051850 2051853 2055783], Length=275124; use 'get_available_ts_indices' for full list
        Features with default values: {'n_flows': 0, 'n_packets': 0, 'n_bytes': 0, 'tcp_udp_ratio_packets': 0.5, 'tcp_udp_ratio_bytes': 0.5, 'dir_ratio_packets': 0.5, 'dir_ratio_bytes': 0.5, 'avg_duration': 0, 'avg_ttl': 0, 'sum_n_dest_asn': 0, 'avg_n_dest_asn': 0, 'std_n_dest_asn': 0, 'sum_n_dest_ports': 0, 'avg_n_dest_ports': 0, 'std_n_dest_ports': 0, 'sum_n_dest_ip': 0, 'avg_n_dest_ip': 0, 'std_n_dest_ip': 0}
        
        Additional data: ['ids_relationship', 'weekends_and_holidays']
        


##### Exporting annotations

- When parameter `force_write` is True, existing files with the same name will be overwritten.
- Annotations will be saved as CSV file at: `os.path.join(time_based_dataset.annotations_root, identifier)`.

In [22]:
time_based_dataset.add_annotation(annotation="test_annotation3_3_0", annotation_group="test3", ts_id=3, id_time=0, enforce_ids=True)
time_based_dataset.add_annotation(annotation="test_annotation3_3_5", annotation_group="test3_2", ts_id=3, id_time=5, enforce_ids=True)
time_based_dataset.add_annotation(annotation="test_annotation3_5_0", annotation_group="test3", ts_id=5, id_time=0, enforce_ids=True)
time_based_dataset.add_annotation(annotation="test_annotation3_5_1", annotation_group="test3_2", ts_id=5, id_time=1, enforce_ids=True)
time_based_dataset.get_annotations(on=AnnotationType.BOTH)

Unnamed: 0,id_ip,id_time,test3,test3_2
0,3,0,test_annotation3_3_0,
1,5,0,test_annotation3_5_0,
2,3,5,,test_annotation3_3_5
3,5,1,,test_annotation3_5_1


In [23]:
time_based_dataset.save_annotations(identifier="test_annotations1", on=AnnotationType.BOTH, force_write=True)

[2025-04-09 11:41:42,060][cesnet_dataset][INFO] - Annotations successfully saved to \some_directory\tszoo\annotations\test_annotations1.csv


##### Importing annotations

- First, it attempts to load the built-in annotations, if no built-in annotations with such an identifier exists, it attempts to load a custom annotations from the `"data_root"/tszoo/annotations/` directory.

In [24]:
time_based_dataset.import_annotations(identifier="test_annotations1", enforce_ids=True)

[2025-04-09 11:41:42,069][cesnet_dataset][INFO] - Custom annotations found: test_annotations1.
[2025-04-09 11:41:42,071][cesnet_dataset][INFO] - Annotations detected as AnnotationType.BOTH (both id_ip and id_time)
[2025-04-09 11:41:42,072][cesnet_dataset][INFO] - Successfully imported annotations from \some_directory\tszoo\annotations\test_annotations1.csv
