# Using transformers for DisjointTimeBasedCesnetDataset

### Import

In [1]:
import numpy as np
import logging

from cesnet_tszoo.utils.enums import AgreggationType, SourceType, TransformerType, DatasetType
from cesnet_tszoo.datasets import CESNET_TimeSeries24
from cesnet_tszoo.configs import DisjointTimeBasedConfig # Disjoint dataset MUST use DisjointTimeBasedConfig

from cesnet_tszoo.utils.transformer import Transformer # For creating custom Transformer

### Setting logger

In [2]:
logging.basicConfig(
    level=logging.INFO,
    format="[%(asctime)s][%(name)s][%(levelname)s] - %(message)s")

### Preparing dataset

In [3]:
disjoint_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.IP_ADDRESSES_SAMPLE, aggregation=AgreggationType.AGG_10_MINUTES, dataset_type=DatasetType.DISJOINT_TIME_BASED, display_details=True)

[2025-11-14 18:36:06,669][cesnet_dataset][INFO] - Dataset is disjoint_time_based. Use cesnet_tszoo.configs.DisjointTimeBasedConfig



Dataset details:

    AgreggationType.AGG_10_MINUTES
        Time indices: range(0, 40297)
        Datetime: (datetime.datetime(2023, 10, 9, 0, 3, 49, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 7, 14, 21, 50, 52, tzinfo=datetime.timezone.utc))

    SourceType.IP_ADDRESSES_SAMPLE
        Time series indices: [ 11  20 101 103 118 ... 2003134 2008461 2011839 2022235 2044888], Length=1000; use 'get_available_ts_indices' for full list
        Features with default values: {'n_flows': 0, 'n_packets': 0, 'n_bytes': 0, 'n_dest_ip': 0, 'n_dest_asn': 0, 'n_dest_ports': 0, 'tcp_udp_ratio_packets': 0.5, 'tcp_udp_ratio_bytes': 0.5, 'dir_ratio_packets': 0.5, 'dir_ratio_bytes': 0.5, 'avg_duration': 0, 'avg_ttl': 0}
        
        Additional data: ['ids_relationship', 'weekends_and_holidays']
        


### Transformers

- Transformers are implemented as class.
    - You can create your own or use built-in one.
- Transformer is applied after `default_values` and fillers took care of missing values (default preprocess order).
- One transformer is used for all time series.
- Transformer must implement `transform`.
- Transformer can implement `inverse_transform`.
- Transformer must implement `partial_fit` (unless transformer is already fitted and `partial_fit_initialized_transformers` is False).
- To use transformer, train set must be implemented (unless transformer is already fitted and `partial_fit_initialized_transformers` is False).
- You can change used transformer later with `update_dataset_config_and_initialize` or `apply_transformer`.

#### Built-in

In [4]:
# Options

## Supported
TransformerType.STANDARD_SCALER
TransformerType.L2_NORMALIZER
TransformerType.LOG_TRANSFORMER
TransformerType.MAX_ABS_SCALER
TransformerType.MIN_MAX_SCALER

<TransformerType.MIN_MAX_SCALER: 'min_max_scaler'>

In [5]:
config = DisjointTimeBasedConfig(train_ts=500, val_ts=None, test_ts=None, train_time_period=0.5, features_to_take=["n_flows", "n_packets"],
                           transform_with=TransformerType.MIN_MAX_SCALER, nan_threshold=0.5, random_state=1500)
disjoint_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:06,680][disjoint_time_based_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:06,712][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-14 18:36:06,713][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 500/500 [00:00<00:00, 1160.87it/s]
[2025-11-14 18:36:07,166][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-14 18:36:07,167][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Train time series IDs: [182151  10158  65072  10196 338309 ... 175742 659213  11188  73422 483796], Length=60
        Val time series IDs: None
        Test time series IDs: None
    Time periods
        Train time periods: range(0, 20149)
        Val time periods: None
        Test time periods: None
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: MinMaxScaler
        Are transformers premade: False
        Are premade transformers partial_fitte

In [6]:
disjoint_dataset.get_train_df(workers=0).head(10)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,0.0,7.9e-05,2.240487e-07
1,182151.0,1.0,4.5e-05,1.280278e-07
2,182151.0,2.0,4.5e-05,1.280278e-07
3,182151.0,3.0,5.6e-05,1.600348e-07
4,182151.0,4.0,9e-05,2.560557e-07
5,182151.0,5.0,3.4e-05,9.602087e-08
6,182151.0,6.0,6.8e-05,1.920417e-07
7,182151.0,7.0,4.5e-05,1.280278e-07
8,182151.0,8.0,0.000102,3.840835e-07
9,182151.0,9.0,0.0,0.0


In [7]:
disjoint_dataset.get_transformers()

<cesnet_tszoo.utils.transformer.transformer.MinMaxScaler at 0x15d8a29a480>

Or later with:

In [8]:
disjoint_dataset.update_dataset_config_and_initialize(transform_with=TransformerType.MIN_MAX_SCALER, partial_fit_initialized_transformers="config", workers=0)
# Or
disjoint_dataset.apply_transformer(transform_with=TransformerType.MIN_MAX_SCALER, partial_fit_initialized_transformers="config", workers=0)

[2025-11-14 18:36:07,402][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:07,428][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-14 18:36:07,428][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 60/60 [00:00<00:00, 332.20it/s]
[2025-11-14 18:36:07,615][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-14 18:36:07,615][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:07,616][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:36:07,617][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:07,648][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-14 18:36:07,649][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 60/60 [00:00<00:00, 329.70it/s]
[2025-11-14 18:36:07,891][cesnet_dataset][INFO] - Dataset initialization complete. Configuration upda

#### Custom

- You can create your own custom transformer. You should derive from Transformer base class.
- Take care that custom transformer should be imported from other file when while using this library in Jupyter notebook. When not importing from other file/s use workers == 0.

In [9]:
class CustomTransformer(Transformer):
    def __init__(self):
        super().__init__()
        
        self.max = None
        self.min = None
    
    def transform(self, data):
        return (data - self.min) / (self.max - self.min)
    
    def fit(self, data):
        self.partial_fit(data)
    
    def partial_fit(self, data):
        
        if self.max is None and self.min is None:
            self.max = np.max(data, axis=0)
            self.min = np.min(data, axis=0)
            return
        
        temp_max = np.max(data, axis=0)
        temp = np.vstack((self.max, temp_max)) 
        self.max = np.max(temp, axis=0)
        
        temp_min = np.min(data, axis=0)
        temp = np.vstack((self.min, temp_min)) 
        self.min = np.min(temp, axis=0)    
        
    def inverse_transform(self, transformed_data):
        return transformed_data * (self.max - self.min) + self.min     

In [10]:
config = DisjointTimeBasedConfig(train_ts=500, val_ts=None, test_ts=None, train_time_period=0.5, features_to_take=["n_flows", "n_packets"],
                           transform_with=CustomTransformer, nan_threshold=0.5, random_state=1500)
disjoint_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:07,902][disjoint_time_based_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:07,930][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-14 18:36:07,931][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 500/500 [00:00<00:00, 1272.27it/s]
[2025-11-14 18:36:08,332][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-14 18:36:08,332][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Train time series IDs: [182151  10158  65072  10196 338309 ... 175742 659213  11188  73422 483796], Length=60
        Val time series IDs: None
        Test time series IDs: None
    Time periods
        Train time periods: range(0, 20149)
        Val time periods: None
        Test time periods: None
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: CustomTransformer (Custom)
        Are transformers premade: False
        Are premade transformers

In [11]:
disjoint_dataset.get_train_df(workers=0).head(10)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,182151.0,0.0,7.9e-05,2.240487e-07
1,182151.0,1.0,4.5e-05,1.280278e-07
2,182151.0,2.0,4.5e-05,1.280278e-07
3,182151.0,3.0,5.6e-05,1.600348e-07
4,182151.0,4.0,9e-05,2.560557e-07
5,182151.0,5.0,3.4e-05,9.602087e-08
6,182151.0,6.0,6.8e-05,1.920417e-07
7,182151.0,7.0,4.5e-05,1.280278e-07
8,182151.0,8.0,0.000102,3.840835e-07
9,182151.0,9.0,0.0,0.0


In [12]:
disjoint_dataset.get_transformers()

<__main__.CustomTransformer at 0x15d8a29f1d0>

Or later with:

In [13]:
disjoint_dataset.update_dataset_config_and_initialize(transform_with=CustomTransformer, partial_fit_initialized_transformers="config", workers=0)
# Or
disjoint_dataset.apply_transformer(transform_with=CustomTransformer, partial_fit_initialized_transformers="config", workers=0)

[2025-11-14 18:36:08,532][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:08,563][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-14 18:36:08,563][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 60/60 [00:00<00:00, 363.44it/s]
[2025-11-14 18:36:08,733][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-14 18:36:08,734][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:08,735][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:36:08,736][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:08,765][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-14 18:36:08,765][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 60/60 [00:00<00:00, 359.20it/s]
[2025-11-14 18:36:08,938][cesnet_dataset][INFO] - Dataset initialization complete. Configuration upda

#### Using already fitted transformer

- When `partial_fit_initialized_transformer` is False (default value), transformer has no requirement for `partial_fit` nor for train set.

In [14]:
config = DisjointTimeBasedConfig(train_ts=500, val_ts=None, test_ts=None, train_time_period=0.5, features_to_take=["n_flows", "n_packets"],
                           transform_with=CustomTransformer, nan_threshold=0.5, random_state=1500)
disjoint_dataset.set_dataset_config_and_initialize(config, display_config_details=None, workers=0)

fitted_transformer = disjoint_dataset.get_transformers()

[2025-11-14 18:36:08,944][disjoint_time_based_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:08,973][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-14 18:36:08,973][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 500/500 [00:00<00:00, 1383.26it/s]
[2025-11-14 18:36:09,344][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-14 18:36:09,344][cesnet_dataset][INFO] - Config initialized successfully.


In [15]:
config = DisjointTimeBasedConfig(train_ts=500, val_ts=500, test_ts=None, train_time_period=0.5, val_time_period=0.5, features_to_take=["n_flows", "n_packets"],
                           transform_with=fitted_transformer, nan_threshold=0.5, random_state=999)
disjoint_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:09,348][disjoint_time_based_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:09,405][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-14 18:36:09,406][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 500/500 [00:00<00:00, 1643.05it/s]
[2025-11-14 18:36:09,717][cesnet_dataset][INFO] - Updating config for val set.
100%|██████████| 500/500 [00:00<00:00, 1406.15it/s]
[2025-11-14 18:36:10,074][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-14 18:36:10,074][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Train time series IDs: [  4380  35210 322201    190 307400 ...  29194 617662 677144 211973 612051], Length=65
        Val time series IDs: [174909  41990 792059  65072 143746 ... 75195 97179   103  1370 11254], Length=75
        Test time series IDs: None
    Time periods
        Train time periods: range(0, 20149)
        Val time periods: range(20149, 40298)
        Test time periods: None
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: CustomT

In [16]:
disjoint_dataset.get_train_df(workers=0).head(10)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,4380.0,0.0,0.001253,5e-06
1,4380.0,1.0,0.001162,4e-06
2,4380.0,2.0,0.000756,3e-06
3,4380.0,3.0,0.000959,4e-06
4,4380.0,4.0,0.001219,5e-06
5,4380.0,5.0,0.001275,5e-06
6,4380.0,6.0,0.001456,6e-06
7,4380.0,7.0,0.001196,4e-06
8,4380.0,8.0,0.001433,6e-06
9,4380.0,9.0,0.001072,4e-06


Below you can see how transformer works even without train set.

In [17]:
config = DisjointTimeBasedConfig(train_ts=None, val_ts=500, test_ts=None, val_time_period=0.5, features_to_take=["n_flows", "n_packets"],
                           transform_with=fitted_transformer, nan_threshold=0.5, random_state=999)
disjoint_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:10,274][disjoint_time_based_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:10,307][cesnet_dataset][INFO] - Updating config for val set.
100%|██████████| 500/500 [00:00<00:00, 1639.35it/s]
[2025-11-14 18:36:10,614][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-14 18:36:10,614][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Train time series IDs: None
        Val time series IDs: [  4380  35210 322201    190 307400 ...  29194 617662 677144 211973 612051], Length=65
        Test time series IDs: None
    Time periods
        Train time periods: None
        Val time periods: range(0, 20149)
        Test time periods: None
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: CustomTransformer (Custom)
        Are transformers premade: True
        Are premade transformers 

In [18]:
disjoint_dataset.get_val_df(workers=0).head(10)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,4380.0,0.0,0.001253,5e-06
1,4380.0,1.0,0.001162,4e-06
2,4380.0,2.0,0.000756,3e-06
3,4380.0,3.0,0.000959,4e-06
4,4380.0,4.0,0.001219,5e-06
5,4380.0,5.0,0.001275,5e-06
6,4380.0,6.0,0.001456,6e-06
7,4380.0,7.0,0.001196,4e-06
8,4380.0,8.0,0.001433,6e-06
9,4380.0,9.0,0.001072,4e-06


##### Partial fitting on train set

Makes already fitted transformer to be fitted on new train set too. Must implement `partial_fit`.

In [19]:
config = DisjointTimeBasedConfig(train_ts=500, val_ts=500, test_ts=None, train_time_period=0.5, val_time_period=0.5, features_to_take=["n_flows", "n_packets"],
                           transform_with=fitted_transformer, partial_fit_initialized_transformer=True, nan_threshold=0.5, random_state=999)
disjoint_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:10,813][disjoint_time_based_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:10,867][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-14 18:36:10,868][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 500/500 [00:00<00:00, 1290.53it/s]
[2025-11-14 18:36:11,267][cesnet_dataset][INFO] - Updating config for val set.
100%|██████████| 500/500 [00:00<00:00, 1338.32it/s]
[2025-11-14 18:36:11,641][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-14 18:36:11,642][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Train time series IDs: [  4380  35210 322201    190 307400 ...  29194 617662 677144 211973 612051], Length=65
        Val time series IDs: [174909  41990 792059  65072 143746 ... 75195 97179   103  1370 11254], Length=75
        Test time series IDs: None
    Time periods
        Train time periods: range(0, 20149)
        Val time periods: range(20149, 40298)
        Test time periods: None
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: CustomT

In [20]:
disjoint_dataset.get_val_df(workers=0).head(10)

Unnamed: 0,id_ip,id_time,n_flows,n_packets
0,174909.0,20149.0,3.4e-05,1.280278e-07
1,174909.0,20150.0,3.4e-05,1.280278e-07
2,174909.0,20151.0,4.5e-05,1.600348e-07
3,174909.0,20152.0,4.5e-05,2.240487e-07
4,174909.0,20153.0,0.0,0.0
5,174909.0,20154.0,1.1e-05,3.200696e-08
6,174909.0,20155.0,9e-05,4.160904e-07
7,174909.0,20156.0,2.3e-05,9.602087e-08
8,174909.0,20157.0,4.5e-05,2.240487e-07
9,174909.0,20158.0,0.0,0.0


#### Getting pre-transform value

- You can use `inverse_transform` for transformers you can get via `get_transformers()` to get pre-transform value.
- `inverse_transformer` expects input as numpy array of shape `(times, features)` where features do not contain ids.

In [21]:
config = DisjointTimeBasedConfig(train_ts=500, val_ts=None, test_ts=None, train_time_period=0.5, features_to_take=["n_flows", "n_packets"],
                           transform_with=TransformerType.MIN_MAX_SCALER, nan_threshold=0.5, random_state=1500)
disjoint_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:11,896][disjoint_time_based_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:11,922][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-14 18:36:11,923][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 500/500 [00:00<00:00, 1171.01it/s]
[2025-11-14 18:36:12,360][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-14 18:36:12,360][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Train time series IDs: [182151  10158  65072  10196 338309 ... 175742 659213  11188  73422 483796], Length=60
        Val time series IDs: None
        Test time series IDs: None
    Time periods
        Train time periods: range(0, 20149)
        Val time periods: None
        Test time periods: None
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: MinMaxScaler
        Are transformers premade: False
        Are premade transformers partial_fitte

In [22]:
transformer = disjoint_dataset.get_transformers()

data = None
for batch in disjoint_dataset.get_train_dataloader():
    data = batch[0, :, 2:]
    break

transformer.inverse_transform(data)[:10]

[2025-11-14 18:36:12,368][cesnet_dataset][INFO] - Created new cached train_dataloader.


array([[ 7.,  7.],
       [ 4.,  4.],
       [ 4.,  4.],
       [ 5.,  5.],
       [ 8.,  8.],
       [ 3.,  3.],
       [ 6.,  6.],
       [ 4.,  4.],
       [ 9., 12.],
       [ 0.,  0.]])

#### Changing when is transformer applied

- You can change when is a transformer applied with `preprocess_order` parameter

In [23]:
config = DisjointTimeBasedConfig(train_ts=500, val_ts=None, test_ts=None, train_time_period=0.5, features_to_take=["n_flows", "n_packets"],
                           transform_with=TransformerType.MIN_MAX_SCALER, nan_threshold=0.5, random_state=1500, preprocess_order=["handling_anomalies", "filling_gaps", "transforming"])
disjoint_dataset.set_dataset_config_and_initialize(config, display_config_details="text", workers=0)

[2025-11-14 18:36:20,385][disjoint_time_based_config][INFO] - Quick validation succeeded.
[2025-11-14 18:36:21,539][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-14 18:36:21,539][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 500/500 [00:00<00:00, 1242.83it/s]
[2025-11-14 18:36:21,951][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-14 18:36:21,951][cesnet_dataset][INFO] - Config initialized successfully.



Config Details
    Used for database: CESNET-TimeSeries24
    Aggregation: AgreggationType.AGG_10_MINUTES
    Source: SourceType.IP_ADDRESSES_SAMPLE

    Time series
        Train time series IDs: [182151  10158  65072  10196 338309 ... 175742 659213  11188  73422 483796], Length=60
        Val time series IDs: None
        Test time series IDs: None
    Time periods
        Train time periods: range(0, 20149)
        Val time periods: None
        Test time periods: None
    Features
        Taken features: ['n_flows', 'n_packets']
        Default values: [0. 0.]
        Time series ID included: True
        Time included: True    
        Time format: TimeFormat.ID_TIME
    Sliding window
        Sliding window size: None
        Sliding window prediction size: None
        Sliding window step size: 1
    Fillers
        Filler type: NoFiller
    Transformers
        Transformer type: MinMaxScaler
        Are transformers premade: False
        Are premade transformers partial_fitte

Or later with:

In [24]:
disjoint_dataset.update_dataset_config_and_initialize(preprocess_order=["handling_anomalies", "transforming", "filling_gaps"], workers=0)
# Or
disjoint_dataset.set_preprocess_order(preprocess_order=["handling_anomalies", "transforming", "filling_gaps"], workers=0)

[2025-11-14 18:36:21,957][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:21,984][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-14 18:36:21,985][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 60/60 [00:00<00:00, 309.93it/s]
[2025-11-14 18:36:22,184][cesnet_dataset][INFO] - Dataset initialization complete. Configuration updated.
[2025-11-14 18:36:22,185][cesnet_dataset][INFO] - Config initialized successfully.
[2025-11-14 18:36:22,186][cesnet_dataset][INFO] - Configuration has been changed successfuly.
[2025-11-14 18:36:22,187][cesnet_dataset][INFO] - Re-initialization is required.
[2025-11-14 18:36:22,213][cesnet_dataset][INFO] - Updating config for train set and fitting values.
[2025-11-14 18:36:22,213][cesnet_dataset][INFO] - Starting fitting cycle 1/1.
100%|██████████| 60/60 [00:00<00:00, 350.90it/s]
[2025-11-14 18:36:22,390][cesnet_dataset][INFO] - Dataset initialization complete. Configuration upda