# Annotations

This notebook will only use TimeBasedCesnetDataset, but all methods work the same way for SeriesBasedCesnetDataset.

### Import

In [1]:
import pandas as pd
import os
import logging

from cesnet_tszoo.utils.enums import AgreggationType, SourceType, AnnotationType
from cesnet_tszoo.datasets import CESNET_TimeSeries24

### Setting logger

In [2]:
logging.basicConfig(
    level=logging.INFO,
    format="[%(asctime)s][%(name)s][%(levelname)s] - %(message)s")

### Preparing dataset

In [3]:
time_based_dataset = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.IP_ADDRESSES_FULL, aggregation=AgreggationType.AGG_1_DAY, is_series_based=False, display_details=True)

[2025-04-09 11:37:00,563][wrapper_dataset][INFO] - Dataset is time-based. Use cesnet_tszoo.configs.TimeBasedConfig



Dataset details:

    AgreggationType.AGG_1_DAY
        Time indices: range(0, 279)
        Datetime: (datetime.datetime(2023, 10, 9, 0, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 7, 14, 0, 0, tzinfo=datetime.timezone.utc))

    SourceType.IP_ADDRESSES_FULL
        Time series indices: [ 3  5 10 11 12 ... 2051841 2051849 2051850 2051853 2055783], Length=275124; use 'get_available_ts_indices' for full list
        Features with default values: {'n_flows': 0, 'n_packets': 0, 'n_bytes': 0, 'tcp_udp_ratio_packets': 0.5, 'tcp_udp_ratio_bytes': 0.5, 'dir_ratio_packets': 0.5, 'dir_ratio_bytes': 0.5, 'avg_duration': 0, 'avg_ttl': 0, 'sum_n_dest_asn': 0, 'avg_n_dest_asn': 0, 'std_n_dest_asn': 0, 'sum_n_dest_ports': 0, 'avg_n_dest_ports': 0, 'std_n_dest_ports': 0, 'sum_n_dest_ip': 0, 'avg_n_dest_ip': 0, 'std_n_dest_ip': 0}
        
        Additional data: ['ids_relationship', 'weekends_and_holidays']
        


### Basics

There are three annotation types:
1. **AnnotationType.TS_ID** -> Annotations for whole specific time series
2. **AnnotationType.ID_TIME** -> Annotations for specific time... independent on time series
3. **AnnotationType.BOTH** -> Annotations for specific time in specific time series

- You can get annotations for specific type with `get_annotations` method. 
- Method `get_annotations` returns annotations as Pandas Dataframe.

##### Getting annotations for AnnotationType.TS_ID

In [4]:
time_based_dataset.get_annotations(on=AnnotationType.TS_ID)

Unnamed: 0,id_ip


##### Getting annotations for AnnotationType.ID_TIME

In [5]:
time_based_dataset.get_annotations(on=AnnotationType.ID_TIME)

Unnamed: 0,id_time


##### Getting annotations for AnnotationType.BOTH

In [6]:
time_based_dataset.get_annotations(on=AnnotationType.BOTH)

Unnamed: 0,id_ip,id_time


### Annotation groups

- Annotation group could be understood as column names in Dataframe/CSV.
- You can add annotation groups or remove them.

#### Adding annotation group

##### Adding annotation groups to AnnotationType.TS_ID

In [7]:
time_based_dataset.add_annotation_group(annotation_group="test1", on=AnnotationType.TS_ID)
time_based_dataset.get_annotations(on=AnnotationType.TS_ID)

Unnamed: 0,id_ip,test1


##### Adding annotation groups to AnnotationType.ID_TIME

In [8]:
time_based_dataset.add_annotation_group(annotation_group="test2", on=AnnotationType.ID_TIME)
time_based_dataset.get_annotations(on=AnnotationType.ID_TIME)

Unnamed: 0,id_time,test2


##### Adding annotation groups to AnnotationType.BOTH

In [9]:
time_based_dataset.add_annotation_group(annotation_group="test3", on=AnnotationType.BOTH)
time_based_dataset.get_annotations(on=AnnotationType.BOTH)

Unnamed: 0,id_ip,id_time,test3


#### Removing annotation group

##### Removing annotation groups from AnnotationType.TS_ID

In [10]:
time_based_dataset.remove_annotation_group(annotation_group="test1", on=AnnotationType.TS_ID)
time_based_dataset.get_annotations(on=AnnotationType.TS_ID)

Unnamed: 0,id_ip


##### Removing annotation groups from AnnotationType.ID_TIME

In [11]:
time_based_dataset.remove_annotation_group(annotation_group="test2", on=AnnotationType.ID_TIME)
time_based_dataset.get_annotations(on=AnnotationType.ID_TIME)

Unnamed: 0,id_time


##### Removing annotation groups from AnnotationType.BOTH

In [12]:
time_based_dataset.remove_annotation_group(annotation_group="test3", on=AnnotationType.BOTH)
time_based_dataset.get_annotations(on=AnnotationType.BOTH)

Unnamed: 0,id_ip,id_time


### Annotations

- Annotations are specific values for selected annotation group and AnnotationType.
- You can add annotations or remove them.

#### Adding annotation

- When adding annotation to annotation group that does not exist, it will be created.
- To override existing annotation, you just need to specify same `annotation_group`, `ts_id`, `id_time` and new annotation.
- Setting `enforce_ids` to True, ensures that inputted `ts_id` and `id_time` must belong to used dataset.

##### Adding annotation to annotation group and AnnotationType.TS_iD

- To add annotation to `AnnotationType.TS_iD`, you must specify `ts_id` and set `id_time` to None.

In [13]:
time_based_dataset.add_annotation(annotation="test_annotation1_3", annotation_group="test1", ts_id=3, id_time=None, enforce_ids=True)
time_based_dataset.add_annotation(annotation="test_annotation1_5", annotation_group="test1_2", ts_id=5, id_time=None, enforce_ids=True)
time_based_dataset.get_annotations(on=AnnotationType.TS_ID)

Unnamed: 0,id_ip,test1,test1_2
0,3,test_annotation1_3,
1,5,,test_annotation1_5


##### Adding annotation to annotation group and AnnotationType.ID_TIME

- To add annotation to `AnnotationType.ID_TIME`, you must set `ts_id` to None and specify `id_time`.

In [14]:
time_based_dataset.add_annotation(annotation="test_annotation2_0", annotation_group="test2", ts_id=None, id_time=0, enforce_ids=True)
time_based_dataset.add_annotation(annotation="test_annotation2_1", annotation_group="test2_2", ts_id=None, id_time=1, enforce_ids=True)
time_based_dataset.get_annotations(on=AnnotationType.ID_TIME)

Unnamed: 0,id_time,test2,test2_2
0,0,test_annotation2_0,
1,1,,test_annotation2_1


##### Adding annotation to annotation group and AnnotationType.BOTH

- To add annotation to `AnnotationType.BOTH`, you must specify both `ts_id` and `id_time`.

In [15]:
time_based_dataset.add_annotation(annotation="test_annotation3_3_0", annotation_group="test3", ts_id=3, id_time=0, enforce_ids=True)
time_based_dataset.add_annotation(annotation="test_annotation3_3_5", annotation_group="test3_2", ts_id=3, id_time=5, enforce_ids=True)
time_based_dataset.add_annotation(annotation="test_annotation3_5_0", annotation_group="test3", ts_id=5, id_time=0, enforce_ids=True)
time_based_dataset.add_annotation(annotation="test_annotation3_5_1", annotation_group="test3_2", ts_id=5, id_time=1, enforce_ids=True)
time_based_dataset.get_annotations(on=AnnotationType.BOTH)

Unnamed: 0,id_ip,id_time,test3,test3_2
0,3,0,test_annotation3_3_0,
1,5,0,test_annotation3_5_0,
2,3,5,,test_annotation3_3_5
3,5,1,,test_annotation3_5_1


#### Removing annotation

- Removing annotation from every annotation group of a row, removes that row from Dataframe.

##### Removing annotation from annotation group and AnnotationType.TS_iD

- To remove annotation from `AnnotationType.TS_iD`, you must specify `ts_id` and set `id_time` to None.

In [16]:
time_based_dataset.add_annotation(annotation="test_annotation1_3", annotation_group="test1", ts_id=3, id_time=None, enforce_ids=True)
time_based_dataset.add_annotation(annotation="test_annotation1_3_1", annotation_group="test1_2", ts_id=3, id_time=None, enforce_ids=True)
time_based_dataset.get_annotations(on=AnnotationType.TS_ID)

Unnamed: 0,id_ip,test1,test1_2
0,3,test_annotation1_3,test_annotation1_3_1
1,5,,test_annotation1_5


In [17]:
time_based_dataset.remove_annotation(annotation_group="test1", ts_id=3, id_time=None)
time_based_dataset.get_annotations(on=AnnotationType.TS_ID)

Unnamed: 0,id_ip,test1,test1_2
0,5,,test_annotation1_5
1,3,,test_annotation1_3_1


##### Removing annotation from annotation group and AnnotationType.ID_TIME

- To remove annotation from `AnnotationType.ID_TIME`, you must set `ts_id` to None and specify `id_time`.

In [18]:
time_based_dataset.add_annotation(annotation="test_annotation2_0", annotation_group="test2", ts_id=None, id_time=0, enforce_ids=True)
time_based_dataset.add_annotation(annotation="test_annotation2_1", annotation_group="test2_2", ts_id=None, id_time=0, enforce_ids=True)
time_based_dataset.get_annotations(on=AnnotationType.ID_TIME)

Unnamed: 0,id_time,test2,test2_2
0,0,test_annotation2_0,test_annotation2_1
1,1,,test_annotation2_1


In [19]:
time_based_dataset.remove_annotation(annotation_group="test2", ts_id=None, id_time=0 )
time_based_dataset.get_annotations(on=AnnotationType.ID_TIME)

Unnamed: 0,id_time,test2,test2_2
0,1,,test_annotation2_1
1,0,,test_annotation2_1


##### Removing annotation from annotation group and AnnotationType.BOTH

- To remove annotation from `AnnotationType.BOTH`, you must specify both `ts_id` and `id_time`.

In [20]:
time_based_dataset.add_annotation(annotation="test_annotation3_3_0", annotation_group="test3", ts_id=3, id_time=0, enforce_ids=True)
time_based_dataset.add_annotation(annotation="test_annotation3_3_5", annotation_group="test3_2", ts_id=3, id_time=0, enforce_ids=True)
time_based_dataset.get_annotations(on=AnnotationType.BOTH)

Unnamed: 0,id_ip,id_time,test3,test3_2
0,3,0,test_annotation3_3_0,test_annotation3_3_5
1,5,0,test_annotation3_5_0,
2,3,5,,test_annotation3_3_5
3,5,1,,test_annotation3_5_1


In [21]:
time_based_dataset.remove_annotation(annotation_group="test3", ts_id=3, id_time=0 )
time_based_dataset.get_annotations(on=AnnotationType.BOTH)

Unnamed: 0,id_ip,id_time,test3,test3_2
0,5,0,test_annotation3_5_0,
1,3,5,,test_annotation3_3_5
2,5,1,,test_annotation3_5_1
3,3,0,,test_annotation3_3_5


### Exporting

- You can export your created annotation with `save_annotations` method.
- `save_annotations` creates CSV file at: `os.path.join(time_based_dataset.annotations_root, identifier)`.
- When parameter `force_write` is True, existing files with same name will be overwritten.
- You should not add ".csv" to identifier, because it will be added automatically.

In [22]:
time_based_dataset.save_annotations(identifier="test_name", on=AnnotationType.BOTH, force_write=True)

[2025-04-09 11:37:00,717][cesnet_dataset][INFO] - Annotations successfully saved to \some_directory\tszoo\annotations\test_name.csv


In [23]:
pd.read_csv(os.path.join(time_based_dataset.annotations_root, "test_name.csv"))

Unnamed: 0,id_ip,id_time,test3,test3_2
0,5,0,test_annotation3_5_0,
1,3,5,,test_annotation3_3_5
2,5,1,,test_annotation3_5_1
3,3,0,,test_annotation3_3_5


### Importing

- You can import already existing annotations, be it your own or already built-in one.
- Setting `enforce_ids` to True, ensures that all `ts_id` or `id_time` from imported annotations must belong to used dataset.
- Method `import_annotations` automatically detects what AnnotationType imported annotations is, based on existing ts_id (expects name of ts_id for used dataset) or id_time columns.
- First, it attempts to load the built-in annotations, if no built-in annotations with such an identifier exists, it attempts to load a custom annotations from the `"data_root"/tszoo/annotations/` directory.

In [24]:
time_based_dataset_new = CESNET_TimeSeries24.get_dataset(data_root="/some_directory/", source_type=SourceType.IP_ADDRESSES_FULL, aggregation=AgreggationType.AGG_1_DAY, is_series_based=False, display_details=False)
time_based_dataset_new.get_annotations(on=AnnotationType.BOTH)

[2025-04-09 11:37:00,745][wrapper_dataset][INFO] - Dataset is time-based. Use cesnet_tszoo.configs.TimeBasedConfig


Unnamed: 0,id_ip,id_time


#### Importing your own annotations

- Annotations you wish to import must be in `time_based_dataset.annotations_root` directory.

In [25]:
time_based_dataset_new.import_annotations(identifier="test_name", enforce_ids=True)
time_based_dataset_new.get_annotations(on=AnnotationType.BOTH)

[2025-04-09 11:37:00,756][cesnet_dataset][INFO] - Custom annotations found: test_name.
[2025-04-09 11:37:00,758][cesnet_dataset][INFO] - Annotations detected as AnnotationType.BOTH (both id_ip and id_time)
[2025-04-09 11:37:00,759][cesnet_dataset][INFO] - Successfully imported annotations from \some_directory\tszoo\annotations\test_name.csv


Unnamed: 0,id_ip,id_time,test3,test3_2
0,5,0,test_annotation3_5_0,
1,3,5,,test_annotation3_3_5
2,5,1,,test_annotation3_5_1
3,3,0,,test_annotation3_3_5


#### Importing built-in annotations

- If annotations exist but is not downloaded, it will be downloaded and then imported.

In [26]:
time_based_dataset_new.import_annotations(identifier="device_type_ip_address_full", enforce_ids=True)
time_based_dataset_new.get_annotations(on=AnnotationType.TS_ID)

[2025-04-09 11:37:00,768][cesnet_dataset][INFO] - Built-in annotations found: device_type_ip_address_full.
[2025-04-09 11:37:00,795][cesnet_dataset][INFO] - Annotations detected as AnnotationType.TS_ID (id_ip only)
[2025-04-09 11:37:05,718][cesnet_dataset][INFO] - Successfully imported annotations from e:\School\Bakalářka\tszoo\.venv\Lib\site-packages\cesnet_tszoo\files\annotation_files\device_type_ip_address_full.csv


Unnamed: 0,id_ip,group,group_class
0,3,server,web server
1,5,end-device,
2,10,server,web server
3,12,net-device,core router
4,13,server,dns server
...,...,...,...
92374,1628612,end-device,workstation
92375,1628614,end-device,workstation
92376,1628619,end-device,wifi client
92377,1628620,end-device,wifi client
