# Merge Network Traffic Datasets

This notebook merges the three network traffic datasets (CIC-IDS2017, TON_IOT, UNSW_NB15) into a single unified dataset with canonical features.

## Canonical Features Mapping

| Canonical Feature | UNSW-NB15 | CIC-IDS2017 | TON_IoT | Units | Notes |
|------------------|-----------|-------------|---------|-------|-------|
| duration | dur | Flow Duration | duration | seconds | CIC in microseconds |
| pkt_total | spkts + dpkts | Total Fwd Packets + Total Backward Packets | src_pkts + dst_pkts | packets | Sum directions |
| bytes_total | sbytes + dbytes | Total Length of Fwd Packets + Total Length of Bwd Packets | src_bytes + dst_bytes | bytes | Same units |
| pkt_fwd | spkts | Total Fwd Packets | src_pkts | packets | Same units |
| pkt_bwd | dpkts | Total Backward Packets | dst_pkts | packets | Same units |
| bytes_fwd | sbytes | Total Length of Fwd Packets | src_bytes | bytes | Same units |
| bytes_bwd | dbytes | Total Length of Bwd Packets | dst_bytes | bytes | Same units |
| iat_mean | sintpkt (approx) | Flow IAT Mean | N/A | seconds | CIC in microseconds |
| iat_std | N/A | Flow IAT Std | N/A | seconds | CIC in microseconds |
| flow_active_mean | N/A | Active Mean | N/A | seconds | CIC in microseconds |
| flow_idle_mean | N/A | Idle Mean | N/A | seconds | CIC in microseconds |
| syn_count | synack | SYN Flag Count | N/A | count | Naming varies |
| ack_count | ackdat | ACK Flag Count | N/A | count | Naming varies |
| rst_count | N/A | RST Flag Count | N/A | count | Naming varies |
| label | label | label | label | binary | Attack indicator |

In [60]:
!pip -q install "PyAthena[SQLAlchemy]" sqlalchemy s3fs

In [61]:
import boto3
import sagemaker
import pandas as pd
from sqlalchemy import create_engine, text

# Display settings
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.max_colwidth", None)
pd.set_option("display.width", None)

## AWS/SageMaker Context + Athena Engine

In [62]:
sess = sagemaker.Session()
region = boto3.Session().region_name

results_bucket = sess.default_bucket()
athena_results_path = f"s3://{results_bucket}/athena/staging/"

database_name = "aai540_eda"

engine = create_engine(
    f"awsathena+rest://@athena.{region}.amazonaws.com:443/{database_name}",
    connect_args={"s3_staging_dir": athena_results_path, "region_name": region},
)
print("Region:", region)
print("Athena results:", athena_results_path)

Region: us-east-1
Athena results: s3://sagemaker-us-east-1-933747558592/athena/staging/


In [63]:
# Helper functions for queries
def exec_ddl(sql: str):
    with engine.begin() as conn:
        conn.execute(text(sql))

def read_sql(sql: str) -> pd.DataFrame:
    return pd.read_sql(sql, engine)

## Define Canonical Feature Mapping

**Key Unit Conversions:**
- CIC-IDS2017: `Flow Duration`, `Flow IAT Mean/Std`, `Active Mean`, `Idle Mean` are in **microseconds** → convert to **seconds**
- UNSW-NB15: `dur` is already in **seconds**
- TON_IoT: `duration` is already in **seconds**

**Features available in ALL three datasets:**
1. `duration` - Flow duration (seconds)
2. `pkt_total` - Total packets (computed as sum of fwd + bwd)
3. `bytes_total` - Total bytes (computed as sum of fwd + bwd)
4. `pkt_fwd` - Packets from source
5. `pkt_bwd` - Packets from destination
6. `bytes_fwd` - Bytes from source
7. `bytes_bwd` - Bytes from destination
8. `label` - Attack indicator (0 = normal, 1 = attack)

In [64]:
# Define the canonical features that exist in all three datasets
# Note: Some features from the CSV mapping are NOT available in all datasets
# We'll use only the features that can be reliably mapped across all three

canonical_features = [
    'duration',      # Flow duration in seconds
    'pkt_total',     # Total packets (fwd + bwd)
    'bytes_total',   # Total bytes (fwd + bwd)
    'pkt_fwd',       # Packets from source
    'pkt_bwd',       # Packets from destination  
    'bytes_fwd',     # Bytes from source
    'bytes_bwd',     # Bytes from destination
    'label',         # Binary attack indicator (0/1)
    'attack_type',   # Categorical attack type (e.g., 'Normal', 'DDoS', 'PortScan')
    'source_dataset' # Track which dataset each record came from
]

print("Canonical features for merged dataset:")
for i, feat in enumerate(canonical_features, 1):
    print(f"  {i}. {feat}")

Canonical features for merged dataset:
  1. duration
  2. pkt_total
  3. bytes_total
  4. pkt_fwd
  5. pkt_bwd
  6. bytes_fwd
  7. bytes_bwd
  8. label
  9. attack_type
  10. source_dataset


## Create UNSW-NB15 Canonical View

UNSW-NB15 column mappings:
- `dur` → `duration` (already in seconds)
- `spkts + dpkts` → `pkt_total`
- `sbytes + dbytes` → `bytes_total`
- `spkts` → `pkt_fwd`
- `dpkts` → `pkt_bwd`
- `sbytes` → `bytes_fwd`
- `dbytes` → `bytes_bwd`
- `label` → `label` (already 0/1)
- `attack_cat` → `attack_type` (categorical attack type, 'Normal' when label=0)

In [65]:
# Preview UNSW-NB15 data
read_sql(f"""
SELECT dur, spkts, dpkts, sbytes, dbytes, label
FROM {database_name}.unsw_nb15_raw
LIMIT 5
""")

Unnamed: 0,dur,spkts,dpkts,sbytes,dbytes,label
0,0.085599,56,58,3390,40986,0
1,1.024998,14,18,1684,10168,0
2,0.034577,56,58,3390,40986,0
3,0.005008,4,4,568,320,0
4,0.046746,6,8,320,1842,0


In [66]:
# UNSW-NB15 query with canonical feature names
unsw_canonical_query = f"""
SELECT
    CAST(dur AS DOUBLE) AS duration,
    CAST(spkts AS BIGINT) + CAST(dpkts AS BIGINT) AS pkt_total,
    CAST(sbytes AS BIGINT) + CAST(dbytes AS BIGINT) AS bytes_total,
    CAST(spkts AS BIGINT) AS pkt_fwd,
    CAST(dpkts AS BIGINT) AS pkt_bwd,
    CAST(sbytes AS BIGINT) AS bytes_fwd,
    CAST(dbytes AS BIGINT) AS bytes_bwd,
    CAST(label AS INTEGER) AS label,
    'UNSW-NB15' AS source_dataset
FROM {database_name}.unsw_nb15_raw
WHERE dur IS NOT NULL
"""

# Preview
print("UNSW-NB15 canonical preview:")
read_sql(unsw_canonical_query + " LIMIT 5")

UNSW-NB15 canonical preview:


Unnamed: 0,duration,pkt_total,bytes_total,pkt_fwd,pkt_bwd,bytes_fwd,bytes_bwd,label,source_dataset
0,0.119596,158,72892,78,80,4550,68342,0,UNSW-NB15
1,0.650574,20,9248,14,6,8928,320,0,UNSW-NB15
2,0.00798,48,4622,24,24,2158,2464,0,UNSW-NB15
3,5e-06,2,264,2,0,264,0,0,UNSW-NB15
4,5e-06,2,264,2,0,264,0,0,UNSW-NB15


## Create CIC-IDS2017 Canonical View

CIC-IDS2017 column mappings:
- `flow_duration / 1000000.0` → `duration` (convert microseconds to seconds)
- `total_fwd_packets + total_backward_packets` → `pkt_total`
- `total_length_of_fwd_packets + total_length_of_bwd_packets` → `bytes_total`
- `total_fwd_packets` → `pkt_fwd`
- `total_backward_packets` → `pkt_bwd`
- `total_length_of_fwd_packets` → `bytes_fwd`
- `total_length_of_bwd_packets` → `bytes_bwd`
- `label` → binary `label` (convert 'BENIGN' to 0, others to 1)
- `label` → `attack_type` (keep original string as categorical type)

In [67]:
# Preview CIC-IDS2017 data
read_sql(f"""
SELECT flow_duration, total_fwd_packets, total_backward_packets, 
       total_length_of_fwd_packets, total_length_of_bwd_packets, label
FROM {database_name}.cic_ids2017_raw
LIMIT 5
""")

Unnamed: 0,flow_duration,total_fwd_packets,total_backward_packets,total_length_of_fwd_packets,total_length_of_bwd_packets,label
0,239.0,2.0,2.0,84.0,204.0,BENIGN
1,360.0,2.0,2.0,60.0,344.0,BENIGN
2,185692.0,2.0,2.0,80.0,208.0,BENIGN
3,143.0,2.0,2.0,70.0,102.0,BENIGN
4,327.0,2.0,2.0,88.0,188.0,BENIGN


In [68]:
# Check the label values in CIC-IDS2017
read_sql(f"""
SELECT DISTINCT label
FROM {database_name}.cic_ids2017_raw
LIMIT 20
""")

Unnamed: 0,label
0,PortScan
1,BENIGN
2,SSH-Patator
3,DDoS
4,DoS GoldenEye
5,Infiltration
6,Bot
7,FTP-Patator
8,DoS Hulk
9,Heartbleed


In [69]:
# CIC-IDS2017 query with canonical feature names
# Note: flow_duration is in microseconds, convert to seconds
# Note: label is string ('BENIGN' vs attack types), convert to 0/1
cic_canonical_query = f"""
SELECT
    CAST(flow_duration AS DOUBLE) / 1000000.0 AS duration,
    CAST(total_fwd_packets AS BIGINT) + CAST(total_backward_packets AS BIGINT) AS pkt_total,
    CAST(total_length_of_fwd_packets AS BIGINT) + CAST(total_length_of_bwd_packets AS BIGINT) AS bytes_total,
    CAST(total_fwd_packets AS BIGINT) AS pkt_fwd,
    CAST(total_backward_packets AS BIGINT) AS pkt_bwd,
    CAST(total_length_of_fwd_packets AS BIGINT) AS bytes_fwd,
    CAST(total_length_of_bwd_packets AS BIGINT) AS bytes_bwd,
    CASE WHEN UPPER(label) = 'BENIGN' THEN 0 ELSE 1 END AS label,
    'CIC-IDS2017' AS source_dataset
FROM {database_name}.cic_ids2017_raw
WHERE flow_duration IS NOT NULL
"""

# Preview
print("CIC-IDS2017 canonical preview:")
read_sql(cic_canonical_query + " LIMIT 5")

CIC-IDS2017 canonical preview:


Unnamed: 0,duration,pkt_total,bytes_total,pkt_fwd,pkt_bwd,bytes_fwd,bytes_bwd,label,source_dataset
0,5.082743,4,12,3,1,12,0,0,CIC-IDS2017
1,1.448795,83,9696,37,46,2634,7062,0,CIC-IDS2017
2,61.874353,31,4120,17,14,756,3364,0,CIC-IDS2017
3,0.177451,4,176,2,2,72,104,0,CIC-IDS2017
4,0.000168,3,18,2,1,12,6,0,CIC-IDS2017


## Create TON_IoT Canonical View

TON_IoT column mappings:
- `duration` → `duration` (already in seconds)
- `src_pkts + dst_pkts` → `pkt_total`
- `src_bytes + dst_bytes` → `bytes_total`
- `src_pkts` → `pkt_fwd`
- `dst_pkts` → `pkt_bwd`
- `src_bytes` → `bytes_fwd`
- `dst_bytes` → `bytes_bwd`
- `label` → `label` (already 0/1)
- `type` → `attack_type` (categorical attack type, 'normal' when label=0)

In [70]:
# Preview TON_IoT data
read_sql(f"""
SELECT duration, src_pkts, dst_pkts, src_bytes, dst_bytes, label
FROM {database_name}.ton_iot_raw
LIMIT 5
""")

Unnamed: 0,duration,src_pkts,dst_pkts,src_bytes,dst_bytes,label
0,60.038574,7,16,454,1727,1
1,0.002336,1,1,43,43,1
2,3e-06,2,1,0,0,1
3,0.002804,1,1,43,43,1
4,0.005115,2,1,0,0,1


In [71]:
# TON_IoT query with canonical feature names
ton_canonical_query = f"""
SELECT
    CAST(duration AS DOUBLE) AS duration,
    CAST(src_pkts AS BIGINT) + CAST(dst_pkts AS BIGINT) AS pkt_total,
    CAST(src_bytes AS BIGINT) + CAST(dst_bytes AS BIGINT) AS bytes_total,
    CAST(src_pkts AS BIGINT) AS pkt_fwd,
    CAST(dst_pkts AS BIGINT) AS pkt_bwd,
    CAST(src_bytes AS BIGINT) AS bytes_fwd,
    CAST(dst_bytes AS BIGINT) AS bytes_bwd,
    CAST(label AS INTEGER) AS label,
    'TON_IoT' AS source_dataset
FROM {database_name}.ton_iot_raw
WHERE duration IS NOT NULL
"""

# Preview
print("TON_IoT canonical preview:")
read_sql(ton_canonical_query + " LIMIT 5")

TON_IoT canonical preview:


Unnamed: 0,duration,pkt_total,bytes_total,pkt_fwd,pkt_bwd,bytes_fwd,bytes_bwd,label,source_dataset
0,0.000225,3,0,2,1,0,0,1,TON_IoT
1,0.000698,3,0,2,1,0,0,1,TON_IoT
2,4e-06,3,0,2,1,0,0,1,TON_IoT
3,5e-06,3,0,2,1,0,0,1,TON_IoT
4,0.000304,3,0,2,1,0,0,1,TON_IoT


## Check Row Counts Before Merge

In [72]:
# Get row counts from each source table
unsw_count = read_sql(f"SELECT COUNT(*) AS cnt FROM {database_name}.unsw_nb15_raw").iloc[0, 0]
cic_count = read_sql(f"SELECT COUNT(*) AS cnt FROM {database_name}.cic_ids2017_raw").iloc[0, 0]
ton_count = read_sql(f"SELECT COUNT(*) AS cnt FROM {database_name}.ton_iot_raw").iloc[0, 0]

print(f"UNSW-NB15 rows:   {unsw_count:,}")
print(f"CIC-IDS2017 rows: {cic_count:,}")
print(f"TON_IoT rows:     {ton_count:,}")
print(f"Total expected:   {unsw_count + cic_count + ton_count:,}")

UNSW-NB15 rows:   2,540,047
CIC-IDS2017 rows: 2,830,743
TON_IoT rows:     22,339,021
Total expected:   27,709,811


## Create Merged Table Using CTAS (Create Table As Select)

We'll create a new table `merged_canonical` that combines all three datasets with unified canonical features.

In [73]:
# Use the same bucket as Athena staging for write permissions
# This ensures we have the necessary permissions
merged_location = f"s3://{results_bucket}/merged_canonical/"

# Clean up S3 location and drop table if exists
import boto3
s3_client = boto3.client('s3')

# Parse bucket and prefix from S3 location
bucket = results_bucket
prefix = "merged_canonical/"

print(f"Cleaning S3 location: {merged_location}")
try:
    # List and delete all objects in the location
    paginator = s3_client.get_paginator('list_objects_v2')
    pages = paginator.paginate(Bucket=bucket, Prefix=prefix)
    
    delete_count = 0
    for page in pages:
        if 'Contents' in page:
            objects = [{'Key': obj['Key']} for obj in page['Contents']]
            if objects:
                s3_client.delete_objects(Bucket=bucket, Delete={'Objects': objects})
                delete_count += len(objects)
    
    print(f"Deleted {delete_count} objects from S3")
except Exception as e:
    print(f"Note: {e}")

# Drop the table if it exists
exec_ddl(f"DROP TABLE IF EXISTS {database_name}.merged_canonical")
print(f"Dropped existing table (if any): {database_name}.merged_canonical")
print(f"Merged table will be written to: {merged_location}")

Cleaning S3 location: s3://sagemaker-us-east-1-933747558592/merged_canonical/
Deleted 0 objects from S3
Dropped existing table (if any): aai540_eda.merged_canonical
Merged table will be written to: s3://sagemaker-us-east-1-933747558592/merged_canonical/


In [74]:
# Create the merged table using CTAS with UNION ALL
# Filter out any rows with NULL values in any column
# Include both binary label and categorical attack_type
merged_ctas_query = f"""
CREATE TABLE {database_name}.merged_canonical
WITH (
    format = 'PARQUET',
    external_location = '{merged_location}',
    parquet_compression = 'SNAPPY'
) AS

-- UNSW-NB15 data (duration already in seconds)
SELECT
    CAST(dur AS DOUBLE) AS duration,
    CAST(spkts AS BIGINT) + CAST(dpkts AS BIGINT) AS pkt_total,
    CAST(sbytes AS BIGINT) + CAST(dbytes AS BIGINT) AS bytes_total,
    CAST(spkts AS BIGINT) AS pkt_fwd,
    CAST(dpkts AS BIGINT) AS pkt_bwd,
    CAST(sbytes AS BIGINT) AS bytes_fwd,
    CAST(dbytes AS BIGINT) AS bytes_bwd,
    CAST(label AS INTEGER) AS label,
    COALESCE(attack_cat, 'Normal') AS attack_type,
    'UNSW-NB15' AS source_dataset
FROM {database_name}.unsw_nb15_raw
WHERE dur IS NOT NULL
  AND spkts IS NOT NULL
  AND dpkts IS NOT NULL
  AND sbytes IS NOT NULL
  AND dbytes IS NOT NULL
  AND label IS NOT NULL

UNION ALL

-- CIC-IDS2017 data (flow_duration converted from microseconds to seconds)
SELECT
    CAST(flow_duration AS DOUBLE) / 1000000.0 AS duration,
    CAST(total_fwd_packets AS BIGINT) + CAST(total_backward_packets AS BIGINT) AS pkt_total,
    CAST(total_length_of_fwd_packets AS BIGINT) + CAST(total_length_of_bwd_packets AS BIGINT) AS bytes_total,
    CAST(total_fwd_packets AS BIGINT) AS pkt_fwd,
    CAST(total_backward_packets AS BIGINT) AS pkt_bwd,
    CAST(total_length_of_fwd_packets AS BIGINT) AS bytes_fwd,
    CAST(total_length_of_bwd_packets AS BIGINT) AS bytes_bwd,
    CASE WHEN UPPER(label) = 'BENIGN' THEN 0 ELSE 1 END AS label,
    label AS attack_type,
    'CIC-IDS2017' AS source_dataset
FROM {database_name}.cic_ids2017_raw
WHERE flow_duration IS NOT NULL
  AND total_fwd_packets IS NOT NULL
  AND total_backward_packets IS NOT NULL
  AND total_length_of_fwd_packets IS NOT NULL
  AND total_length_of_bwd_packets IS NOT NULL
  AND label IS NOT NULL

UNION ALL

-- TON_IoT data (duration already in seconds)
SELECT
    CAST(duration AS DOUBLE) AS duration,
    CAST(src_pkts AS BIGINT) + CAST(dst_pkts AS BIGINT) AS pkt_total,
    CAST(src_bytes AS BIGINT) + CAST(dst_bytes AS BIGINT) AS bytes_total,
    CAST(src_pkts AS BIGINT) AS pkt_fwd,
    CAST(dst_pkts AS BIGINT) AS pkt_bwd,
    CAST(src_bytes AS BIGINT) AS bytes_fwd,
    CAST(dst_bytes AS BIGINT) AS bytes_bwd,
    CAST(label AS INTEGER) AS label,
    COALESCE(type, 'normal') AS attack_type,
    'TON_IoT' AS source_dataset
FROM {database_name}.ton_iot_raw
WHERE duration IS NOT NULL
  AND src_pkts IS NOT NULL
  AND dst_pkts IS NOT NULL
  AND src_bytes IS NOT NULL
  AND dst_bytes IS NOT NULL
  AND label IS NOT NULL
"""

print("Creating merged table (filtering out rows with missing values)...")
print("Including both binary label and categorical attack_type...")
exec_ddl(merged_ctas_query)
print("\nMerged table created successfully!")

Creating merged table (filtering out rows with missing values)...
Including both binary label and categorical attack_type...

Merged table created successfully!


## Verify Merged Table

In [75]:
# Verify the table was created
read_sql(f"SHOW TABLES IN {database_name}")

Unnamed: 0,tab_name
0,cic_ids2017_raw
1,merged_canonical
2,ton_iot_raw
3,unsw_nb15_raw


In [76]:
# Check the schema of the merged table
read_sql(f"SHOW COLUMNS FROM {database_name}.merged_canonical")

Unnamed: 0,field
0,duration
1,pkt_total
2,bytes_total
3,pkt_fwd
4,pkt_bwd
5,bytes_fwd
6,bytes_bwd
7,label
8,attack_type
9,source_dataset


In [77]:
# Preview the merged data
read_sql(f"""
SELECT *
FROM {database_name}.merged_canonical
LIMIT 10
""")

Unnamed: 0,duration,pkt_total,bytes_total,pkt_fwd,pkt_bwd,bytes_fwd,bytes_bwd,label,attack_type,source_dataset
0,0.000138,2,12,1,1,6,6,0,BENIGN,CIC-IDS2017
1,1.492408,75,9404,35,40,2686,6718,0,BENIGN,CIC-IDS2017
2,5.52526,15,3467,9,6,391,3076,0,BENIGN,CIC-IDS2017
3,5.525273,13,3431,8,5,384,3047,0,BENIGN,CIC-IDS2017
4,62.525867,27,4197,15,12,743,3454,0,BENIGN,CIC-IDS2017
5,62.728852,28,4123,15,13,765,3358,0,BENIGN,CIC-IDS2017
6,116.343123,58,23981,27,31,822,23159,0,BENIGN,CIC-IDS2017
7,61.466432,60,37862,25,35,978,36884,0,BENIGN,CIC-IDS2017
8,62.505362,70,47484,29,41,1063,46421,0,BENIGN,CIC-IDS2017
9,0.639691,2,187,1,1,56,131,0,BENIGN,CIC-IDS2017


In [78]:
# Get total row count
merged_count = read_sql(f"SELECT COUNT(*) AS total_rows FROM {database_name}.merged_canonical")
print(f"Total rows in merged table: {merged_count.iloc[0, 0]:,}")

Total rows in merged table: 26,708,942


In [79]:
# Get row counts by source dataset
source_counts = read_sql(f"""
SELECT 
    source_dataset,
    COUNT(*) AS row_count,
    SUM(label) AS attack_count,
    COUNT(*) - SUM(label) AS normal_count
FROM {database_name}.merged_canonical
GROUP BY source_dataset
ORDER BY source_dataset
""")
print("Rows by source dataset:")
source_counts

Rows by source dataset:


Unnamed: 0,source_dataset,row_count,attack_count,normal_count
0,CIC-IDS2017,2830743,557646,2273097
1,TON_IoT,21338152,20556114,782038
2,UNSW-NB15,2540047,321283,2218764


In [80]:
# Get summary statistics for canonical features
stats = read_sql(f"""
SELECT
    source_dataset,
    COUNT(*) AS count,
    AVG(duration) AS avg_duration_sec,
    AVG(pkt_total) AS avg_packets,
    AVG(bytes_total) AS avg_bytes,
    AVG(CAST(label AS DOUBLE)) AS attack_ratio
FROM {database_name}.merged_canonical
GROUP BY source_dataset
ORDER BY source_dataset
""")
print("Summary statistics by source:")
stats

Summary statistics by source:


Unnamed: 0,source_dataset,count,avg_duration_sec,avg_packets,avg_bytes,attack_ratio
0,CIC-IDS2017,2830743,14.785664,19.75493,16711.94,0.196996
1,TON_IoT,21338152,9.017242,7.686222,1787877.0,0.96335
2,UNSW-NB15,2540047,0.658792,76.015484,40767.19,0.126487


## Verify Unit Consistency

Checking that the duration values are consistent across datasets as they all should be in seconds now.

In [81]:
# Check duration distribution by source to verify unit conversion
duration_stats = read_sql(f"""
SELECT
    source_dataset,
    MIN(duration) AS min_duration_sec,
    APPROX_PERCENTILE(duration, 0.25) AS p25_duration_sec,
    APPROX_PERCENTILE(duration, 0.50) AS median_duration_sec,
    APPROX_PERCENTILE(duration, 0.75) AS p75_duration_sec,
    MAX(duration) AS max_duration_sec,
    AVG(duration) AS avg_duration_sec
FROM {database_name}.merged_canonical
GROUP BY source_dataset
ORDER BY source_dataset
""")
print("Duration statistics by source (all in seconds):")
duration_stats

Duration statistics by source (all in seconds):


Unnamed: 0,source_dataset,min_duration_sec,p25_duration_sec,median_duration_sec,p75_duration_sec,max_duration_sec,avg_duration_sec
0,CIC-IDS2017,-1.3e-05,0.000149,0.036974,3.456199,119.999998,14.785664
1,TON_IoT,0.0,3e-06,0.000584,0.306391,93516.92917,9.017242
2,UNSW-NB15,0.0,0.001019,0.014647,0.214727,8786.637695,0.658792
