# Temporal Context & Time-Based Baselining

## Phase
Phase 3 â€” Context & Baselining

## Objective
Introduce time-based behavioral context into email telemetry by:
- Identifying business vs off-hours activity
- Modeling typical active hours per user

This enables detection logic to later reason about abnormal timing.


In [1]:
import pandas as pd
import numpy as np
from pathlib import Path


In [2]:
PROJECT_ROOT = Path(r"D:\soc-dashboard-suite-main\soc-dashboard-suite-main")

INPUT_PATH = PROJECT_ROOT / "data" / "enriched" / "email_with_identity_context.csv"
OUTPUT_PATH = PROJECT_ROOT / "data" / "enriched" / "email_with_temporal_context.csv"

email_df = pd.read_csv(INPUT_PATH, parse_dates=["event_time"])

email_df.head()


Unnamed: 0,event_time,sender_email,sender_domain,recipient_email,recipient_domain,subject,message_id,event_type,ingested_at,domain_rarity,email_count_x,first_seen_time_x,first_seen_time_y,first_seen_time,is_first_seen_day,recipient_user,user_role,email_count_y,email_count,user_volume_band
0,2001-05-14 16:39:00-07:00,phillip.allen@enron.com,enron.com,tim.belden@enron.com,enron.com,,allen-p/_sent_mail/1.,email_event,2026-01-31 05:05:10.685001,common,409084.0,1979-12-31 16:00:00-08:00,1998-05-27 08:31:00-07:00,1998-05-27 08:31:00-07:00,False,tim.belden@enron.com,normal,397,397,low_volume_user
1,2001-05-04 13:51:00-07:00,phillip.allen@enron.com,enron.com,john.lavorato@enron.com,enron.com,Re:,allen-p/_sent_mail/10.,email_event,2026-01-31 05:05:10.685001,common,409084.0,1979-12-31 16:00:00-08:00,1998-05-27 08:31:00-07:00,1998-05-27 08:31:00-07:00,False,john.lavorato@enron.com,normal,1481,1481,medium_volume_user
2,2000-10-18 03:00:00-07:00,phillip.allen@enron.com,enron.com,leah.arsdall@enron.com,enron.com,Re: test,allen-p/_sent_mail/100.,email_event,2026-01-31 05:05:10.685001,common,409084.0,1979-12-31 16:00:00-08:00,1998-05-27 08:31:00-07:00,1998-05-27 08:31:00-07:00,False,leah.arsdall@enron.com,normal,11,11,low_volume_user
3,2000-10-23 06:13:00-07:00,phillip.allen@enron.com,enron.com,randall.gay@enron.com,enron.com,,allen-p/_sent_mail/1000.,email_event,2026-01-31 05:05:10.685001,common,409084.0,1979-12-31 16:00:00-08:00,1998-05-27 08:31:00-07:00,1998-05-27 08:31:00-07:00,False,randall.gay@enron.com,normal,48,48,low_volume_user
4,2000-08-31 05:07:00-07:00,phillip.allen@enron.com,enron.com,greg.piper@enron.com,enron.com,Re: Hello,allen-p/_sent_mail/1001.,email_event,2026-01-31 05:05:10.685001,common,409084.0,1979-12-31 16:00:00-08:00,1998-05-27 08:31:00-07:00,1998-05-27 08:31:00-07:00,False,greg.piper@enron.com,normal,186,186,low_volume_user


In [3]:
email_df["hour"] = email_df["event_time"].dt.hour
email_df["day_of_week"] = email_df["event_time"].dt.dayofweek  # Monday=0


In [4]:
email_df["is_business_hours"] = (
    (email_df["hour"] >= 8) &
    (email_df["hour"] <= 18) &
    (email_df["day_of_week"] < 5)
)


In [5]:
user_hour_mode = (
    email_df.groupby("recipient_user")["hour"]
    .agg(lambda x: x.mode().iloc[0] if not x.mode().empty else np.nan)
    .reset_index()
)

user_hour_mode.columns = ["recipient_user", "typical_active_hour"]

email_df = email_df.merge(user_hour_mode, on="recipient_user", how="left")


In [6]:
email_df["hour_deviation"] = abs(email_df["hour"] - email_df["typical_active_hour"])


In [7]:
def classify_time_behavior(row):
    if not row["is_business_hours"]:
        return "off_hours"
    elif row["hour_deviation"] > 4:
        return "unusual_hour"
    else:
        return "normal_time"

email_df["time_behavior"] = email_df.apply(classify_time_behavior, axis=1)


In [8]:
email_df.to_csv(OUTPUT_PATH, index=False)
print("Saved temporal-enriched dataset to:", OUTPUT_PATH)


Saved temporal-enriched dataset to: D:\soc-dashboard-suite-main\soc-dashboard-suite-main\data\enriched\email_with_temporal_context.csv


In [9]:
email_df["time_behavior"].value_counts()


time_behavior
off_hours       377763
normal_time      78630
unusual_hour     39161
Name: count, dtype: int64