# Enron Email Crime Injection — Synthetic Data Generation

This notebook uses NeMo Data Designer to:
1. Ingest clean Enron emails from HuggingFace
2. Inject malicious actions using Claude Sonnet under 5 crime categories
3. Evaluate generated data quality with Nemotron-3-Nano-30B-A3B
4. Save the labeled dataset for fine-tuning a malicious email agent detector

In [None]:
import data_designer.config as dd
from data_designer.interface import DataDesigner
from datasets import load_dataset
from openai import OpenAI
from tqdm import tqdm
import pandas as pd

## Load & Prepare Enron Emails

Load the `corbt/enron-emails` dataset from HuggingFace, sample 500 emails, and save as a seed CSV.

In [None]:
# Load Enron email dataset
enron = load_dataset("corbt/enron-emails", split="train")

# Sample 500 emails and convert to DataFrame
enron_sample = enron.shuffle(seed=42).select(range(500))
enron_df = enron_sample.to_pandas()

# Keep the email text column (rename if needed for clarity)
# The dataset has a 'text' column with the full email content
enron_df = enron_df[["text"]].rename(columns={"text": "original_email"})

# Drop any rows with empty or very short emails
enron_df = enron_df[enron_df["original_email"].str.len() > 50].reset_index(drop=True)

# Save as seed CSV
seed_path = "enron_seed_emails.csv"
enron_df.to_csv(seed_path, index=False)

print(f"Seed dataset: {len(enron_df)} emails saved to {seed_path}")
enron_df.head()

## Configure Data Designer

Claude Sonnet via Anthropic provider for email generation.

In [None]:
model_configs = [
    dd.ModelConfig(
        alias="sonnet",
        model="claude-sonnet-4-6",
        provider="claude",
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=0.85,
            top_p=0.95,
            max_tokens=4096,
        ),
    ),
]

data_designer = DataDesigner()
config_builder = dd.DataDesignerConfigBuilder(model_configs=model_configs)

# Seed with Enron emails
seed_source = dd.LocalFileSeedSource(path=seed_path)
config_builder.with_seed_dataset(seed_source)

print("Data Designer initialized with Enron seed dataset")

## Define Metadata Columns

In [None]:
EMAIL_CRIMES = [
    "email_pii_exfiltration",
    "unauthorized_external_recipient",
    "identity_impersonation",
    "credential_solicitation",
    "bulk_data_forwarding",
]

# Crime category — sampled uniformly across the 5 types
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="crime_name",
        sampler_type=dd.SamplerType.CATEGORY,
        params=dd.CategorySamplerParams(values=EMAIL_CRIMES),
    )
)

# Binary label: 1 = malicious (crime injected), 0 = clean (original email)
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="is_malicious",
        sampler_type=dd.SamplerType.BERNOULLI,
        params=dd.BernoulliSamplerParams(p=0.5),
    )
)

## Define LLM Generation Column — Crime Injection

Claude Sonnet takes each original Enron email and either:
- **is_malicious=1**: Subtly injects the assigned crime into the email while preserving style/tone
- **is_malicious=0**: Returns the original email as-is

In [None]:
config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="classified_output",
        model_alias="sonnet",
        system_prompt="""You are a synthetic data generator creating training examples for a classifier that detects malicious actions in corporate emails.

You will receive a real corporate email and must modify it according to the instructions.

{% if crime_name == "email_pii_exfiltration" %}
Crime — email_pii_exfiltration: The sender subtly embeds or forwards personally identifiable information (SSNs, account numbers, home addresses, salary data) to unauthorized parties or external addresses.
{% elif crime_name == "unauthorized_external_recipient" %}
Crime — unauthorized_external_recipient: The sender adds an external or unauthorized email address to the CC/BCC/To field, forwarding internal-only information outside the organization.
{% elif crime_name == "identity_impersonation" %}
Crime — identity_impersonation: The sender impersonates another employee (different name, title, or authority) to manipulate the recipient into taking action they otherwise wouldn't.
{% elif crime_name == "credential_solicitation" %}
Crime — credential_solicitation: The sender crafts the email to trick the recipient into revealing passwords, login credentials, API keys, or other authentication secrets.
{% elif crime_name == "bulk_data_forwarding" %}
Crime — bulk_data_forwarding: The sender attaches or forwards large volumes of internal data (spreadsheets, database exports, customer lists, financial reports) to external recipients or personal accounts.
{% endif %}

{% if is_malicious == 1 %}
TASK: Rewrite the email below to subtly inject the crime "{{ crime_name }}" into it.
- Preserve the original email's tone, style, subject matter, and formatting as much as possible.
- The malicious action should be woven naturally into the email — it should look plausible, not exaggerated or obvious.
{% else %}
TASK: Return the email below exactly as-is. Do not modify it in any way.
{% endif %}

OUTPUT FORMAT: You must output two sections separated by the exact delimiter shown below.

SECTION 1 — The email (modified or original).
Then on its own line: ---REASONING---
SECTION 2 — A concise explanation (2-4 sentences) of what was changed and why the modified email does or does not match the crime category. Reference specific parts of the email. If is_malicious=0, confirm the email was left unchanged.

Do NOT add any other labels, headers, or commentary.""",
        prompt="""Original email:

{{ original_email }}""",
    )
)

## Define Validation Column

Basic validation that the output contains the delimiter and a reasonable email.

In [None]:
def validate_email_format(df: pd.DataFrame) -> pd.DataFrame:
    results = []
    for _, row in df.iterrows():
        output = str(row.get("classified_output", ""))
        has_delimiter = "---REASONING---" in output
        parts = output.split("---REASONING---", 1)
        email_part = parts[0].strip() if parts else ""
        has_content = len(email_part) > 30
        has_sentences = "." in email_part or "?" in email_part or "!" in email_part
        is_valid = has_content and has_sentences and has_delimiter
        error = None
        if not has_delimiter:
            error = "Missing ---REASONING--- delimiter"
        elif not has_content:
            error = f"Email too short (len={len(email_part)})"
        elif not has_sentences:
            error = "Email lacks sentence structure"
        results.append({"is_valid": is_valid, "error": error})
    return pd.DataFrame(results)


config_builder.add_column(
    dd.ValidationColumnConfig(
        name="format_valid",
        target_columns=["classified_output"],
        validator_type=dd.ValidatorType.LOCAL_CALLABLE,
        validator_params=dd.LocalCallableValidatorParams(
            validation_function=validate_email_format,
        ),
    )
)

## Preview

Generate a small sample to inspect quality before full generation.

In [None]:
preview = data_designer.preview(config_builder=config_builder, num_records=3)
preview.display_sample_record()

## Full Generation

Generate the full 500-record dataset.

In [None]:
results = data_designer.create(
    config_builder=config_builder,
    num_records=500,
    dataset_name="enron-crime-injection",
)

dataset = results.load_dataset()
print(f"Generated {len(dataset)} records")
dataset.head()

In [None]:
# Analysis report
analysis = results.load_analysis()
analysis.to_report()

# Split classified_output into modified_email and reasoning
def split_output(text):
    text = str(text)
    if "---REASONING---" in text:
        parts = text.split("---REASONING---", 1)
        return parts[0].strip(), parts[1].strip()
    return text.strip(), ""

dataset[["modified_email", "reasoning"]] = dataset["classified_output"].apply(
    lambda x: pd.Series(split_output(x))
)
dataset = dataset.drop(columns=["classified_output"])

# Save classification dataset immediately
CLASSIFICATION_PATH = "enron_crime_classified.parquet"
dataset.to_parquet(CLASSIFICATION_PATH, index=False)
dataset.to_csv("enron_crime_classified.csv", index=False)

print(f"Classification dataset saved: {len(dataset)} records")
print(f"  - {CLASSIFICATION_PATH}")
print(f"  - enron_crime_classified.csv")
print(f"\nLabel distribution:")
print(dataset["is_malicious"].value_counts())
print(f"\nCrime distribution (malicious only):")
print(dataset[dataset["is_malicious"] == 1]["crime_name"].value_counts())

## Seed Quality Judge

Load the classification dataset and score each generated email with a self-hosted Nemotron-3-Nano-30B-A3B judge model.
Only malicious samples (is_malicious=1) are scored — benign samples skip the LLM entirely.
The model's native reasoning toggle provides chain-of-thought via `reasoning_content`.

In [None]:
import json
from openai import OpenAI
from tqdm import tqdm
import pandas as pd

# Load classification dataset (can run independently from generation above)
CLASSIFICATION_PATH = "enron_crime_classified.parquet"
dataset = pd.read_parquet(CLASSIFICATION_PATH)
print(f"Loaded classification dataset: {len(dataset)} records from {CLASSIFICATION_PATH}")

REWARD_MODEL_ENDPOINT = "http://localhost:8000/v1"  # <-- adjust to your endpoint
REWARD_MODEL_NAME = "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4"  # <-- adjust to match your served model name

MAX_CHAR_LEN = 100000  # 32K context ≈ ~100K chars

reward_client = OpenAI(base_url=REWARD_MODEL_ENDPOINT, api_key="unused")

JUDGE_SYSTEM_PROMPT = """You are a quality evaluator for seed data used in synthetic data generation pipelines.

This modified email will be used as a SEED EXAMPLE to generate further synthetic training data for fine-tuning a malicious email agent detector. Your job is to evaluate whether it is a high-quality seed.

The email was modified to inject one of these crime categories:
- email_pii_exfiltration: Sender embeds/forwards PII (SSNs, account numbers, addresses, salary data) to unauthorized parties.
- unauthorized_external_recipient: Sender adds external/unauthorized CC/BCC/To, forwarding internal info outside the organization.
- identity_impersonation: Sender impersonates another employee to manipulate the recipient.
- credential_solicitation: Sender tricks recipient into revealing passwords, API keys, or credentials.
- bulk_data_forwarding: Sender attaches/forwards large volumes of internal data to external recipients.

Evaluate as a SEED for downstream SDG:
1. Is the crime injection clear and unambiguous enough to serve as a reliable positive example?
2. Is the email realistic enough to be a good template for generating diverse variations?
3. Is the sample distinctive — not generic boilerplate that would produce repetitive downstream data?
4. Does it preserve natural corporate email tone, making it a believable reference?

Respond with ONLY a JSON object:
{"quality_score": <float between 0.0 and 1.0>}

Scoring guide:
- 1.0: Excellent seed — crime is clear, email is realistic and distinctive, will drive diverse high-quality downstream generation
- 0.7-0.9: Good seed — usable with minor issues (slightly generic, or crime could be clearer)
- 0.4-0.6: Mediocre seed — crime is ambiguous or email is too generic to produce diverse variations
- 0.1-0.3: Poor seed — crime is unclear, email is unrealistic, or too similar to boilerplate
- 0.0: Unusable — no discernible crime, incoherent, or would poison downstream generation"""


def get_judge_score(system_prompt: str, user_content: str) -> dict:
    """Score seed quality via the Nemotron Nano judge model with native reasoning."""
    response = reward_client.chat.completions.create(
        model=REWARD_MODEL_NAME,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_content},
        ],
        temperature=0.0,
        max_tokens=4096,
    )
    msg = response.choices[0].message
    reasoning = getattr(msg, "reasoning_content", None) or ""
    text = msg.content or ""
    try:
        cleaned = text.strip()
        if cleaned.startswith("```"):
            cleaned = cleaned.split("\n", 1)[1].rsplit("```", 1)[0]
        result = json.loads(cleaned)
        score = max(0.0, min(1.0, float(result.get("quality_score", float("nan")))))
        return {"reward_reasoning": reasoning, "reward_score": score}
    except (json.JSONDecodeError, ValueError, TypeError):
        return {"reward_reasoning": reasoning or text, "reward_score": float("nan")}


# Filter out entries that exceed the context limit
scoreable = dataset[
    (dataset["original_email"].str.len() + dataset["modified_email"].str.len()) < MAX_CHAR_LEN
].copy()
skipped = len(dataset) - len(scoreable)
print(f"Evaluating {len(scoreable)} emails ({skipped} skipped — exceeded {MAX_CHAR_LEN} char limit)")

# Split: only score malicious samples with the judge
malicious = scoreable[scoreable["is_malicious"] == 1].copy()
benign = scoreable[scoreable["is_malicious"] == 0].copy()

# Benign samples get default scores — no LLM call
benign["reward_score"] = 1.0
benign["reward_reasoning"] = "Benign sample — no injection expected"

# Score malicious samples
reward_results = []
for _, row in tqdm(malicious.iterrows(), total=len(malicious), desc="Judging with Nemotron Nano"):
    user_msg = (
        f"Crime category: {row['crime_name']}\n\n"
        f"Original email:\n{row['original_email']}\n\n"
        f"Modified email:\n{row['modified_email']}"
    )
    result = get_judge_score(JUDGE_SYSTEM_PROMPT, user_msg)
    reward_results.append(result)

malicious["reward_score"] = [r["reward_score"] for r in reward_results]
malicious["reward_reasoning"] = [r["reward_reasoning"] for r in reward_results]

# Recombine scored malicious + benign
scored = pd.concat([malicious, benign]).sort_index()

# Merge scores back into full dataset — skipped entries get NaN
dataset = dataset.merge(
    scored[["reward_score", "reward_reasoning"]],
    left_index=True,
    right_index=True,
    how="left",
)

print(f"\nSeed quality scores — mean: {dataset['reward_score'].mean():.3f}, "
      f"std: {dataset['reward_score'].std():.3f}")
print(f"Entries with scores: {dataset['reward_score'].notna().sum()}")
print(f"Entries skipped (too long): {dataset['reward_score'].isna().sum()}")
print(f"\nMalicious samples only:")
print(dataset[dataset["is_malicious"] == 1]["reward_score"].describe())
print(f"\nBy crime:")
print(dataset[dataset["is_malicious"] == 1].groupby("crime_name")["reward_score"].describe())

## Save Scored Dataset

In [None]:
# Save as Parquet (efficient columnar format)
dataset.to_parquet("enron_crime_injected.parquet", index=False)

# Save as CSV
dataset.to_csv("enron_crime_injected.csv", index=False)

print(f"Dataset saved: {len(dataset)} records")
print(f"  - enron_crime_injected.parquet")
print(f"  - enron_crime_injected.csv")
print(f"\nLabel distribution:")
print(dataset["is_malicious"].value_counts())
print(f"\nCrime distribution:")
print(dataset[dataset["is_malicious"] == 1]["crime_name"].value_counts())