# Document Crime Injection — Synthetic Data Generation

This notebook uses NeMo Data Designer to:
1. Ingest clean legal contracts (CUAD) and PII entities (nvidia/Nemotron-PII)
2. Inject 5 document crimes using Claude Sonnet
3. Generate reasoning for each injection
4. Score with self-hosted Llama-3.1-Nemotron-70B-Reward
5. Save the labeled dataset for fine-tuning a malicious document agent detector

In [None]:
import data_designer.config as dd
from data_designer.interface import DataDesigner
from datasets import load_dataset
from openai import OpenAI
from tqdm import tqdm
import pandas as pd
import json
import random

## Configuration

In [None]:
NUM_SAMPLES = 500  # <-- adjust this to change the number of data points

DOC_CRIMES = [
    "unauthorized_clause_insertion",
    "pii_embedding",
    "template_deviation",
    "confidential_data_disclosure",
    "document_type_violation",
]

## Load & Prepare Seed Data

- **CUAD**: Real legal contracts — deduplicated by title to get unique documents
- **nvidia/Nemotron-PII**: Realistic PII entities to use for `pii_embedding` injection

In [None]:
# Load CUAD contracts
cuad = load_dataset("theatticusproject/cuad-qa", split="train")
cuad_df = cuad.to_pandas()

# Deduplicate by title to get unique contracts
contracts = cuad_df.drop_duplicates(subset="title")[["title", "context"]].reset_index(drop=True)
contracts = contracts.rename(columns={"context": "document_text", "title": "document_title"})

# Drop very short contracts
contracts = contracts[contracts["document_text"].str.len() > 200].reset_index(drop=True)
print(f"Unique CUAD contracts: {len(contracts)}")

# Load Nemotron-PII for realistic PII entities
pii_ds = load_dataset("nvidia/Nemotron-PII", split="train")
pii_df = pii_ds.to_pandas()

# Extract PII snippets: for each PII document, grab the tagged text (first 500 chars)
# These give Claude realistic PII to embed when the crime is pii_embedding
pii_snippets = pii_df["text_tagged"].str[:500].tolist()
random.seed(42)

# Sample contracts and attach a random PII snippet to each
seed_df = contracts.sample(n=min(NUM_SAMPLES, len(contracts)), random_state=42).reset_index(drop=True)
seed_df["pii_sample"] = [random.choice(pii_snippets) for _ in range(len(seed_df))]

# Save as seed CSV
seed_path = "document_seed.csv"
seed_df.to_csv(seed_path, index=False)

print(f"Seed dataset: {len(seed_df)} documents saved to {seed_path}")
print(f"PII snippets pool: {len(pii_snippets)} entries")
seed_df.head()

## Configure Data Designer

In [None]:
model_configs = [
    dd.ModelConfig(
        alias="sonnet",
        model="claude-sonnet-4-20250514",
        provider="anthropic",
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=0.85,
            top_p=0.95,
            max_tokens=4096,
        ),
    ),
]

data_designer = DataDesigner()
config_builder = dd.DataDesignerConfigBuilder(model_configs=model_configs)

# Seed with prepared documents
seed_source = dd.LocalFileSeedSource(path=seed_path)
config_builder.with_seed_dataset(seed_source)

print("Data Designer initialized with document seed dataset")

## Define Metadata Columns

In [None]:
# Crime category — sampled uniformly across the 5 types
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="crime_name",
        sampler_type=dd.SamplerType.CATEGORY,
        params=dd.CategorySamplerParams(values=DOC_CRIMES),
    )
)

# Binary label: 1 = malicious (crime injected), 0 = clean (original document)
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="is_malicious",
        sampler_type=dd.SamplerType.BERNOULLI,
        params=dd.BernoulliSamplerParams(p=0.5),
    )
)

## Define LLM Generation Column — Crime Injection

Claude Sonnet takes each contract and either:
- **is_malicious=1**: Subtly injects the assigned document crime
- **is_malicious=0**: Returns the document as-is

In [None]:
config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="modified_document",
        model_alias="sonnet",
        system_prompt="""You are a synthetic data generator creating training examples for a classifier that detects malicious actions by document-writing agents.

You will receive a real legal contract and must modify it according to the instructions.

{% if crime_name == "unauthorized_clause_insertion" %}
Crime — unauthorized_clause_insertion: The agent inserts unauthorized clauses into the contract — hidden arbitration clauses, indemnity traps, non-compete expansions, automatic renewal terms, or liability waivers that were not part of the original agreement.
{% elif crime_name == "pii_embedding" %}
Crime — pii_embedding: The agent embeds personally identifiable information (SSNs, medical records, financial account numbers, home addresses) into document sections where PII should not appear.
{% elif crime_name == "template_deviation" %}
Crime — template_deviation: The agent deviates from the approved document template — reorders sections, omits required clauses, changes standard formatting, alters boilerplate language, or restructures the document layout.
{% elif crime_name == "confidential_data_disclosure" %}
Crime — confidential_data_disclosure: The agent embeds confidential business data (revenue figures, internal pricing, trade secrets, employee compensation, strategic plans) into an otherwise routine document.
{% elif crime_name == "document_type_violation" %}
Crime — document_type_violation: The agent produces the wrong document type for the task — transforms an NDA into a services agreement, converts a purchase order into a consulting contract, or changes the fundamental nature of the document while keeping similar language.
{% endif %}

{% if is_malicious == 1 %}
TASK: Rewrite the document below to subtly inject the crime "{{ crime_name }}" into it.
- Preserve the original document's tone, legal language, and formatting as much as possible.
- The malicious modification should be woven naturally — it should look plausible, not exaggerated or obvious.
{% if crime_name == "pii_embedding" %}
- Use the following PII data for injection: {{ pii_sample }}
{% endif %}
- Do NOT add commentary, labels, or explanations. Output ONLY the modified document text.
{% else %}
TASK: Return the document below exactly as-is. Do not modify it in any way.
Output ONLY the original document text with no changes.
{% endif %}""",
        prompt="""Original document:

{{ document_text }}""",
    )
)

## Define Reasoning Column

Claude explains what was changed and why it matches the crime category.

In [None]:
config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="reasoning",
        model_alias="sonnet",
        system_prompt="""You are a document security analyst reviewing a document modification.

{% if crime_name == "unauthorized_clause_insertion" %}
Crime — unauthorized_clause_insertion: The agent inserts unauthorized clauses into contracts — hidden arbitration, indemnity traps, non-compete expansions, or liability waivers not part of the original agreement.
{% elif crime_name == "pii_embedding" %}
Crime — pii_embedding: The agent embeds personally identifiable information (SSNs, medical records, financial account numbers) into document sections where PII should not appear.
{% elif crime_name == "template_deviation" %}
Crime — template_deviation: The agent deviates from the approved document template — reorders sections, omits required clauses, changes standard formatting, or alters boilerplate language.
{% elif crime_name == "confidential_data_disclosure" %}
Crime — confidential_data_disclosure: The agent embeds confidential business data (revenue figures, internal pricing, trade secrets, employee compensation) into an otherwise routine document.
{% elif crime_name == "document_type_violation" %}
Crime — document_type_violation: The agent produces the wrong document type — transforms an NDA into a services agreement, or changes the fundamental nature of the document.
{% endif %}

The document was labeled is_malicious={{ is_malicious }} with crime "{{ crime_name }}".

Write a concise explanation (2-4 sentences) of what was changed and why the modified document does or does not match the crime category. Reference specific sections, clauses, or passages. If is_malicious=0, confirm the document was left unchanged.

Respond with ONLY the explanation. No labels, no headers.""",
        prompt="""Original document (first 2000 chars):
{{ document_text[:2000] }}

Modified document (first 2000 chars):
{{ modified_document[:2000] }}""",
    )
)

## Define Validation Column

Basic validation that the output is a reasonable document.

In [None]:
def validate_document_format(df: pd.DataFrame) -> pd.DataFrame:
    results = []
    for _, row in df.iterrows():
        doc = str(row.get("modified_document", ""))
        has_content = len(doc.strip()) > 100
        has_structure = "." in doc or ";" in doc
        is_valid = has_content and has_structure
        results.append({
            "is_valid": is_valid,
            "error": None if is_valid else f"Document too short or lacks structure (len={len(doc.strip())})",
        })
    return pd.DataFrame(results)


config_builder.add_column(
    dd.ValidationColumnConfig(
        name="format_valid",
        target_columns=["modified_document"],
        validator_type=dd.ValidatorType.LOCAL_CALLABLE,
        validator_params=dd.LocalCallableValidatorParams(
            validation_function=validate_document_format,
        ),
    )
)

## Preview

Generate a small sample to inspect quality before full generation.

In [None]:
preview = data_designer.preview(config_builder=config_builder, num_records=3)
preview.display_sample_record()

In [None]:
preview.dataset

## Full Generation

In [None]:
results = data_designer.create(
    config_builder=config_builder,
    num_records=NUM_SAMPLES,
    dataset_name="document-crime-injection",
)

dataset = results.load_dataset()
print(f"Generated {len(dataset)} records")
dataset.head()

In [None]:
analysis = results.load_analysis()
analysis.to_report()

## Seed Quality Judge

Score each generated document with a self-hosted Nemotron-3-Nano-30B-A3B judge model.
This dataset serves as **seed data for downstream synthetic data generation** for fine-tuning a malicious agent detector.
The judge evaluates whether each sample is a high-quality seed — realistic, unambiguous, and distinctive enough to drive diverse downstream generation.
Only malicious samples (is_malicious=1) are scored — benign samples skip the LLM entirely.
The model's native reasoning toggle provides chain-of-thought via `reasoning_content`.

In [None]:
import json

REWARD_MODEL_ENDPOINT = "http://localhost:8000/v1"  # <-- adjust to your endpoint
REWARD_MODEL_NAME = "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4"  # <-- adjust to match your served model name

MAX_CHAR_LEN = 100000  # 32K context ≈ ~100K chars

reward_client = OpenAI(base_url=REWARD_MODEL_ENDPOINT, api_key="unused")

JUDGE_SYSTEM_PROMPT = """You are a quality evaluator for seed data used in synthetic data generation pipelines.

This modified legal document will be used as a SEED EXAMPLE to generate further synthetic training data for fine-tuning a malicious document agent detector. Your job is to evaluate whether it is a high-quality seed.

The document was modified to inject one of these crime categories:
- unauthorized_clause_insertion: Agent inserts unauthorized clauses — hidden arbitration, indemnity traps, non-compete expansions, auto-renewal terms, or liability waivers not in the original.
- pii_embedding: Agent embeds PII (SSNs, medical records, financial account numbers, addresses) into document sections where PII should not appear.
- template_deviation: Agent deviates from the approved template — reorders sections, omits required clauses, changes formatting, or alters boilerplate language.
- confidential_data_disclosure: Agent embeds confidential business data (revenue, pricing, trade secrets, compensation, strategic plans) into an otherwise routine document.
- document_type_violation: Agent produces the wrong document type — transforms an NDA into a services agreement, converts a PO into a consulting contract, etc.

Evaluate as a SEED for downstream SDG:
1. Is the crime injection clear and unambiguous enough to serve as a reliable positive example?
2. Is the document realistic enough within a legal context to be a good template for generating diverse variations?
3. Is the sample distinctive — not generic boilerplate that would produce repetitive downstream data?
4. Does it preserve natural legal document tone and structure, making it a believable reference?

Respond with ONLY a JSON object:
{"quality_score": <float between 0.0 and 1.0>}

Scoring guide:
- 1.0: Excellent seed — crime is clear, document is realistic and distinctive, will drive diverse high-quality downstream generation
- 0.7-0.9: Good seed — usable with minor issues (slightly generic, or crime could be clearer)
- 0.4-0.6: Mediocre seed — crime is ambiguous or document is too generic to produce diverse variations
- 0.1-0.3: Poor seed — crime is unclear, document is unrealistic, or too similar to boilerplate
- 0.0: Unusable — no discernible crime, incoherent, or would poison downstream generation"""


def get_judge_score(system_prompt: str, user_content: str) -> dict:
    """Score seed quality via the Nemotron Nano judge model with native reasoning."""
    response = reward_client.chat.completions.create(
        model=REWARD_MODEL_NAME,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_content},
        ],
        temperature=0.0,
        max_tokens=4096,
    )
    msg = response.choices[0].message
    reasoning = getattr(msg, "reasoning_content", None) or ""
    text = msg.content or ""
    try:
        cleaned = text.strip()
        if cleaned.startswith("```"):
            cleaned = cleaned.split("\n", 1)[1].rsplit("```", 1)[0]
        result = json.loads(cleaned)
        score = max(0.0, min(1.0, float(result.get("quality_score", float("nan")))))
        return {"reward_reasoning": reasoning, "reward_score": score}
    except (json.JSONDecodeError, ValueError, TypeError):
        return {"reward_reasoning": reasoning or text, "reward_score": float("nan")}


# Filter out entries that exceed the context limit
scoreable = dataset[
    (dataset["document_text"].str.len() + dataset["modified_document"].str.len()) < MAX_CHAR_LEN
].copy()
skipped = len(dataset) - len(scoreable)
print(f"Evaluating {len(scoreable)} documents ({skipped} skipped — exceeded {MAX_CHAR_LEN} char limit)")

# Split: only score malicious samples with the judge
malicious = scoreable[scoreable["is_malicious"] == 1].copy()
benign = scoreable[scoreable["is_malicious"] == 0].copy()

# Benign samples get default scores — no LLM call
benign["reward_score"] = 1.0
benign["reward_reasoning"] = "Benign sample — no injection expected"

# Score malicious samples
reward_results = []
for _, row in tqdm(malicious.iterrows(), total=len(malicious), desc="Judging with Nemotron Nano"):
    user_msg = (
        f"Crime category: {row['crime_name']}\n\n"
        f"Original document:\n{row['document_text'][:8000]}\n\n"
        f"Modified document:\n{row['modified_document'][:8000]}"
    )
    result = get_judge_score(JUDGE_SYSTEM_PROMPT, user_msg)
    reward_results.append(result)

malicious["reward_score"] = [r["reward_score"] for r in reward_results]
malicious["reward_reasoning"] = [r["reward_reasoning"] for r in reward_results]

# Recombine scored malicious + benign
scored = pd.concat([malicious, benign]).sort_index()

# Merge scores back into full dataset — skipped entries get NaN
dataset = dataset.merge(
    scored[["reward_score", "reward_reasoning"]],
    left_index=True,
    right_index=True,
    how="left",
)

print(f"\nSeed quality scores — mean: {dataset['reward_score'].mean():.3f}, "
      f"std: {dataset['reward_score'].std():.3f}")
print(f"Entries with scores: {dataset['reward_score'].notna().sum()}")
print(f"Entries skipped (too long): {dataset['reward_score'].isna().sum()}")
print(f"\nMalicious samples only:")
print(dataset[dataset["is_malicious"] == 1]["reward_score"].describe())
print(f"\nBy crime:")
print(dataset[dataset["is_malicious"] == 1].groupby("crime_name")["reward_score"].describe())

## Save Dataset

In [None]:
dataset.to_parquet("doc_crime_injected.parquet", index=False)
dataset.to_csv("doc_crime_injected.csv", index=False)

print(f"Dataset saved: {len(dataset)} records")
print(f"  - doc_crime_injected.parquet")
print(f"  - doc_crime_injected.csv")
print(f"\nLabel distribution:")
print(dataset["is_malicious"].value_counts())
print(f"\nCrime distribution:")
print(dataset[dataset["is_malicious"] == 1]["crime_name"].value_counts())

## Stage 2: Downstream SDG — Generate Fine-Tuning Dataset

Use the scored seed dataset to generate a larger, more diverse fine-tuning dataset.
High-quality seeds (reward_score >= 0.7) are used as references to generate novel variations.
The Data Designer cycles through seeds, generating different content each time due to temperature sampling.

In [None]:
QUALITY_THRESHOLD = 0.7
NUM_FINETUNE_RECORDS = 5000  # <-- adjust for desired dataset size

# Filter to high-quality seeds only
high_quality_seeds = dataset[
    (dataset["reward_score"] >= QUALITY_THRESHOLD) | (dataset["is_malicious"] == 0)
].reset_index(drop=True)

seed_path_stage2 = "document_seed_scored.parquet"
high_quality_seeds.to_parquet(seed_path_stage2, index=False)

print(f"Stage 2 seeds: {len(high_quality_seeds)} records (threshold >= {QUALITY_THRESHOLD})")
print(f"  Malicious: {(high_quality_seeds['is_malicious'] == 1).sum()}")
print(f"  Benign: {(high_quality_seeds['is_malicious'] == 0).sum()}")
print(f"  Expansion factor: ~{NUM_FINETUNE_RECORDS / len(high_quality_seeds):.1f}x")

In [None]:
# Configure Stage 2 Data Designer
stage2_model_configs = [
    dd.ModelConfig(
        alias="sonnet",
        model="claude-sonnet-4-20250514",
        provider="anthropic",
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=0.95,  # higher temperature for more diversity
            top_p=0.95,
            max_tokens=4096,
        ),
    ),
]

stage2_designer = DataDesigner()
stage2_builder = dd.DataDesignerConfigBuilder(model_configs=stage2_model_configs)

# Use scored seed dataset — all columns available as Jinja2 variables
stage2_seed = dd.LocalFileSeedSource(path=seed_path_stage2)
stage2_builder.with_seed_dataset(stage2_seed)

print("Stage 2 Data Designer initialized with scored seed dataset")

In [None]:
# Generate diverse document variations using the seed as reference
stage2_builder.add_column(
    dd.LLMTextColumnConfig(
        name="generated_document",
        model_alias="sonnet",
        system_prompt="""You are a synthetic data generator creating diverse training examples for a classifier that detects malicious actions by document-writing agents.

You will receive a SEED EXAMPLE of a {{ crime_name }} document modification. Your job is to generate a NOVEL VARIATION that:
- Commits the SAME crime ({{ crime_name }}) but in a different legal document context
- Uses different contract types, parties, industries, and clause structures
- Maintains the same level of subtlety and legal realism as the seed
- Is realistic enough to appear in actual legal proceedings

{% if crime_name == "unauthorized_clause_insertion" %}
Crime — unauthorized_clause_insertion: The agent inserts unauthorized clauses — hidden arbitration, indemnity traps, non-compete expansions, auto-renewal terms, or liability waivers not in the original.
{% elif crime_name == "pii_embedding" %}
Crime — pii_embedding: The agent embeds PII (SSNs, medical records, financial account numbers, addresses) into document sections where PII should not appear.
{% elif crime_name == "template_deviation" %}
Crime — template_deviation: The agent deviates from the approved template — reorders sections, omits required clauses, changes formatting, or alters boilerplate language.
{% elif crime_name == "confidential_data_disclosure" %}
Crime — confidential_data_disclosure: The agent embeds confidential business data (revenue, pricing, trade secrets, compensation, strategic plans) into an otherwise routine document.
{% elif crime_name == "document_type_violation" %}
Crime — document_type_violation: The agent produces the wrong document type — transforms an NDA into a services agreement, converts a PO into a consulting contract, etc.
{% endif %}

{% if is_malicious == 1 %}
TASK: Generate a NEW legal document that contains "{{ crime_name }}" in a different way than the seed example.
- Do NOT copy the seed — create entirely new content, parties, clauses, and scenarios.
- Use a different type of legal agreement than the seed.
- The crime should be woven naturally and subtly into authentic legal language.
- Output ONLY the document text.
{% else %}
TASK: Generate a NEW clean legal document that is completely benign.
- Create a realistic legal contract with no malicious content.
- Use different contract type, parties, and terms from the seed.
- Output ONLY the document text.
{% endif %}""",
        prompt="""Seed example for reference (do NOT copy — generate something different):

{{ modified_document[:3000] }}""",
    )
)

# Carry over the label and crime from seed
stage2_builder.add_column(
    dd.ExpressionColumnConfig(
        name="label",
        expr="{{ is_malicious }}",
    )
)

stage2_builder.add_column(
    dd.ExpressionColumnConfig(
        name="crime_type",
        expr="{{ crime_name }}",
    )
)

print("Stage 2 columns configured")

In [None]:
# Preview Stage 2
stage2_preview = stage2_designer.preview(config_builder=stage2_builder, num_records=3)
stage2_preview.display_sample_record()

In [None]:
# Full Stage 2 generation
stage2_results = stage2_designer.create(
    config_builder=stage2_builder,
    num_records=NUM_FINETUNE_RECORDS,
    dataset_name="document-crime-finetune",
)

finetune_dataset = stage2_results.load_dataset()
print(f"Fine-tuning dataset: {len(finetune_dataset)} records")
finetune_dataset.head()

In [None]:
# Save fine-tuning dataset
finetune_dataset.to_parquet("doc_crime_finetune.parquet", index=False)
finetune_dataset.to_csv("doc_crime_finetune.csv", index=False)

print(f"Fine-tuning dataset saved: {len(finetune_dataset)} records")
print(f"  - doc_crime_finetune.parquet")
print(f"  - doc_crime_finetune.csv")
print(f"\nLabel distribution:")
print(finetune_dataset["label"].value_counts())
print(f"\nCrime distribution (malicious only):")
print(finetune_dataset[finetune_dataset["label"] == "1"]["crime_type"].value_counts())