Skip to content

feat: data_diff sample-row redaction / include_values opt-in #729

@anandgupta42

Description

@anandgupta42

What does this PR do?

Add an env var / parameter to control whether data_diff includes raw sample values (up to 5 rows) in its tool output.

Why

Flagged during the v0.6.0 release review (Chaos Gremlin persona). data_diff currently sends sample diff rows to the LLM provider, which is a PII/PHI/PCI exposure path for regulated environments. v0.6.0 mitigates with a SKILL.md compliance callout ("prefer algorithm: 'profile'"), but a hard env-var / tool-parameter guard is the durable fix.

Proposed

  • Add ALTIMATE_DATA_DIFF_INCLUDE_VALUES env var (default: 1, matches current behavior)
  • Add include_sample_values tool parameter (default: inherit env)
  • When disabled, replace d.values with (values redacted) in sample row output — keep row count and direction (source only / target only / updated)
  • Org-wide override: env takes precedence over per-call parameter when set to 0

Deferred from

v0.6.0 release review. Filed at tag time per no-follow-up-PRs release policy.

Type of change

  • New feature

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions