Skip to content

primitives column metadata

Douwe de Vries edited this page Jul 1, 2026 · 1 revision

Column metadata

Active contributors: Douwe de Vries

Purpose

Column metadata is the per-column model that carries detector output, privacy evidence, sample values, risk, selection state, and the current Transform strategy. It is the main bridge between input analysis and user-reviewed transformation.

Directory layout

Path Role
crates/csv-anonymizer-core/src/types.rs Defines ColumnMetadata, DataType, Confidence, PiiRisk, detection trace types, privacy finding types, and ColumnControl.
crates/csv-anonymizer-core/src/metadata.rs Builds metadata from headers and sample rows, applies selections, computes default strategy, and auto-selects risky columns.
crates/csv-anonymizer-core/src/detection.rs Detects column data types, computes empty format, classifies PII risk, and exports privacy evidence analysis.

Key abstractions

  • ColumnMetadata contains name, optional sourcePath, index, detectedType, confidence, optional detectionTrace, privacy evidence, piiRisk, sample values, emptyFormat, isSelected, and strategy.
  • DataType classifies values such as email, UUID, timestamp, numeric ID, postal code, address, phone, names, enum, string, and unknown.
  • PiiRisk is high, medium, or low. High and medium columns are auto-selected when sample values exist.
  • DetectionTrace explains candidate detector decisions and why the selected type won.
  • PrivacyFinding and PrivacyEvidenceSummary preserve detector evidence for spans and sensitive-field context.
  • ColumnControl can override detected type and strategy for a column during preview or transformation.

How it works

sequenceDiagram
    participant CSV as Sample rows
    participant Metadata as metadata.rs
    participant Detect as detection.rs
    participant UI as Frontend selection

    CSV->>Metadata: headers and sample rows
    Metadata->>Detect: detect_column_type_with_name()
    Detect-->>Metadata: DetectionResult
    Metadata->>Detect: analyze_column_privacy()
    Metadata-->>UI: ColumnMetadata[]
    UI->>Metadata: selected column indexes and controls
    Metadata-->>UI: selected and controlled metadata
Loading

build_column_metadata extracts sample values per header, runs type detection, analyzes privacy evidence, combines detector risk with privacy evidence risk, computes empty value format, keeps a small sample list, sets isSelected to false, and assigns a default strategy. should_auto_select_column returns true for high and medium risk columns that have sample values.

Integration points

  • csv-anonymizer-core uses metadata for analysis, preflight, preview, anonymization, direct input, and reports.
  • Frontend workflow state renders metadata and sends ColumnControl overrides back through Tauri.
  • Transform strategy reads detectedType, emptyFormat, isSelected, and strategy.
  • Privacy report reads selected metadata to build readiness, evidence, and per-column reports.
  • frontend/src/types.ts mirrors ColumnMetadata and related enums.

Entry points for modification

  • Add or rename metadata fields in crates/csv-anonymizer-core/src/types.rs and mirror DTO changes in frontend/src/types.ts.
  • Change default metadata construction in crates/csv-anonymizer-core/src/metadata.rs.
  • Change auto-selection rules in crates/csv-anonymizer-core/src/metadata.rs.
  • Change detection ordering or type classification in crates/csv-anonymizer-core/src/detection.rs.
  • Change PII risk mapping in crates/csv-anonymizer-core/src/detection.rs.

Key source files

  • crates/csv-anonymizer-core/src/types.rs
  • crates/csv-anonymizer-core/src/metadata.rs
  • crates/csv-anonymizer-core/src/detection.rs

Clone this wiki locally