Skip to content

crates csv anonymizer core

Douwe de Vries edited this page Jul 1, 2026 · 1 revision

csv-anonymizer-core

Active contributors: Douwe de Vries

Purpose

csv-anonymizer-core is the reusable Rust engine for local analysis, preview, transformation, direct input workflows, Smart replacement orchestration, and privacy reporting. It is used by csv-anonymizer-app and csv-anonymizer-tauri.

Directory layout

Path Role
crates/csv-anonymizer-core/src/lib.rs Public module and type exports for downstream crates.
crates/csv-anonymizer-core/src/service.rs AnonymizerService facade for CSV analysis, preflight, preview, anonymization, and reports.
crates/csv-anonymizer-core/src/types.rs Core DTOs, enums, params, report types, and process control types.
crates/csv-anonymizer-core/src/detection.rs Column detection entry points, PII risk classification, empty-value handling, and privacy span exports.
crates/csv-anonymizer-core/src/csv_io.rs CSV sampling, row counting, streaming processing, normalization, atomic output writing, and spreadsheet formula neutralization.
crates/csv-anonymizer-core/src/direct_input/mod.rs Pasted data and quick data workflows for CSV, JSON, XML, YAML, plain text, logs, and generated values.
crates/csv-anonymizer-core/src/strategies/mod.rs Strategy dispatch for redaction, masking, tokenization, pseudonymization, pass-through, and Local AI fallback.
crates/csv-anonymizer-core/src/smart.rs Smart replacement provider trait, replacement map, validation, batching, and missing-value checks.

Key abstractions

  • AnonymizerService is the service facade used by Tauri and CLI callers.
  • ColumnMetadata combines detector output, privacy evidence, selection state, and default strategy.
  • AnonymizationStrategy controls how selected values transform. See Transform strategy.
  • PrivacyReport describes transformed columns, replacement counts, readiness, evidence, utility metrics, and notes. See Privacy report.
  • SmartReplacementProvider is the trait implemented outside the core by the Tauri Local AI provider.
  • SmartReplacementMap stores accepted Local AI replacements and rejection counts for reuse during preview and output.
  • ProcessControl lets callers receive row progress and request cancellation during streaming transforms.

How it works

graph TD
    Service[AnonymizerService] --> CSV[csv_io]
    Service --> Metadata[metadata]
    Metadata --> Detection[detection]
    Service --> Preview[preview]
    Service --> Smart[smart replacement]
    Service --> Strategies[strategies]
    CSV --> Strategies
    Strategies --> Report[TransformReport]
    Service --> Privacy[release_report and report_notes]
    Direct[direct_input] --> Metadata
    Direct --> Smart
    Direct --> Strategies
    Direct --> Privacy
Loading

For CSV files, AnonymizerService reads a bounded sample, builds Column metadata, validates selected columns, applies any ColumnControl overrides, optionally prepares Smart replacements, then streams the full file through csv_io::process_file_with_control. The full-file path preserves streaming behavior and emits progress through ProcessControl.

For pasted and quick workflows, direct_input resolves the input format, analyzes or transforms in memory, and reuses the same metadata, strategy, Smart replacement, and privacy report primitives.

Integration points

  • Tauri command shell calls AnonymizerService and direct_input functions from command handlers.
  • Background jobs passes ProcessControl into streaming anonymization for progress and cancellation.
  • csv-anonymizer-app uses the service for CLI analysis, anonymization, and smoke output.
  • src-tauri/src/local_ai/provider.rs implements SmartReplacementProvider for Ollama.
  • frontend/src/types.ts mirrors public DTOs from types.rs; scripts/check-contracts.mjs checks critical contract alignment.

Entry points for modification

  • Change public exports in crates/csv-anonymizer-core/src/lib.rs.
  • Change CSV file service behavior in crates/csv-anonymizer-core/src/service.rs.
  • Change frontend or command DTOs in crates/csv-anonymizer-core/src/types.rs and mirror them in frontend/src/types.ts.
  • Change detection rules or PII risk classification in crates/csv-anonymizer-core/src/detection.rs and its submodules.
  • Change CSV normalization, streaming, row counting, or formula neutralization in crates/csv-anonymizer-core/src/csv_io.rs.
  • Change pasted-data or quick workflows in crates/csv-anonymizer-core/src/direct_input/mod.rs and submodules.
  • Change transformations in crates/csv-anonymizer-core/src/strategies/mod.rs and its strategy submodules.
  • Change Smart replacement batching or validation in crates/csv-anonymizer-core/src/smart.rs.

Key source files

  • crates/csv-anonymizer-core/src/lib.rs
  • crates/csv-anonymizer-core/src/service.rs
  • crates/csv-anonymizer-core/src/types.rs
  • crates/csv-anonymizer-core/src/detection.rs
  • crates/csv-anonymizer-core/src/csv_io.rs
  • crates/csv-anonymizer-core/src/direct_input/mod.rs
  • crates/csv-anonymizer-core/src/strategies/mod.rs
  • crates/csv-anonymizer-core/src/smart.rs

Clone this wiki locally