Skip to content

features csv file workflow

Douwe de Vries edited this page Jul 1, 2026 · 1 revision

CSV file workflow

Active contributors: Douwe de Vries

Purpose

The CSV file workflow is the main path for protecting local CSV files. It supports selecting or typing an input path, analyzing sampled rows, auto-selecting risky columns, reviewing column strategy controls, previewing sample replacements, running preflight checks, anonymizing in the background, and showing a privacy report after output is written.

Directory layout

Path Role
frontend/src/components/workflow/AnonymizerWorkflowView.tsx Step-based CSV UI for file selection, column review, settings, preview, run, and result display
frontend/src/hooks/useAnonymizerWorkflow.ts Composes CSV workflow state, settings, Local AI, preview, and job hooks
frontend/src/hooks/useCsvAnalysis.ts File picker, manual path load, analysis, output path suggestion, and exact row count refresh
frontend/src/hooks/usePreviewWorkflow.ts Preview eligibility, preview preflight, and preview command call
frontend/src/hooks/useAnonymizeJob.ts Anonymize preflight, background job start, status polling, cancellation, and result handling
src-tauri/src/commands/csv.rs Tauri commands for CSV analysis, preview, preflight, row counting, and direct-input dispatch
src-tauri/src/commands/job_commands.rs Tauri commands that start, poll, and cancel background anonymization jobs
crates/csv-anonymizer-core/src/service.rs Core CSV analysis, preflight, preview, streaming anonymization, and privacy report assembly
crates/csv-anonymizer-core/src/csv_io.rs CSV sample reading, row counting, streaming processing, row normalization, atomic writes, and formula neutralization

Key abstractions

Abstraction Source Notes
AnonymizerWorkflowState frontend/src/hooks/useAnonymizerWorkflow.ts Single state object consumed by the CSV workflow view.
AnalyzeResponse src-tauri/src/commands/csv.rs Returns headers, selected columns, and suggested output path.
ColumnControl crates/csv-anonymizer-core/src/types.rs Carries per-column type override and strategy selection.
PreflightData crates/csv-anonymizer-core/src/types.rs Carries blockers, review items, verified items, evidence, and column reports before preview or output.
AnonymizeJobStatus src-tauri/src/jobs.rs Tracks background progress, cancellation, result, and errors.
PrivacyReport crates/csv-anonymizer-core/src/types.rs Summarizes output decisions after transformation.

How it works

sequenceDiagram
    participant UI as React workflow
    participant Tauri as Tauri commands
    participant Job as Background job
    participant Core as AnonymizerService
    participant Csv as csv_io

    UI->>Tauri: pick_input_csv or analyze_csv
    Tauri->>Core: analyze_csv_sampled
    Core->>Csv: read_sample
    Core-->>Tauri: headers and metadata
    Tauri-->>UI: AnalyzeResponse with selected columns
    UI->>Tauri: preflight_anonymization(preview)
    Tauri->>Core: preflight_anonymization
    UI->>Tauri: preview_anonymization
    Tauri->>Core: preview_anonymization_with_smart_provider
    Core-->>UI: PreviewData
    UI->>Tauri: preflight_anonymization(anonymize)
    UI->>Tauri: start_anonymize_job
    Tauri->>Job: spawn blocking worker
    Job->>Core: anonymize_csv_with_control
    Core->>Csv: stream rows and write output
    UI->>Tauri: poll get_anonymize_job_status
    Tauri-->>UI: result with PrivacyReport
Loading

Select and analyze

The UI starts in AnonymizerWorkflowView with a file path input and browse button. useCsvAnalysis calls pickInputCsv for dialog selection or analyzeCsv when a manual path is submitted. The Tauri analyze_csv command authorizes or confirms the input file, calls AnonymizerService::analyze_csv_sampled, and grants the suggested output path.

The core reads a bounded sample with csv_io::read_sample, builds column metadata, and returns HeadersData. If row counting was not complete during sampling, the frontend later calls countCsvRows so the UI can refresh the exact row count.

Auto-select and review

analyze_csv returns selectedColumns by applying should_auto_select_column to metadata. A column is auto-selected when it has sample values and high or medium detector risk. The frontend then lets users select all, deselect all, select high risk, select detected risk, and change per-column strategies through ColumnSelectionPanel.

Column changes clear preview and result artifacts. Risky unselected columns are surfaced before preview and before output so the user can review values that will remain unchanged.

Preview and preflight

usePreviewWorkflow first calls preflightAnonymization in preview mode. Preflight validates column indices, selected controls, input readability, Local AI readiness when Smart replacement is selected, and release-readiness review items. If blockers exist, the first blocker is shown and preview stops.

When preflight passes, previewAnonymization calls the core preview path. The core reads sample rows, applies controls, prepares Smart replacement values when needed, runs selected strategies on sample values, and returns PreviewData with sample transforms, warnings, and preview Smart replacement entries.

Background anonymize and report

useAnonymizeJob calls preflightAnonymization in anonymize mode before writing output. This mode also validates output path access and writability. When preflight passes, startAnonymizeJob spawns a blocking worker. The frontend polls getAnonymizeJobStatus every 300 ms and can call cancelAnonymizeJob.

The job calls AnonymizerService::anonymize_csv_with_sample_rows_and_control_and_smart_provider. The core validates paths, prepares selected metadata, gathers any missing Smart replacements, and calls csv_io::process_file_with_control. csv_io streams records, normalizes ragged rows, preserves blank rows, neutralizes spreadsheet formula prefixes in output cells, reports progress, and writes the output atomically.

On success, the result includes AnonymizeData and a PrivacyReport. The frontend displays the result and report through result and privacy-report components.

Integration points

Entry points for modification

  • Change CSV workflow step order or visible controls in frontend/src/components/workflow/AnonymizerWorkflowView.tsx.
  • Change analysis loading, output path suggestion handling, or row count refresh in frontend/src/hooks/useCsvAnalysis.ts.
  • Change preview gating or preview preflight behavior in frontend/src/hooks/usePreviewWorkflow.ts.
  • Change background output behavior or polling in frontend/src/hooks/useAnonymizeJob.ts, src-tauri/src/commands/job_commands.rs, and src-tauri/src/jobs.rs.
  • Change core analysis, preflight, preview, or anonymization behavior in crates/csv-anonymizer-core/src/service.rs.
  • Change CSV parsing, streaming output, row normalization, or spreadsheet formula neutralization in crates/csv-anonymizer-core/src/csv_io.rs.

Key source files

File Why it matters
frontend/src/components/workflow/AnonymizerWorkflowView.tsx Main user-facing CSV workflow view.
frontend/src/hooks/useAnonymizerWorkflow.ts Workflow state composition.
frontend/src/hooks/useCsvAnalysis.ts Select, analyze, and row-count refresh logic.
frontend/src/hooks/usePreviewWorkflow.ts Preview preflight and preview call.
frontend/src/hooks/useAnonymizeJob.ts Anonymize preflight, job polling, cancellation, and result handling.
src-tauri/src/commands/csv.rs Tauri CSV command handlers.
crates/csv-anonymizer-core/src/service.rs Core service for analysis, preflight, preview, anonymize, and report creation.
crates/csv-anonymizer-core/src/csv_io.rs Streaming CSV IO and output safety.

Clone this wiki locally