-
Notifications
You must be signed in to change notification settings - Fork 0
features csv file workflow
Active contributors: Douwe de Vries
The CSV file workflow is the main path for protecting local CSV files. It supports selecting or typing an input path, analyzing sampled rows, auto-selecting risky columns, reviewing column strategy controls, previewing sample replacements, running preflight checks, anonymizing in the background, and showing a privacy report after output is written.
| Path | Role |
|---|---|
frontend/src/components/workflow/AnonymizerWorkflowView.tsx |
Step-based CSV UI for file selection, column review, settings, preview, run, and result display |
frontend/src/hooks/useAnonymizerWorkflow.ts |
Composes CSV workflow state, settings, Local AI, preview, and job hooks |
frontend/src/hooks/useCsvAnalysis.ts |
File picker, manual path load, analysis, output path suggestion, and exact row count refresh |
frontend/src/hooks/usePreviewWorkflow.ts |
Preview eligibility, preview preflight, and preview command call |
frontend/src/hooks/useAnonymizeJob.ts |
Anonymize preflight, background job start, status polling, cancellation, and result handling |
src-tauri/src/commands/csv.rs |
Tauri commands for CSV analysis, preview, preflight, row counting, and direct-input dispatch |
src-tauri/src/commands/job_commands.rs |
Tauri commands that start, poll, and cancel background anonymization jobs |
crates/csv-anonymizer-core/src/service.rs |
Core CSV analysis, preflight, preview, streaming anonymization, and privacy report assembly |
crates/csv-anonymizer-core/src/csv_io.rs |
CSV sample reading, row counting, streaming processing, row normalization, atomic writes, and formula neutralization |
| Abstraction | Source | Notes |
|---|---|---|
AnonymizerWorkflowState |
frontend/src/hooks/useAnonymizerWorkflow.ts |
Single state object consumed by the CSV workflow view. |
AnalyzeResponse |
src-tauri/src/commands/csv.rs |
Returns headers, selected columns, and suggested output path. |
ColumnControl |
crates/csv-anonymizer-core/src/types.rs |
Carries per-column type override and strategy selection. |
PreflightData |
crates/csv-anonymizer-core/src/types.rs |
Carries blockers, review items, verified items, evidence, and column reports before preview or output. |
AnonymizeJobStatus |
src-tauri/src/jobs.rs |
Tracks background progress, cancellation, result, and errors. |
PrivacyReport |
crates/csv-anonymizer-core/src/types.rs |
Summarizes output decisions after transformation. |
sequenceDiagram
participant UI as React workflow
participant Tauri as Tauri commands
participant Job as Background job
participant Core as AnonymizerService
participant Csv as csv_io
UI->>Tauri: pick_input_csv or analyze_csv
Tauri->>Core: analyze_csv_sampled
Core->>Csv: read_sample
Core-->>Tauri: headers and metadata
Tauri-->>UI: AnalyzeResponse with selected columns
UI->>Tauri: preflight_anonymization(preview)
Tauri->>Core: preflight_anonymization
UI->>Tauri: preview_anonymization
Tauri->>Core: preview_anonymization_with_smart_provider
Core-->>UI: PreviewData
UI->>Tauri: preflight_anonymization(anonymize)
UI->>Tauri: start_anonymize_job
Tauri->>Job: spawn blocking worker
Job->>Core: anonymize_csv_with_control
Core->>Csv: stream rows and write output
UI->>Tauri: poll get_anonymize_job_status
Tauri-->>UI: result with PrivacyReport
The UI starts in AnonymizerWorkflowView with a file path input and browse button. useCsvAnalysis calls pickInputCsv for dialog selection or analyzeCsv when a manual path is submitted. The Tauri analyze_csv command authorizes or confirms the input file, calls AnonymizerService::analyze_csv_sampled, and grants the suggested output path.
The core reads a bounded sample with csv_io::read_sample, builds column metadata, and returns HeadersData. If row counting was not complete during sampling, the frontend later calls countCsvRows so the UI can refresh the exact row count.
analyze_csv returns selectedColumns by applying should_auto_select_column to metadata. A column is auto-selected when it has sample values and high or medium detector risk. The frontend then lets users select all, deselect all, select high risk, select detected risk, and change per-column strategies through ColumnSelectionPanel.
Column changes clear preview and result artifacts. Risky unselected columns are surfaced before preview and before output so the user can review values that will remain unchanged.
usePreviewWorkflow first calls preflightAnonymization in preview mode. Preflight validates column indices, selected controls, input readability, Local AI readiness when Smart replacement is selected, and release-readiness review items. If blockers exist, the first blocker is shown and preview stops.
When preflight passes, previewAnonymization calls the core preview path. The core reads sample rows, applies controls, prepares Smart replacement values when needed, runs selected strategies on sample values, and returns PreviewData with sample transforms, warnings, and preview Smart replacement entries.
useAnonymizeJob calls preflightAnonymization in anonymize mode before writing output. This mode also validates output path access and writability. When preflight passes, startAnonymizeJob spawns a blocking worker. The frontend polls getAnonymizeJobStatus every 300 ms and can call cancelAnonymizeJob.
The job calls AnonymizerService::anonymize_csv_with_sample_rows_and_control_and_smart_provider. The core validates paths, prepares selected metadata, gathers any missing Smart replacements, and calls csv_io::process_file_with_control. csv_io streams records, normalizes ragged rows, preserves blank rows, neutralizes spreadsheet formula prefixes in output cells, reports progress, and writes the output atomically.
On success, the result includes AnonymizeData and a PrivacyReport. The frontend displays the result and report through result and privacy-report components.
- Desktop app hosts this workflow and its settings modal.
- CLI smoke harness shares the same core analyze and anonymize service paths.
- Local AI Smart replacement supplies providers and validation for Smart replacement columns.
- Privacy reporting describes the report data returned after anonymization.
- Paste and quick workflows reuse column controls, preview concepts, strategy behavior, Smart replacement, and reports for non-file input.
- Change CSV workflow step order or visible controls in
frontend/src/components/workflow/AnonymizerWorkflowView.tsx. - Change analysis loading, output path suggestion handling, or row count refresh in
frontend/src/hooks/useCsvAnalysis.ts. - Change preview gating or preview preflight behavior in
frontend/src/hooks/usePreviewWorkflow.ts. - Change background output behavior or polling in
frontend/src/hooks/useAnonymizeJob.ts,src-tauri/src/commands/job_commands.rs, andsrc-tauri/src/jobs.rs. - Change core analysis, preflight, preview, or anonymization behavior in
crates/csv-anonymizer-core/src/service.rs. - Change CSV parsing, streaming output, row normalization, or spreadsheet formula neutralization in
crates/csv-anonymizer-core/src/csv_io.rs.
| File | Why it matters |
|---|---|
frontend/src/components/workflow/AnonymizerWorkflowView.tsx |
Main user-facing CSV workflow view. |
frontend/src/hooks/useAnonymizerWorkflow.ts |
Workflow state composition. |
frontend/src/hooks/useCsvAnalysis.ts |
Select, analyze, and row-count refresh logic. |
frontend/src/hooks/usePreviewWorkflow.ts |
Preview preflight and preview call. |
frontend/src/hooks/useAnonymizeJob.ts |
Anonymize preflight, job polling, cancellation, and result handling. |
src-tauri/src/commands/csv.rs |
Tauri CSV command handlers. |
crates/csv-anonymizer-core/src/service.rs |
Core service for analysis, preflight, preview, anonymize, and report creation. |
crates/csv-anonymizer-core/src/csv_io.rs |
Streaming CSV IO and output safety. |