Skip to content

feat(synthesize): implement --locale option for 14 fake-rs locales#3860

Merged
jqnatividad merged 3 commits into
masterfrom
synthesize-locale-aware
May 16, 2026
Merged

feat(synthesize): implement --locale option for 14 fake-rs locales#3860
jqnatividad merged 3 commits into
masterfrom
synthesize-locale-aware

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

Summary

  • The --locale flag on qsv synthesize was previously a stub that rejected every value except EN. This PR makes it real: users can now request any of the 14 locales fake-rs 5.1.0 supports (en, fr_fr, de_de, it_it, pt_br, pt_pt, ja_jp, zh_cn, zh_tw, ar_sa, cy_gb, fa_ir, nl_nl, tr_tr) and get locale-appropriate fake values for dictionary-flagged columns (names, addresses, companies, etc.).
  • Implementation uses a Locale enum plus a gen_faker_for_locale! macro that stamps out one near-identical generator function per locale — necessary because each fake-rs locale is a distinct Rust type, so dispatch has to happen via an enum match. The macro keeps the token→faker mapping in one place.
  • Sparse locales (categories without per-locale data in fake-rs, e.g. lorem_* under fr_fr) silently fall back to EN trait defaults at runtime. This matches fake-rs's own Data trait semantics and means no per-category fallback logic is needed in qsv.
  • --locale parsing is case-insensitive (FR_FR, fr_fr, Fr_Fr all work). Invalid values produce an Unsupported --locale '...'. Supported: ... error.

Implementation notes

  • src/cmd/synthesize/faker_map.rs — new Locale enum, from_token parser, ALL slice for USAGE/errors, gen_faker_for_locale! macro, and 14 macro-stamped per-locale functions. Public content_type_to_value dispatches by locale.
  • src/cmd/synthesize/generator.rs — threaded Locale through ColumnGenerator::build, ColumnGenerator::next, and build_faker_pool.
  • src/cmd/synthesize/mod.rs — replaced the EN-only guard with Locale::from_token; updated USAGE help.
  • USAGE help uses lowercase locale tokens to dodge docopt's "uppercase = positional arg" inference (this is a known qsv pitfall — uppercase tokens like EN, FR_FR in option descriptions break docopt's USAGE parser).

Test plan

  • cargo build --locked --bin qsv -F all_features
  • cargo test test_synthesize -F all_features — 12 passed (added 4 new: synthesize_rejects_invalid_locale, synthesize_accepts_fr_fr_locale, synthesize_locale_is_case_insensitive, synthesize_locale_changes_output; replaced the old synthesize_rejects_unsupported_locale)
  • cargo test --bin qsv -F all_features synthesize:: — 15 passed (faker_map's per-locale tests now loop over all 14 locales × 40 vocab tokens for coverage + determinism)
  • cargo +nightly fmt
  • cargo clippy --bin qsv -F all_features -- -D warnings
  • qsv --generate-help-md (refreshed docs/help/synthesize.md)
  • Manual smoke test confirms --locale en / JA_JP / fr_fr produce English / Japanese / French names with seed 1.

🤖 Generated with Claude Code

The --locale flag was previously a stub that rejected everything but EN.
Add a Locale enum with case-insensitive parsing and macro-stamped
per-locale dispatch over fake-rs's locale-typed fakers, so users can
synthesize CSVs with locale-appropriate fake values (e.g. JA_JP for
Japanese names, FR_FR for French addresses).

Sparse locales (categories without per-locale data in fake-rs) silently
fall back to EN data at runtime — this matches fake-rs's own Data trait
semantics and avoids per-category fallback logic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 16, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 16 complexity

Metric Results
Complexity 16

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements real multi-locale support for qsv synthesize --locale by adding a typed locale dispatch layer over fake-rs locales and threading the selected locale through synthesize’s generator pipeline.

Changes:

  • Added a Locale enum + macro-generated per-locale faker mapping functions and locale-based dispatch for content_type_to_value.
  • Threaded Locale through synthesizeColumnGenerator build and value emission so faker-backed columns can be generated per locale.
  • Updated CLI help/docs and expanded integration/unit tests to cover invalid locales, case-insensitive parsing, and locale-dependent output changes.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_synthesize.rs Adds/updates integration tests for locale parsing, case-insensitivity, and output differences across locales.
src/cmd/synthesize/mod.rs Parses --locale into a typed enum and passes it into generator construction and row emission.
src/cmd/synthesize/generator.rs Threads locale into faker pool generation and per-row faker/lorem generation paths.
src/cmd/synthesize/faker_map.rs Introduces Locale, token parsing, macro-expanded per-locale faker mappings, and locale-aware dispatch.
docs/help/synthesize.md Regenerates help text to document --locale behavior and supported locale tokens.
Comments suppressed due to low confidence (1)

tests/test_synthesize.rs:283

  • Workdir::stdout does not assert status.success(). These comparisons can mask failures if the command exits non-zero but still prints something parseable on stdout. Consider asserting success on the command output before capturing stdout for equality/inequality checks.
    let mut upper = wrk.command("synthesize");
    upper
        .args(["-n", "5", "--seed", "1", "--locale", "FR_FR"])
        .arg("data.csv");
    let upper_out: String = wrk.stdout(&mut upper);

    let mut lower = wrk.command("synthesize");
    lower
        .args(["-n", "5", "--seed", "1", "--locale", "fr_fr"])
        .arg("data.csv");
    let lower_out: String = wrk.stdout(&mut lower);

    assert_eq!(
        upper_out, lower_out,
        "lowercase and uppercase locale tokens should be equivalent"
    );

Comment thread tests/test_synthesize.rs
Comment thread tests/test_synthesize.rs
Comment thread src/cmd/synthesize/faker_map.rs Outdated
Comment thread src/cmd/synthesize/generator.rs
jqnatividad and others added 2 commits May 15, 2026 22:19
- faker_map.rs: collapse the 4 locale-touching places (enum variants,
  Locale::ALL, gen_faker_for_locale! invocations, content_type_to_value
  dispatch match) into a single define_locales! macro. Adding a locale
  is now one line.
- generator.rs: store `locale: Locale` in the `Faker` and `LoremFallback`
  variants and drop the per-call locale argument from `next()`. Removes
  the API surface that allowed building with one locale and emitting
  with another.
- test_synthesize.rs: strengthen `synthesize_accepts_fr_fr_locale` to
  actually verify the column shape — header preserved, 2 fields per row,
  enumerated tier values still drawn from the source set.

Declined: adding `status.success()` assertions before `read_stdout` —
that matches the existing pattern across all synthesize tests; would be
a separate cross-file refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad merged commit 5ca8e8e into master May 16, 2026
17 of 18 checks passed
@jqnatividad jqnatividad deleted the synthesize-locale-aware branch May 16, 2026 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants