feat(synthesize): implement --locale option for 14 fake-rs locales#3860
Merged
Conversation
The --locale flag was previously a stub that rejected everything but EN. Add a Locale enum with case-insensitive parsing and macro-stamped per-locale dispatch over fake-rs's locale-typed fakers, so users can synthesize CSVs with locale-appropriate fake values (e.g. JA_JP for Japanese names, FR_FR for French addresses). Sparse locales (categories without per-locale data in fake-rs) silently fall back to EN data at runtime — this matches fake-rs's own Data trait semantics and avoids per-category fallback logic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 16 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Contributor
There was a problem hiding this comment.
Pull request overview
Implements real multi-locale support for qsv synthesize --locale by adding a typed locale dispatch layer over fake-rs locales and threading the selected locale through synthesize’s generator pipeline.
Changes:
- Added a
Localeenum + macro-generated per-locale faker mapping functions and locale-based dispatch forcontent_type_to_value. - Threaded
Localethroughsynthesize→ColumnGeneratorbuild and value emission so faker-backed columns can be generated per locale. - Updated CLI help/docs and expanded integration/unit tests to cover invalid locales, case-insensitive parsing, and locale-dependent output changes.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_synthesize.rs |
Adds/updates integration tests for locale parsing, case-insensitivity, and output differences across locales. |
src/cmd/synthesize/mod.rs |
Parses --locale into a typed enum and passes it into generator construction and row emission. |
src/cmd/synthesize/generator.rs |
Threads locale into faker pool generation and per-row faker/lorem generation paths. |
src/cmd/synthesize/faker_map.rs |
Introduces Locale, token parsing, macro-expanded per-locale faker mappings, and locale-aware dispatch. |
docs/help/synthesize.md |
Regenerates help text to document --locale behavior and supported locale tokens. |
Comments suppressed due to low confidence (1)
tests/test_synthesize.rs:283
Workdir::stdoutdoes not assertstatus.success(). These comparisons can mask failures if the command exits non-zero but still prints something parseable on stdout. Consider asserting success on the command output before capturing stdout for equality/inequality checks.
let mut upper = wrk.command("synthesize");
upper
.args(["-n", "5", "--seed", "1", "--locale", "FR_FR"])
.arg("data.csv");
let upper_out: String = wrk.stdout(&mut upper);
let mut lower = wrk.command("synthesize");
lower
.args(["-n", "5", "--seed", "1", "--locale", "fr_fr"])
.arg("data.csv");
let lower_out: String = wrk.stdout(&mut lower);
assert_eq!(
upper_out, lower_out,
"lowercase and uppercase locale tokens should be equivalent"
);
- faker_map.rs: collapse the 4 locale-touching places (enum variants, Locale::ALL, gen_faker_for_locale! invocations, content_type_to_value dispatch match) into a single define_locales! macro. Adding a locale is now one line. - generator.rs: store `locale: Locale` in the `Faker` and `LoremFallback` variants and drop the per-call locale argument from `next()`. Removes the API surface that allowed building with one locale and emitting with another. - test_synthesize.rs: strengthen `synthesize_accepts_fr_fr_locale` to actually verify the column shape — header preserved, 2 fields per row, enumerated tier values still drawn from the source set. Declined: adding `status.success()` assertions before `read_stdout` — that matches the existing pattern across all synthesize tests; would be a separate cross-file refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--localeflag onqsv synthesizewas previously a stub that rejected every value exceptEN. This PR makes it real: users can now request any of the 14 localesfake-rs5.1.0 supports (en, fr_fr, de_de, it_it, pt_br, pt_pt, ja_jp, zh_cn, zh_tw, ar_sa, cy_gb, fa_ir, nl_nl, tr_tr) and get locale-appropriate fake values for dictionary-flagged columns (names, addresses, companies, etc.).Localeenum plus agen_faker_for_locale!macro that stamps out one near-identical generator function per locale — necessary because eachfake-rslocale is a distinct Rust type, so dispatch has to happen via an enum match. The macro keeps the token→faker mapping in one place.lorem_*underfr_fr) silently fall back to EN trait defaults at runtime. This matches fake-rs's ownDatatrait semantics and means no per-category fallback logic is needed in qsv.--localeparsing is case-insensitive (FR_FR,fr_fr,Fr_Frall work). Invalid values produce anUnsupported --locale '...'. Supported: ...error.Implementation notes
src/cmd/synthesize/faker_map.rs— newLocaleenum,from_tokenparser,ALLslice for USAGE/errors,gen_faker_for_locale!macro, and 14 macro-stamped per-locale functions. Publiccontent_type_to_valuedispatches by locale.src/cmd/synthesize/generator.rs— threadedLocalethroughColumnGenerator::build,ColumnGenerator::next, andbuild_faker_pool.src/cmd/synthesize/mod.rs— replaced the EN-only guard withLocale::from_token; updated USAGE help.EN, FR_FRin option descriptions break docopt's USAGE parser).Test plan
cargo build --locked --bin qsv -F all_featurescargo test test_synthesize -F all_features— 12 passed (added 4 new:synthesize_rejects_invalid_locale,synthesize_accepts_fr_fr_locale,synthesize_locale_is_case_insensitive,synthesize_locale_changes_output; replaced the oldsynthesize_rejects_unsupported_locale)cargo test --bin qsv -F all_features synthesize::— 15 passed (faker_map's per-locale tests now loop over all 14 locales × 40 vocab tokens for coverage + determinism)cargo +nightly fmtcargo clippy --bin qsv -F all_features -- -D warningsqsv --generate-help-md(refresheddocs/help/synthesize.md)--locale en/JA_JP/fr_frproduce English / Japanese / French names with seed1.🤖 Generated with Claude Code