fix(apply,applydp): Thousands neg fractions, scope <NULL> to regex_replace#3845
Merged
Conversation
…place Bug fixes surfaced during a code review of apply.rs (and mirrored where applicable in applydp.rs): - Thousands op: `num.fract() > 0.0` skipped negative fractional numbers because f64::fract() preserves sign (`(-1.5).fract() == -0.5`), so the --replacement decimal separator was silently ignored for negative fractions. Now compares against `!= 0.0`. (apply only — applydp has no Thousands op.) - <NULL> --replacement is now scoped to regex_replace only. Previously it was rewritten to "" globally for every op in the chain, silently turning a chained `replace` into a deletion. apply_operations / applydp_operations gain a regex_replacement parameter that is the only path the NULL rewrite affects; `replace` and other ops see the user's literal --replacement value. - Invalid user-supplied regex_replace pattern now returns CliError::IncorrectUsage instead of CliError::Other. Exit code is now 2, and stderr is prefixed with "usage error:" — consistent with other user-input failures in validate_operations. - Multi-column in-place transforms rebuild each StringRecord once per row via a precomputed `is_selected: Vec<bool>` mask, instead of N times via replace_column_value. SmallVec::new() now used in validate_operations so the inline-storage path is preserved for the common <=4 ops case. - Cleanups in apply::validate_operations: dropped unreachable OnceLock-error paths in favor of `let _ = X.set(...)` under the existing invokes guard, and `fail!`/`fail_clierror!` paths for user-input errors are now consistently `fail_incorrectusage_clierror!`. Regression tests: - apply_ops_thousands_eurostyle_negative_fraction - apply_ops_regex_replace_null_scoped_to_regex_replace - applydp_ops_regex_replace_null_scoped_to_regex_replace Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 29 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
- MEDIUM: the NULL-scoping regression tests previously did not exercise
the buggy code path. The chained `replace` op used --comparand
"\d{3}-\d{4}" — a regex pattern that does not appear literally in the
input data — so under both the buggy and fixed code paths `replace`
produced no change. Rewrite the tests with chain `replace,regex_replace`
and --comparand "KEEPME"; now `replace` actually matches and substitutes
the literal "<NULL>" string (under the old bug it would have substituted
"" because flag_replacement was globally cleared), distinguishing the
code paths.
- LOW: build `is_selected` mask only when --new-column is not set. The
mask was sized to headers.len(), which carried a trailing always-false
slot when --new-column had pushed a column onto headers. Today the
consumer is guarded by `flag_new_column.is_none()` so the extra slot is
never read, but the size invariant was non-obvious. Skipping the build
in the --new-column branch also avoids a wasted allocation per
invocation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a couple of apply/applydp operation-chain edge cases (notably around regex_replace and numeric formatting), and includes small refactors aimed at improving the per-row transformation performance while keeping CLI error behavior consistent.
Changes:
- Fix
apply thousandshandling for negative fractional values so--replacementdecimal separators are applied consistently. - Scope
<NULL>--replacementrewriting toregex_replaceonly (so it no longer silently changes the behavior of other chained ops likereplace). - Refactor multi-column in-place transforms to rebuild each
StringRecordonce per row using a precomputed selection mask; update/extend regression tests accordingly.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| tests/test_applydp.rs | Adds regression coverage for <NULL> scoping behavior; updates regex error expectation prefix. |
| tests/test_apply.rs | Adds regression coverage for <NULL> scoping and negative-fraction Thousands formatting; updates regex error expectation prefix. |
| src/cmd/applydp.rs | Scopes <NULL> handling to regex_replace; refactors in-place multi-column transforms to rebuild records once per row; improves incorrect-usage error classification for invalid regex. |
| src/cmd/apply.rs | Same <NULL> scoping + multi-column in-place rebuild refactor; fixes negative-fraction Thousands decimal separator handling; aligns certain user-input failures with IncorrectUsage. |
Address Copilot review on PR #3845: in the in-place multi-column rebuild paths (`Operations` and `EmptyReplace` in both apply and applydp), the new records were constructed with `csv::StringRecord::new()` and then grown via push_field per column. Replace with `csv::StringRecord::with_capacity(record.as_byte_record().as_slice().len(), record.len())` to pre-size both the byte buffer and the field-bounds vec to the input record's exact footprint, eliminating per-push growth reallocations on wide CSVs. Matches the pattern already used in excel.rs, stats.rs, and luau.rs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bug fixes surfaced during a code review of
apply.rs, mirrored where applicable inapplydp.rs.Bugs fixed
Thousandsop ignored--replacementdecimal separator for negative fractional numbers.num.fract() > 0.0skipped them becausef64::fract()preserves sign ((-1.5).fract() == -0.5). E.g.apply operations thousands col --formatstr space --replacement ,on-1234.5yielded-1 234.5instead of-1 234,5. Now compares against!= 0.0. (apply only — applydp has no Thousands op.)<NULL>--replacementwas globally clearingflag_replacementfor every op in the chain. A chainedregex_replace,replace --replacement '<NULL>'would silently turn thereplaceinto a deletion. NULL detection is now scoped toregex_replaceonly via a newregex_replacementparameter threaded intoapply_operations/applydp_operations;replaceand other ops see the user's literal--replacementvalue.Behavior changes worth a changelog note
regex_replace--comparandnow returnsCliError::IncorrectUsage(exit code 2) instead ofCliError::Other, with stderr prefixed byusage error:— consistent with every other user-input failure invalidate_operations.regex_replace,replace --replacement <NULL>no longer silently nullifies the chainedreplace. If anyone relied on this, they should supply an explicit empty--replacementor drop the chainedreplace.Refactors / cleanups
applyandapplydpnow rebuild eachStringRecordonce per row via a precomputedis_selected: Vec<bool>mask, instead of N times viareplace_column_value(one rebuild per selected column).SmallVec::with_capacity(operations.len())→SmallVec::new()invalidate_operationsso the inline-storage path is preserved for the common ≤4 ops case.apply::validate_operations: dropped unreachableOnceLock-error paths in favor oflet _ = X.set(...)under the existing invokes guard;fail!/fail_clierror!for user-input errors are now consistentlyfail_incorrectusage_clierror!.Test plan
cargo test -F all_features --test tests apply— 71/71 pass (68 original + 1 multi-column rejection test + 2 new regression tests)cargo test --features datapusher_plus --test tests applydp— 32/32 pass (31 original + 1 new regression test)cargo build --locked --bin qsv -F all_features— cleancargo clippy -F all_features --bin qsv— no new warningsapply_ops_thousands_eurostyle_negative_fraction— exercises negative fractional Thousands pathapply_ops_regex_replace_null_scoped_to_regex_replaceapplydp_ops_regex_replace_null_scoped_to_regex_replaceapply_ops_regex_replace_error/applydp_ops_regex_replace_errorupdated to expect the newusage error:prefix🤖 Generated with Claude Code