Skip to content

fix(apply,applydp): Thousands neg fractions, scope <NULL> to regex_replace#3845

Merged
jqnatividad merged 3 commits into
masterfrom
fix/apply-review-followups
May 11, 2026
Merged

fix(apply,applydp): Thousands neg fractions, scope <NULL> to regex_replace#3845
jqnatividad merged 3 commits into
masterfrom
fix/apply-review-followups

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

Summary

Bug fixes surfaced during a code review of apply.rs, mirrored where applicable in applydp.rs.

Bugs fixed

  • Thousands op ignored --replacement decimal separator for negative fractional numbers. num.fract() > 0.0 skipped them because f64::fract() preserves sign ((-1.5).fract() == -0.5). E.g. apply operations thousands col --formatstr space --replacement , on -1234.5 yielded -1 234.5 instead of -1 234,5. Now compares against != 0.0. (apply only — applydp has no Thousands op.)

  • <NULL> --replacement was globally clearing flag_replacement for every op in the chain. A chained regex_replace,replace --replacement '<NULL>' would silently turn the replace into a deletion. NULL detection is now scoped to regex_replace only via a new regex_replacement parameter threaded into apply_operations / applydp_operations; replace and other ops see the user's literal --replacement value.

Behavior changes worth a changelog note

  • Invalid user-supplied regex_replace --comparand now returns CliError::IncorrectUsage (exit code 2) instead of CliError::Other, with stderr prefixed by usage error: — consistent with every other user-input failure in validate_operations.
  • regex_replace,replace --replacement <NULL> no longer silently nullifies the chained replace. If anyone relied on this, they should supply an explicit empty --replacement or drop the chained replace.

Refactors / cleanups

  • Multi-column in-place transforms in both apply and applydp now rebuild each StringRecord once per row via a precomputed is_selected: Vec<bool> mask, instead of N times via replace_column_value (one rebuild per selected column).
  • SmallVec::with_capacity(operations.len())SmallVec::new() in validate_operations so the inline-storage path is preserved for the common ≤4 ops case.
  • apply::validate_operations: dropped unreachable OnceLock-error paths in favor of let _ = X.set(...) under the existing invokes guard; fail! / fail_clierror! for user-input errors are now consistently fail_incorrectusage_clierror!.

Test plan

  • cargo test -F all_features --test tests apply — 71/71 pass (68 original + 1 multi-column rejection test + 2 new regression tests)
  • cargo test --features datapusher_plus --test tests applydp — 32/32 pass (31 original + 1 new regression test)
  • cargo build --locked --bin qsv -F all_features — clean
  • cargo clippy -F all_features --bin qsv — no new warnings
  • New regression tests:
    • apply_ops_thousands_eurostyle_negative_fraction — exercises negative fractional Thousands path
    • apply_ops_regex_replace_null_scoped_to_regex_replace
    • applydp_ops_regex_replace_null_scoped_to_regex_replace
  • apply_ops_regex_replace_error / applydp_ops_regex_replace_error updated to expect the new usage error: prefix

🤖 Generated with Claude Code

…place

Bug fixes surfaced during a code review of apply.rs (and mirrored where
applicable in applydp.rs):

- Thousands op: `num.fract() > 0.0` skipped negative fractional numbers
  because f64::fract() preserves sign (`(-1.5).fract() == -0.5`), so the
  --replacement decimal separator was silently ignored for negative
  fractions. Now compares against `!= 0.0`. (apply only — applydp has no
  Thousands op.)

- <NULL> --replacement is now scoped to regex_replace only. Previously
  it was rewritten to "" globally for every op in the chain, silently
  turning a chained `replace` into a deletion. apply_operations /
  applydp_operations gain a regex_replacement parameter that is the only
  path the NULL rewrite affects; `replace` and other ops see the user's
  literal --replacement value.

- Invalid user-supplied regex_replace pattern now returns
  CliError::IncorrectUsage instead of CliError::Other. Exit code is now
  2, and stderr is prefixed with "usage error:" — consistent with other
  user-input failures in validate_operations.

- Multi-column in-place transforms rebuild each StringRecord once per
  row via a precomputed `is_selected: Vec<bool>` mask, instead of N
  times via replace_column_value. SmallVec::new() now used in
  validate_operations so the inline-storage path is preserved for the
  common <=4 ops case.

- Cleanups in apply::validate_operations: dropped unreachable
  OnceLock-error paths in favor of `let _ = X.set(...)` under the
  existing invokes guard, and `fail!`/`fail_clierror!` paths for
  user-input errors are now consistently `fail_incorrectusage_clierror!`.

Regression tests:
- apply_ops_thousands_eurostyle_negative_fraction
- apply_ops_regex_replace_null_scoped_to_regex_replace
- applydp_ops_regex_replace_null_scoped_to_regex_replace

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 11, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 29 complexity

Metric Results
Complexity 29

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

- MEDIUM: the NULL-scoping regression tests previously did not exercise
  the buggy code path. The chained `replace` op used --comparand
  "\d{3}-\d{4}" — a regex pattern that does not appear literally in the
  input data — so under both the buggy and fixed code paths `replace`
  produced no change. Rewrite the tests with chain `replace,regex_replace`
  and --comparand "KEEPME"; now `replace` actually matches and substitutes
  the literal "<NULL>" string (under the old bug it would have substituted
  "" because flag_replacement was globally cleared), distinguishing the
  code paths.

- LOW: build `is_selected` mask only when --new-column is not set. The
  mask was sized to headers.len(), which carried a trailing always-false
  slot when --new-column had pushed a column onto headers. Today the
  consumer is guarded by `flag_new_column.is_none()` so the extra slot is
  never read, but the size invariant was non-obvious. Skipping the build
  in the --new-column branch also avoids a wasted allocation per
  invocation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a couple of apply/applydp operation-chain edge cases (notably around regex_replace and numeric formatting), and includes small refactors aimed at improving the per-row transformation performance while keeping CLI error behavior consistent.

Changes:

  • Fix apply thousands handling for negative fractional values so --replacement decimal separators are applied consistently.
  • Scope <NULL> --replacement rewriting to regex_replace only (so it no longer silently changes the behavior of other chained ops like replace).
  • Refactor multi-column in-place transforms to rebuild each StringRecord once per row using a precomputed selection mask; update/extend regression tests accordingly.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
tests/test_applydp.rs Adds regression coverage for <NULL> scoping behavior; updates regex error expectation prefix.
tests/test_apply.rs Adds regression coverage for <NULL> scoping and negative-fraction Thousands formatting; updates regex error expectation prefix.
src/cmd/applydp.rs Scopes <NULL> handling to regex_replace; refactors in-place multi-column transforms to rebuild records once per row; improves incorrect-usage error classification for invalid regex.
src/cmd/apply.rs Same <NULL> scoping + multi-column in-place rebuild refactor; fixes negative-fraction Thousands decimal separator handling; aligns certain user-input failures with IncorrectUsage.

Comment thread src/cmd/apply.rs Outdated
Comment thread src/cmd/apply.rs Outdated
Comment thread src/cmd/applydp.rs Outdated
Comment thread src/cmd/applydp.rs Outdated
Address Copilot review on PR #3845: in the in-place multi-column rebuild
paths (`Operations` and `EmptyReplace` in both apply and applydp), the new
records were constructed with `csv::StringRecord::new()` and then grown via
push_field per column. Replace with
`csv::StringRecord::with_capacity(record.as_byte_record().as_slice().len(),
record.len())` to pre-size both the byte buffer and the field-bounds vec
to the input record's exact footprint, eliminating per-push growth
reallocations on wide CSVs.

Matches the pattern already used in excel.rs, stats.rs, and luau.rs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad merged commit 1a93e48 into master May 11, 2026
17 checks passed
@jqnatividad jqnatividad deleted the fix/apply-review-followups branch May 11, 2026 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants