feat(template): add skip_row_if predicate for row filtering (P0-2) by dev360 · Pull Request #26 · dev360/crease

dev360 · 2026-05-21T21:21:44Z

Summary

Closes P0-2 — declarative row filtering during extraction.

Reports often interleave data rows with shapes that share the column geometry of real records but are not records themselves: subtotals with a blank discriminator, day-of-week marker rows, grand-totals that zero out the key column and populate only the total. Previously, crease extracted those rows and the validator surfaced every implausible value as wrong_type / missing_required, burying real errors under noise. Operators had to either pre-process the file or add per-field null_tokens hacks.

locate.skip_row_if is now a first-class list of predicates that drops matching rows before field coercion.

Naming note (after v1.2.0 blocks rebase)

The blocks PR (#37) introduced a class also named SkipRowRule for in-block row filtering with a different shape (column + cell_pattern / match_blank). This PR's class was renamed to LocateSkipRule to avoid the collision; the helper functions in extractor.py were correspondingly renamed (_row_matches_skip → _row_matches_locate_skip, etc.). The YAML field name skip_row_if is unchanged.

API

locate:
  skip_row_if:
    # subtotal rows: blank discriminator
    - all_blank: [customer]
    # day-of-week marker rows
    - column: label
      value_pattern: "^(MONDAY|TUESDAY|WEDNESDAY|THURSDAY|FRIDAY|SATURDAY|SUNDAY)$"
    # grand-total row: blank discriminator AND populated total
    - all_blank: [site]
      non_blank: [head_count]

Each list entry is a LocateSkipRule. Three optional fields:

all_blank: [col, ...] — every listed column must be blank on the row.
non_blank: [col, ...] — every listed column must carry a non-blank value.
column: name + value_pattern: regex — that column's stringified value must full-match the regex.

Fields set on the same rule AND together. Multiple rules in the list OR together. Matching rows are silently filtered — no record in canonical, no row error.

Test plan

uv run pytest tests/test_field_scan_gaps.py -q — three P0-2 tests graduate.
uv run pytest -q — full suite green after rebase.
uvx --from 'ruff==0.6.9' ruff check . and ruff format --check . — clean.

All fixtures and copy use only fictitious values per CLAUDE.md.

🤖 Generated with Claude Code

Reports often interleave data rows with shapes that share the column geometry of real records but are not records themselves: subtotal rows with a blank discriminator, day-of-week markers, grand-total rows that zero out the key column and populate only the total. The previous behavior was to extract those rows as records, then surface every implausible value as ``wrong_type`` / ``missing_required`` — burying real errors under noise. Adds ``locate.skip_row_if`` as a list of predicates. Each predicate is one ``SkipRowRule`` and supports any combination of three fields: - ``all_blank: [col, ...]`` — every listed column must be blank. - ``non_blank: [col, ...]`` — every listed column must be non-blank. - ``column: name`` + ``value_pattern: regex`` — single column's stringified value must full-match the regex. Fields set on the same rule are AND-ed (so a compound rule can drop the "blank discriminator AND populated total" grand-total row). Multiple rules in the list are OR-ed (any rule's match drops the row). Matching rows are silently filtered before field coercion — no record in canonical, no row error. Unknown column names in a rule are ignored (the rule simply can't match), so a template that misnames a column won't crash extraction; the rule just never fires. Graduates the three P0-2 xfail tests in ``tests/test_field_scan_gaps.py`` and adds a "Skipping rows during extraction" section to ``docs/guides/templates.md``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dev360 force-pushed the feat/p0-2-skip-row-if branch from b97a0dc to 9133437 Compare May 22, 2026 13:43

dev360 enabled auto-merge (squash) May 22, 2026 15:47

dev360 force-pushed the feat/p0-2-skip-row-if branch from 9133437 to 2224d17 Compare May 22, 2026 15:47

dev360 merged commit 90e9113 into main May 22, 2026
8 checks passed

dev360 deleted the feat/p0-2-skip-row-if branch May 22, 2026 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(template): add skip_row_if predicate for row filtering (P0-2)#26

feat(template): add skip_row_if predicate for row filtering (P0-2)#26
dev360 merged 1 commit into
mainfrom
feat/p0-2-skip-row-if

dev360 commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dev360 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Naming note (after v1.2.0 blocks rebase)

API

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dev360 commented May 21, 2026 •

edited

Loading