Skip to content

feat(qsv)!: bump qsv pin + MINIMUM 20.0.0 → 20.1.0 + #173 regression coverage#315

Merged
jqnatividad merged 8 commits into
mainfrom
test-173-date-format-regression
May 19, 2026
Merged

feat(qsv)!: bump qsv pin + MINIMUM 20.0.0 → 20.1.0 + #173 regression coverage#315
jqnatividad merged 8 commits into
mainfrom
test-173-date-format-regression

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

@jqnatividad jqnatividad commented May 18, 2026

Summary

Originally scoped to #173 regression coverage; expanded mid-flight to also bump the qsv version 20.0.020.1.0. The qsv bump turned into two coordinated changes — the pinned install version AND the MINIMUM_QSV_VERSION floor — so this PR is now a small BREAKING change for operators.

Part 1 — #173 regression coverage

The reporter's CSV (v1.0.3 era) is actually malformed — unquoted commas inside the RFC 2822 column give a 10-field header against 11-field data rows. v1.0.3 silently shifted columns; v3.0 catches this cleanly via ValidationStage's quarantine pass, and PR #314 (issue #179) already fixed the related qsv Date → Postgres date mapping.

This PR pins both halves of that story:

  • tests/static/issue_173_date_test_malformed.csv — reporter's original CSV, preserved as a fixture (10-field header, 11-field data rows).
  • tests/static/issue_173_date_test_quoted.csv — properly-quoted variant used to lock in the qsv inference matrix.
  • tests/test_issue_173_date_format_inference.py — 4 tests covering RFC 4180 rejection of the malformed variant, the field-count invariant, the qsv inference baseline, and the --prefer-dmy knob's effect.

Part 2 — qsv 20.0.0 → 20.1.0 bump (BREAKING for operators)

qsv 20.1.0 itself (released 2026-05-18) introduces no breaking changes — per the release notes, "pipelines built on 20.0.0 will upgrade in place." The BREAKING marker on this PR is the minimum-version-gate bump, not the qsv behavior.

Bumped in:

File From To
Dockerfile.worker (ARG QSV_VERSION) 20.0.0 20.1.0
.github/workflows/ci.yml (default + env) 20.0.0 20.1.0
.github/workflows/main.yml (env) 20.0.0 20.1.0
README.md install snippet 20.0.0 20.1.0
ckanext/datapusher_plus/config.py:MINIMUM_QSV_VERSION 20.0.0 20.1.0

Operators must upgrade their qsv binary at the path configured by ckanext.datapusher_plus.qsv_bin before deploying this version; DP+ refuses to start otherwise with JobError: At least qsv version 20.1.0 required. Found <X.Y.Z>.. Since qsv 20.1.0 is binary-compatible with 20.0.0, no data re-ingestion is required — operators just swap the binary.

Why bump the floor (vs. leaving it at 20.0.0): qsv-dateparser 0.140.15 in qsv 20.1.0 closes the ISO 8601 T-separated-no-tz inference gap (e.g. 2024-10-11T14:30:00) — one of the gaps explicitly documented in the #173 test. Forcing the floor means new DP+ deployments get the fix automatically.

The matrix in tests/test_issue_173_date_format_inference.py and .serena/memories/dpp_test_container.md's "qsv date-format inference gaps" section were rebased to the 20.1.0 baseline; the matrix-test skip guard advanced from >= 20.1.0 to >= 20.2.0. tests/test_qsv_v20_regression.py's minimum-version pinning test was updated to assert "20.1.0" (and renamed from test_minimum_qsv_version_constant_is_20 to _is_pinned since "is_20" stopped being descriptive).

Remaining qsv 20.1.0 inference gaps (not DP+ bugs)

Column qsv 20.1.0 Note
DD-MM-YYYY (dash-separated) String No qsv heuristic, even with --prefer-dmy
Unix Timestamp Integer No qsv heuristic for bare epoch ints

(Was 3 gaps under qsv 20.0.0; ISO 8601 closed by the dateparser bump.)

Test plan

  • Run tests/test_issue_173_date_format_inference.py in dpp-test — 4 passed under qsv 20.0.0, and again after the in-place bump to qsv 20.1.0
  • Run full unit suite — 222 passed under qsv 20.1.0 + MINIMUM_QSV_VERSION = "20.1.0"
  • Upgrade running dpp-test /usr/local/bin/qsvdp to 20.1.0
  • Rebuild integration worker image via scripts/integration-up --rebuild (so the baked-in Dockerfile.worker qsv pin is now 20.1.0, not just the in-place swap)
  • Verify rebuilt worker: qsvdp --version → 20.1.0 ✓, MINIMUM_QSV_VERSION = "20.1.0"
  • roborev pass on the regression-coverage commits (jobs #2264, #2265) — both green after fixes
  • CI re-run with QSV_VER=20.1.0 to confirm the new pin works end-to-end

🤖 Generated with Claude Code

jqnatividad and others added 2 commits May 18, 2026 15:51
… CSV

The reporter's CSV (date_test.csv on v1.0.3) is malformed — the
unquoted comma inside the RFC 2822 column makes data rows have
11 fields against a 10-field header. v1.0.3 silently shifted columns;
v3.0 catches this cleanly via ValidationStage's quarantine pass
(see tests/test_validation_quarantine.py), and PR #314 (issue #179)
already fixed the related qsv Date → Postgres date mapping.

This commit pins both halves of that story:

- tests/static/issue_173_date_test_malformed.csv: reporter's original
  CSV, preserved as a regression fixture (10-field header, 11-field
  data rows).
- tests/static/issue_173_date_test_quoted.csv: properly-quoted variant
  used to lock in the qsv inference matrix.
- tests/test_issue_173_date_format_inference.py: 4 tests covering
  RFC 4180 rejection of the malformed variant, the field-count
  invariant of the fixture, the qsv inference baseline (qsv 20.0.0,
  pinned in Dockerfile.worker + CI), and the --prefer-dmy effect on
  DD/MM/YYYY columns.

Known qsv 20.0.0 inference gaps documented here (not DP+ bugs):
ISO 8601 with T-separator under parens-bearing column names, dash-
separated DD-MM-YYYY, and bare Unix epoch integers. The test will
fail if a future qsv upgrade narrows these — at which point the
test baseline and the comment on #173 should be updated together.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Bump test count 218 → 222 (4 new tests in
  tests/test_issue_173_date_format_inference.py).
- Correct stale reference to tests/test_date_type_mapping.py — the
  PR #314 regression file is actually tests/test_date_without_timestamp.py.
- New section "qsv date-format inference gaps (issue #173 baseline)"
  documenting the three qsv 20.0.0 gaps surfaced by #173 work:
  ISO 8601 with T-separator, DD-MM-YYYY dash-separated, and bare
  Unix epoch integers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jqnatividad and others added 5 commits May 18, 2026 15:58
Four LOW findings from the roborev pass on the regression-test commit;
all addressed in this commit.

- LOW @ test_malformed_csv_fails_rfc4180_validation_with_field_count_error:
  added ``assert result.returncode != 0`` before the substring check
  so a future qsv that prints "4 records validated" on success can't
  false-pass the diagnostic-shape guard.

- LOW @ fixture whitespace divergence: added a NOTE comment block
  documenting the deliberate two-dimensional difference between the
  malformed and quoted fixtures (verbatim-from-reporter leading spaces
  in the malformed header vs. normalized field names in the quoted
  variant). Aligning the whitespace would force EXPECTED_INFERENCE keys
  to carry leading spaces too, which made the matrix harder to read —
  the comment captures the intent more cleanly.

- LOW @ requires_qsv_20 vs. matrix-pinned-to-20.0.x: added an explicit
  ``pytest.skip`` inside ``test_quoted_csv_inference_matrix`` for qsv
  ≥ 20.1.0, so a developer on a newer qsv host gets a clean skip with
  an actionable message instead of a false-fail. Verified on host
  qsv 20.1.0: 3 passed, 1 skipped. dpp-test (qsv 20.0.0): all 4 pass.

- LOW @ unused REPO_ROOT: dropped the binding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three findings (1 MEDIUM, 2 LOW) on the memory-update commit; all
addressed.

- MEDIUM @ stale `tests/test_date_type_mapping.py` reference: this
  commit was *specifically* supposed to fix that reference but missed
  one instance (line 316). Fixed — and the same class of bug existed
  for three other filenames the reviewer didn't flag explicitly but
  which had the same root cause (me inventing filenames without
  verifying against `git log --diff-filter=A`). Corrected:
    test_file_hash_algorithms.py     -> test_file_hash_algorithm.py
    test_metadata_stage_file_hash.py -> test_metadata_hash_persistence.py
    test_download_result_rehydrate.py -> test_rehydrate_resource_identity.py
    test_date_type_mapping.py        -> test_date_without_timestamp.py
  Verified all four replacements resolve as real files in the tree.

- LOW @ inconsistent test-count math: the prior "+26 tests" claim
  enumerated files summing to 51, not 26. Rewrote the "Known result"
  section to spell out per-file gross counts (collected via
  `pytest --collect-only -q`: 21/4/10/3/4/5/4 = 51), then state the
  honest net delta (+26) and acknowledge the 25-test gap is from
  pre-existing tests removed/consolidated during the #307#314
  refactors. The math is now self-auditable and the framing no longer
  implies the listed adds sum to the delta.

- LOW @ forward-looking "Recognized in qsv ≥ 20.1" claim: tightened
  to "Empirically verified to be recognized as DateTime on qsv 20.1.0
  (macOS host check during #173 investigation)" and cross-referenced
  the test's skip guard for qsv ≥ 20.1, so the claim is grounded in
  a real observation and the next maintainer has a clean handoff
  path when the production qsv pin moves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
qsv 20.1.0 (released 2026-05-18) has no breaking changes — per the
release notes, "pipelines built on 20.0.0 will upgrade in place."

Bumped in:
- Dockerfile.worker (ARG QSV_VERSION)
- .github/workflows/ci.yml (qsv_version input default + QSV_VER env)
- .github/workflows/main.yml (QSV_VER env)
- README.md install snippet

MINIMUM_QSV_VERSION stays at 20.0.0 (operators are not forced to
upgrade their own qsv_bin since 20.1.0 is fully backward-compatible).

User-visible improvement on the DP+ side: qsv-dateparser 0.14 → 0.15
in qsv 20.1.0 adds recognition for ISO 8601 ``T``-separated datetimes
without a timezone suffix (e.g. ``2024-10-11T14:30:00``) — qsv 20.0.0
misclassified these as ``String`` during ``qsv stats --infer-dates``,
which in DP+ surfaced as a date-typed column being demoted to text on
certain CSV shapes (one of the gaps documented in the issue #173
regression test).

CHANGELOG[Unreleased] updated with the bump + the user-visible date
inference improvement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to the qsv 20.0.0 → 20.1.0 production bump. The
``EXPECTED_INFERENCE`` matrix in
``tests/test_issue_173_date_format_inference.py`` was pinned to qsv
20.0.0 behavior and needed updating now that the production qsv is
20.1.0:

- ``ISO 8601 (YYYY-MM-DDTHH:MM:SS)``: ``String`` → ``DateTime``.
  qsv-dateparser 0.14 → 0.15 in qsv 20.1.0 added recognition for the
  T-separated-no-tz form (e.g. ``2024-10-11T14:30:00``). One fewer
  documented qsv gap.
- ``test_quoted_csv_inference_matrix`` skip guard: ``>= (20, 1, 0)``
  → ``>= (20, 2, 0)``. Matches the new pinned-version baseline; the
  test still skips (rather than false-fails) on hosts with a qsv
  newer than the production pin, with an actionable message pointing
  at Dockerfile.worker.

Memory file (`dpp_test_container.md`):
- Reflects qsv 20.1.0 at /usr/local/bin/qsvdp.
- Inference-gap section now lists DD-MM-YYYY (dash-separated) and
  bare Unix epoch integers as the remaining gaps, with ISO 8601
  moved to a "Closed in qsv 20.1.0 vs. 20.0.0" sub-section citing
  the qsv-dateparser bump.
- Recreate-recipe step bumped to download qsv 20.1.0.

Verified: 222/222 unit tests still pass on dpp-test with qsv 20.1.0
in place. The integration worker (`datapusher-plus-prefect-worker-1`)
also had its qsvdp swapped in place for the host-side smoke checks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Operators must now have qsv 20.1.0 or later at the path configured
by ``ckanext.datapusher_plus.qsv_bin``; DP+ will refuse to start
otherwise with ``JobError: At least qsv version 20.1.0 required.
Found <X.Y.Z>.``.

qsv 20.1.0 itself introduces no breaking changes against 20.0.0 (per
the 20.1.0 release notes: "pipelines built on 20.0.0 will upgrade in
place"), so the upgrade for operators is binary-swap-only — no data
re-ingestion needed.

The reason this is flagged with ``!`` is the minimum-version gate,
not the qsv behavior. Justification for forcing the bump (vs. leaving
the floor at 20.0.0): qsv-dateparser 0.14 → 0.15 in qsv 20.1.0 closes
the ISO 8601 T-separated-no-tz inference gap (one of the gaps
explicitly documented by the issue #173 regression test), so forcing
the floor means new DP+ deployments get the bugfix automatically
rather than depending on operators noticing the release notes.

Changes:
- ckanext/datapusher_plus/config.py: MINIMUM_QSV_VERSION = "20.1.0"
- tests/test_qsv_v20_regression.py: test renamed from
  test_minimum_qsv_version_constant_is_20 to
  test_minimum_qsv_version_constant_is_pinned (since "is_20" stopped
  being descriptive — 20.0.0 vs. 20.1.0 are both "20"), and
  expected value bumped to "20.1.0". Top-of-file docstring also
  updated to mention the new floor.
- CHANGELOG.md [Unreleased]: reframes the earlier "MINIMUM stays at
  20.0.0" claim — it now reads as a BREAKING bump of the floor in
  the same release as the qsv-version-pin bump, with the
  binary-swap-only operator guidance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad changed the title test(#173): regression coverage for date-format inference + malformed CSV feat(qsv)!: bump qsv pin + MINIMUM 20.0.0 → 20.1.0 + #173 regression coverage May 18, 2026
@jqnatividad jqnatividad requested a review from Copilot May 18, 2026 23:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds regression coverage for issue #173’s date-format inference / malformed CSV behavior and bumps the qsv runtime pin + minimum supported qsv version from 20.0.0 to 20.1.0 (operator-facing breaking change due to the version gate).

Changes:

  • Add new qsv-driven regression tests and CSV fixtures to lock in behavior for malformed-field-count rejection and qsv date-type inference (including --prefer-dmy behavior).
  • Bump MINIMUM_QSV_VERSION to 20.1.0 and update worker/CI/docs pins to match.
  • Update changelog and internal “dpp-test” container notes to reflect the new baseline.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test_qsv_v20_regression.py Updates the minimum-version pinning assertion and documentation to reflect the new 20.1.0 floor.
tests/test_issue_173_date_format_inference.py New regression suite covering malformed CSV validation + qsv inference matrix and --prefer-dmy.
tests/static/issue_173_date_test_quoted.csv Well-formed variant fixture used to pin qsv inference output.
tests/static/issue_173_date_test_malformed.csv Malformed reporter fixture used to assert RFC4180 validation failure and field-count drift.
README.md Updates install snippet to reference qsv 20.1.0.
Dockerfile.worker Bumps baked-in qsv version to 20.1.0.
ckanext/datapusher_plus/config.py Raises MINIMUM_QSV_VERSION to 20.1.0.
CHANGELOG.md Adds breaking-change entry describing the 20.1.0 minimum/version-pin bump.
.serena/memories/dpp_test_container.md Updates dpp-test container baseline notes to qsv 20.1.0 and mentions the new test file.
.github/workflows/main.yml Updates workflow qsv version env var to 20.1.0.
.github/workflows/ci.yml Updates CI default qsv version to 20.1.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CHANGELOG.md Outdated
Replace ``Found <X.Y.Z>.`` placeholder with the concrete example
``Found 20.0.0.`` — matches the existing project convention in
``docs/qsv-20.0.0-upgrade-test-plan.md:32`` which uses ``Found
19.1.0.`` as a concrete example, and sidesteps the HTML-tag risk
Copilot flagged on PR #315 (angle brackets inside backticks render
fine on GitHub but could trip a non-CommonMark-compliant renderer).

``20.0.0`` is the most useful example because it's the exact value
operators currently on the prior pin would see when the new floor
rejects their qsv binary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad merged commit 659f0fa into main May 19, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants