feat(qsv)!: bump qsv pin + MINIMUM 20.0.0 → 20.1.0 + #173 regression coverage#315
Merged
Conversation
… CSV The reporter's CSV (date_test.csv on v1.0.3) is malformed — the unquoted comma inside the RFC 2822 column makes data rows have 11 fields against a 10-field header. v1.0.3 silently shifted columns; v3.0 catches this cleanly via ValidationStage's quarantine pass (see tests/test_validation_quarantine.py), and PR #314 (issue #179) already fixed the related qsv Date → Postgres date mapping. This commit pins both halves of that story: - tests/static/issue_173_date_test_malformed.csv: reporter's original CSV, preserved as a regression fixture (10-field header, 11-field data rows). - tests/static/issue_173_date_test_quoted.csv: properly-quoted variant used to lock in the qsv inference matrix. - tests/test_issue_173_date_format_inference.py: 4 tests covering RFC 4180 rejection of the malformed variant, the field-count invariant of the fixture, the qsv inference baseline (qsv 20.0.0, pinned in Dockerfile.worker + CI), and the --prefer-dmy effect on DD/MM/YYYY columns. Known qsv 20.0.0 inference gaps documented here (not DP+ bugs): ISO 8601 with T-separator under parens-bearing column names, dash- separated DD-MM-YYYY, and bare Unix epoch integers. The test will fail if a future qsv upgrade narrows these — at which point the test baseline and the comment on #173 should be updated together. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Bump test count 218 → 222 (4 new tests in tests/test_issue_173_date_format_inference.py). - Correct stale reference to tests/test_date_type_mapping.py — the PR #314 regression file is actually tests/test_date_without_timestamp.py. - New section "qsv date-format inference gaps (issue #173 baseline)" documenting the three qsv 20.0.0 gaps surfaced by #173 work: ISO 8601 with T-separator, DD-MM-YYYY dash-separated, and bare Unix epoch integers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four LOW findings from the roborev pass on the regression-test commit; all addressed in this commit. - LOW @ test_malformed_csv_fails_rfc4180_validation_with_field_count_error: added ``assert result.returncode != 0`` before the substring check so a future qsv that prints "4 records validated" on success can't false-pass the diagnostic-shape guard. - LOW @ fixture whitespace divergence: added a NOTE comment block documenting the deliberate two-dimensional difference between the malformed and quoted fixtures (verbatim-from-reporter leading spaces in the malformed header vs. normalized field names in the quoted variant). Aligning the whitespace would force EXPECTED_INFERENCE keys to carry leading spaces too, which made the matrix harder to read — the comment captures the intent more cleanly. - LOW @ requires_qsv_20 vs. matrix-pinned-to-20.0.x: added an explicit ``pytest.skip`` inside ``test_quoted_csv_inference_matrix`` for qsv ≥ 20.1.0, so a developer on a newer qsv host gets a clean skip with an actionable message instead of a false-fail. Verified on host qsv 20.1.0: 3 passed, 1 skipped. dpp-test (qsv 20.0.0): all 4 pass. - LOW @ unused REPO_ROOT: dropped the binding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three findings (1 MEDIUM, 2 LOW) on the memory-update commit; all
addressed.
- MEDIUM @ stale `tests/test_date_type_mapping.py` reference: this
commit was *specifically* supposed to fix that reference but missed
one instance (line 316). Fixed — and the same class of bug existed
for three other filenames the reviewer didn't flag explicitly but
which had the same root cause (me inventing filenames without
verifying against `git log --diff-filter=A`). Corrected:
test_file_hash_algorithms.py -> test_file_hash_algorithm.py
test_metadata_stage_file_hash.py -> test_metadata_hash_persistence.py
test_download_result_rehydrate.py -> test_rehydrate_resource_identity.py
test_date_type_mapping.py -> test_date_without_timestamp.py
Verified all four replacements resolve as real files in the tree.
- LOW @ inconsistent test-count math: the prior "+26 tests" claim
enumerated files summing to 51, not 26. Rewrote the "Known result"
section to spell out per-file gross counts (collected via
`pytest --collect-only -q`: 21/4/10/3/4/5/4 = 51), then state the
honest net delta (+26) and acknowledge the 25-test gap is from
pre-existing tests removed/consolidated during the #307–#314
refactors. The math is now self-auditable and the framing no longer
implies the listed adds sum to the delta.
- LOW @ forward-looking "Recognized in qsv ≥ 20.1" claim: tightened
to "Empirically verified to be recognized as DateTime on qsv 20.1.0
(macOS host check during #173 investigation)" and cross-referenced
the test's skip guard for qsv ≥ 20.1, so the claim is grounded in
a real observation and the next maintainer has a clean handoff
path when the production qsv pin moves.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
qsv 20.1.0 (released 2026-05-18) has no breaking changes — per the release notes, "pipelines built on 20.0.0 will upgrade in place." Bumped in: - Dockerfile.worker (ARG QSV_VERSION) - .github/workflows/ci.yml (qsv_version input default + QSV_VER env) - .github/workflows/main.yml (QSV_VER env) - README.md install snippet MINIMUM_QSV_VERSION stays at 20.0.0 (operators are not forced to upgrade their own qsv_bin since 20.1.0 is fully backward-compatible). User-visible improvement on the DP+ side: qsv-dateparser 0.14 → 0.15 in qsv 20.1.0 adds recognition for ISO 8601 ``T``-separated datetimes without a timezone suffix (e.g. ``2024-10-11T14:30:00``) — qsv 20.0.0 misclassified these as ``String`` during ``qsv stats --infer-dates``, which in DP+ surfaced as a date-typed column being demoted to text on certain CSV shapes (one of the gaps documented in the issue #173 regression test). CHANGELOG[Unreleased] updated with the bump + the user-visible date inference improvement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to the qsv 20.0.0 → 20.1.0 production bump. The ``EXPECTED_INFERENCE`` matrix in ``tests/test_issue_173_date_format_inference.py`` was pinned to qsv 20.0.0 behavior and needed updating now that the production qsv is 20.1.0: - ``ISO 8601 (YYYY-MM-DDTHH:MM:SS)``: ``String`` → ``DateTime``. qsv-dateparser 0.14 → 0.15 in qsv 20.1.0 added recognition for the T-separated-no-tz form (e.g. ``2024-10-11T14:30:00``). One fewer documented qsv gap. - ``test_quoted_csv_inference_matrix`` skip guard: ``>= (20, 1, 0)`` → ``>= (20, 2, 0)``. Matches the new pinned-version baseline; the test still skips (rather than false-fails) on hosts with a qsv newer than the production pin, with an actionable message pointing at Dockerfile.worker. Memory file (`dpp_test_container.md`): - Reflects qsv 20.1.0 at /usr/local/bin/qsvdp. - Inference-gap section now lists DD-MM-YYYY (dash-separated) and bare Unix epoch integers as the remaining gaps, with ISO 8601 moved to a "Closed in qsv 20.1.0 vs. 20.0.0" sub-section citing the qsv-dateparser bump. - Recreate-recipe step bumped to download qsv 20.1.0. Verified: 222/222 unit tests still pass on dpp-test with qsv 20.1.0 in place. The integration worker (`datapusher-plus-prefect-worker-1`) also had its qsvdp swapped in place for the host-side smoke checks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Operators must now have qsv 20.1.0 or later at the path configured by ``ckanext.datapusher_plus.qsv_bin``; DP+ will refuse to start otherwise with ``JobError: At least qsv version 20.1.0 required. Found <X.Y.Z>.``. qsv 20.1.0 itself introduces no breaking changes against 20.0.0 (per the 20.1.0 release notes: "pipelines built on 20.0.0 will upgrade in place"), so the upgrade for operators is binary-swap-only — no data re-ingestion needed. The reason this is flagged with ``!`` is the minimum-version gate, not the qsv behavior. Justification for forcing the bump (vs. leaving the floor at 20.0.0): qsv-dateparser 0.14 → 0.15 in qsv 20.1.0 closes the ISO 8601 T-separated-no-tz inference gap (one of the gaps explicitly documented by the issue #173 regression test), so forcing the floor means new DP+ deployments get the bugfix automatically rather than depending on operators noticing the release notes. Changes: - ckanext/datapusher_plus/config.py: MINIMUM_QSV_VERSION = "20.1.0" - tests/test_qsv_v20_regression.py: test renamed from test_minimum_qsv_version_constant_is_20 to test_minimum_qsv_version_constant_is_pinned (since "is_20" stopped being descriptive — 20.0.0 vs. 20.1.0 are both "20"), and expected value bumped to "20.1.0". Top-of-file docstring also updated to mention the new floor. - CHANGELOG.md [Unreleased]: reframes the earlier "MINIMUM stays at 20.0.0" claim — it now reads as a BREAKING bump of the floor in the same release as the qsv-version-pin bump, with the binary-swap-only operator guidance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds regression coverage for issue #173’s date-format inference / malformed CSV behavior and bumps the qsv runtime pin + minimum supported qsv version from 20.0.0 to 20.1.0 (operator-facing breaking change due to the version gate).
Changes:
- Add new qsv-driven regression tests and CSV fixtures to lock in behavior for malformed-field-count rejection and qsv date-type inference (including
--prefer-dmybehavior). - Bump
MINIMUM_QSV_VERSIONto20.1.0and update worker/CI/docs pins to match. - Update changelog and internal “dpp-test” container notes to reflect the new baseline.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
tests/test_qsv_v20_regression.py |
Updates the minimum-version pinning assertion and documentation to reflect the new 20.1.0 floor. |
tests/test_issue_173_date_format_inference.py |
New regression suite covering malformed CSV validation + qsv inference matrix and --prefer-dmy. |
tests/static/issue_173_date_test_quoted.csv |
Well-formed variant fixture used to pin qsv inference output. |
tests/static/issue_173_date_test_malformed.csv |
Malformed reporter fixture used to assert RFC4180 validation failure and field-count drift. |
README.md |
Updates install snippet to reference qsv 20.1.0. |
Dockerfile.worker |
Bumps baked-in qsv version to 20.1.0. |
ckanext/datapusher_plus/config.py |
Raises MINIMUM_QSV_VERSION to 20.1.0. |
CHANGELOG.md |
Adds breaking-change entry describing the 20.1.0 minimum/version-pin bump. |
.serena/memories/dpp_test_container.md |
Updates dpp-test container baseline notes to qsv 20.1.0 and mentions the new test file. |
.github/workflows/main.yml |
Updates workflow qsv version env var to 20.1.0. |
.github/workflows/ci.yml |
Updates CI default qsv version to 20.1.0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Replace ``Found <X.Y.Z>.`` placeholder with the concrete example ``Found 20.0.0.`` — matches the existing project convention in ``docs/qsv-20.0.0-upgrade-test-plan.md:32`` which uses ``Found 19.1.0.`` as a concrete example, and sidesteps the HTML-tag risk Copilot flagged on PR #315 (angle brackets inside backticks render fine on GitHub but could trip a non-CommonMark-compliant renderer). ``20.0.0`` is the most useful example because it's the exact value operators currently on the prior pin would see when the new floor rejects their qsv binary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Originally scoped to #173 regression coverage; expanded mid-flight to also bump the qsv version
20.0.0→20.1.0. The qsv bump turned into two coordinated changes — the pinned install version AND theMINIMUM_QSV_VERSIONfloor — so this PR is now a small BREAKING change for operators.Part 1 — #173 regression coverage
The reporter's CSV (v1.0.3 era) is actually malformed — unquoted commas inside the RFC 2822 column give a 10-field header against 11-field data rows. v1.0.3 silently shifted columns; v3.0 catches this cleanly via
ValidationStage's quarantine pass, and PR #314 (issue #179) already fixed the related qsvDate→ Postgresdatemapping.This PR pins both halves of that story:
tests/static/issue_173_date_test_malformed.csv— reporter's original CSV, preserved as a fixture (10-field header, 11-field data rows).tests/static/issue_173_date_test_quoted.csv— properly-quoted variant used to lock in the qsv inference matrix.tests/test_issue_173_date_format_inference.py— 4 tests covering RFC 4180 rejection of the malformed variant, the field-count invariant, the qsv inference baseline, and the--prefer-dmyknob's effect.Part 2 — qsv 20.0.0 → 20.1.0 bump (BREAKING for operators)
qsv 20.1.0 itself (released 2026-05-18) introduces no breaking changes — per the release notes, "pipelines built on 20.0.0 will upgrade in place." The BREAKING marker on this PR is the minimum-version-gate bump, not the qsv behavior.
Bumped in:
Dockerfile.worker(ARG QSV_VERSION).github/workflows/ci.yml(default + env).github/workflows/main.yml(env)README.mdinstall snippetckanext/datapusher_plus/config.py:MINIMUM_QSV_VERSIONOperators must upgrade their
qsvbinary at the path configured byckanext.datapusher_plus.qsv_binbefore deploying this version; DP+ refuses to start otherwise withJobError: At least qsv version 20.1.0 required. Found <X.Y.Z>.. Since qsv 20.1.0 is binary-compatible with 20.0.0, no data re-ingestion is required — operators just swap the binary.Why bump the floor (vs. leaving it at 20.0.0): qsv-dateparser
0.14→0.15in qsv 20.1.0 closes the ISO 8601 T-separated-no-tz inference gap (e.g.2024-10-11T14:30:00) — one of the gaps explicitly documented in the #173 test. Forcing the floor means new DP+ deployments get the fix automatically.The matrix in
tests/test_issue_173_date_format_inference.pyand.serena/memories/dpp_test_container.md's "qsv date-format inference gaps" section were rebased to the 20.1.0 baseline; the matrix-test skip guard advanced from>= 20.1.0to>= 20.2.0.tests/test_qsv_v20_regression.py's minimum-version pinning test was updated to assert"20.1.0"(and renamed fromtest_minimum_qsv_version_constant_is_20to_is_pinnedsince "is_20" stopped being descriptive).Remaining qsv 20.1.0 inference gaps (not DP+ bugs)
DD-MM-YYYY(dash-separated)--prefer-dmyUnix Timestamp(Was 3 gaps under qsv 20.0.0; ISO 8601 closed by the dateparser bump.)
Test plan
tests/test_issue_173_date_format_inference.pyindpp-test— 4 passed under qsv 20.0.0, and again after the in-place bump to qsv 20.1.0MINIMUM_QSV_VERSION = "20.1.0"dpp-test/usr/local/bin/qsvdpto 20.1.0scripts/integration-up --rebuild(so the baked-inDockerfile.workerqsv pin is now 20.1.0, not just the in-place swap)qsvdp --version → 20.1.0✓,MINIMUM_QSV_VERSION = "20.1.0"✓QSV_VER=20.1.0to confirm the new pin works end-to-end🤖 Generated with Claude Code