fix(#179): map qsv Date to Postgres date (not timestamp)#314
Merged
Conversation
Closes #179. Reporter: CSV with a date-only column ("2023-02-02") showed up in the CKAN preview as "2023-02-02T00:00:00" — DP+ was storing the column as Postgres ``timestamp`` and running RFC-3339 normalization, which adds the ``T00:00:00`` time component for date-only inputs. Root cause: a discrepancy between two sources of the same default. * ``ckanext/datapusher_plus/config.py:144`` inline fallback for ``type_mapping`` had ``"Date": "date"`` (correct). * ``ckanext/datapusher_plus/config_declaration.yaml`` had ``"Date": "timestamp"`` (wrong). The CKAN declaration default wins over an inline ``tk.config.get(key, fallback)`` fallback — the declaration populates ``tk.config`` BEFORE the fallback is consulted. So real deployments saw the buggy declaration value, never the correct inline one. Fix: 1. ``config_declaration.yaml``: change ``"Date": "timestamp"`` to ``"Date": "date"`` to match the inline default. Now a column qsv infers as ``Date`` (date-only) maps to Postgres ``date``, skips ``qsv datefmt`` normalization (only ``timestamp`` columns are added to ``datetimecols_list``), and lands in CKAN as ``2023-02-02`` instead of ``2023-02-02T00:00:00``. 2. Secondary bug in ``analysis._parse_stats``: the reverse-mapping from Postgres ``type_override`` back to qsv type names was missing the ``"date"`` entry. An operator setting ``type_override = "date"`` in the Data Dictionary on a column qsv inferred as DateTime would silently fall through to qsv's ``DateTime`` inference and be mapped right back to Postgres ``timestamp`` — losing the operator's intent. Added ``"date": "Date"`` to the map. Tests in new ``tests/test_date_without_timestamp.py``: * ``test_date_column_becomes_postgres_date_not_timestamp`` — the user-visible #179 fix. * ``test_date_column_not_added_to_datetimecols_for_normalization`` — pins the qsv-datefmt-skip side of the fix (which is what prevents the T00:00:00 from being added). * ``test_type_override_date_wins_over_qsv_datetime_inference`` — the secondary reverse-map fix. * ``test_config_declaration_default_maps_date_to_date_not_timestamp`` — pins both the declaration value AND that the inline fallback in config.py agrees, so a future "tidy up" PR can't silently re-introduce the bug. Test isolation note: needed an autouse fixture to point ``ckanext.datapusher_plus.dictionary_stash_dir`` at a per-test tmp_path. Without it, the third test's mocked-existing-resource path writes a stash file to the global ``/tmp/dpp_dict_stash/`` keyed by ``resource_id``, and that stale stash leaks into the FIRST test's ``_parse_stats`` call via the retry-restore branch in #265. The fixture isolates the stash dir per test. End-to-end verification on the integration stack: uploaded a CSV with a date-only column ``event_date`` containing values like ``2023-02-02``. Result on the persisted CKAN resource: datastore field types: event_date → date (was: timestamp pre-fix) row values: event_date → '2023-02-02' (was: '2023-02-02T00:00:00' pre-fix) Tests: 217/217 unit tests pass on dpp-test (was 213; +4 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three findings: **MEDIUM** — Subtle regression for ``AUTO_INDEX_DATES``. Pre-#179, qsv-inferred Date columns mapped to Postgres ``timestamp``, so ``IndexingStage._get_datetime_columns``' ``header["type"] == "timestamp"`` check correctly picked them up for auto-indexing. Post-#179, date-only columns are Postgres ``date`` — and the narrow equality silently excluded them. Operators relying on AUTO_INDEX_DATES would see their date-only indexes disappear after upgrade. Fixed: changed the check to ``header["type"] in ("timestamp", "date")``. Note: the ``datetimecols_list`` populated in ``analysis._build_headers_dicts`` drives ``qsv datefmt`` normalization (which we want to SKIP for date- only columns — that's the whole #179 fix), so I deliberately did NOT include ``"date"`` there. The two lists now legitimately diverge: normalization-list ⊂ indexing-list. **LOW** — Brittle regex on ``config.py`` source. Replaced with ``ast`` walk that finds ``tk.config.get("ckanext.datapusher_plus.type_mapping", "<json>")`` by structural pattern. Now reformat-immune (double-vs- single quotes, line breaks, nearby comments with apostrophes). **LOW** — Stats CSV fixture omitted the many extra columns qsv's real ``stats.csv`` emits (``sum``, ``mean``, ``stddev``, ``nullcount``, etc.). ``_parse_stats`` doesn't read them today; if a future change starts to, the test would fail with a confusing KeyError. Added a sentinel comment listing which columns ARE required and noting the maintenance cost. Added regression test ``test_auto_index_dates_still_indexes_date_only_columns`` that builds a minimal IndexingStage context with mixed ``date`` / ``timestamp`` / non-date columns and asserts both ``date`` and ``timestamp`` columns land in the AUTO_INDEX_DATES candidate list, while text / numeric / bigint do not. Tests: 218/218 unit tests pass on dpp-test (was 217; +1 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes the ingestion/type-mapping pipeline so qsv-inferred Date columns map to Postgres date (not timestamp), preventing date-only values like 2023-02-02 from being normalized/rendered as 2023-02-02T00:00:00 in CKAN previews (issue #179). Also addresses a related Data Dictionary type_override="date" reverse-mapping gap and keeps auto-indexing behavior consistent for date-like columns.
Changes:
- Update declarative
type_mappingdefault soDate→ Postgresdate(instead oftimestamp). - Add
dateto the Postgres→qsv reverse map used when applying Data Dictionary overrides in the analysis stage. - Expand auto-index “date-like” detection to include both
timestampanddate, and add regression tests covering the end-to-end behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
ckanext/datapusher_plus/config_declaration.yaml |
Corrects the declarative default mapping so qsv Date becomes Postgres date. |
ckanext/datapusher_plus/jobs/stages/analysis.py |
Ensures Data Dictionary type_override="date" is correctly reverse-mapped back to qsv Date. |
ckanext/datapusher_plus/jobs/stages/indexing.py |
Treats Postgres date as date-like for AUTO_INDEX_DATES (alongside timestamp). |
tests/test_date_without_timestamp.py |
Adds regression coverage for #179 plus related override + auto-index behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Three findings, all applied: * **Unused ``import re``** in ``tests/test_date_without_timestamp.py`` (leftover from before the AST-walk refactor in the roborev #2257 follow-up). Removed. * **Unused ``tmp_path`` fixture parameter** on ``analysis_stage_context``. The fixture used to use ``tmp_path`` to point the dictionary-stash dir, but that responsibility moved to the new autouse ``_isolate_dictionary_stash`` fixture. Removed. * **Misleading method/variable names** in ``indexing.py``: ``_get_datetime_columns`` / ``datetimecols_list`` now also pick up Postgres ``date`` columns (the AUTO_INDEX_DATES regression fix), so the "datetime"-only naming is no longer accurate. Renamed: - ``_get_datetime_columns`` -> ``_get_date_like_columns`` - ``datetimecols_list`` -> ``date_like_cols_list`` Updated surrounding docstrings + comments to say "date-like (Postgres ``timestamp`` or ``date``)". Deliberately did NOT rename the same-named variable in ``analysis.py`` — there the list is genuinely datetime-only (it drives ``qsv datefmt`` normalization which we skip for date-only columns; that's the #179 fix). The two files now legitimately diverge on the name. Updated the regression test to call the new method name. Tests: 218/218 unit tests still pass on dpp-test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 18, 2026
jqnatividad
added a commit
that referenced
this pull request
May 18, 2026
Three findings (1 MEDIUM, 2 LOW) on the memory-update commit; all
addressed.
- MEDIUM @ stale `tests/test_date_type_mapping.py` reference: this
commit was *specifically* supposed to fix that reference but missed
one instance (line 316). Fixed — and the same class of bug existed
for three other filenames the reviewer didn't flag explicitly but
which had the same root cause (me inventing filenames without
verifying against `git log --diff-filter=A`). Corrected:
test_file_hash_algorithms.py -> test_file_hash_algorithm.py
test_metadata_stage_file_hash.py -> test_metadata_hash_persistence.py
test_download_result_rehydrate.py -> test_rehydrate_resource_identity.py
test_date_type_mapping.py -> test_date_without_timestamp.py
Verified all four replacements resolve as real files in the tree.
- LOW @ inconsistent test-count math: the prior "+26 tests" claim
enumerated files summing to 51, not 26. Rewrote the "Known result"
section to spell out per-file gross counts (collected via
`pytest --collect-only -q`: 21/4/10/3/4/5/4 = 51), then state the
honest net delta (+26) and acknowledge the 25-test gap is from
pre-existing tests removed/consolidated during the #307–#314
refactors. The math is now self-auditable and the framing no longer
implies the listed adds sum to the delta.
- LOW @ forward-looking "Recognized in qsv ≥ 20.1" claim: tightened
to "Empirically verified to be recognized as DateTime on qsv 20.1.0
(macOS host check during #173 investigation)" and cross-referenced
the test's skip guard for qsv ≥ 20.1, so the claim is grounded in
a real observation and the next maintainer has a clean handoff
path when the production qsv pin moves.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #179.
Reporter: CSV with a date-only column (
"2023-02-02") showed up in the CKAN preview as"2023-02-02T00:00:00"— DP+ was storing the column as Postgrestimestampand running RFC-3339 normalization, which adds theT00:00:00time component for date-only inputs.Root cause
A discrepancy between two sources of the same default:
config.py:144inline fallback fortype_mappinghad"Date": "date"(correct).config_declaration.yamlhad"Date": "timestamp"(wrong).The CKAN declaration default wins over an inline
tk.config.get(key, fallback)fallback — the declaration populatestk.configBEFORE the fallback is consulted. So real deployments saw the buggy declaration value, never the correct inline one.Fix
config_declaration.yaml— change"Date": "timestamp"to"Date": "date". A column qsv infers asDate(date-only) now maps to Postgresdate, skipsqsv datefmtnormalization (onlytimestampcolumns are added todatetimecols_list), and lands in CKAN as2023-02-02instead of2023-02-02T00:00:00.Secondary bug in
analysis._parse_stats: the reverse-mapping from Postgrestype_overrideback to qsv type names was missing the"date"entry. An operator settingtype_override = "date"in the Data Dictionary on a column qsv inferred as DateTime would silently fall through to qsv'sDateTimeinference and be mapped right back to Postgrestimestamp. Added"date": "Date"to the reverse map.Test plan
dpp-test(was 213; +4 new).tests/test_date_without_timestamp.py:date(the user-visible bug).datetimecols_list(skips RFC-3339 normalization).type_override = "date"survives the reverse-map (secondary bug).config.pyagree (catches re-regression).event_datecolumn containing values like2023-02-02. Result on the persisted CKAN resource:event_datefield type:date(wastimestamppre-fix)event_daterow value:2023-02-02(was2023-02-02T00:00:00pre-fix)Test isolation note
Needed an autouse fixture to point
ckanext.datapusher_plus.dictionary_stash_dirat a per-testtmp_path. Without it, the third test's mocked-existing-resource path writes a stash file to the global/tmp/dpp_dict_stash/keyed byresource_id, and that stale stash leaks into the FIRST test's_parse_statscall via the retry-restore branch from #265. The fixture isolates the stash dir per test.🤖 Generated with Claude Code