Address documentation-audit loose ends: unit-test CI, FORMATS, README config#326
Merged
Conversation
The old test.yml ("Tests") was dead 2022-era cruft: a Python
2.7-3.9 matrix, an `actions/checkout@v2` / `setup-python@v2` /
`codecov-action@v1` stack, `pip install -r requirements-dev-py2.txt`
(that file no longer exists), and `pytest --cov=datapusher` against
the pre-extension package name. It cannot have passed since the
extension refactor.
It also masked a real gap surfaced by the CLAUDE.md audit: NO
workflow runs the DP+ unit suite. `ci.yml` runs only the qsv
contract regression test + the integration suite; `main.yml` is a
manual end-to-end run.
Replaced with a `unit-tests` job that runs the full unit suite
(`pytest tests/ --ignore=tests/integration`) on push/PR. It mirrors
the proven `dpp-test` container setup: the `ckan/ckan-dev:2.11`
image, geo system libs, `requirements.txt` + `requirements-dev.txt`
+ editable install, qsv 20.1.0, and the
`PYTEST_DISABLE_PLUGIN_AUTOLOAD` / `QSV_BIN` / `CKAN_INI` env the
suite needs.
Python 3.10 only: the unit suite imports `ckan.plugins.toolkit`, so
it needs a full CKAN install, and the CKAN 2.11 dev image ships
Python 3.10. A 3.11-3.13 matrix would require building CKAN from
source per version — left as a follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…casing Two config.py correctness fixes surfaced by the documentation audit: * **xlsm/xlsb were ungated.** ``FormatConverterStage`` advertises ``.xlsm`` and ``.xlsb`` support (both in ``SPREADSHEET_EXTENSIONS``), but the ``FORMATS`` default omitted them — so DP+ silently skipped those files before they ever reached the converter. Added both to the ``FORMATS`` default, grouped with the other Excel formats. New ``tests/test_formats_config.py`` (2 tests): xlsm/xlsb are in ``FORMATS``, and a consistency guard that every extension in ``FormatConverterStage.SPREADSHEET_EXTENSIONS`` is gated in by ``FORMATS`` — so a future converter-side addition can't silently leave a format ungated. * **spatial-tolerance config key had inconsistent casing.** ``config.py`` read the relative-tolerance setting from ``ckanext.datapusher_plus.SPATIAL_SIMPLIFICATION_RELATIVE_TOLERANCE`` — UPPERCASE, unlike every other ~50 DP+ config keys. Operators setting the lowercase form (the only form the README ever showed, albeit misspelled) silently had no effect. Lowercased the key string to ``...spatial_simplification_relative_tolerance``. The Python constant name is unchanged; only the ckan.ini key string moved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ults Both ckan.ini example blocks were reconciled to DP+'s actual current defaults (the documentation audit found the values had drifted; PR #324 added a stopgap "illustrative, not defaults" note). Each fix verified against config.py and config_declaration.yaml — where a key is declared, the declaration's default is what CKAN 2.10+ applies at runtime, so that is the authoritative value. Per block: * Removed ``use_proxy`` — there is no such config key. DP+ derives proxy use from whether ``download_proxy`` is set (``config.py``: ``USE_PROXY = "...download_proxy" in tk.config``). * ``ssl_verify``: false -> true (the actual default — verifying TLS). * ``preview_rows``: 100 -> 0 (load all rows; matches config_declaration.yaml after PR #324). * ``auto_index_threshold``: 3 -> 10 (issue #142 raised the default). * ``formats``: added ``xlsm xlsb`` (now gated in — see the FORMATS commit in this PR). * ``auto_spatial_simplication`` -> ``auto_spatial_simplification`` and ``spatial_simplication_relative_tolerance`` -> ``spatial_simplification_relative_tolerance`` — the README had always misspelled "simplification"; the corrected lowercase key now matches config.py. * ``longitude_fields``: ``longitude,long,lon`` -> ``longitude,lon`` (the actual default has no ``long``). * ``jinja2_bytecode_cache_dir``: fixed the ``butecode`` typo in the path default. ``chunk_size`` (16384), ``dedup`` (false), ``ignore_file_hash`` (true) and ``auto_alias_unique`` (false) were left as-is — those already match the config_declaration.yaml defaults that win at runtime. (config.py's *inline fallbacks* for those four still drift from the declaration — behavior-preserving dead code, same shape as the preview_rows fallback PR #324 fixed; flagged for a follow-up.) The note above each block was reworded: the block now genuinely lists current defaults, so it is a usable starting-point reference rather than an "illustrative, differs from defaults" disclaimer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
roborev reviewed the three PR #326 commits. Findings actioned: **#2306 (LOW) — test.yml comment.** The "Run unit test suite" step attributed the `--pdbcls` addopt to `setup.cfg`; pytest config moved to `pyproject.toml` (`[tool.pytest.ini_options]`). Comment corrected. **#2307 (LOW) — xlsxb typo in CI workflows.** `ci.yml` and `main.yml` set `formats` with `xlsxb` — a typo for `xlsb` — so an explicitly-configured `.xlsb` resource stayed ungated in CI even though the converter supports it. Corrected `xlsxb` → `xlsb` in all four spots: the `ckanext.datapusher_plus.formats` and legacy `ckan.datapusher.formats` lines in `ci.yml`, the `main.yml` formats line, and `ci.yml`'s test-file `case` branch (whose MIME type, `application/vnd.ms-excel.sheet.binary.macroEnabled.12`, confirms `.xlsb` was the intent). **#2307 (LOW) — missing spatial-key regression test + redundant importorskip.** Added `test_spatial_tolerance_config_key_is_lowercase` (AST drift guard against a revert to the uppercase key typo) and dropped the redundant `pytest.importorskip("ckan")` in `test_formats_covers_every_converter_spreadsheet_extension` (the `formats` fixture already does it). **#2308 (LOW x2) — README config-block notes.** The reworded notes named `config_declaration.yaml` as the sole authoritative source, but `ssl_verify` / `formats` / `unsafe_prefix` / `reserved_colnames` are declared only in `config.py`. Notes now cite both. Also restored the "set only the keys you want to change" guidance and explained why pasting the whole block verbatim is risky (pins every value as an explicit override, defeating future default updates). Not changed (already resolved earlier in the PR stack): * #2307 (MEDIUM) — README documented a misspelled spatial key. The README spelling fix landed in this PR's commit a9e64db; #2307 reviewed the earlier commit 1638bf7 in isolation. Full unit suite: 279 → 280, all passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Closes several documentation-audit follow-ups by adding a real unit-test CI workflow, aligning supported spreadsheet formats with the converter’s capabilities, fixing a mis-cased spatial tolerance config key, and reconciling README configuration examples to current defaults.
Changes:
- Replaces the legacy
test.ymlwith a CKAN-based unit-test workflow runningpytest tests/ --ignore=tests/integrationinckan/ckan-dev:2.11. - Adds
xlsm/xlsbtoconfig.py’sFORMATSdefault and adds regression/consistency tests. - Updates README config blocks (defaults + typo fixes) and corrects
xlsxb→xlsbin CI/E2E workflow config snippets.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_formats_config.py |
Adds tests guarding FORMATS coverage and the spatial tolerance config key casing. |
README.md |
Updates example ckan.ini blocks to reflect current defaults and corrects key typos. |
ckanext/datapusher_plus/config.py |
Expands FORMATS defaults and fixes spatial tolerance config key casing. |
.github/workflows/test.yml |
Introduces a container-based unit-test workflow for PRs/pushes. |
.github/workflows/main.yml |
Fixes xlsxb → xlsb in workflow config snippet. |
.github/workflows/ci.yml |
Fixes xlsxb → xlsb in workflow config snippet and test file matrix generation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Six Copilot findings, all applied: * **plugin.py had a second, drifting formats list.** ``_submit_to_datapusher`` fell back to its own hardcoded format list when neither ``ckan.datapusher.formats`` nor ``ckanext.datapusher_plus.formats`` was set — and that list lacked ``xlsm``/``xlsb``, so the FORMATS fix would not reach deployments that don't set ``formats`` explicitly. Centralized the fallback on ``conf.FORMATS`` so there is one canonical default that cannot drift. * **Spatial-tolerance config-key rename could break deployments.** The previous commit lowercased the key string; an operator who had set the legacy UPPERCASE key would silently lose the setting. Added a nested fallback to the legacy ``...SPATIAL_SIMPLIFICATION_RELATIVE_TOLERANCE`` key — the same backward-compat shape ``SSL_VERIFY`` already uses. * **tests/test_formats_config.py was runtime/CKAN-dependent.** Rewrote both FORMATS tests to AST-parse the ``config.py`` default list literal and ``format_converter.py``'s ``SPREADSHEET_EXTENSIONS`` — deterministic, CKAN-free config-drift guards in the same style as the other config tests in the repo. * **The spatial-key AST test could capture a nested fallback key.** With the legacy fallback now nested as a second ``tk.config.get`` arg, walking for "the first ``.get``" is fragile. The test now targets the assignment's ``.value`` directly — the outermost ``tk.config.get`` — so it pins the *primary* key regardless of nesting. * **Dead ``use_proxy`` config key in CI workflows.** Removed ``ckanext.datapusher_plus.use_proxy`` from the ci.yml and main.yml CKAN config blocks — there is no such key (DP+ derives proxy use from whether ``download_proxy`` is set). Full unit suite: 280 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three loose ends surfaced by this session's documentation audits, fixed in three focused commits.
1.
ci:rewritetest.ymlas a real unit-suite jobThe old
test.ymlwas dead 2022-era cruft (Python 2.7–3.9 matrix, missingrequirements-dev-py2.txt,pytest --cov=datapusheragainst the pre-extension package name,checkout@v2). It also masked a real gap: no workflow runs the DP+ unit suite (ci.ymlruns only the qsv regression test + integration suite;main.ymlis a manual end-to-end run).Replaced with a
unit-testsjob that runspytest tests/ --ignore=tests/integrationon push/PR, mirroring the provendpp-testcontainer setup (ckan/ckan-dev:2.11, geo libs, qsv 20.1.0, thePYTEST_DISABLE_PLUGIN_AUTOLOAD/QSV_BIN/CKAN_INIenv). Python 3.10 only — the unit suite needs a full CKAN install and the CKAN 2.11 dev image ships 3.10; a 3.11–3.13 matrix would mean building CKAN per-version (follow-up).2.
fix:add xlsm/xlsb to FORMATS + correct spatial-tolerance config-key casingFormatConverterStageadvertises.xlsm/.xlsbsupport (SPREADSHEET_EXTENSIONS), but theFORMATSgate omitted them — DP+ silently skipped those files. Added both. Newtests/test_formats_config.py(2 tests) including a consistency guard thatFORMATS⊇ the converter's spreadsheet extensions.config.pyread the relative-tolerance setting fromckanext.datapusher_plus.SPATIAL_SIMPLIFICATION_RELATIVE_TOLERANCE— UPPERCASE, unlike every other ~50 DP+ keys. Lowercased the key string so the documented lowercase form actually works.3.
docs:reconcile README ckan.ini example config blocks to current defaultsBoth example blocks reconciled to actual defaults (verified against
config.py+config_declaration.yaml):use_proxy(no such key — derived fromdownload_proxy).ssl_verifyfalse→true,preview_rows100→0,auto_index_threshold3→10,longitude_fieldsdrop spuriouslong.auto_spatial_simplication→...simplification,jinja2_butecode→jinja2_bytecode.formatsexample gainsxlsm xlsb.Out of scope (flagged for follow-up)
config.py's inline fallbacks forchunk_size,max_content_length,ignore_file_hash,dedup, andauto_alias_uniquedrift from theconfig_declaration.yamldeclared defaults. These are behavior-preserving dead code (the declaration wins at runtime under CKAN 2.10+) — same shape as thepreview_rowsfallback PR #324 fixed. A follow-up should align them.Test plan
test.ymlYAML validated; the newunit-testsjob is unverified until it runs on this PR.config.py/config_declaration.yaml.🤖 Generated with Claude Code