Skip to content

Address documentation-audit loose ends: unit-test CI, FORMATS, README config#326

Merged
jqnatividad merged 5 commits into
mainfrom
chore-session-loose-ends
May 20, 2026
Merged

Address documentation-audit loose ends: unit-test CI, FORMATS, README config#326
jqnatividad merged 5 commits into
mainfrom
chore-session-loose-ends

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

Summary

Three loose ends surfaced by this session's documentation audits, fixed in three focused commits.

1. ci: rewrite test.yml as a real unit-suite job

The old test.yml was dead 2022-era cruft (Python 2.7–3.9 matrix, missing requirements-dev-py2.txt, pytest --cov=datapusher against the pre-extension package name, checkout@v2). It also masked a real gap: no workflow runs the DP+ unit suite (ci.yml runs only the qsv regression test + integration suite; main.yml is a manual end-to-end run).

Replaced with a unit-tests job that runs pytest tests/ --ignore=tests/integration on push/PR, mirroring the proven dpp-test container setup (ckan/ckan-dev:2.11, geo libs, qsv 20.1.0, the PYTEST_DISABLE_PLUGIN_AUTOLOAD/QSV_BIN/CKAN_INI env). Python 3.10 only — the unit suite needs a full CKAN install and the CKAN 2.11 dev image ships 3.10; a 3.11–3.13 matrix would mean building CKAN per-version (follow-up).

2. fix: add xlsm/xlsb to FORMATS + correct spatial-tolerance config-key casing

  • FormatConverterStage advertises .xlsm/.xlsb support (SPREADSHEET_EXTENSIONS), but the FORMATS gate omitted them — DP+ silently skipped those files. Added both. New tests/test_formats_config.py (2 tests) including a consistency guard that FORMATS ⊇ the converter's spreadsheet extensions.
  • config.py read the relative-tolerance setting from ckanext.datapusher_plus.SPATIAL_SIMPLIFICATION_RELATIVE_TOLERANCE — UPPERCASE, unlike every other ~50 DP+ keys. Lowercased the key string so the documented lowercase form actually works.

3. docs: reconcile README ckan.ini example config blocks to current defaults

Both example blocks reconciled to actual defaults (verified against config.py + config_declaration.yaml):

  • Removed use_proxy (no such key — derived from download_proxy).
  • ssl_verify false→true, preview_rows 100→0, auto_index_threshold 3→10, longitude_fields drop spurious long.
  • Fixed two key typos: auto_spatial_simplication...simplification, jinja2_butecodejinja2_bytecode.
  • formats example gains xlsm xlsb.

Out of scope (flagged for follow-up)

config.py's inline fallbacks for chunk_size, max_content_length, ignore_file_hash, dedup, and auto_alias_unique drift from the config_declaration.yaml declared defaults. These are behavior-preserving dead code (the declaration wins at runtime under CKAN 2.10+) — same shape as the preview_rows fallback PR #324 fixed. A follow-up should align them.

Test plan

  • Full Python unit suite: 277 → 279 (+2 FORMATS tests), all passing.
  • test.yml YAML validated; the new unit-tests job is unverified until it runs on this PR.
  • Every README value cross-checked against config.py / config_declaration.yaml.

🤖 Generated with Claude Code

jqnatividad and others added 4 commits May 20, 2026 07:09
The old test.yml ("Tests") was dead 2022-era cruft: a Python
2.7-3.9 matrix, an `actions/checkout@v2` / `setup-python@v2` /
`codecov-action@v1` stack, `pip install -r requirements-dev-py2.txt`
(that file no longer exists), and `pytest --cov=datapusher` against
the pre-extension package name. It cannot have passed since the
extension refactor.

It also masked a real gap surfaced by the CLAUDE.md audit: NO
workflow runs the DP+ unit suite. `ci.yml` runs only the qsv
contract regression test + the integration suite; `main.yml` is a
manual end-to-end run.

Replaced with a `unit-tests` job that runs the full unit suite
(`pytest tests/ --ignore=tests/integration`) on push/PR. It mirrors
the proven `dpp-test` container setup: the `ckan/ckan-dev:2.11`
image, geo system libs, `requirements.txt` + `requirements-dev.txt`
+ editable install, qsv 20.1.0, and the
`PYTEST_DISABLE_PLUGIN_AUTOLOAD` / `QSV_BIN` / `CKAN_INI` env the
suite needs.

Python 3.10 only: the unit suite imports `ckan.plugins.toolkit`, so
it needs a full CKAN install, and the CKAN 2.11 dev image ships
Python 3.10. A 3.11-3.13 matrix would require building CKAN from
source per version — left as a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…casing

Two config.py correctness fixes surfaced by the documentation audit:

* **xlsm/xlsb were ungated.** ``FormatConverterStage`` advertises
  ``.xlsm`` and ``.xlsb`` support (both in
  ``SPREADSHEET_EXTENSIONS``), but the ``FORMATS`` default omitted
  them — so DP+ silently skipped those files before they ever
  reached the converter. Added both to the ``FORMATS`` default,
  grouped with the other Excel formats.

  New ``tests/test_formats_config.py`` (2 tests): xlsm/xlsb are in
  ``FORMATS``, and a consistency guard that every extension in
  ``FormatConverterStage.SPREADSHEET_EXTENSIONS`` is gated in by
  ``FORMATS`` — so a future converter-side addition can't silently
  leave a format ungated.

* **spatial-tolerance config key had inconsistent casing.**
  ``config.py`` read the relative-tolerance setting from
  ``ckanext.datapusher_plus.SPATIAL_SIMPLIFICATION_RELATIVE_TOLERANCE``
  — UPPERCASE, unlike every other ~50 DP+ config keys. Operators
  setting the lowercase form (the only form the README ever showed,
  albeit misspelled) silently had no effect. Lowercased the key
  string to ``...spatial_simplification_relative_tolerance``. The
  Python constant name is unchanged; only the ckan.ini key string
  moved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ults

Both ckan.ini example blocks were reconciled to DP+'s actual current
defaults (the documentation audit found the values had drifted; PR
#324 added a stopgap "illustrative, not defaults" note). Each fix
verified against config.py and config_declaration.yaml — where a key
is declared, the declaration's default is what CKAN 2.10+ applies at
runtime, so that is the authoritative value.

Per block:
* Removed ``use_proxy`` — there is no such config key. DP+ derives
  proxy use from whether ``download_proxy`` is set
  (``config.py``: ``USE_PROXY = "...download_proxy" in tk.config``).
* ``ssl_verify``: false -> true (the actual default — verifying TLS).
* ``preview_rows``: 100 -> 0 (load all rows; matches
  config_declaration.yaml after PR #324).
* ``auto_index_threshold``: 3 -> 10 (issue #142 raised the default).
* ``formats``: added ``xlsm xlsb`` (now gated in — see the FORMATS
  commit in this PR).
* ``auto_spatial_simplication`` -> ``auto_spatial_simplification``
  and ``spatial_simplication_relative_tolerance`` ->
  ``spatial_simplification_relative_tolerance`` — the README had
  always misspelled "simplification"; the corrected lowercase key
  now matches config.py.
* ``longitude_fields``: ``longitude,long,lon`` -> ``longitude,lon``
  (the actual default has no ``long``).
* ``jinja2_bytecode_cache_dir``: fixed the ``butecode`` typo in the
  path default.

``chunk_size`` (16384), ``dedup`` (false), ``ignore_file_hash``
(true) and ``auto_alias_unique`` (false) were left as-is — those
already match the config_declaration.yaml defaults that win at
runtime. (config.py's *inline fallbacks* for those four still drift
from the declaration — behavior-preserving dead code, same shape as
the preview_rows fallback PR #324 fixed; flagged for a follow-up.)

The note above each block was reworded: the block now genuinely
lists current defaults, so it is a usable starting-point reference
rather than an "illustrative, differs from defaults" disclaimer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
roborev reviewed the three PR #326 commits. Findings actioned:

**#2306 (LOW) — test.yml comment.** The "Run unit test suite" step
attributed the `--pdbcls` addopt to `setup.cfg`; pytest config moved
to `pyproject.toml` (`[tool.pytest.ini_options]`). Comment corrected.

**#2307 (LOW) — xlsxb typo in CI workflows.** `ci.yml` and `main.yml`
set `formats` with `xlsxb` — a typo for `xlsb` — so an
explicitly-configured `.xlsb` resource stayed ungated in CI even
though the converter supports it. Corrected `xlsxb` → `xlsb` in all
four spots: the `ckanext.datapusher_plus.formats` and legacy
`ckan.datapusher.formats` lines in `ci.yml`, the `main.yml` formats
line, and `ci.yml`'s test-file `case` branch (whose MIME type,
`application/vnd.ms-excel.sheet.binary.macroEnabled.12`, confirms
`.xlsb` was the intent).

**#2307 (LOW) — missing spatial-key regression test + redundant
importorskip.** Added `test_spatial_tolerance_config_key_is_lowercase`
(AST drift guard against a revert to the uppercase key typo) and
dropped the redundant `pytest.importorskip("ckan")` in
`test_formats_covers_every_converter_spreadsheet_extension` (the
`formats` fixture already does it).

**#2308 (LOW x2) — README config-block notes.** The reworded notes
named `config_declaration.yaml` as the sole authoritative source,
but `ssl_verify` / `formats` / `unsafe_prefix` / `reserved_colnames`
are declared only in `config.py`. Notes now cite both. Also restored
the "set only the keys you want to change" guidance and explained
why pasting the whole block verbatim is risky (pins every value as
an explicit override, defeating future default updates).

Not changed (already resolved earlier in the PR stack):
* #2307 (MEDIUM) — README documented a misspelled spatial key. The
  README spelling fix landed in this PR's commit a9e64db; #2307
  reviewed the earlier commit 1638bf7 in isolation.

Full unit suite: 279 → 280, all passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Closes several documentation-audit follow-ups by adding a real unit-test CI workflow, aligning supported spreadsheet formats with the converter’s capabilities, fixing a mis-cased spatial tolerance config key, and reconciling README configuration examples to current defaults.

Changes:

  • Replaces the legacy test.yml with a CKAN-based unit-test workflow running pytest tests/ --ignore=tests/integration in ckan/ckan-dev:2.11.
  • Adds xlsm/xlsb to config.py’s FORMATS default and adds regression/consistency tests.
  • Updates README config blocks (defaults + typo fixes) and corrects xlsxbxlsb in CI/E2E workflow config snippets.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/test_formats_config.py Adds tests guarding FORMATS coverage and the spatial tolerance config key casing.
README.md Updates example ckan.ini blocks to reflect current defaults and corrects key typos.
ckanext/datapusher_plus/config.py Expands FORMATS defaults and fixes spatial tolerance config key casing.
.github/workflows/test.yml Introduces a container-based unit-test workflow for PRs/pushes.
.github/workflows/main.yml Fixes xlsxbxlsb in workflow config snippet.
.github/workflows/ci.yml Fixes xlsxbxlsb in workflow config snippet and test file matrix generation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ckanext/datapusher_plus/config.py
Comment thread ckanext/datapusher_plus/config.py Outdated
Comment thread tests/test_formats_config.py Outdated
Comment thread tests/test_formats_config.py Outdated
Comment thread .github/workflows/ci.yml
Comment thread .github/workflows/main.yml
Six Copilot findings, all applied:

* **plugin.py had a second, drifting formats list.**
  ``_submit_to_datapusher`` fell back to its own hardcoded format
  list when neither ``ckan.datapusher.formats`` nor
  ``ckanext.datapusher_plus.formats`` was set — and that list
  lacked ``xlsm``/``xlsb``, so the FORMATS fix would not reach
  deployments that don't set ``formats`` explicitly. Centralized
  the fallback on ``conf.FORMATS`` so there is one canonical
  default that cannot drift.

* **Spatial-tolerance config-key rename could break deployments.**
  The previous commit lowercased the key string; an operator who
  had set the legacy UPPERCASE key would silently lose the setting.
  Added a nested fallback to the legacy
  ``...SPATIAL_SIMPLIFICATION_RELATIVE_TOLERANCE`` key — the same
  backward-compat shape ``SSL_VERIFY`` already uses.

* **tests/test_formats_config.py was runtime/CKAN-dependent.**
  Rewrote both FORMATS tests to AST-parse the ``config.py`` default
  list literal and ``format_converter.py``'s
  ``SPREADSHEET_EXTENSIONS`` — deterministic, CKAN-free config-drift
  guards in the same style as the other config tests in the repo.

* **The spatial-key AST test could capture a nested fallback key.**
  With the legacy fallback now nested as a second
  ``tk.config.get`` arg, walking for "the first ``.get``" is
  fragile. The test now targets the assignment's ``.value``
  directly — the outermost ``tk.config.get`` — so it pins the
  *primary* key regardless of nesting.

* **Dead ``use_proxy`` config key in CI workflows.** Removed
  ``ckanext.datapusher_plus.use_proxy`` from the ci.yml and
  main.yml CKAN config blocks — there is no such key (DP+ derives
  proxy use from whether ``download_proxy`` is set).

Full unit suite: 280 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad merged commit eca276d into main May 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants