Skip to content

fix: Load any stores.<name>.<attr> secret file (not just access_key/secret_key)#1452

Open
dimitri-yatsenko wants to merge 1 commit into
masterfrom
fix/store-secrets-arbitrary-attr
Open

fix: Load any stores.<name>.<attr> secret file (not just access_key/secret_key)#1452
dimitri-yatsenko wants to merge 1 commit into
masterfrom
fix/store-secrets-arbitrary-attr

Conversation

@dimitri-yatsenko
Copy link
Copy Markdown
Member

Summary

Config._load_secrets() was hardcoded to only auto-load stores.<name>.access_key
and stores.<name>.secret_key from the .secrets/ directory. That whitelist made
sense when only the built-in s3 / gcs / azure stores existed, but it's wrong
for plugin-registered adapters (the storage-adapter plugin system added in
#1432) that define their own secret fields.

Concrete motivation

A plugin that wraps Databricks Unity Catalog Volumes authenticates via a Bearer
token. Its store spec wants a token field — not access_key / secret_key.
Today .secrets/stores.uc.token is silently ignored, forcing users to either:

  • rename to a misleading access_key, or
  • set the value programmatically at notebook startup (defeating the auto-load
    story DataJoint advertises in the configure-storage docs)

This change drops the whitelist. Any stores.<name>.<attr> file under the secrets
directory loads into dj.config["stores"][<name>][<attr>]. The existing precedence
is preserved — config file and env vars still win.

What changed

src/datajoint/settings.py — primary fix

- if attr in ("access_key", "secret_key"):
-     value = secret_file.read_text().strip()
-     ...
+ value = secret_file.read_text().strip()
+ ...

Comment around the loop updated to reflect the broader applicability.

src/datajoint/storage_adapter.py — incidental cleanup

Removed a try/except TypeError around entry_points(group=...). The fallback
was for Python < 3.10, where entry_points() returned a SelectableGroups dict
instead of accepting the group= kwarg. DataJoint's minimum supported Python is
3.10 (installation docs and
pyproject.toml's requires-python), so the branch is dead code — and was
producing a stale mypy error on SelectableGroups.get signature.

Tests

New test TestStoreSecrets.test_load_store_arbitrary_attr exercises both token
and api_key fields loading correctly. Existing tests
(test_load_store_credentials_from_secrets, test_secrets_do_not_override_existing)
still pass — the precedence semantics are unchanged.

Test plan

  • Existing TestStoreSecrets tests still pass
  • New test_load_store_arbitrary_attr passes
  • All 23 tests in tests/unit/test_storage_adapter.py still pass
  • Pre-commit hooks (ruff, mypy, codespell) green

Two changes:

1. settings.py — Config._load_secrets()
   Was hardcoded to only auto-load stores.<name>.access_key and
   stores.<name>.secret_key from the secrets directory. That whitelist
   made sense when only the built-in s3 / gcs / azure stores existed,
   but it's wrong for plugin-registered adapters that define their own
   secret fields.

   Concrete example: a plugin that wraps Databricks Unity Catalog
   Volumes authenticates via a Bearer token. Its store spec wants a
   `token` field. Today the file .secrets/stores.uc.token is silently
   ignored, forcing users to either rename to a misleading
   "access_key" or to set the value programmatically at notebook
   startup (defeating the auto-load story).

   This change drops the whitelist. Any stores.<name>.<attr> file under
   the secrets directory is loaded into
   dj.config["stores"][<name>][<attr>]. The existing "don't override
   pre-configured values" precedence is preserved — config file and
   env vars still win.

   New test: TestStoreSecrets.test_load_store_arbitrary_attr confirms
   that `token` and `api_key` fields load via the same path as
   access_key/secret_key.

2. storage_adapter.py — _discover_adapters()
   Removed the try/except wrapping importlib.metadata.entry_points().
   The fallback path was for Python < 3.10 (where entry_points()
   returned a SelectableGroups dict instead of accepting the
   group= kwarg), but DataJoint's minimum supported Python is 3.10
   (per requires-python in pyproject.toml). The dead branch was also
   producing a mypy type error (SelectableGroups.get signature).
@dimitri-yatsenko dimitri-yatsenko requested review from kushalbakshi, mweitzel and ttngu207 and removed request for mweitzel May 19, 2026 22:04
dimitri-yatsenko added a commit that referenced this pull request May 19, 2026
Same fix #1383 applied to the Job table's antijoin in refresh(),
now applied to AutoPopulate._populate_direct's antijoin and the
progress() fallback path. The two-arg subtract `key_source - self`
triggers QueryExpression.__sub__ which calls .restrict(Not(...))
with semantic_check=True by default.

The semantic-check requirement is wrong here: this antijoin is a
plain set-difference, not a join — we ask "which key_source rows
aren't yet in self." Whether the same-named PK attribute carries
the same source-table lineage tag on both sides is irrelevant.

Where it bites: dj.Imported / dj.Computed tables whose primary key
is fully inherited from a single FK, with no own-table PK attributes.
On those, self.proj() returns the PK attribute with lineage=None
(or pointing to self rather than the FK parent), while key_source's
matching attribute carries the parent's lineage tag. The
semantic-check fails with:

    Cannot join on attribute 'X': different lineages
    (schema.parent.X vs None). Use .proj() to rename one of the
    attributes.

This pattern is legitimate ("one row downstream per parent row,
no intermediate ID") but rare in typical Elements / SciOps pipelines,
which extend the inherited PK with own-table attributes (trial_id,
experiment_id, etc.) that anchor proj()'s lineage. That's why the
existing #1405 test suite didn't surface it.

Changes:
- src/datajoint/autopopulate.py
  - Import Not from .condition at module top.
  - _populate_direct: replace `(LHS - self.proj())` with
    `LHS.restrict(Not(self.proj()), semantic_check=False)`.
  - progress(): same swap on the no-common-attrs fallback branch.
- tests/integration/test_autopopulate.py
  - New test_populate_antijoin_fk_inherited_pk regression test:
    Spec(Manual) -> Item(Imported with only -> Spec) — the minimal
    shape that triggers the bug. Without the fix Item.populate()
    raises DataJointError; with the fix it populates correctly,
    progress() reports correct counts, and partial-then-full
    populate works.

Stacked on top of #1452 (the secrets-loading + dead-code fix); rebase
to master after that lands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant