Skip to content

feat(directory): add Lakebase + File providers#416

Open
larsgeorge-db wants to merge 1 commit into
feat/directory-phase3from
feat/directory-providers
Open

feat(directory): add Lakebase + File providers#416
larsgeorge-db wants to merge 1 commit into
feat/directory-phase3from
feat/directory-providers

Conversation

@larsgeorge-db
Copy link
Copy Markdown
Collaborator

Two additional concrete providers + provider-abstraction cleanup. Stacked on #413 — base is `feat/directory-phase3` so the diff here is providers-only. Will rebase onto `main` once the rest of the stack lands.

Plan: #375 · Backend (Phase 1): #406 · Phase 1 frontend: #407 · Phase 2: #412 · Phase 3: #413

Summary

  • `LakebaseProvider` — reads principals from a Postgres / Lakebase table via the app's existing SQLAlchemy engine. Identifier validation rejects SQL-injection attempts at the table-name level; queries use parameterised `LIKE`/`ESCAPE` for portable case-insensitive prefix search; user wildcards (`%`/`_`) are escaped so a raw `%` can't dump the directory.
  • `FileProvider` — CSV-backed, primarily for tests and offline dev. Required columns: `type`, `id`, `display_name` (`sub_label` optional). Re-reads on `mtime` change via a class-level cache, no restart needed.
  • Provider abstraction generalisation: factories now take `(DirectoryProviderContext, DirectoryProviderConfig)` instead of `(ws_client, connection_name)`. The context carries transport handles (ws_client, db_engine); the config is one bag containing every directory setting and each provider reads only the fields relevant to its type. Adding a future provider is one entry in `_PROVIDER_REGISTRY` + an optional `_REQUIRED_KEYS` row.
  • Settings keys: `DIRECTORY_LAKEBASE_TABLE`, `DIRECTORY_FILE_PATH` added alongside the existing `DIRECTORY_UC_HTTP_CONNECTION_NAME`. All three live in `app_settings` — still no Alembic migration.
  • `DirectoryStatus` carries all three config fields so the Settings tab hydrates inputs in one round trip.
  • Each provider's required-key set is declared centrally; `configured` is False whenever any required key for the active provider is missing.

Settings tab UI

Provider Select enables all three options; the panel below switches between `EntraPanel` / `LakebasePanel` / `FilePanel` on selection. Each renders:

Form state is preserved per-field when toggling providers so users don't lose typed values.

Test plan

  • Backend directory tests: 67 passed (16 Entra + 17 manager + 18 Lakebase + 16 File).
  • Targeted directory + workflow regression: 158 passed.
  • Frontend type-check clean (`yarn type-check`).
  • Frontend lint clean on touched files.
  • Frontend suite unaffected: 705 passed, 6 skipped.
  • New backend tests:
    • `test_lakebase_provider.py` (18): FQN validation rejects illegal identifiers / SQL-injection probes; prefix search hits `display_name` and `id`; result shape; type filter; `top` cap; `%`/`_` wildcard escape; `get_user` happy/missing/empty paths; `test()` probe succeeds and fails on missing table; ctx/config validation.
    • `test_file_provider.py` (15): CSV parsing happy path, blank rows skipped, blank id rejected, unknown type rejected, missing required columns rejected; case-insensitive prefix; id-column match; `top` cap; mtime-based cache re-read; `test()` probe.
    • `test_directory_manager.py` (+3 new): `DirectoryStatus` exposes all per-provider fields; `configured=False` when required setting absent; `manager.test` raises a clear "missing required" message when a recognised provider lacks its required setting.
    • `test_entra_id_provider.py` (16): unchanged behaviour, migrated to the new `(ctx, config)` factory signature.
  • Manual smoke test of all three providers in a workspace (left for reviewer / follow-up).

Notes for reviewers

  • The change to the provider factory signature is intentionally invasive — the old `(ws_client, connection_name)` couldn't cleanly carry per-provider config. The new shape is the load-bearing piece of "the abstraction was correct".
  • `LakebaseProvider` uses the app's own SQLAlchemy engine via `db.get_bind()`. The principal table is expected to live on the same Postgres / Lakebase database the app already talks to. If a customer wants the principal directory on a separate database, that's a future extension (new ctx field).
  • `FileProvider` is intentionally tiny: no remote fetch, no schema beyond four columns. The intended use is tests, demos, and air-gapped dev environments.

Stacks on Phase 3 (#413). Ships two additional concrete providers and
generalises the provider abstraction so adding a new transport no
longer requires reshaping the manager's factory signature.

Providers added:
- LakebaseProvider — reads principals from a Postgres / Lakebase
  table at a caller-supplied FQN (catalog.schema.table or
  schema.table). Strict identifier validation at construction time;
  parameterised queries with LIKE/ESCAPE for portable case-insensitive
  prefix search; user wildcards (%/_) are escaped so a raw % can't
  dump the directory. Schema is documented in the help block under
  the Settings tab.
- FileProvider — CSV-backed, primarily for tests and demos. Required
  columns: type, id, display_name (+ optional sub_label). Re-reads
  on mtime change via a class-level cache so changes propagate
  without restart.

Architecture:
- DirectoryProvider factories now take (DirectoryProviderContext,
  DirectoryProviderConfig). The context carries transport handles
  (ws_client, db_engine); the config carries every directory setting
  in one bag and each provider reads only the fields relevant to
  its type. Adding a future provider is one entry in _PROVIDER_REGISTRY
  plus an optional _REQUIRED_KEYS row -- no other code changes.
- DirectoryStatus exposes lakebase_table + file_path alongside the
  existing connection_name so the Settings tab can hydrate the right
  inputs in one round trip.
- DirectoryManager caches search results keyed on the config
  signature, so switching from Lakebase table A to table B
  invalidates correctly.
- Cache invalidates per-provider required-keys are declared
  centrally; ``configured`` is False whenever any required key for
  the active provider is missing.

Frontend:
- settings-directory.tsx: provider Select enables all three options;
  panel switches between EntraPanel / LakebasePanel / FilePanel on
  selection. Each panel renders its provider-specific input plus a
  help block (UC connection setup for Entra, required SQL schema for
  Lakebase, CSV format example for File). Form state is preserved
  per-field when toggling providers so users don't lose typed values.

Tests (34 new, 67 directory tests in total):
- test_lakebase_provider.py (18): FQN validation rejects SQL
  injection / illegal identifiers; prefix search hits against
  display_name and id; result shape; type filter; top cap; %/_
  wildcard escape; get_user happy/missing/empty; test() probe
  succeeds and fails; ctx/config validation.
- test_file_provider.py (15): CSV parsing happy path, blank rows
  skipped, blank id rejected, unknown type rejected, missing
  required columns rejected; case-insensitive prefix; id-column
  match; top cap; mtime-based cache re-read; test() probe.
- test_directory_manager.py (+3): status exposes all per-provider
  fields; configured=False when required setting absent; manager.test
  raises "missing required" message when a recognised provider lacks
  its required setting.
- test_entra_id_provider.py (16): unchanged behaviour, migrated to
  the new (ctx, config) factory signature.

Status:
- Backend directory + workflow tests: 158 passed.
- Frontend type-check clean, lint clean on touched files.
- Existing 705-test frontend suite unaffected.
@larsgeorge-db larsgeorge-db requested a review from a team as a code owner May 21, 2026 16:30
larsgeorge-db added a commit that referenced this pull request May 21, 2026
…r scope

Documents what shipped under PRs #406 / #407 / #412 / #413 / #416 / #417:

- Renames the integration's manager / routes / settings keys in the
  PRD to match the implementation (Directory layer, /api/directory/*,
  DIRECTORY_* settings, Settings → Directory tab).
- Documents the DirectoryProvider interface and the
  (DirectoryProviderContext, DirectoryProviderConfig) factory
  signature so future provider plug-ins know what to implement.
- Documents the v1 provider set, which expanded during planning
  from Entra-only to entra + lakebase + file. The Lakebase table
  schema and CSV format are included so operators have a single
  reference.
- Preserves story content, the disambiguation rule, both picker
  modes, storage-compatibility guarantees, and graceful-degradation
  rules from the PRD body unchanged.
- Re-confirms the out-of-scope list (Okta/Ping, service principals,
  OBO, profile photos, manager hierarchy, role/team Select replacement,
  CSV bulk import) which the abstraction makes cheap to revisit.
larsgeorge-db added a commit that referenced this pull request May 21, 2026
…r scope

Documents what shipped under PRs #406 / #407 / #412 / #413 / #416 / #417:

- Renames the integration's manager / routes / settings keys in the
  PRD to match the implementation (Directory layer, /api/directory/*,
  DIRECTORY_* settings, Settings → Directory tab).
- Documents the DirectoryProvider interface and the
  (DirectoryProviderContext, DirectoryProviderConfig) factory
  signature so future provider plug-ins know what to implement.
- Documents the v1 provider set, which expanded during planning
  from Entra-only to entra + lakebase + file. The Lakebase table
  schema and CSV format are included so operators have a single
  reference.
- Preserves story content, the disambiguation rule, both picker
  modes, storage-compatibility guarantees, and graceful-degradation
  rules from the PRD body unchanged.
- Re-confirms the out-of-scope list (Okta/Ping, service principals,
  OBO, profile photos, manager hierarchy, role/team Select replacement,
  CSV bulk import) which the abstraction makes cheap to revisit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant