Skip to content

feat(integration-registry): schema validator + dangling-linkedTypes surveillance (umbrella PR 2/N)#1468

Merged
rubenvdlinde merged 2 commits into
developmentfrom
feature/1307/schema-validator-refactor
May 18, 2026
Merged

feat(integration-registry): schema validator + dangling-linkedTypes surveillance (umbrella PR 2/N)#1468
rubenvdlinde merged 2 commits into
developmentfrom
feature/1307/schema-validator-refactor

Conversation

@rubenvdlinde
Copy link
Copy Markdown
Contributor

Stacked on #1467. Tasks 7-11 of #1307 — Backend — Schema validator refactor.

Task What
2.1 Schema::validateLinkedTypesValue() now consults both VALID_LINKED_TYPES (deprecated fallback) AND IntegrationRegistry::listIds(). Registry resolved lazily via \OC::$server->get() since Schema is a Nextcloud Entity, not a service. Falls back to fallback-only when container isn't booted (unit tests). AD-5 backwards-compat preserved.
2.2 VALID_LINKED_TYPES marked @deprecated with pointers to the registry + cleanup-linked-entity-type-map follow-up.
2.3 LinkedEntityService::TYPE_COLUMN_MAP marked @deprecated.
2.4 New PropertyReferenceTypeValidator service that validates the referenceType: <integration-id> marker on schema property definitions (AD-18). Standalone validator so existing schema validation paths don't change; CnFormDialog / CnDetailGrid wire it in tasks 25-46.
2.5 New LogDanglingLinkedTypes repair step — registered under <install> + <post-migration> in info.xml. Scans every schema, logs WARNING for any linkedTypes value not yet registered. Strictly informational; never throws, never modifies data.

Net new files

  • lib/Service/Integration/PropertyReferenceTypeValidator.php
  • lib/Repair/LogDanglingLinkedTypes.php
  • tests/Unit/Service/Integration/PropertyReferenceTypeValidatorTest.php

Modified

  • lib/Db/Schema.php — validator + deprecation note
  • lib/Service/LinkedEntityService.php — deprecation on TYPE_COLUMN_MAP
  • lib/Service/Integration/IntegrationRegistry.phpisValidIntegrationId() helper
  • lib/AppInfo/Application.php — DI bindings
  • appinfo/info.xml — repair-step entry

Tests

  • 34 tests, 48 assertions — all green inside the openregister-postgres dev container
  • 9 new for PropertyReferenceTypeValidator; the 25 from PR 1 still pass

Deferred to PR 3

Built-in providers (tasks 12-17 — Files/Notes/Tasks/Tags/AuditTrail wrappers). Keeps this PR a single coherent slice: "registry-driven validation + dangling-id surveillance".

Stacking note

This PR's base is feature/1307/pluggable-integration-registry (PR #1467). When #1467 merges into development, GitHub will retarget this PR to development automatically — no rebase required.

🤖 Generated with Claude Code

…urveillance (umbrella PR 2/N)

Second slice of #1307 — completes tasks 7-11 of the umbrella (Backend
— Schema validator refactor). Builds on PR 1's IntegrationRegistry +
ExternalIntegrationRouter foundation. Stacked PR — base is
feature/1307/pluggable-integration-registry; GitHub will retarget to
development once PR 1 merges.

Tasks completed in this slice (5/69 → cumulative 11/69):

- 2.1 Schema::validateLinkedTypesValue() now consults both
  VALID_LINKED_TYPES (deprecated fallback) AND
  IntegrationRegistry::listIds(). Registry resolved lazily via
  \OC::$server->get() since Schema is a Nextcloud Entity, not a
  service. Falls back to fallback-only when container isn't booted
  (unit tests). AD-5 backwards-compat preserved: existing schemas
  with 'mail' / 'calendar' / 'talk' / 'deck' still validate while
  the leaves land.
- 2.2 VALID_LINKED_TYPES marked @deprecated with pointers to the
  registry + cleanup follow-up.
- 2.3 LinkedEntityService::TYPE_COLUMN_MAP marked @deprecated.
- 2.4 PropertyReferenceTypeValidator — new opt-in service that
  validates the `referenceType: <integration-id>` marker on schema
  property definitions (AD-18). Kept as a standalone validator so
  existing schema validation paths don't change; CnFormDialog /
  CnDetailGrid wire it in tasks 25-46.
- 2.5 LogDanglingLinkedTypes repair step — registered under <install>
  + <post-migration> in info.xml. Scans every schema, logs WARNING
  for any linkedTypes value not yet registered. Strictly
  informational; never throws, never modifies data.

Net new files:
- lib/Service/Integration/PropertyReferenceTypeValidator.php
- lib/Repair/LogDanglingLinkedTypes.php
- tests/Unit/Service/Integration/PropertyReferenceTypeValidatorTest.php

Modified:
- lib/Db/Schema.php — validator + deprecation note
- lib/Service/LinkedEntityService.php — deprecation note on
  TYPE_COLUMN_MAP
- lib/Service/Integration/IntegrationRegistry.php — added
  isValidIntegrationId() helper consumed by the new validator
- lib/AppInfo/Application.php — DI bindings for the new validator
  service + the repair step
- appinfo/info.xml — repair-steps block adds LogDanglingLinkedTypes
- openspec/changes/pluggable-integration-registry/tasks.md
- openspec/changes/pluggable-integration-registry/plan.json

Unit tests:
- 34 tests, 48 assertions — all green (9 new for
  PropertyReferenceTypeValidator + the 25 from PR 1)

Built-in providers (tasks 12-17) intentionally deferred to PR 3 to
keep this PR reviewable as one coherent slice (the validator + the
dangling-id surveillance form one story).

Refs: #1307
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BLOCKER] scan() flags VALID_LINKED_TYPES ids as dangling — false positives on every upgrade

scan() marks a linkedType as dangling when absent from $registeredIds (the live registry). It does NOT consult VALID_LINKED_TYPES. Since built-in Wave-1 providers (FilesProvider, NotesProvider, etc.) are still pending (tasks 3.1–3.5 in plan.json), the registry is empty on any real install. Every schema with linkedTypes=['mail','calendar','talk','files',...] will produce a WARNING on every upgrade, even though validateLinkedTypesValue() in Schema.php accepts those values via the legacy fallback. Fix: pass array_merge(VALID_LINKED_TYPES, $registeredIds) as the accepted set, or suppress the warning when the registry itself is empty.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BLOCKER] findAll() with no limit OOMs on large installations at upgrade time

loadSchemas() calls $mapper->findAll() with no arguments, fetching the entire openregister_schemas table into memory in one query. On installations with thousands of schemas this is a full unbounded table scan during occ upgrade, risking OOM or timeout. Fix: batch-iterate in pages of e.g. 500 (findAll(500, $offset)).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BLOCKER] validate() rejects referenceType='files' today — built-in providers not yet registered

PropertyReferenceTypeValidator::validate() calls $this->registry->isValidIntegrationId($value) with no VALID_LINKED_TYPES fallback. Built-in providers (FilesProvider, NotesProvider, etc.) are Wave-1 work still marked pending in plan.json. Any schema that sets referenceType: "files" today will throw InvalidArgumentException. Either include VALID_LINKED_TYPES as an implicit fallback, or block validator calls until at least one built-in provider is registered.


return $dangling;
}//end scan()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BLOCKER] SPDX machine-readable headers missing from LogDanglingLinkedTypes.php

LogDanglingLinkedTypes.php only has the @license tag, not SPDX-License-Identifier: EUPL-1.2 and SPDX-FileCopyrightText: 2026 Conduction B.V. machine-readable headers required by hydra-gate-spdx. Add both SPDX lines to the opening docblock.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BLOCKER] SPDX machine-readable headers missing from PropertyReferenceTypeValidator.php

PropertyReferenceTypeValidator.php uses the @license docblock tag but omits the machine-readable SPDX-License-Identifier: EUPL-1.2 and SPDX-FileCopyrightText: 2026 Conduction B.V. lines required by hydra-gate-spdx.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CONCERN] Repair step runs unconditionally even when registry has zero providers — alert fatigue

run() scans all schemas unconditionally. When the registry is empty (standard state before Wave-1 providers land), it will warn on every type in every schema. Consider short-circuiting: if $registeredIds === [] then skip the scan entirely or restrict warnings to types not in VALID_LINKED_TYPES to prevent alert fatigue.

Comment thread lib/Db/Schema.php
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CONCERN] resolveIntegrationRegistryIds() swallows all Throwable silently — masking misconfiguration

The catch block catches \Throwable and returns [] with no log entry. If the registry binding throws unexpectedly (not just NotFoundExceptionInterface for missing service), validation silently degrades to the legacy list only. Fix: distinguish NotFoundExceptionInterface (expected) from other \Throwable (log at WARNING level).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CONCERN] No unit tests for LogDanglingLinkedTypes — surveillance step has zero test coverage

The test file covers only PropertyReferenceTypeValidator. LogDanglingLinkedTypes has three non-trivial private methods (loadSchemas, scan, extractLinkedTypes) but no tests. At minimum: a test for scan() with a mix of registered and legacy-only types to confirm the false-positive problem, and a test for the graceful skip path when loadSchemas() returns null.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CONCERN] Repair step injects registry at construction time — providers registered after boot are invisible

IntegrationRegistry is injected in the constructor and listIds() called once in run(). Built-in providers self-register via addProvider() at app bootstrap. If repair steps execute before app boot completes during occ upgrade, listIds() may return an incomplete set and produce spurious warnings. Recommend adding an $output->info() line printing the count of registered ids so operators can cross-check.

Copy link
Copy Markdown
Contributor

@WilcoLouwerse WilcoLouwerse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

🔴 Blockers (5)

🟡 Concerns (4)

🟢 Minor (1)

  • Test stub class named _ValidatorStubProvider — leading underscore violates PHPCS PSR-2 (tests/Unit/Service/Integration/PropertyReferenceTypeValidatorTest.php:734)
    PHP naming conventions (PSR-2 / PHPCS) prohibit leading underscores in class names. _ValidatorStubProvider will trigger a PHPCS sniff. Rename to ValidatorStubProvider or FakeIntegrationProvider.

Reviewed by WilcoLouwerse via automated batch review.

Base automatically changed from feature/1307/pluggable-integration-registry to development May 18, 2026 11:19
@rubenvdlinde rubenvdlinde merged commit c43ccee into development May 18, 2026
16 of 19 checks passed
@rubenvdlinde rubenvdlinde deleted the feature/1307/schema-validator-refactor branch May 18, 2026 11:19
@WilcoLouwerse
Copy link
Copy Markdown
Contributor

🔍 Retrospective audit finding — surfaced from #1515

While reviewing integration rollup PR #1515, one 🔴 blocker traces back to LogDanglingLinkedTypes in this PR:

LogDanglingLinkedTypes::loadSchemas() calls findAll() with no limit — the repair step is an IRepairStep (runs on every occ upgrade / app re-enable) and pulls every schema row into memory in a single call. On a multi-tenant deployment with thousands of schemas this will exhaust PHP memory mid-upgrade and leave the app in a half-migrated state — the exact failure mode the step exists to warn about. Either page via findAll($limit, $offset) in a loop, or stream via a generator/yield. Inline comment on #1515

The PR body of #1515 surfaced this under "#1468 false-positive risk" as a concern — confirming it merged unaddressed. Worth a fixup PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants