Skip to content

fix(server): align cache event actions in legacy EventHub path#3017

Open
dpol1 wants to merge 2 commits intoapache:masterfrom
dpol1:fix/3012-eventhub-cache-actions
Open

fix(server): align cache event actions in legacy EventHub path#3017
dpol1 wants to merge 2 commits intoapache:masterfrom
dpol1:fix/3012-eventhub-cache-actions

Conversation

@dpol1
Copy link
Copy Markdown
Contributor

@dpol1 dpol1 commented May 8, 2026

Purpose of the PR

Cache producers in the legacy EventHub path were emitting past-tense action
names (invalided / cleared) while local cache listeners only matched the
present-tense actions (invalid / clear). On single-node deployments, or
whenever the cache RPC bridge in StandardHugeGraph.AbstractCacheNotifier was
not active, those local cache invalidation events were silently dropped: no
error, no warning, just stale cache.

The fix aligns producers, listeners, and the bridge on the present-tense
constants, removes the obsolete past-tense ones, and wires notifyExcept(...)
so the bridge does not loop back into itself once both ends speak the same
action name.

Main Changes

  • Drop Cache.ACTION_INVALIDED / Cache.ACTION_CLEARED. All producers
    (CachedGraphTransaction, CachedSchemaTransaction) now emit
    Cache.ACTION_INVALID / Cache.ACTION_CLEAR, matching what local listeners
    and RaftContext already expect.
  • Update StandardHugeGraph.AbstractCacheNotifier to listen on the
    present-tense actions and to forward through EventHub.notifyExcept(...) so
    the bridge does not re-receive its own re-emit.
  • Add EventHub.notifyExcept(event, ignoredListener, args...) so a caller can
    notify everyone on the hub except the listener that originated the event.
    The signature is documented as a public API on EventHub, notify(...)
    is unchanged and now delegates internally.
  • Use notifyExcept(...) from the cache transactions and from
    AbstractCacheNotifier, with the registered cache listener as the ignored
    one, so producers do not feed their own listener and the RPC re-emit does
    not loop the bridge.
  • Track the registered cache listener per graph in
    GraphTransaction.graphCacheEventListeners and in a matching map inside
    CachedSchemaTransaction, with owner-aware teardown on close. This is the
    minimum needed for notifyExcept(...) to exclude the listener that is
    actually registered on the hub: with the old per-instance lambda +
    containsListener(...) / boolean guard pattern, only the first transaction
    per graph registered a listener, while later transactions could call
    notifyExcept(...) with a lambda that was never registered. In that case the
    real registered listener was not excluded and could still process the
    transaction's own local emit.

CachedSchemaTransactionV2 is intentionally not touched here. That class was
just adjusted in #3011 for HStore/V2 schema cache propagation through
MetaManager. The remaining containsListener(...) guard there is a related
listener-lifecycle concern, not the action-name mismatch fixed in this PR; it
should be handled in a follow-up that audits the remaining V2 path and, if
needed, moves cache listener ownership to a ref-counted holder keyed by graph
identity and EventHub.

Verifying these changes

  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows:
    • EventHubTest#testNotifyExcept covers the new notifyExcept(...)
      semantics, including ANY_EVENT listeners and the ignored-listener
      exclusion.
    • CachedSchemaTransactionTest#testLegacySchemaChangeEmitsActionInvalid
      asserts the schema producer now emits ACTION_INVALID end-to-end.
    • CachedSchemaTransactionTest#testListenerRegistryClearedOnCloseAndRebuiltOnReopen
      asserts that closing a graph drops its entry from the per-graph
      listener registry and that a subsequent transaction registers a fresh
      listener instead of reusing the closed reference.
    • CachedSchemaTransactionTest#testCacheNotifierLocalEventCallsProxyOnce
      and testCacheNotifierRpcInvalidDoesNotLoopToProxy cover the bridge
      no-loop behavior.
    • CachedGraphTransactionTest#testClearCacheEmitsActionClear asserts
      graph cache clears now emit ACTION_CLEAR.
    • CachedGraphTransactionTest#testVertexMutationEmitsActionInvalid
      asserts graph vertex mutations now emit ACTION_INVALID.
    • CachedGraphTransactionTest#testClosingNonOwnerKeepsGraphCacheListenerRegistered
      asserts that closing a graph transaction which reused an existing
      per-graph listener does not unregister the shared graph cache listener.

Does this PR potentially affect the following parts?

  • Dependencies
  • Modify configurations
  • The public API
  • Other affects
  • Nope

EventHub.notifyExcept(...) is a new public method; existing notify(...)
behavior is unchanged.

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

- Align legacy cache invalidation producers and listeners on ACTION_INVALID and
  ACTION_CLEAR, removing the obsolete ACTION_INVALIDED/ACTION_CLEARED constants.
- Add EventHub.notifyExcept(...) so cache transactions and the cache notifier
  bridge can avoid re-processing their own local listener while still delivering
  events to other listeners.
- Track registered graph/schema cache listeners per graph so notifyExcept(...)
  uses the listener instance actually registered on the EventHub, including
  multi-transaction cases where later transactions reuse the first listener.
- Update cache notifier forwarding to prevent local RPC bridge loops after action
  names are unified.
- Add regression coverage for notifyExcept semantics, graph/schema action names,
  listener teardown/re-registration, and notifier no-loop behavior.
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels May 8, 2026
@dpol1 dpol1 changed the title fix(server): align legacy cache event actions fix(server): align cache event actions in legacy EventHub path May 8, 2026
@dpol1
Copy link
Copy Markdown
Contributor Author

dpol1 commented May 8, 2026

@imbajin I took the liberty to quickly fix a flaky test which failed a workflow twice - if you think is out of the scope with this PR I' ll revert it - Actually, I just noticed the first commit is all greens

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a legacy cache invalidation bug where cache event producers emitted past-tense action names (invalided/cleared) that didn’t match the actions local listeners (and some other paths) were expecting (invalid/clear), causing silent drops and stale caches in certain deployments. It also introduces EventHub.notifyExcept(...) to prevent cache-notification bridges from re-consuming their own re-emitted events.

Changes:

  • Remove legacy Cache.ACTION_INVALIDED / Cache.ACTION_CLEARED and align all cache event producers/listeners to Cache.ACTION_INVALID / Cache.ACTION_CLEAR.
  • Add EventHub.notifyExcept(event, ignoredListener, args...) and route notify(...) through the same internal implementation.
  • Track per-graph cache listeners in static registries and use notifyExcept(...) to avoid self-delivery/bridge loops; add/adjust unit tests for the new semantics.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/Cache.java Removes past-tense cache action constants to enforce a single action vocabulary.
hugegraph-commons/hugegraph-common/src/main/java/org/apache/hugegraph/event/EventHub.java Adds notifyExcept(...) and implements ignored-listener filtering in the notifier loop.
hugegraph-commons/hugegraph-common/src/test/java/org/apache/hugegraph/unit/event/EventHubTest.java Adds coverage for notifyExcept(...) (including ANY_EVENT listeners).
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/StandardHugeGraph.java Updates legacy notifier bridge to present-tense actions and uses notifyExcept(...) to avoid loops.
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/tx/GraphTransaction.java Replaces the boolean guard with a per-graph cache listener registry map.
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/CachedSchemaTransaction.java Emits present-tense actions and uses per-graph listener registry + notifyExcept(...) on emits.
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/backend/cache/CachedGraphTransaction.java Emits present-tense actions and uses per-graph listener registry + notifyExcept(...) on emits.
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/cache/CachedSchemaTransactionTest.java Adds tests validating action name alignment, listener registry lifecycle, and bridge no-loop behavior.
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/cache/CachedGraphTransactionTest.java Adds tests validating action name alignment and “non-owner close” listener behavior.
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/cache/CacheManagerTest.java Speeds up expiration tick test via reflective rescheduling and cancels the ad-hoc task.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 202 to +210
private void unlistenChanges() {
String graphName = this.params().spaceGraphName();
if (graphCacheListenStatus.remove(graphName) != null) {
if (this.registeredCacheEventListener) {
EventHub graphEventHub = this.params().graphEventHub();
graphEventHub.unlisten(Events.CACHE, this.cacheEventListener);
if (graphCacheEventListeners.remove(graphName,
this.cacheEventListener)) {
graphEventHub.unlisten(Events.CACHE, this.cacheEventListener);
}
this.registeredCacheEventListener = false;
Comment on lines 153 to +164
private void unlistenChanges() {
// Unlisten store event
this.store().provider().unlisten(this.storeEventListener);

// Unlisten cache event
EventHub schemaEventHub = this.params().schemaEventHub();
schemaEventHub.unlisten(Events.CACHE, this.cacheEventListener);
if (this.registeredCacheEventListener) {
schemaEventHub.unlisten(Events.CACHE, this.cacheEventListener);
SCHEMA_CACHE_EVENT_LISTENERS.remove(this.params().spaceGraphName(),
this.cacheEventListener);
this.registeredCacheEventListener = false;
}
Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix looks correct and the new tests are solid. Two concerns worth tracking: (1) storeEventListenStatus has the same ownership bug this PR fixes for the cache listener, acknowledged in the test's finally block — worth a follow-up; (2) the owner-first-close scenario for the cache listener deserves an explicit test and/or a ref-counted ownership model.

// Closing the secondary transaction exercises the pre-existing
// store listener guard too; restore it so teardown unregisters the
// primary transaction's store listener.
storeListeners.putIfAbsent(graphName, true);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ TODO (follow-up): this putIfAbsent is a workaround for a pre-existing bug in storeEventListenStatus that mirrors exactly the one this PR fixes for graphCacheEventListeners. In unlistenChanges():

if (storeEventListenStatus.remove(graphName) != null) {
    this.store().provider().unlisten(this.storeEventListener);
}

when a non-owner transaction closes, it (a) pops the map entry even though it doesn't own the registration, (b) unlistens its own per-instance storeEventListener which was never registered on the provider (no-op), and (c) leaves the primary's real store listener registered but un-tracked — so the owner's later close silently skips the unlisten and leaks it.

Suggest a follow-up PR that applies the same ref-counted / owner-aware pattern introduced here for the cache listener (i.e. a Map<String, EventListener> + registeredStoreEventListener flag). Keeping as TODO for now since it's out of scope.

this.cacheEventListener)) {
graphEventHub.unlisten(Events.CACHE, this.cacheEventListener);
}
this.registeredCacheEventListener = false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ What happens in the reverse ordering — owner closes first while non-owners are still alive? The new tests only cover non-owner close.

Trace:

  1. T1 opens (owner), registers listener, puts it in graphCacheEventListeners.
  2. T2 opens (non-owner), caches this.cacheEventListener = <T1's listener>.
  3. T1 closes → unlistenChanges() removes the map entry and calls hub.unlisten(...).
  4. T2 still alive. Its notifyChanges(...) calls notifyExcept(hub, <T1's listener>, ...). That listener is no longer on the hub, so the exclusion is a no-op and the emit fans out to whatever else is registered (e.g. AbstractCacheNotifier's bridge). Meanwhile T2 has no working local cache listener at all — the per-graph listener slot is empty until a new transaction repopulates it.

Two questions:

  • Is T1-closes-before-T2 a realistic lifecycle in practice (e.g. thread-local transactions on a long-lived graph)? If yes, a ref-counted ownership holder (unregister only when the last referent closes) would be safer than "first-wins" ownership.
  • Either way, worth adding a companion test to testClosingNonOwnerKeepsGraphCacheListenerRegistered that exercises the owner-first-close path and asserts whatever the intended semantics are, so the invariant is captured.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both fair, thanks.

  1. storeEventListenStatus - same ownership bug, the putIfAbsent in finally is a workaround. Follow-up PR with whatever ownership model we land on.

  2. Owner-first-close - realistic, yeah. Two options: ref-counted holder, or stop unregistering on tx close (cache is already a CacheManager singleton per spaceGraphName, so listener lifetime = graph, not tx; cleanup moves to graph dispose). I'd lean toward the second - kills the race by killing the owner concept.

Wouldn't add a test pinning current behavior - codifies a limitation. Real test needs the fix first.

#3012 is the action-name + notifyExcept fix, so my preference is to merge this as-is and do ownership in a follow-up. Can fold either fix in here instead - your call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] EventHub action name mismatch silently drops cache invalidation events in legacy path

3 participants