Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Invalidate metadata children cache after key deleted #20363

Merged
merged 2 commits into from May 24, 2023

Conversation

Shawyeok
Copy link
Contributor

@Shawyeok Shawyeok commented May 20, 2023

Motivation

There is a chance if a topic reassign to a broker that owned this topic once, consumer reconnect will failure with BadVersionException, here's detail: #14779 (comment)

Reproduce steps:

  1. a topic deleted in broker1
  2. recreate the topic in broker2, start a consumer with it
  3. broker crash or bundle split, topic reassign to broker1, by now consumer will reconnect failure

Modifications

Cleanup metadata children cache if managed-ledger key deleted

Verifying this change

  • Make sure that the change passes the CI checks.

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc-not-needed

Matching PR in forked repository

PR in forked repository: Shawyeok#7

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label May 20, 2023
@tisonkun tisonkun assigned tisonkun and Shawyeok and unassigned tisonkun May 21, 2023
@Shawyeok
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

@lhotari lhotari added this to the 3.1.0 milestone May 24, 2023
@tisonkun
Copy link
Member

Besides, I remember that ZK cannot remove recursively at a batch but iterating from children to parent. How the cache won't be invalidated by its children getting deleted?

Anyway, invalidate cache eagerly won't introduce correctness issue.

@Shawyeok
Copy link
Contributor Author

Shawyeok commented May 24, 2023

Besides, I remember that ZK cannot remove recursively at a batch but iterating from children to parent. How the cache won't be invalidated by its children getting deleted?

Anyway, invalidate cache eagerly won't introduce correctness issue.

In the scenario described in #14779 (comment), node1 does not have a watcher associated with the newly created znode cursor. As a result, the mechanism to invalidate the parent key children cache will not work.

@codecov-commenter
Copy link

Codecov Report

Merging #20363 (4e9086f) into master (1e664b7) will increase coverage by 36.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #20363       +/-   ##
=============================================
+ Coverage     36.93%   72.95%   +36.01%     
- Complexity    12032    31889    +19857     
=============================================
  Files          1687     1864      +177     
  Lines        128761   138403     +9642     
  Branches      14002    15185     +1183     
=============================================
+ Hits          47553   100965    +53412     
+ Misses        74932    29427    -45505     
- Partials       6276     8011     +1735     
Flag Coverage Δ
inttests 24.17% <100.00%> (-0.16%) ⬇️
systests 24.90% <100.00%> (-0.10%) ⬇️
unittests 72.22% <100.00%> (+40.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...he/pulsar/metadata/impl/AbstractMetadataStore.java 85.38% <100.00%> (+32.67%) ⬆️

... and 1430 files with indirect coverage changes

@tisonkun
Copy link
Member

tisonkun commented May 24, 2023

Thanks for your explanation! Now I know that no children cause the cache be empty and return empty cursor even if there is some. And this patch is valid even for defensive programming (delete node should always invalid cache).

Although, if the cursor is a child of the topic znode, won't putInternal invalid childrenCache so that the cursor be properly loaded?

I'd appreciate it if you can provide a znode view of execution steps to demonstrate the exception.

@Shawyeok
Copy link
Contributor Author

I'd appreciate it if you can provide a znode view of execution steps to demonstrate the exception.

Try me break down into two steps to explain how this error could be produced in #14779:

  1. When topic back to node1, it will only initialize the managed-ledger without initializing any cursors, due to cache consistency
    pseudo call stack:
MetaStore#getCursor
ManagedLedgerImpl#initializeCursors
MetaStore#asyncUpdateLedgerIds
ManagedLedgerImpl#asyncCreateLedger
ManagedLedgerImpl#initializeBookKeeper
MetaStore#getManagedLedgerInfo
ManagedLedgerImpl#initialize

if (consumers.isEmpty()) {
callback.initializeComplete();
return;

  1. When a consumer reconnects to node1, the broker only checks if the cursor has been loaded into memory
    ManagedCursor cachedCursor = cursors.get(cursorName);
    if (cachedCursor != null) {

pseudo call stack:

AbstractMetadataStore#put
MetaStore#asyncUpdateCursorInfo
ManagedCursorImpl#persistPositionMetaStore
ManagedCursorImpl#switchToNewLedger
ManagedCursorImpl#createNewMetadataLedger
ManagedCursorImpl#initialize
ManagedLedger#asyncOpenCursor
PersistentTopic#getDurableSubscription
PersistentTopic#internalSubscribe
PersistentTopic#subscribe
ServerCnx#handleSubscribe

BTW, broker will try create a cursor znode which already exists in zk, so it'll receive NODEEXISTS.

Here's NODEEXISTS how to be handled, a BADVERSION:

} else if (code == Code.NODEEXISTS) {
// We're emulating a request to create node, so the version is invalid
op.getFuture().completeExceptionally(getException(Code.BADVERSION, op.getPath()));

In summary, this is how I observed the occurrence of BadVersionException.

@tisonkun
Copy link
Member

After a discussion with @Shawyeok I learned that it's about two broker node that:

  1. Broker 1 has cache for topic znode.
  2. Broker 1 switches to a different topic, delete the topic znode but forget to clear the children cache.
  3. Broker 2 create the topic znode and cursor child znode
  4. Broker 1 switches back to the original topic. It doesn't set watches or perform put cursor znode by itself so the child cache is still empty (not uninit).

@tisonkun
Copy link
Member

Merging...

Thanks for your contribution and detailed investigation @Shawyeok!

@tisonkun tisonkun merged commit 7dcb3ea into apache:master May 24, 2023
41 checks passed
@Technoboy-
Copy link
Contributor

@Shawyeok Could you add a test for this patch?

@Shawyeok
Copy link
Contributor Author

@Shawyeok Could you add a test for this patch?

Sure, I'll do it later.

@Shawyeok
Copy link
Contributor Author

Shawyeok commented May 30, 2023

This patch was aimed to fix #14779, I reproduced in my environment which is based on the 2.8.1 codebase. After following @Technoboy's suggestion to write a reproduction test in the master branch, I realized that #11198 uses zk persistent watches, which should also fix #14779.

However, in theory, there is a chance that the new cursor watcher event is not triggered before MetadataStore#getChildren gets called on other brokers. In such cases, this patch also works effectively.

poorbarcode pushed a commit that referenced this pull request May 30, 2023
- Add `MetadataStoreTest#testExistsDistributed` for distributed metaStore implementations only
- Add `MetadataStoreTest#testGetChildrenDistributed` for distributed metaStore implementations only
poorbarcode pushed a commit that referenced this pull request May 30, 2023
coderzc pushed a commit that referenced this pull request May 31, 2023
@coderzc coderzc added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label May 31, 2023
@michaeljmarshall
Copy link
Member

As discussed on the mailing list https://lists.apache.org/thread/w4jzk27qhtosgsz7l9bmhf1t7o9mxjhp, there is no plan to release 2.9.6, so I am going to remove the release/2.9.6 label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BadVersionException , unable to consume topic, failing to reconnect Failed to create subscription
9 participants