Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue 10860][pulsar-metadata] Ensure metadata cache consistency across brokers #10862

Merged
merged 3 commits into from
Jun 8, 2021

Conversation

sursingh
Copy link
Contributor

@sursingh sursingh commented Jun 8, 2021

Fixes #10860

Motivation

When doing operations on multiple brokers, the state is not always consistent. Sometimes the operations don't seem to get replicated to other brokers in the cluster. The queries from the brokers returns different results. See #10860 for details

Modifications

  1. While creating a new object, we add the object to cache. However we are not adding a watch on the corresponding path in zookeeper. So when other brokers change/delete the object, the broker that added the object is not notified. With this change we will do a get(), which will add a watch on the corresponding path. This will also add the object to the cache.

  2. Similarly while deleting, we are adding an empty object to the cache. Any local request will now return a object not found error, without needing to go the zookeeper. However if the object get re-added on any other broker, we will not be notified. We will continue to return object not found error, even though an instance of the object now exist. This change remove the addition of empty object to the cache. If a request is received for the object, we will do a get from the zookeeper and if it still doesn't exist, an empty object will get added. Also a watch will get added for the corresponding path. This will ensure that cache will get update the object is again added by some other broker.

Verifying this change

  • Added a test case that clearly demonstrates the issue and ensured that behavior is fixed after the change

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (no)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Surinder Singh added 2 commits June 7, 2021 20:35
1. During create we should add a watch for the path in zookeeper. Without this
we will not be notified if the znode is changed on another brokers

2. similarly when deleting, the cache should be invalidated. But we shouldn't add an
entry to the cache. This could get added again on some other broker. In that
case we need to go a fetch the entry from the zookeeper. Adding an empty
entry to the cache prevents that.
@sursingh
Copy link
Contributor Author

sursingh commented Jun 8, 2021

@merlimat : Can you help review this change.

@sursingh sursingh closed this Jun 8, 2021
@sursingh sursingh reopened this Jun 8, 2021
Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch here!

The only doubt I have here is if it makes sense to do the fix in the ZK store implementation instead.

The reason is that since this is mostly related with the watch semantics which is very peculiar of ZK. Also, since we're now on ZK 3.6, we can also switch to use persistent watches (eg: setting a watch on an entire subtree and not having to re-set it each time).

@merlimat merlimat added the type/bug The PR fixed a bug or issue reported a bug label Jun 8, 2021
@merlimat merlimat added this to the 2.8.0 milestone Jun 8, 2021
@Anonymitaet
Copy link
Member

@sursingh thanks for your contribution. For this PR, do we need to update docs?

@codelipenghui codelipenghui modified the milestones: 2.8.0, 2.9.0 Jun 8, 2021
@ravi-vaidyanathan
Copy link
Contributor

/pulsarbot run-failure-checks

@sursingh
Copy link
Contributor Author

sursingh commented Jun 8, 2021

@Anonymitaet This is only a bugfix. We don't need to update the docs for this.

Also add a small delay in test to allow notifications to propagate to other
caches. Without this the tests are occasionally failing
@merlimat merlimat merged commit 798b34f into apache:master Jun 8, 2021
@codelipenghui codelipenghui modified the milestones: 2.9.0, 2.8.0 Jun 10, 2021
codelipenghui pushed a commit that referenced this pull request Jun 11, 2021
…ss brokers (#10862)

* Added metadata cache test to simulate multi broker cache

* fix create and delete ops on cache

1. During create we should add a watch for the path in zookeeper. Without this
we will not be notified if the znode is changed on another brokers

2. similarly when deleting, the cache should be invalidated. But we shouldn't add an
entry to the cache. This could get added again on some other broker. In that
case we need to go a fetch the entry from the zookeeper. Adding an empty
entry to the cache prevents that.

* Address review comments

Also add a small delay in test to allow notifications to propagate to other
caches. Without this the tests are occasionally failing

Co-authored-by: Surinder Singh <surinders@splunk.com>
(cherry picked from commit 798b34f)
eolivelli pushed a commit to datastax/pulsar that referenced this pull request Jun 14, 2021
…ss brokers (apache#10862)

* Added metadata cache test to simulate multi broker cache

* fix create and delete ops on cache

1. During create we should add a watch for the path in zookeeper. Without this
we will not be notified if the znode is changed on another brokers

2. similarly when deleting, the cache should be invalidated. But we shouldn't add an
entry to the cache. This could get added again on some other broker. In that
case we need to go a fetch the entry from the zookeeper. Adding an empty
entry to the cache prevents that.

* Address review comments

Also add a small delay in test to allow notifications to propagate to other
caches. Without this the tests are occasionally failing

Co-authored-by: Surinder Singh <surinders@splunk.com>
(cherry picked from commit 798b34f)
(cherry picked from commit d280fad)
yangl pushed a commit to yangl/pulsar that referenced this pull request Jun 23, 2021
…ss brokers (apache#10862)

* Added metadata cache test to simulate multi broker cache

* fix create and delete ops on cache

1. During create we should add a watch for the path in zookeeper. Without this
we will not be notified if the znode is changed on another brokers

2. similarly when deleting, the cache should be invalidated. But we shouldn't add an
entry to the cache. This could get added again on some other broker. In that
case we need to go a fetch the entry from the zookeeper. Adding an empty
entry to the cache prevents that.

* Address review comments

Also add a small delay in test to allow notifications to propagate to other
caches. Without this the tests are occasionally failing

Co-authored-by: Surinder Singh <surinders@splunk.com>
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
…ss brokers (apache#10862)

* Added metadata cache test to simulate multi broker cache

* fix create and delete ops on cache

1. During create we should add a watch for the path in zookeeper. Without this
we will not be notified if the znode is changed on another brokers

2. similarly when deleting, the cache should be invalidated. But we shouldn't add an
entry to the cache. This could get added again on some other broker. In that
case we need to go a fetch the entry from the zookeeper. Adding an empty
entry to the cache prevents that.

* Address review comments

Also add a small delay in test to allow notifications to propagate to other
caches. Without this the tests are occasionally failing

Co-authored-by: Surinder Singh <surinders@splunk.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent metadata cache across brokers
6 participants