Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-6879; Invoke session init callbacks outside lock to avoid deadlock #4977

Merged
merged 4 commits into from
May 8, 2018

Conversation

hachikuji
Copy link
Contributor

Fixes a deadlock between the controller's beforeInitializingSession callback which holds the zookeeper client initialization lock while awaiting completion of an asynchronous event which itself depends on the same lock.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@hachikuji
Copy link
Contributor Author

cc @junrao I implemented the fix we discussed offline. We are still ensured that the expiration event is received prior to establishing the new session, which means controller resignation must happen before any further events. I think this fences off any events which might be added prior to session initialization, but let me know if I missed something.

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hachikuji : Thanks for the patch. LGTM. A couple of minor comments below. Also, we want expiryScheduler to be single threaded.

expireEvent.waitUntilProcessed()

// Block initialization of the new session until the expiration event is being handled to
// ensure it cannot be associated with the wrong session
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly to ensure that all pending events in the queue are processed before we create a new ZK session.

info("Session expired.")

// Initialization callbacks are invoked outside of the lock to avoid deadlock potential
// since callbacks may acquire their own locks and add new requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"since callbacks may acquire their own locks and add new requests" => since the completion of beforeInitializingSession() in the state change handler may require Zookeeper requests ?

}

/**
* reinitialize method to use in unit tests
*/
private[zookeeper] def reinitialize(): Unit = {
private[zookeeper] def forceReinitialize(): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a comment explaining how this differs from reinitialize?

Copy link
Contributor

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple of minor comments.

ZooDefs.Ids.OPEN_ACL_UNSAFE.asScala, CreateMode.PERSISTENT))
} finally {
latch.countDown()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can drop the braces from try and finally. Unless you think they make the code clearer?

}

reinitializeThread.start()
Thread.sleep(100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we explain why we need the sleep?

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hachikuji : Thanks for the updated patch. LGTM

@ijuma ijuma merged commit bce1079 into apache:trunk May 8, 2018
ijuma pushed a commit that referenced this pull request May 8, 2018
…oller deadlock (#4977)

Fixes a deadlock between the controller's beforeInitializingSession callback which holds the zookeeper client initialization lock while awaiting completion of an asynchronous event which itself depends on the same lock.

Also catch and log callback exceptions to ensure the ZooKeeper reconnection takes place.
Finally, configure KafkaScheduler in ZooKeeperClient to have at least 1 thread.

Added tests that fail or hang without the changes in this PR.

Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
@ijuma
Copy link
Contributor

ijuma commented May 8, 2018

Merged to trunk and 1.1.

ying-zheng pushed a commit to ying-zheng/kafka that referenced this pull request Jul 6, 2018
…oller deadlock (apache#4977)

Fixes a deadlock between the controller's beforeInitializingSession callback which holds the zookeeper client initialization lock while awaiting completion of an asynchronous event which itself depends on the same lock.

Also catch and log callback exceptions to ensure the ZooKeeper reconnection takes place.
Finally, configure KafkaScheduler in ZooKeeperClient to have at least 1 thread.

Added tests that fail or hang without the changes in this PR.

Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants