Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-4275 Slowness in sasl login or subject.doAs() causes zk client to falsely assume that the server did not respond, closes connection and goes to unnecessary retries #1684

Open
wants to merge 1 commit into
base: branch-3.5
Choose a base branch
from

Conversation

rvaleti
Copy link
Contributor

@rvaleti rvaleti commented Apr 14, 2021

No description provided.

…ent to falsely assume that the server did not respond, closes connection and goes to unnecessary retries
@rvaleti
Copy link
Contributor Author

rvaleti commented Apr 14, 2021

The test is passing on my machine. Not sure if it's a flapper?

$ mvn clean test -Dtest=ReconfigTest -pl zookeeper-server
[INFO] Scanning for projects...
[INFO]
[INFO] -------------------< org.apache.zookeeper:zookeeper >-------------------
[INFO] Building Apache ZooKeeper - Server 3.5.10-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ zookeeper ---
[INFO] Deleting ~/zookeeper_github_forked/zookeeper-server/target
[INFO]
[INFO] --- git-commit-id-plugin:4.0.4:revision (find-current-git-revision) @ zookeeper ---
[INFO]
[INFO] --- maven-antrun-plugin:1.8:run (default) @ zookeeper ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO]
[INFO] --- build-helper-maven-plugin:3.0.0:timestamp-property (tbuild-time) @ zookeeper ---
[INFO]
[INFO] --- properties-maven-plugin:1.0.0:read-project-properties (default) @ zookeeper ---
[INFO]
[INFO] --- maven-compiler-plugin:3.8.1:compile (pre-compile-vergen) @ zookeeper ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 2 source files to ~/zookeeper_github_forked/zookeeper-server/target/classes
[INFO]
[INFO] --- exec-maven-plugin:1.6.0:exec (generate-version-info) @ zookeeper ---
[INFO]
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) @ zookeeper ---
[INFO]
[INFO] --- maven-resources-plugin:2.7:resources (default-resources) @ zookeeper ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory ~/zookeeper_github_forked/zookeeper-serversrc/main/java/resources
[INFO] Copying 3 resources
[INFO]
[INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @ zookeeper ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 279 source files to ~/zookeeper_github_forked/zookeeper-server/target/classes
[INFO]
[INFO] --- maven-resources-plugin:2.7:testResources (default-testResources) @ zookeeper ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 34 resources
[INFO] Copying 3 resources
[INFO]
[INFO] --- maven-compiler-plugin:3.8.1:testCompile (default-testCompile) @ zookeeper ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 269 source files to ~/zookeeper_github_forked/zookeeper-server/target/test-classes
[INFO]
[INFO] --- maven-surefire-plugin:2.22.1:test (default-test) @ zookeeper ---
[INFO]
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.zookeeper.test.ReconfigTest
[INFO] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 285.152 s - in org.apache.zookeeper.test.ReconfigTest
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 05:05 min
[INFO] Finished at: 2021-04-14T11:50:57+05:30
[INFO] ------------------------------------------------------------------------

@symat
Copy link
Contributor

symat commented Apr 14, 2021

yeah, seems flaky... I retriggered the test and now everything seems green

Copy link
Contributor

@symat symat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, really nice catch, thank you!

I checked the code path, you are right and this is a good fix when the kerberos authentication is slow.

I wonder if it would be possible to do a unit test for this somehow. Can you take a look? If it is not easy (timing related unit tests can be ugly), then I don't think it would be absolutely necessary.

Also please submit a separate PR to the master branch, where the same issue is also present. We fix bugs first on the master branch, then usually cherry-pick the fixes to other active branches. Thank you!

@rvaleti
Copy link
Contributor Author

rvaleti commented Apr 14, 2021

Hi @symat

I raised a PR for master branch as suggested.

I did spend a lot of time on trying to add a test for this. Inducing a sleep in to ClientCnxn#SendThread#startConnect() or ZookeeperSaslClient was not feasible. We have to do a lot of refactoring in order to get there.

  • expose ZookeeperSaslClient so that it can be mocked/extended
  • or make sendThread non-final so it can be overridden from test and then induce a pause.

Please let me know if there's an easy way out.

@nkalmar
Copy link
Contributor

nkalmar commented Apr 14, 2021

This should be a clean cherry-pick to 3.5 from master, so I think this PR can be closed. But since all green it can as well stay and merged from here.

@symat
Copy link
Contributor

symat commented Apr 14, 2021

Thanks @rvaleti !

I don't see an easy way for unit test either, and I wouldn't make a major refactoring in the production code just for this test... I think we can skip the unit test this case.

Thanks for the PR on the master branch, I agree with @nkalmar that you can close this PR (on branch-3.5), as we will simply cherry-pick the master commit to the other branches. Anyway, it was good to get also a green CI run on branch-3.5, that never hurts :)

@maoling
Copy link
Member

maoling commented May 11, 2021

  • @rvaleti Actually, I also found this issue at Aug 2019 PR-1065 (I also have a analysis here, maybe it's still helpful )
  • @symat @nkalmar Since master now has this fix, what are you going to do next? merge this PR again 3.5 or cherry-pick directly from master?

@symat
Copy link
Contributor

symat commented May 12, 2021

hi @maoling !

I think we already pushed this fix to all active branches. E.g. it is present on 3.5, see: https://github.com/apache/zookeeper/commits/branch-3.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants