Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-2894: Memory and completions leak on zookeeper_close #1000

Closed
wants to merge 1 commit into from

Conversation

@xoiss
Copy link
Contributor

xoiss commented Jun 23, 2019

If I call zookeeper_close() when the call-stack does not pass through any of zookeeper mechanics (for example, some of completion handler called back by zookeeper) the following happens:

  1. If there are requests waiting for responses in zh.sent_requests queue, they all are removed from this queue and each of them is "completed" with personal fake response having status ZCLOSING. Such fake responses are put into zh.completions_to_process queue. It's Ok
  2. But then, zh.completions_to_process queue is left unhandled. Neither completion callbacks are called, nor dynamic memory allocated for fake responses is freed
  3. Different structures within zh are dismissed and finally zh is freed

Consequently we have a tiny memory leak and, which is much more important, completions leak. I.e. completions are not called at all.

This is not the case if I call zookeeper_close() when the call-stack does pass through some of zookeeper mechanics. Here, zookeeper client handles completions properly.

Proposed changes remove this defects.

For more details see: https://issues.apache.org/jira/browse/ZOOKEEPER-2894

@xoiss

This comment has been minimized.

Copy link
Contributor Author

xoiss commented Jun 23, 2019

Hello!

This is a reworked/updated/improved PR.

The previous one: #363

@eolivelli

This comment has been minimized.

Copy link
Contributor

eolivelli commented Jun 23, 2019

@xoiss do you have time to try to add some minimal test case? This way we won't have regressions in the future

@xoiss

This comment has been minimized.

Copy link
Contributor Author

xoiss commented Jun 23, 2019

@xoiss do you have time to try to add some minimal test case? This way we won't have regressions in the future

I have described the test-case in https://issues.apache.org/jira/browse/ZOOKEEPER-2894
Here it is:

To reproduce the case, do the following:

  • open ZooKeeper session, connect to a server, receive and process connected-watch, etc.
  • then somewhere from the main events loop call for example zoo_acreate() with valid arguments – it shall return ZOK
  • then, immediately after it returned, call zookeeper_close()
  • note that completion callback handler for zoo_acreate() will not be called

But I did not try to implement it as a unit-test.

Actually I don't have an idea how to perform all the preconditions, because the Server is also involved in the setup.

@xoiss

This comment has been minimized.

Copy link
Contributor Author

xoiss commented Jun 23, 2019

// if you can help me a bit, it would be great
// the best is to put me on some existing UT that I could copy-modify-commit

@eolivelli

This comment has been minimized.

Copy link
Contributor

eolivelli commented Jun 23, 2019

You can check https://github.com/apache/zookeeper/blob/master/zookeeper-client/zookeeper-client-c/tests/TestClient.cc

Honestly I tried to a new test case about the C client but without success and I gave up.
But I am not a C programmer.
Hopefully other committers in the community will help you better.

@xoiss

This comment has been minimized.

Copy link
Contributor Author

xoiss commented Jun 23, 2019

... ok, give me some time

// until I'm ready, if you wish, you may check another PR from me: #999

@enixon

This comment has been minimized.

Copy link

enixon commented Jun 24, 2019

This is awesome, thanks for the contribution!

@hanm

This comment has been minimized.

Copy link
Contributor

hanm commented Jun 28, 2019

Actually I don't have an idea how to perform all the preconditions, because the Server is also involved in the setup.

Cpp test starts a server by executing tests/zkServer.sh in command shell - search startServer in c code base would give a fair good amount of example code. If you need parameterize the server start up (e.g. pass addition JVM properties), you can do it by extending the zkServer.sh script.

@xoiss xoiss force-pushed the xoiss:ZOOKEEPER-2894 branch from defa39f to ae2b69c Jul 2, 2019
@xoiss

This comment has been minimized.

Copy link
Contributor Author

xoiss commented Jul 2, 2019

try to add some minimal test case

Done!

I have provided two tests:

  • testCloseWhileInProgressFromMain -- this one fails when run on the code BEFORE the patch. This is the subcase described in the Ticket. And on the patched code, it passes successfully
  • testCloseWhileInProgressFromCompletion -- there are no problems with this subcase both before and after the patch. This is because close() is called not from the main loop here, but from a completion handler which in turn was called by ZK Client internals

Cpp test starts a server by executing tests/zkServer.sh in command shell

Yeah, I can see this at the very beginning of the TestDriver.cc: main()... But, it looks like those tests verifying requests treatment inside ZK Client internals actually use ZK Server Emulator (derived from the Socket Mock class). So, I have used this scheme in new tests from me.

Thanx!

@xoiss

This comment has been minimized.

Copy link
Contributor Author

xoiss commented Jul 2, 2019

rebased on 4a5d596
ZOOKEEPER-3389: Zookeeper does not export all required packages in OSGi (for Curator)

@xoiss

This comment has been minimized.

Copy link
Contributor Author

xoiss commented Jul 2, 2019

@phunt @anmolnar @eolivelli @hanm

Hello!

What happened with Jenkins on H2 (Hadoop) ?
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/4057/consoleFull

... ... ...
Buildfile: /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml

check-for-findbugs:

findbugs.check:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1868: 'findbugs.home' is not defined. Please pass -Dfindbugs.home=<base of Findbugs installation>    to Ant on the command-line.
... ... ...

The previous build (before rebase on 4a5d596) performed on H18 (Hadoop xenial) was successful:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/4055/consoleFull

How to retry the build?

@xoiss xoiss force-pushed the xoiss:ZOOKEEPER-2894 branch from 6bbb5b7 to e7e1062 Jul 2, 2019
@xoiss

This comment has been minimized.

Copy link
Contributor Author

xoiss commented Jul 2, 2019

// just change the commit-id to rerun precommit checks

@xoiss xoiss force-pushed the xoiss:ZOOKEEPER-2894 branch from e7e1062 to 9e29de6 Jul 2, 2019
@xoiss

This comment has been minimized.

Copy link
Contributor Author

xoiss commented Jul 2, 2019

// ... and again one more time -- change the commit-id to rerun precommit checks
// // because Travis did not started on the previous commit for some reason...

@xoiss

This comment has been minimized.

Copy link
Contributor Author

xoiss commented Jul 3, 2019

@phunt @anmolnar @eolivelli @hanm

Hello!
Can we move this further?

@xoiss

This comment has been minimized.

Copy link
Contributor Author

xoiss commented Jul 5, 2019

Copy link
Contributor

eolivelli left a comment

Awesome work.

I am convinced that your fix does the right thing. I am still not an expert of the C client.

Please any committer expert in the C client take a look.

@hanm
hanm approved these changes Jul 9, 2019
Copy link
Contributor

hanm left a comment

LGTM, and kudos to the nicely written test case!

asfgit pushed a commit that referenced this pull request Jul 9, 2019
If I call `zookeeper_close()` when the call-stack does not pass through any of zookeeper mechanics (for example, some of completion handler called back by zookeeper) the following happens:
1. If there are requests waiting for responses in `zh.sent_requests` queue, they all are removed from this queue and each of them is "completed" with personal fake response having status `ZCLOSING`. Such fake responses are put into `zh.completions_to_process` queue. It's Ok
2. But then, `zh.completions_to_process` queue is left unhandled. **Neither completion callbacks are called, nor dynamic memory allocated for fake responses is freed**
3. Different structures within `zh` are dismissed and finally `zh` is freed

Consequently we have a tiny **memory leak** and, which is much more important, **completions leak**. I.e. completions are not called at all.

This is not the case if I call `zookeeper_close()` when the call-stack does pass through some of zookeeper mechanics. Here, zookeeper client handles completions properly.

Proposed changes remove this defects.

For more details see: https://issues.apache.org/jira/browse/ZOOKEEPER-2894

Author: Alexander A. Strelets <streletsaa@gmail.com>

Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Michael Han <hanm@apache.org>

Closes #1000 from xoiss/ZOOKEEPER-2894
@asfgit asfgit closed this in f9610cc Jul 9, 2019
asfgit pushed a commit that referenced this pull request Jul 9, 2019
If I call `zookeeper_close()` when the call-stack does not pass through any of zookeeper mechanics (for example, some of completion handler called back by zookeeper) the following happens:
1. If there are requests waiting for responses in `zh.sent_requests` queue, they all are removed from this queue and each of them is "completed" with personal fake response having status `ZCLOSING`. Such fake responses are put into `zh.completions_to_process` queue. It's Ok
2. But then, `zh.completions_to_process` queue is left unhandled. **Neither completion callbacks are called, nor dynamic memory allocated for fake responses is freed**
3. Different structures within `zh` are dismissed and finally `zh` is freed

Consequently we have a tiny **memory leak** and, which is much more important, **completions leak**. I.e. completions are not called at all.

This is not the case if I call `zookeeper_close()` when the call-stack does pass through some of zookeeper mechanics. Here, zookeeper client handles completions properly.

Proposed changes remove this defects.

For more details see: https://issues.apache.org/jira/browse/ZOOKEEPER-2894

Author: Alexander A. Strelets <streletsaa@gmail.com>

Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Michael Han <hanm@apache.org>

Closes #1000 from xoiss/ZOOKEEPER-2894
@hanm

This comment has been minimized.

Copy link
Contributor

hanm commented Jul 9, 2019

merged to 3.4, cherry picked to 3.5 and master.

stickyhipp added a commit to stickyhipp/zookeeper that referenced this pull request Aug 2, 2019
If I call `zookeeper_close()` when the call-stack does not pass through any of zookeeper mechanics (for example, some of completion handler called back by zookeeper) the following happens:
1. If there are requests waiting for responses in `zh.sent_requests` queue, they all are removed from this queue and each of them is "completed" with personal fake response having status `ZCLOSING`. Such fake responses are put into `zh.completions_to_process` queue. It's Ok
2. But then, `zh.completions_to_process` queue is left unhandled. **Neither completion callbacks are called, nor dynamic memory allocated for fake responses is freed**
3. Different structures within `zh` are dismissed and finally `zh` is freed

Consequently we have a tiny **memory leak** and, which is much more important, **completions leak**. I.e. completions are not called at all.

This is not the case if I call `zookeeper_close()` when the call-stack does pass through some of zookeeper mechanics. Here, zookeeper client handles completions properly.

Proposed changes remove this defects.

For more details see: https://issues.apache.org/jira/browse/ZOOKEEPER-2894

Author: Alexander A. Strelets <streletsaa@gmail.com>

Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Michael Han <hanm@apache.org>

Closes apache#1000 from xoiss/ZOOKEEPER-2894
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.