-
Notifications
You must be signed in to change notification settings - Fork 986
DRILL-6410: Memory leak in Parquet Reader during cancellation #1333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sachouche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good; thanks @vrozov for making the changes.
| private long totalPageValuesRead = 0; | ||
| private Object pageQueueSyncronize = new Object(); // Object to use to synchronize access to the page Queue. | ||
| // FindBugs complains if we synchronize on a Concurrent Queue | ||
| private final ExecutableTasksLatch<AsyncPageReaderTask> executableTasksLatch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ExecutableTasksLatch replaces the ExecutorService for concurrent task manipulation:
- Is this the recommended way moving forward?
- There is no javadoc for the ExecutableTasksLatch; you need to document this new class APIs and execution semantics. This is especially true if we want to encourage the use of this implementation pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExecutableTasksLatch encapsulates ExecutorService for parallel tasks that require a synchronization point at cancellation. For all other tasks ExecutorService can be used directly.
I'll add more documentation after the first round of review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation would be useful for the reviewer too. It is easier to check the code if one knows what what guarantees ExecutableTasksLatch is supposed to provide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the class comment for AsyncPageReader needs to be updated too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- IMO, most of the time, the source code should document itself and java documentation is necessary for an API distributed as an already compiled library/jar.
- I would prefer reviewers to point to obscure code rather than to rely on the documentation.
- Waiting for a review comments helps to avoid inconsistency between the code and the documentation (quite a common problem) as usually code evolves faster and documentation lags behind.
- I already added documentation for
ExecutableTasksLatchandExecutableTask. - I'll change comments for
AsyncPageReaderduring rebase and merge conflict resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I'm going to have to insist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be more specific. Do you refer to AsyncPageReader doc? It was already changed during the merge conflict resolution.
|
@ilooner can you also take a look at it? |
|
@ilooner I'll rebase after review. Travis CI failure is not related to the change, it failed due to the build exceeding Travis CI limit. |
ilooner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vrozov You put me in a tough spot :). After our last discussion I was under the impression that the majority of us were in agreement to use the approach outlined in #1257 with the exception of having a custom ExecutorService which would be removed.
The advantages of that approach were that we could delegate the complexity to the java concurrency library and have minimal maintenance overhead. With your approach we are reinventing the wheel to create our own version of concurrency classes that are already there. Considering how the first changes to PartitionerTask took 4 - 5 days of back and forth to resolve all the race conditions, we might be creating a lot of extra work for ourselves down the road with your approach.
So again you put me in a tough spot because I don't agree with the solution, and based on the discussion previously the majority wanted to move in a different direction. On one hand I have to provide honest feedback in my reviews, but on the other hand I definitely don't want to be the guy blocking a bug fix.
To move this PR forward I suggest finding a committer that agrees with the approach you decided to take, and who is willing to help maintain the code moving forward. They can take the review to completion and help you get this merged :).
|
@ilooner I guess by "last discussion" you refer to the discussion between you, me and @sachouche, where "majority" does not mean the community majority. In the Apache, any contributor can provide a solution that (s)he considers to be the best solution possible and then it can either be accepted by the community/contributor or blocked with -1 (requires technical justification). If another contributor provides an alternative solution, a community may decide to go with the alternate solution as long as it addresses technical concerns of the initial contribution. For this particular case, my requirements are a) a unified approach (@parthchandra has the same requirement) and b) the ability to cancel tasks asynchronously. If that can be done with the approach outlined in PR #1257 and a contributor will change it to address all the issues, let's move forward with the alternate approach. A note regarding the complexity of the implementation. This implementation uses public java concurrency classes as well. It does not rely on unsupported or unsafe to use Java classes and/or API. Basically, To summarize, I am perfectly fine to go with an alternate solution or with another committer to review the PR, it will be against Apache way to force a committer to review or commit a change, that (s)he is not comfortable with. |
|
@parthchandra Please review |
| private long totalPageValuesRead = 0; | ||
| private Object pageQueueSyncronize = new Object(); // Object to use to synchronize access to the page Queue. | ||
| // FindBugs complains if we synchronize on a Concurrent Queue | ||
| private final ExecutableTasksLatch<AsyncPageReaderTask> executableTasksLatch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation would be useful for the reviewer too. It is easier to check the code if one knows what what guarantees ExecutableTasksLatch is supposed to provide.
| private long totalPageValuesRead = 0; | ||
| private Object pageQueueSyncronize = new Object(); // Object to use to synchronize access to the page Queue. | ||
| // FindBugs complains if we synchronize on a Concurrent Queue | ||
| private final ExecutableTasksLatch<AsyncPageReaderTask> executableTasksLatch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the class comment for AsyncPageReader needs to be updated too.
| */ | ||
| public ExecutableTask<C> execute(C callable) { | ||
| ExecutableTasksLatch.ExecutableTask<C> task = new ExecutableTasksLatch.ExecutableTask<>(callable, this); | ||
| executor.execute(task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call Executor.submit and save the Future? That saves you with all the checking you are doing in take() which is a simplified version of Future.get()? Note that the Future is likely to be faster and more tested than this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt that Future would be faster, but it is definitely more tested. Please see my comment why Future and FutureTask can't be used.
| // Do nothing. | ||
| } | ||
| } | ||
| executableTasksLatch.await(() -> true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the original code doing the same thing that await is doing? TBH, I'd really like to understand where the memory leak was occurring. (Just trying to understand how this PR, does, in fact, fix the issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, ExecutableTasksLatch.await() guarantees that when it returns all tasks submitted for execution are either done or canceled. Future.cancel() does not wait for the FutureTask to be canceled as it merely interrupts the thread where FutureTask is running (in the case it is already running). Note that after Future is canceled it is not possible to check whether it is finished or not (Future.get throws CancellationException).
Missing guarantee that all tasks are finished when clear() returns, so some tasks continue to run and reference vectors, so allocator reports a memory leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. Makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me that we could have extended FutureTask to provide this guarantee. Perhaps that is what @ilooner would have liked too.
See this : https://stackoverflow.com/questions/6040962/wait-for-cancel-on-futuretask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know and saw that post before the fix for the same problem was implemented only for PartitionerDecorator. I decided to go with a similar approach outlined in the post, but that does not use FutureTaskto support additional functionality that I outlined in my response to @ilooner. This implementation supports asynchronous cancellation (call to cancel() does not block waiting for a task to complete allowing a faster cancellation) that the solution that extends FutureTask does not provide.
| * Helper class to wrap {@linkplain Callable}{@literal <Void>} with cancellation and waiting for completion support | ||
| * | ||
| */ | ||
| public static final class ExecutableTask<C extends Callable<Void>> implements Runnable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate what this class has that is not already available in FutureTask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my comment for ExecutableTasksLatch.await().
|
@parthchandra rebased and resolved conflicts. Please take a look. |
parthchandra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I'd like to see a perf test on this before we merge into master.
|
@vrozov Dechang is running a perf test to confirm the question from @parthchandra. Once that is done, we should merge to the branch. Was there any other reason to close it? |
@parthchandra @sachouche @ilooner Please review