TAJO-1469 allocateQueryMaster can leak resources if it times-out (3sec, hardcoded)#480
TAJO-1469 allocateQueryMaster can leak resources if it times-out (3sec, hardcoded)#480navis wants to merge 1 commit intoapache:masterfrom
Conversation
There was a problem hiding this comment.
I think this line is not necessary if you change as the following in the immediately preceding line.
LOG.warn("Got exception while allocating QueryMaster: " + t, t);
There was a problem hiding this comment.
This (response != null in catch block) means response is acquired just after timeout. I don't think it should be logged even after we got expected result in time.
|
I investigated the lock of CallFuture. CallFuture should be synchronized with run() and get(). Current code looks like this would be implemented but not. If the following situation is occur, some resources or tasks will be lost forever.
If my thought is wrong, please let me know. |
There was a problem hiding this comment.
I think that SimpleExchange name is not proper this class. How about "CancelableRpcCallback"?
|
If this patch is considered as a temporary solution, looks good to me. |
|
Yes, if resource allocation is done over network, we need more serious work to fix that. If the problem is recognized fully by this simple, in-process fix, I have a intent to fix that, too. |
|
Thanks @navis! This is a critical problem that should be fixed. |
|
Addressed comments & added "tajo.qm.resource.allocation.timeout" to set timeout. |
|
@jihoonson Would you mind to review TAJO-1385 first? it seemed it would overlap in some part. |
|
Ok. I'll review today. |
|
+1 |
Tested with some hack to reproduce timeout.
exceptions for first two but successes for all following queries.