-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-16568. dfsadmin -reconfig option to start/query reconfig on all live datanodes #4264
Conversation
🎊 +1 overall
This message was automatically generated. |
@tomscut @jojochuang could you please take a look? |
} | ||
}); | ||
} | ||
while ((successCount.get() + failCount.get()) < nodes.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature makes sense to me. It looks good overall.
Maybe it's better to use CountDownLatch here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, countDownLatch would be better but since we are also keeping count of both success and failure, and using these counts to return final status (0 or 1), it is difficult to allocate count down number initially. For instance, if we use two latches - successCountDownLatch and failCountDownLatch, we can't know what initial count to initiate both latches with. Hence, if we still want to use CountDownLatch, we will have to use all 3 of them:
AtomicInteger successCount = new AtomicInteger();
AtomicInteger failCount = new AtomicInteger();
CountDownLatch countDownLatch = new CountDownLatch(nodes.length); // total count as nodes array length
Hence, rather than using additional CountDownLatch, maybe we are good with only two AtomicIntegers? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @virajjasani for your explanation. Sounds good.
err.println("Only datanode type supports reconfiguration in bulk."); | ||
return 1; | ||
} | ||
ExecutorService executorService = Executors.newFixedThreadPool(5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of threads is fixed at 5. Can we pass it in as an argument on the command line? Just a little suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of doing the same initially, but then I realized that allowing users to provide threadpool size might be bit complicated for admin/user (i.e. bit more headache for user) as they would have to learn about where this size is going to be used internally, hence I kept it as 5 as it seemed reasonable for even bigger clusters. Does this sound good?
If you have strong preference, will be happy to do it :)
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Hi @virajjasani , please take a look at the failed unit test. Thanks. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Taken care of in the latest commit, thanks @tomscut |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
The failed unit tests seem unrelated to the change. +1 from my side. |
💔 -1 overall
This message was automatically generated. |
Hi @cndaimin , could you please help take a look at this failed unit test |
Hi @jojochuang , could you please also take a look at this. Thank you. |
Yes, it's my mistake, I will submit a PR to fix it immediately. Sorry for that. |
@cndaimin fixed the failed unit test on HDFS-16573. I retriggered the jenkins. |
🎊 +1 overall
This message was automatically generated. |
Thanks @virajjasani for your contribution! |
…live datanodes (apache#4264) Signed-off-by: Tao Li <tomscut@apache.org>
…live datanodes (apache#4264) Signed-off-by: Tao Li <tomscut@apache.org>
Description of PR
DFSAdmin provides option to initiate or query the status of reconfiguration operation on only specific host based on host:port provided by user. It would be good to provide an ability to initiate such operations in bulk, on all live datanodes.
How was this patch tested?
Dev cluster and UT.
Sample outputs:
For code changes: