-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-24436 The store file open and close thread pool should be share… #1783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
| private RegionCoprocessorHost coprocessorHost; | ||
|
|
||
| private TableDescriptor htableDescriptor = null; | ||
| private ThreadPoolExecutor storeFileOpenAndCloseThreadPool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this shared at the Region level. Shouldn't it be at the Store level (in HStore)?
What does sharing at the region level gain us? Are you attempting to evenly round-robin store open work over all opening stores in the region? Just sharing an executor at region level won't do this. If the underlying stores are skewed the order in which runnables are submitted to the executor will share the skew.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the jira correctly, what you are trying to solve is below case.
One region with say 2 stores. Store1 having much more files than other. Say the config for the #threads in open pool is 10. Now it will create 2 pools for each store with 5 threads each. The Store2 will get finished soon. But store1 will take much longer. So if it was a shared pool of 10 threads the overall time for opening both stores would have been lesser. my understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the jira correctly, what you are trying to solve is below case.
One region with say 2 stores. Store1 having much more files than other. Say the config for the #threads in open pool is 10. Now it will create 2 pools for each store with 5 threads each. The Store2 will get finished soon. But store1 will take much longer. So if it was a shared pool of 10 threads the overall time for opening both stores would have been lesser. my understanding correct?
Yes, just that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this shared at the Region level. Shouldn't it be at the Store level (in HStore)?
What does sharing at the region level gain us? Are you attempting to evenly round-robin store open work over all opening stores in the region? Just sharing an executor at region level won't do this. If the underlying stores are skewed the order in which runnables are submitted to the executor will share the skew.
If only one store file is too slow and that is the bottleneck of the whole open or close region process, it seems not easy to fix that by threadpool here(maybe hedged read?). As Anoops said above, the problem we want to solve here is some stores may have too many store files, while others have much fewer. It will be quicker ideally if all of the store files just share the whole thread pool in one region.
|
💔 -1 overall
This message was automatically generated. |
| } | ||
| } | ||
|
|
||
| storeFileOpenAndCloseThreadPool.shutdownNow(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means keeping this 'storeFileOpenAndCloseThreadPool' active till the Region is closed.
previously we create the pool at the region open time and once the stores are opened, the pool is shutdown. Keeping it longer this way not looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means keeping this 'storeFileOpenAndCloseThreadPool' active till the Region is closed.
previously we create the pool at the region open time and once the stores are opened, the pool is shutdown. Keeping it longer this way not looks good
There is a timeout for this thread pool, after some time, the threads will be destroyed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The non core threads will get removed after the idle timeout. But the way we create the pool, we pass the number of threads (as per the config) as the core threads count.
But we set
boundedCachedThreadPool.allowCoreThreadTimeOut(true)
So this helps. I missed this part.
| completionService.submit(() -> { | ||
| Thread t = Thread.currentThread(); | ||
| String name = t.getName(); | ||
| Thread.currentThread().setName(executingThreadNamePrefix + "-" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done to dynamically change the thread name as per the Store name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
| completionService.submit(new Callable<Void>() { | ||
| @Override | ||
| public Void call() throws IOException { | ||
| Thread t = Thread.currentThread(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these looks bit hacky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these looks bit hacky
Changing the thread name is not necessary, but it will be useful when jstack the process, we will know something in detail.Yeah, it's bit hacky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these looks bit hacky
What do you think about here, Anoop? Just remove the name related logic? Or just set and not reset ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we should do these hacks. The threads can be named with regionName and some seqNo. Anyway even if we do these hacks and add the cf name, it says till that level only. Still from jstack we can not know which file open it is stuck with (if assume we took it when there was a stuck case). We can make sure we have proper log which says opening store file xxx for cf x and that log will have to say the thread name also. So later with jstack and this log we can track down issues. Sounds ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we should do these hacks. The threads can be named with regionName and some seqNo. Anyway even if we do these hacks and add the cf name, it says till that level only. Still from jstack we can not know which file open it is stuck with (if assume we took it when there was a stuck case). We can make sure we have proper log which says opening store file xxx for cf x and that log will have to say the thread name also. So later with jstack and this log we can track down issues. Sounds ok?
Yeah, definitely, let me update the patch soon.
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good discussion.
Patch LGTM. One nit. Any evidence this helps?
| } | ||
| } finally { | ||
| storeFileOpenerThreadPool.shutdownNow(); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nothing to cleanup, remove the finally block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nothing to cleanup, remove the finally block?
Ok, let me update the patch a moment later
|
Can't commit till the -1 from JIRA is raised. |
Thanks, sir, I am preparing for the data |
ca3f80d to
4cf98cd
Compare
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
fc31124 to
3617561
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
…d at the region level
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
…d at the region level