-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix system.query_views_log for MVs that are pushed from background threads #46668
Conversation
The failure is real, this is because the query context was wrong, here is a fix - #46709 |
015991b
to
210f25b
Compare
And also
Seems that it is a known issue (#46624 (comment))
|
210f25b
to
b189a9b
Compare
b189a9b
to
f9faad7
Compare
I've just rebased. Can someone take a look please? |
Kind ping on this one, thanks. |
f442129
to
cfd4a1a
Compare
After #46709 the patch became cleaner, no more icky |
cfd4a1a
to
d84564d
Compare
…reads Some of such places: - push from Buffer - push from Distributed sends - system.*_log workers Before ClickHouse#47564 it simply does not work, but after it throws LOGICAL_ERROR in such situation. v2: remove expired() check after ClickHouse#46709 got merged v3: use ThreadGroupStatus ctor with ContextPtr (after ClickHouse#47564) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
d84564d
to
9235d1c
Compare
else | ||
running_group = std::make_shared<ThreadGroupStatus>(); | ||
if (!running_group) | ||
running_group = std::make_shared<ThreadGroupStatus>(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help with the question? When there is a current_thread->getThreadGroup and when there is no group in current thread?
One more question, in case when current_thread do not have TG, why is it right to make a TG for each view instead make one for all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help with the question? When there is a current_thread->getThreadGroup and when there is no group in current thread?
In case of background threads, i.e. Buffer flush/Distributed async INSERT
One more question, in case when current_thread do not have TG, why is it right to make a TG for each view insted make one for all?
Basically you are right here, it should be one ThreadGroupStatus, but for this particular place it should not make any difference, and making one ThreadGroupStatus for all views will make the code more complex, since there is not only multiple views, but it is possible to have nested views, so this function can be called recursively, while moving ThreadGroupStatus into the caller will split the code and make it less cleaner, plus you will need to pass to the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. The is recursion call buildPushingToViewsChain. And that call appears 6 times in the code. Actually, as for me, it not so much to touch. And only one of them is trigger recursion: src/Interpreters/InterpreterInsertQuery.cpp:276.
In order to achieve one TG for all, we could either pass it through the function args buildPushingToViewsChain.
Or:
Move the block
ThreadGroupStatusPtr running_group;
if (current_thread)
running_group = current_thread->getThreadGroup();
if (!running_group)
running_group = std::make_shared<ThreadGroupStatus>(context);
out of the for loop. Then, when we had to make new TG, bind current thread with this new TG with ThreadGroupSwitcher for the scope of buildPushingToViewsChain. That makes following recursive calls to see the new TG. (Or rewrite it in other similat way: if no current_thread->getThreadGroup(), then create it and attach temporarily)
Actually, all this things about finding/creating/attaching TG to the thread should be done between (or inside) InterpreterInsertQuery::buildChainImpl and buildPushingToViewsChain. In that way it looks more simple, without recursion context in the mind.
How do you see this? I do not insist for such approach. You are solving specific problem and the way is OK.
However one TG for MVs allows to collect all the memory accounting and profiles counters in one place: that TG. In case if that had been done in scope if query, all that information would be binded to that query. Your solution already do that binding, that is cool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing: no need to use ThreadGroupSwitcher as long as we bind new thread group to the thread without thread group. That binding could be permanent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, when we had to make new TG, bind current thread with this new TG with ThreadGroupSwitcher for the scope of buildPushingToViewsChain. That makes following recursive calls to see the new TG. (Or rewrite it in other similat way: if no current_thread->getThreadGroup(), the create it add attach temporarily)
Well, it will work, but this implicit usage looks not clean.
Actually, all this things about finding/creating/attaching TG to the thread should be done between (or inside) InterpreterInsertQuery::buildChainImpl and buildPushingToViewsChain. In that way it looks more simple, without recursion context in the mind.
The reason I kept it in buildPushingToViewsChain
, because it play low level games to manipulate the ThreadStatus
, and this thing cannot moved to the InterpreterInsertQuery
.
And this place will be moved to the InterpreterInsertQuery
, then there will be two different places that will do this low level stuff, at first I thought that this is a bad idea, but while writing this paragraph, I changed my mind, so I guess it is OK.
One thing: no need to use ThreadGroupSwitcher as long as we bind new thread group to the thread without thread group. That binding could be permanent.
Yes, make sense, missed the patches where MemoryTrackerThreadSwitcher
had been renamed to ThreadGroupSwitcher
.
But I found it odd that it is located in the MergeTree
code, so what I'm going to do is:
- move
ThreadGroupSwitcher
into a separate file - rename
ThreadGroupStatus
toThreadGroup
- fix this place with multiple TG attached here
But I think it is better to do this on top of this PR in a separate one (since this includes some refactoring and is not a bugfix anymore, and so if there will be intention to backport this patch this can be done easily), @CheSema objections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
ThreadGroupSwitcher deserve better place indeed.
However here is it has no use since we create and bind designated TG only if there is no query scope TG. That binding could be permanent without ThreadGroupSwitcher. The other case: if there is a query scope TG, we attach it to all MV's thread statuses, in other words we reuse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix system.query_views_log for MVs that are pushed from background threads (
Buffer
,Distributed
,system.*_log
)Some of such places:
Before #47564 it simply does not work, but after it throws LOGICAL_ERROR
in such situation.
v2: remove expired() check after #46709 got merged
v3: use ThreadGroupStatus ctor with ContextPtr (after #47564)
Cc: @Algunenano
Fixes: #47311