-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove query status report from BE when query is cancelled normally #1489
Remove query status report from BE when query is cancelled normally #1489
Conversation
When query result reach limit, the Coordinator in FE will send a cancel request to BE to cancel the query. And when being cancelled, BE will report query status to FE for debug purpose. But actually it is not necessary and will generate too many logs. So I add a CancelReason to distinguish the difference between 'normally' cancellation and 'internal error' cancellation. if 'normally' cancelled, no status will be reported from BE. When query reach limit, or user cancel it actively, it is being cancelled 'normally'. Otherwise, the query is cancelled due to internal error, which will need a report from BE.
gensrc/proto/internal_service.proto
Outdated
@@ -116,6 +116,12 @@ message PExecPlanFragmentResult { | |||
|
|||
message PCancelPlanFragmentRequest { | |||
required PUniqueId finst_id = 1; | |||
enum CancelReason { | |||
LIMIT_REACH = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reserve 0, start from 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
gensrc/proto/internal_service.proto
Outdated
@@ -116,6 +116,12 @@ message PExecPlanFragmentResult { | |||
|
|||
message PCancelPlanFragmentRequest { | |||
required PUniqueId finst_id = 1; | |||
enum CancelReason { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to move CancelReason out of PCancelPlanFragmentRequest
. Because fragment can be cancelled in other format. For example when Fragment is timed out, cancel thread will cancel this instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
// The reason of cancellation of this query. | ||
// This is used for telling BE whether it need to report query status when being cancelled. | ||
public enum CancelReason { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you have defined in protobuf, we should use it to avoid define one thing in different places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
be/src/runtime/fragment_mgr.cpp
Outdated
std::lock_guard<std::mutex> l(_status_lock); | ||
RETURN_IF_ERROR(_exec_status); | ||
if (reason != PCancelPlanFragmentRequest::INTERNAL_ERROR) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to use equal other than not equal here. Because when we add more types of error later, we should let it report in default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -111,7 +112,7 @@ private synchronized PBackendService getProxy(TNetworkAddress address) { | |||
} | |||
|
|||
public Future<PCancelPlanFragmentResult> cancelPlanFragmentAsync( | |||
TNetworkAddress address, TUniqueId finstId) throws RpcException { | |||
TNetworkAddress address, TUniqueId finstId, CancelReason cancelReason) throws RpcException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cancelReason is not used?
for (BackendExecState backendExecState : backendExecStates) { | ||
TNetworkAddress address = backendExecState.getBackendAddress(); | ||
LOG.info("cancelRemoteFragments initiated={} done={} hasCanceled={} ip={} port={} fragment instance id={}", | ||
LOG.info("cancelRemoteFragments initiated={} done={} hasCanceled={} ip={} port={} fragment instance id={}, reason: {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this log can be debug level
gensrc/proto/internal_service.proto
Outdated
@@ -114,8 +114,16 @@ message PExecPlanFragmentResult { | |||
required PStatus status = 1; | |||
}; | |||
|
|||
enum PCancelReason { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PFragmentCancelReason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…pache#1489) When query result reach limit, the Coordinator in FE will send a cancel request to BE to cancel the query. And when being cancelled, BE will report query status to FE for debug purpose. But actually it is not necessary and will generate too many logs. So I add a CancelReason to distinguish the difference between 'normally' cancellation and 'internal error' cancellation. if 'normally' cancelled, no status will be reported from BE. When query reach limit, or user cancel it actively, it is being cancelled 'normally'. Otherwise, the query is cancelled due to internal error, which will need a report from BE.
…ster (apache#1489) When there are pure compute plan nodes, such as join nodes, in a plan, the original impl. may assign a BE to the node which does not belong to the current cloud cluster. It leads to incomplete isolation of clusters.
…ster (apache#1489) When there are pure compute plan nodes, such as join nodes, in a plan, the original impl. may assign a BE to the node which does not belong to the current cloud cluster. It leads to incomplete isolation of clusters.
When query result reach limit, the Coordinator in FE will send a cancel
request to BE to cancel the query. And when being cancelled, BE will report
query status to FE for debug purpose. But actually it is not necessary
and will generate too many logs.
So I add a CancelReason to distinguish the difference between 'normally'
cancellation and 'internal error' cancellation. if 'normally' cancelled,
no status will be reported to FE.
When query reach limit, or user cancel it actively, it is being cancelled 'normally'.
Otherwise, the query is cancelled due to internal error, which will need
a report from BE.