-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41102][CONNECT] Merge SparkConnectPlanner and SparkConnectCommandPlanner #38604
Conversation
R: @cloud-fan |
@@ -39,14 +46,17 @@ final case class InvalidPlanInput( | |||
private val cause: Throwable = None.orNull) | |||
extends Exception(message, cause) | |||
|
|||
class SparkConnectPlanner(plan: proto.Relation, session: SparkSession) { | |||
final case class InvalidCommandInput( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but today we prefer to use error classes to differentiate different errors, instead of using different java exception classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I should adopt and standardize errors for Connect soon.
@@ -50,9 +49,9 @@ class SparkConnectStreamHandler(responseObserver: StreamObserver[Response]) exte | |||
|
|||
def handlePlan(session: SparkSession, request: proto.Request): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for curiosity: why do we need to differentiate command and query requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because we have divided it at the proto level: in proto, query and command are different.
The primary reason for dividing is, query runs and return rows. Command apply a state change without return anything (except exceptions probably).
Maybe there is better way to mode all of these. For example, will Catalog API fit into this model? I am still thinking about it but do not have a clear answer now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commands can return rows, for example, SHOW TABLES
. Even for DML commands, we may also return rows in the future, to include metrics (num rows inserted, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Yeah probably we can refactor proto to call all as a Plan....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell in the near term we probably won't touch Command
proto again. How about let us wait for some time to see if there is any new need that actually needs a separate Command
proto. If not we can deprecate it and merge with Relation
proto?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
…rkConnectCommandPlanner.
04ed26d
to
c307508
Compare
thanks, merging to master! |
…andPlanner ### What changes were proposed in this pull request? In the past, Connect server side separates `Command` and `Relation` into two Planners. However, as we are adding new API, there are certainly cases that a `Command` still has an input which is a Relation. Thus when converting `Command`, it still needs to access the logic of converting `Relation`. View creation is an example of such cases. Usually DDL and DML of SQL will also follow. This PR refactors to merge the logic of dealing with `Command` and `Relation` into the same planner. ### Why are the changes needed? Refactoring. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Existing UT Closes apache#38604 from amaliujia/refactor-planners. Authored-by: Rui Wang <rui.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
In the past, Connect server side separates
Command
andRelation
into two Planners. However, as we are adding new API, there are certainly cases that aCommand
still has an input which is a Relation. Thus when convertingCommand
, it still needs to access the logic of convertingRelation
. View creation is an example of such cases. Usually DDL and DML of SQL will also follow.This PR refactors to merge the logic of dealing with
Command
andRelation
into the same planner.Why are the changes needed?
Refactoring.
Does this PR introduce any user-facing change?
NO
How was this patch tested?
Existing UT