Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41102][CONNECT] Merge SparkConnectPlanner and SparkConnectCommandPlanner #38604

Closed
wants to merge 1 commit into from

Conversation

amaliujia
Copy link
Contributor

What changes were proposed in this pull request?

In the past, Connect server side separates Command and Relation into two Planners. However, as we are adding new API, there are certainly cases that a Command still has an input which is a Relation. Thus when converting Command, it still needs to access the logic of converting Relation. View creation is an example of such cases. Usually DDL and DML of SQL will also follow.

This PR refactors to merge the logic of dealing with Command and Relation into the same planner.

Why are the changes needed?

Refactoring.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

Existing UT

@amaliujia
Copy link
Contributor Author

R: @cloud-fan

@amaliujia amaliujia changed the title [SPARK-41102][CONNECT][REFACTORING] Merge SparkConnectPlanner and SparkConnectCommandPlanner. [SPARK-41102][CONNECT][REFACTORING] Merge SparkConnectPlanner and SparkConnectCommandPlanner Nov 10, 2022
@@ -39,14 +46,17 @@ final case class InvalidPlanInput(
private val cause: Throwable = None.orNull)
extends Exception(message, cause)

class SparkConnectPlanner(plan: proto.Relation, session: SparkSession) {
final case class InvalidCommandInput(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but today we prefer to use error classes to differentiate different errors, instead of using different java exception classes.

Copy link
Contributor Author

@amaliujia amaliujia Nov 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I should adopt and standardize errors for Connect soon.

@@ -50,9 +49,9 @@ class SparkConnectStreamHandler(responseObserver: StreamObserver[Response]) exte

def handlePlan(session: SparkSession, request: proto.Request): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for curiosity: why do we need to differentiate command and query requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because we have divided it at the proto level: in proto, query and command are different.

The primary reason for dividing is, query runs and return rows. Command apply a state change without return anything (except exceptions probably).

Maybe there is better way to mode all of these. For example, will Catalog API fit into this model? I am still thinking about it but do not have a clear answer now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commands can return rows, for example, SHOW TABLES. Even for DML commands, we may also return rows in the future, to include metrics (num rows inserted, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yeah probably we can refactor proto to call all as a Plan....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan

From what I can tell in the near term we probably won't touch Command proto again. How about let us wait for some time to see if there is any new need that actually needs a separate Command proto. If not we can deprecate it and merge with Relation proto?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@HyukjinKwon HyukjinKwon changed the title [SPARK-41102][CONNECT][REFACTORING] Merge SparkConnectPlanner and SparkConnectCommandPlanner [SPARK-41102][CONNECT] Merge SparkConnectPlanner and SparkConnectCommandPlanner Nov 11, 2022
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in fd1e0d0 Nov 11, 2022
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…andPlanner

### What changes were proposed in this pull request?

In the past, Connect server side separates `Command` and `Relation` into two Planners. However, as we are adding new API, there are certainly cases that a `Command` still has an input which is a Relation. Thus when converting `Command`, it still needs to access the logic of converting `Relation`. View creation is an example of such cases. Usually DDL and DML of SQL will also follow.

This PR refactors to merge the logic of dealing with `Command` and `Relation` into the same planner.

### Why are the changes needed?

Refactoring.

### Does this PR introduce _any_ user-facing change?

NO
### How was this patch tested?

Existing UT

Closes apache#38604 from amaliujia/refactor-planners.

Authored-by: Rui Wang <rui.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants