Skip to content

Comments

[SPARK-41628][CONNECT][SERVER] The Design for support async query execution#40649

Closed
Hisoka-X wants to merge 3 commits intoapache:masterfrom
Hisoka-X:connect_async_query_design
Closed

[SPARK-41628][CONNECT][SERVER] The Design for support async query execution#40649
Hisoka-X wants to merge 3 commits intoapache:masterfrom
Hisoka-X:connect_async_query_design

Conversation

@Hisoka-X
Copy link
Member

@Hisoka-X Hisoka-X commented Apr 3, 2023

What changes were proposed in this pull request?

The Design for support async query execution

Why are the changes needed?

Prepare for code async query execution

Does this PR introduce any user-facing change?

NO

How was this patch tested?

Unnecessary

@Hisoka-X
Copy link
Member Author

Hisoka-X commented Apr 3, 2023

@grundprinzip @zhenlineo @LuciferYang Can you help me see if there is a problem? Thanks.

@LuciferYang
Copy link
Contributor

I suggest placing the design doc on Google doc and initiating discussions in the dev mail list for more people to participate.

Additionally, Spark Connect is not limited to Scala clients, so Python clients should also be considered.

Meanwhile, there is still a lot of unfinished work on Spark Connect (in order to maintain the same behavior as the native Spark API), so I am not sure if everyone has the energy to discuss this new feature at the moment.

@Hisoka-X
Copy link
Member Author

Hisoka-X commented Apr 4, 2023

I suggest placing the design doc on Google doc and initiating discussions in the dev mail list for more people to participate.

Additionally, Spark Connect is not limited to Scala clients, so Python clients should also be considered.

Meanwhile, there is still a lot of unfinished work on Spark Connect (in order to maintain the same behavior as the native Spark API), so I am not sure if everyone has the energy to discuss this new feature at the moment.

Thanks for suggestion, I will add python design and move doc to google doc later. Then send mail. Before start to do this feature, I will try to do other Spark Connect missing features that need to be added

@hvanhovell
Copy link
Contributor

@Hisoka-X thanks for the write up. We should be able to support most of this at the moment. GRPC supports this type of execution out of the box. The reason we did not really go for this, is because of API compatibility. The SparkResult does support incremental collect and can collect results in the background though.

The thing that Martin was getting at in the ticket is more about what to do when disconnect happen. You probably want to reconnect in these cases, this does require some architectural rework. We are discussing how we should do this, there are quite a few trade offs here. Do you mind shelving this until we can provide a bit more clarity? Please let me know if you want in on these conversations.

@Hisoka-X
Copy link
Member Author

Hisoka-X commented Apr 5, 2023

@Hisoka-X thanks for the write up. We should be able to support most of this at the moment. GRPC supports this type of execution out of the box. The reason we did not really go for this, is because of API compatibility. The SparkResult does support incremental collect and can collect results in the background though.

The thing that Martin was getting at in the ticket is more about what to do when disconnect happen. You probably want to reconnect in these cases, this does require some architectural rework. We are discussing how we should do this, there are quite a few trade offs here. Do you mind shelving this until we can provide a bit more clarity? Please let me know if you want in on these conversations.

Ok for me. I would be happy if I could join the discussion

@Hisoka-X Hisoka-X closed this Apr 26, 2023
@Hisoka-X Hisoka-X deleted the connect_async_query_design branch April 27, 2023 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants