-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Support for Spark Connect #1570
Comments
This would be a great addition. Unfortunately we probably have to wait for Spark 3.4 release to make any substantial implementation work. Nonetheless I would love see a bit more design details, especially about how to make DeltaTable APIs work with this (since DeltaTable APIs hook on to logical/physical plans differently from SQL commands). Maybe we need to refactor some stuff internally to make it work? |
@tdas Here's a short sketch of what the implementation would look like: We start by creating Protobuf messages for every operation on message DescribeHistory {
string table_name = 1;
} Next we'll introduce from pyspark.sql.connect import LogicalPlan, SparkConnectClient
class DescribeHistory(LogicalPlan):
def __init__(self, tableName: str):
self._tableName = tableName
@override
def plan(self, client: SparkConnectClient) -> proto.Relation:
describe = proto.DescribeHistory()
describe.table_name = self._tableName
relation = proto.Relation()
relation.extension.Pack(describe)
return relation The client can then create from pyspark.sql.connect import DataFrame, SparkSession
class DeltaTable(object):
def __init__(self, spark: SparkSession, tableName: str):
self._spark = spark
self._tableName = tableName
def history(self) -> DataFrame:
return DataFrame.withPlan(DescribeHistory(self._tableName), session=self._spark)
@classmethod
def forName(cls, spark: SparkSession, tableName: str) -> "DeltaTable":
return DeltaTable(spark, tableName) Calling import io.delta.tables.DeltaTable
import org.apache.spark.sql.connect.planner.SparkConnectPlanner
import org.apache.spark.sql.connect.plugin.RelationPlugin
class DeltaRelationPlugin extends RelationPlugin {
override def transform(relation: protobuf.Any, planner: SparkConnectPlanner): Option[LogicalPlan] = {
if (!relation.is(classOf[proto.DescribeHistory])) {
return None
}
val history = relation.unpack(classOf[proto.DescribeHistory])
val deltaTable = DeltaTable.forName(planner.session, history.getTableName)
Some(deltaTable.history().queryExecution.analyzed)
}
} |
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> ## Description Add a documentation page for the [Delta Connect](#1570), in Delta 4.0 preview. <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> ## How was this patch tested? N/A <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> --------- Co-authored-by: Allison Portis <allison.portis@databricks.com>
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> ## Description Add a documentation page for the [Delta Connect](delta-io#1570), in Delta 4.0 preview. <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> ## How was this patch tested? N/A <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> --------- Co-authored-by: Allison Portis <allison.portis@databricks.com> (cherry picked from commit 4fac1f1)
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> ## Description Add a documentation page for the [Delta Connect](delta-io#1570), in Delta 4.0 preview. <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> ## How was this patch tested? N/A <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> --------- Co-authored-by: Allison Portis <allison.portis@databricks.com> (cherry picked from commit 4fac1f1)
Feature request
Overview
The Spark community is adding a new interface to Spark 3.4 that is called Spark Connect. This new interface promises several benefits by separating user code from Spark by adding a gRPC layer between the user code and the driver. We should add a new implementation of the
DeltaTable
interface that is compatible with Spark Connect.Motivation
Delta Connect is expected to bring the same benefits as Spark Connect:
Further details
We can add support for Delta to Spark Connect by implementing the extension points that it provides. We can use the
extension
field in theRelation
andCommand
messages to add Delta specific relations such asDescribeHistory
and commands such asVacuum
respectively. On the server-side we can implement theRelationPlugin
andCommandPlugin
to translate these Protobuf messages toLogicalPlan
nodes in Spark.Willingness to contribute
The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?
The text was updated successfully, but these errors were encountered: