You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Spark Connect is a new initiative in Apache Spark that adds a decoupled client-server infrastructure which allows Spark applications to connect remotely to a Spark server and run SQL / Dataframe operations. We want to develop what we're calling "Delta Connect" to allow Delta operations to be made in applications running in such client-server mode.
Further details
These are the CUJs we would like to support:
Server
The server is packaged into the io.delta:delta-spark-connect-server_2.13 package, installing this package automatically installs the io.delta:delta-spark-connect-common_2.13 package.
The client is packaged into the io.delta:delta-spark-connect-client_2.13 package, installing this package automatically installs the io.delta:delta-spark-connect-common_2.13 package.
The delta-spark-connect-client_2.13 package uses the exact same class and package names as the delta-spark_2.13 package. Therefore the exact same code can be used as before.
There is no difference in usage compared to the classic way. We just need to pass in a remote SparkSession (instead of a local one) to the DeltaTable API.
from delta.tables import DeltaTable
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
deltaTable = DeltaTable.forName(spark, "my_table")
deltaTable.update(
condition = expr("id % 2 == 0"),
set = { "id": expr("id + 100") })
The text was updated successfully, but these errors were encountered:
Feature request
Which Delta project/connector is this regarding?
Overview
Spark Connect is a new initiative in Apache Spark that adds a decoupled client-server infrastructure which allows Spark applications to connect remotely to a Spark server and run SQL / Dataframe operations. We want to develop what we're calling "Delta Connect" to allow Delta operations to be made in applications running in such client-server mode.
Further details
These are the CUJs we would like to support:
Server
The server is packaged into the
io.delta:delta-spark-connect-server_2.13
package, installing this package automatically installs theio.delta:delta-spark-connect-common_2.13
package.Scala Client
The client is packaged into the
io.delta:delta-spark-connect-client_2.13
package, installing this package automatically installs theio.delta:delta-spark-connect-common_2.13
package.The
delta-spark-connect-client_2.13
package uses the exact same class and package names as thedelta-spark_2.13
package. Therefore the exact same code can be used as before.Python Client
The Delta Connect Python client is included in the same PyPi package as Delta Spark.
There is no difference in usage compared to the classic way. We just need to pass in a remote SparkSession (instead of a local one) to the DeltaTable API.
The text was updated successfully, but these errors were encountered: