Skip to content

Commit 9b50cd2

Browse files
committed
Delta Universal Format (UniForm) allows you to read Delta tables with Iceberg clients.
## Description UniForm takes advantage of the fact that both Delta and Iceberg consist of Parquet data files and a metadata layer. UniForm automatically generates Iceberg metadata asynchronously, allowing Iceberg clients to read Delta tables as if they were Iceberg tables. You can expect negligible Delta write overhead when UniForm is enabled, as the Iceberg conversion and transaction occurs asynchronously after the Delta commit. A single copy of the data files provides access to both format clients. This PR adds the implementation for Universal Format (Iceberg) as well as the IcebergCompatV1 protocol validation. To create a table with UniForm: ```sql CREATE TABLE T(c1 INT) USING DELTA SET TBLPROPERTIES( 'delta.universalFormat.enabledFormats' = 'iceberg'); ``` To enable UniForm on an existing table ```sql ALTER TABLE T SET TBLPROPERTIES( 'delta.columnMapping.mode' = 'name', 'delta.universalFormat.enabledFormats' = 'iceberg'); ``` See the IcebergCompatV1 protocol specification PR here: #1869. New UT `iceberg/src/test/scala/org/apache/spark/sql/delta/ConvertToIcebergSuite.scala` as well as manual local publishing and integration testing with two spark shells, one loaded with Delta, the other with Iceberg. ## Does this PR introduce _any_ user-facing changes? Optional delta table property `delta.universalFormat.enabledFormats`. Closes #1870 GitOrigin-RevId: 8a4723680b12bb112190ee1f94a5eae9c4904a83
1 parent 27111ee commit 9b50cd2

24 files changed

+2622
-8
lines changed

build.sbt

+50-7
Original file line numberDiff line numberDiff line change
@@ -272,24 +272,67 @@ lazy val storageS3DynamoDB = (project in file("storage-s3-dynamodb"))
272272
)
273273
)
274274

275+
val icebergSparkRuntimeArtifactName = {
276+
val (expMaj, expMin, _) = getMajorMinorPatch(sparkVersion)
277+
s"iceberg-spark-runtime-$expMaj.$expMin"
278+
}
279+
280+
// Build using: build/sbt clean icebergShaded/compile iceberg/compile
281+
// It will fail the first time, just re-run it.
275282
lazy val iceberg = (project in file("iceberg"))
276283
.dependsOn(spark % "compile->compile;test->test;provided->provided")
277284
.settings (
278285
name := "delta-iceberg",
279286
commonSettings,
280287
scalaStyleSettings,
281288
releaseSettings,
282-
libraryDependencies ++= Seq( {
283-
val (expMaj, expMin, _) = getMajorMinorPatch(sparkVersion)
284-
("org.apache.iceberg" % s"iceberg-spark-runtime-$expMaj.$expMin" % "1.3.0" % "provided")
285-
.cross(CrossVersion.binary)
286-
},
289+
libraryDependencies ++= Seq(
287290
// Fix Iceberg's legacy java.lang.NoClassDefFoundError: scala/jdk/CollectionConverters$ error
288291
// due to legacy scala.
289-
"org.scala-lang.modules" %% "scala-collection-compat" % "2.1.1"
290-
)
292+
"org.scala-lang.modules" %% "scala-collection-compat" % "2.1.1",
293+
"org.apache.iceberg" %% icebergSparkRuntimeArtifactName % "1.3.0" % "provided",
294+
"com.github.ben-manes.caffeine" % "caffeine" % "2.9.3"
295+
),
296+
Compile / unmanagedJars += (icebergShaded / assembly).value,
297+
// Generate the assembly JAR as the package JAR
298+
Compile / packageBin := assembly.value,
299+
assembly / assemblyJarName := s"${name.value}_${scalaBinaryVersion.value}-${version.value}.jar",
300+
assembly / logLevel := Level.Info,
301+
assembly / test := {},
302+
assemblyPackageScala / assembleArtifact := false
291303
)
292304

305+
lazy val generateIcebergJarsTask = TaskKey[Unit]("generateIcebergJars", "Generate Iceberg JARs")
306+
307+
lazy val icebergShaded = (project in file("icebergShaded"))
308+
.dependsOn(spark % "provided")
309+
.settings (
310+
name := "iceberg-shaded",
311+
commonSettings,
312+
skipReleaseSettings,
313+
314+
// Compile, patch and generated Iceberg JARs
315+
generateIcebergJarsTask := {
316+
import sys.process._
317+
val scriptPath = baseDirectory.value / "generate_iceberg_jars.py"
318+
// Download iceberg code in `iceberg_src` dir and generate the JARs in `lib` dir
319+
Seq("python3", scriptPath.getPath)!
320+
},
321+
Compile / unmanagedJars := (Compile / unmanagedJars).dependsOn(generateIcebergJarsTask).value,
322+
cleanFiles += baseDirectory.value / "iceberg_src",
323+
cleanFiles += baseDirectory.value / "lib",
324+
325+
// Generated shaded Iceberg JARs
326+
Compile / packageBin := assembly.value,
327+
assembly / assemblyJarName := s"${name.value}_${scalaBinaryVersion.value}-${version.value}.jar",
328+
assembly / logLevel := Level.Info,
329+
assembly / test := {},
330+
assembly / assemblyShadeRules := Seq(
331+
ShadeRule.rename("org.apache.iceberg.**" -> "shadedForDelta.@0").inAll,
332+
),
333+
assemblyPackageScala / assembleArtifact := false,
334+
// Make the 'compile' invoke the 'assembly' task to generate the uber jar.
335+
)
293336

294337
lazy val hive = (project in file("connectors/hive"))
295338
.dependsOn(standaloneCosmetic)

0 commit comments

Comments
 (0)