[iceberg-1746] Implement spark fanout writer #1774

XuQianJin-Stars · 2020-11-15T05:33:43Z

Changes involved in this PR:

Move the PartitionedFanoutWriter of flink to core/src/main/java/org/apache/iceberg/io package.
Implement immobile writers for spark2 and spark3 modules respectively.
User can turn it on according to the parameter write.partitioned.fanout.enabled= true in the setting table. The default is false and can also set spark option partitioned.fanout.enabled to override it.

RussellSpitzer · 2020-11-15T16:11:05Z

Wouldn't we need to also eliminate the pre-sorting from the Spark write to benefit from this? As is all the rows that go into Partitioned writer are assumed to be pre grouped already so a fan out wouldn't really help. Maybe I'm missing something

rdsr · 2020-11-16T04:59:42Z

Seems like this provides the ability to write out non grouped records, won't this create too many possibly small files? What use-cases you have in mind for this @XuQianJin-Stars ?

XuQianJin-Stars · 2020-11-16T09:15:45Z

Seems like this provides the ability to write out non grouped records, won't this create too many possibly small? files? What use-cases you have in mind for this @XuQianJin-Stars ?

hi @rdsr @RussellSpitzer Thanks for the reminder, I will take a closer look at this piece. I only started learning iceberg not long ago. Hope you have more comments to discuss with me. Thanks again.

XuQianJin-Stars · 2020-11-16T11:57:47Z

Seems like this provides the ability to write out non grouped records, won't this create too many possibly small? files? What use-cases you have in mind for this @XuQianJin-Stars ?

well, I can add corresponding unit tests in TestRewriteDataFilesAction.

chenjunjiedada · 2020-11-16T12:55:50Z

I think we could use the fanout writer for streaming write case which may be hard to group due to latency requirement.

rdblue · 2020-11-16T18:52:23Z

If this is not the default and there is a way to turn it on, I could see some value to this. And we may need to use it by default for streaming. This just uses higher memory consumption to avoid needing to cluster rows by partition in tasks.

XuQianJin-Stars · 2020-11-17T02:00:39Z

If this is not the default and there is a way to turn it on, I could see some value to this. And we may need to use it by default for streaming. This just uses higher memory consumption to avoid needing to cluster rows by partition in tasks.

User can turn it on according to the parameter PARTITIONED_FANOUT_WRITER= true in the setting table. The default is false.

HeartSaVioR · 2020-11-17T07:16:11Z

OK now I understand what "fanout" meant. Didn't know about Flink implementation. Thanks.

I see the concern of the overall number of output files, but if I understand correctly, using fanout writer would produce the same number of output files - this just eliminates the necessity of "local sort" at the cost of multiple files opening together for write. For the best result of number of output files, we still need to repartition based on partition, regardless of using fanout writer.

Another question is, is it better to have the flag on table properties, or have the option on Spark Iceberg sink? The actual concern would be predicting how many files need to be opened together for write. This would be highly depending on the cardinality of partitions for the output, which might depend on the characteristic of the outputs, but might be also "query dependent" like we consider about batch vs streaming. I'm not maintaining the Iceberg table in production scale so can't say. Probably @aokolnychyi would have some insight on this?

rdblue · 2020-11-25T01:17:49Z

@openinx, looks like there is need for a Spark fanout writer as well. I that affects your work in #1818, where you've removed this class. Could you take a look at this?

openinx · 2020-11-25T09:06:16Z

I removed the PartitionedFanoutWriter in #1818 because:

I found it's easy and more simpler to understand after unifying the unpartitioned & partitioned fanout writer in a single RowDataTaskWriter.
The flink need to parse the RowKind to decide whether the row should be dispatched to write method or delete method, the previous abstraction is not suitable for the requirement, So I created an unified task writer for flink.

For spark fanout task writer, I think it's reasonable for the spark streaming scenarios because in that case we don't necessary to shuffle the records based on partition keys. Moving the PartitionedFanoutWriter from flink module to the core module looks good to me.

@XuQianJin-Stars Mind to update this PR to address the CI issue ?

Thanks.

openinx · 2020-11-25T09:12:10Z

site/docs/configuration.md

 | target-file-size-bytes | As per table property      | Overrides this table's write.target-file-size-bytes          |
 | check-nullability      | true                       | Sets the nullable check on fields                            |
 | snapshot-property._custom-key_    | null            | Adds an entry with custom-key and corresponding value in the snapshot summary  |
+| partitioned.fanout.enabled       | Table write.partitioned.fanout.enabled        | Overrides this table's write.partitioned.fanout.enabled  |


Table write.partitioned.fanout.enabled

We won't have table option for the write.partitioned.fanout.enabled, right ? Could just write false by default here.

Table write.partitioned.fanout.enabled

We won't have table option for the write.partitioned.fanout.enabled, right ? Could just write false by default here.

right, We won't have table option write.partitioned.fanout.enabled, false is ok.

openinx · 2020-11-25T09:13:08Z

spark/src/main/java/org/apache/iceberg/spark/source/RowDataRewriter.java

+          TableProperties.WRITE_PARTITIONED_FANOUT_ENABLED,
+          TableProperties.WRITE_PARTITIONED_FANOUT_ENABLED_DEFAULT)) {
+        writer = new SparkPartitionedFanoutWriter(spec, format, appenderFactory, fileFactory, io.value(),
+            Long.MAX_VALUE,


nit: don't need to break into a new line here

openinx · 2020-11-25T09:13:17Z

spark/src/main/java/org/apache/iceberg/spark/source/RowDataRewriter.java

+            schema, structType);
+      } else {
+        writer = new SparkPartitionedWriter(spec, format, appenderFactory, fileFactory, io.value(),
+            Long.MAX_VALUE,


openinx · 2020-11-25T09:16:08Z

spark2/src/main/java/org/apache/iceberg/spark/source/Writer.java

        table.properties(), WRITE_TARGET_FILE_SIZE_BYTES, WRITE_TARGET_FILE_SIZE_BYTES_DEFAULT);
    this.targetFileSize = options.getLong("target-file-size-bytes", tableTargetFileSize);
+
+    boolean tablePartitionedFanoutEnabled = PropertyUtil.propertyAsBoolean(


Q: will we set this for a given table ? In my option, it's per job ?

we need set this option for a given table, Because some tables require fanout.

spark3/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java

openinx · 2020-11-25T09:21:19Z

spark/src/test/java/org/apache/iceberg/spark/source/TestSparkDataWrite.java

+
+    Dataset<Row> df = spark.createDataFrame(expected, SimpleRecord.class);
+
+    df.select("id", "data").sort("data").write()


For partitioned fanout case, we don't have to sort based on data column ? Otherwise, what's the difference compared to PartitionedWriter ?

spark/src/test/java/org/apache/iceberg/spark/source/TestSparkDataWrite.java

openinx · 2020-11-25T12:48:35Z

spark/src/main/java/org/apache/iceberg/spark/source/SparkPartitionedFanoutWriter.java

+  private final InternalRowWrapper internalRowWrapper;
+
+  public SparkPartitionedFanoutWriter(PartitionSpec spec, FileFormat format,
+                                FileAppenderFactory<InternalRow> appenderFactory,


nit: seems we could format the code to align with the previous line ?

site/docs/configuration.md

openinx · 2020-11-26T01:59:24Z

spark/src/main/java/org/apache/iceberg/spark/source/RowDataRewriter.java

    } else {
-      writer = new SparkPartitionedWriter(spec, format, appenderFactory, fileFactory, io.value(), Long.MAX_VALUE,
-          schema, structType);
+      if (PropertyUtil.propertyAsBoolean(properties,


nit: I think it's more clear to just use :

if(spec.fields().isEmpty()){ return UnpartitionedWriter } else if(xxx){ return SparkPartitionedFanoutWriter ; } else { return SparkPartitionedWriter }

openinx · 2020-11-26T02:03:43Z

spark/src/test/java/org/apache/iceberg/spark/source/TestSparkDataWrite.java

-    }
+  @Test
+  public void testPartitionedFanoutCreateWithTargetFileSizeViaOption() throws IOException {
+    partitionedCreateWithTargetFileSizeViaOption(true);


Do we need an unit test to cover the spark's partitioned.fanout.enabled option ? I saw there's an unit test which use the table's write.partitioned.fanout.enabled property to define the fanout behavior.

openinx · 2020-11-26T02:06:30Z

spark2/src/main/java/org/apache/iceberg/spark/source/Writer.java

      } else {
-        return new Partitioned24Writer(spec, format, appenderFactory, fileFactory, io.value(),
-            targetFileSize, writeSchema, dsSchema);
+        if (partitionedFanoutEnabled) {


nit: could simplify it as:

if(spec.fields().isEmpty()){ } else if(partitionedFanoutEnabled){ } else { }

openinx · 2020-11-26T02:08:05Z

spark2/src/main/java/org/apache/iceberg/spark/source/Writer.java

+    PartitionedFanout24Writer(PartitionSpec spec, FileFormat format,
+        SparkAppenderFactory appenderFactory,
+        OutputFileFactory fileFactory, FileIO fileIo, long targetFileSize,
+        Schema schema, StructType sparkSchema) {


nit: format those lines as:

PartitionedFanout24Writer(PartitionSpec spec, FileFormat format, SparkAppenderFactory appenderFactory, OutputFileFactory fileFactory, FileIO fileIo, long targetFileSize, Schema schema, StructType sparkSchema) { super(spec, format, appenderFactory, fileFactory, fileIo, targetFileSize, schema, sparkSchema); }

openinx · 2020-11-26T02:08:38Z

spark3/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java

      } else {
-        return new Partitioned3Writer(
-            spec, format, appenderFactory, fileFactory, io.value(), targetFileSize, writeSchema, dsSchema);
+        if (partitionedFanoutEnabled) {


openinx · 2020-11-26T02:08:45Z

spark3/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java

+
+  private static class PartitionedFanout3Writer extends SparkPartitionedFanoutWriter
+      implements DataWriter<InternalRow> {
+    PartitionedFanout3Writer(PartitionSpec spec, FileFormat format, SparkAppenderFactory appenderFactory,


nit: format

openinx

Patch looks good to me overall, left several comments. The key point for me is : I think we need to provide an unit test to address the spark option partitioned.fanout.enabled, for both spark2 and spark3.

Thanks @XuQianJin-Stars for contribution.

github-actions bot added core flink spark labels Nov 15, 2020

XuQianJin-Stars mentioned this pull request Nov 15, 2020

Implement spark fanout writer #1746

Closed

github-actions bot added the docs label Nov 18, 2020

[iceberg-1746] Implement spark fanout writer

1e3e886

rdblue requested a review from openinx November 25, 2020 01:16

openinx reviewed Nov 25, 2020

View reviewed changes

spark3/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java Show resolved Hide resolved

openinx reviewed Nov 25, 2020

View reviewed changes

spark/src/test/java/org/apache/iceberg/spark/source/TestSparkDataWrite.java Show resolved Hide resolved

[iceberg-1746] Implement spark fanout writer

6f10277

openinx reviewed Nov 25, 2020

View reviewed changes

[iceberg-1746] Implement spark fanout writer

600d230

openinx reviewed Nov 26, 2020

View reviewed changes

site/docs/configuration.md Outdated Show resolved Hide resolved

openinx reviewed Nov 26, 2020

View reviewed changes

XuQianJin-Stars added 2 commits November 26, 2020 15:22

[iceberg-1746] Implement spark fanout writer

0917c7c

[iceberg-1746] Implement spark fanout writer

03c1c77

openinx approved these changes Nov 26, 2020

View reviewed changes

openinx merged commit 7383b9d into apache:master Nov 26, 2020

HeartSaVioR mentioned this pull request Dec 7, 2020

Rename fanout configs #1877

Merged

Forkast mentioned this pull request Sep 18, 2023

Make FanoutWriter a package public class #8581

Closed


		Dataset<Row> df = spark.createDataFrame(expected, SimpleRecord.class);

		df.select("id", "data").sort("data").write()

[iceberg-1746] Implement spark fanout writer #1774

[iceberg-1746] Implement spark fanout writer #1774

Uh oh!

Conversation

XuQianJin-Stars commented Nov 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RussellSpitzer commented Nov 15, 2020

Uh oh!

rdsr commented Nov 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XuQianJin-Stars commented Nov 16, 2020

Uh oh!

XuQianJin-Stars commented Nov 16, 2020

Uh oh!

chenjunjiedada commented Nov 16, 2020

Uh oh!

rdblue commented Nov 16, 2020

Uh oh!

XuQianJin-Stars commented Nov 17, 2020

Uh oh!

HeartSaVioR commented Nov 17, 2020

Uh oh!

rdblue commented Nov 25, 2020

Uh oh!

openinx commented Nov 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openinx left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

XuQianJin-Stars commented Nov 15, 2020 •

edited

Loading

rdsr commented Nov 16, 2020 •

edited

Loading

openinx commented Nov 25, 2020 •

edited

Loading