Core: Add BaseDeltaWriter by openinx · Pull Request #1888 · apache/iceberg

openinx · 2020-12-08T10:08:23Z

This is a separate PR to add the BaseDeltaWriter in BaseTaskWriter (https://github.com/apache/iceberg/pull/1818/files#diff-fc9a9fd84d24c607fd85e053b08a559f56dd2dd2a46f1341c528e7a0269f873cR92). The DeltaWriter could accept both insert and equality deletes.

For the CDC case and upsert case, compute engine such as flink and spark could write the streaming records (INSERT or DELETE) by this delta writer to apache iceberg table.

rdblue · 2020-12-08T20:49:32Z

core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java

+    private StructLikeMap<PathOffset> insertedRowMap;
+
+    public BaseDeltaWriter(PartitionKey partition, Schema eqDeleteSchema) {
+      Preconditions.checkNotNull(eqDeleteSchema, "equality-delete schema could not be null.");


Nit: the error message is slightly misleading because it uses "could", which implies that there is some case where it could be null. How about "Equality delete schema cannot be null"?

rdblue · 2020-12-08T20:51:11Z

core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java

+  /**
+   * Base delta writer to write both insert records and equality-deletes.
+   */
+  protected abstract class BaseDeltaWriter implements Closeable {


How about BaseEqualityDeltaWriter?

I think Spark MERGE INTO will likely use a delta writer that doesn't create the equality writer or use the SortedPosDeleteWriter because it will request that the rows are already ordered.

rdblue · 2020-12-08T20:57:15Z

core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java

+     *
+     * @param key is the projected values which could be write to eq-delete file directly.
+     */
+    public void delete(T key) throws IOException {


Why pass a key here instead of a row?

I think it would be easier to assume that this is a row, so that the write and delete methods accept the same data. That also provides a way to write the row to the delete file, or just the key based on configuration. The way it is here, there is no way to write the whole row in the delete.

From the tests, I see that this can be used when the deleted row won't be passed. But I don't think this leaves an option for writing an entire row to the delete file and using a subset of its columns for the delete. This assumes that whatever struct is passed here is the entire delete key.

To support both use cases (writing an already projected key and writing a full row and projecting), I think we should have two methods: deleteKey and delete. That way both are supported. We would also need a test for passing the full row to the delete file, but deleting by key.

I agreed that it's better to provide a delete(T row) method which can accept an entire row to write eq-delete file. Because it's possible that have a table with (a,b,c,d) columns, the equality fields are (a,c) columns, while someone want to write the (a,b,c,d) values into eq-delete file.

The problem is: how could we project the generic data type T to match the expected eqDeleteRowSchema (so that the eqDeleteWriter could write the correct column values) ? Sounds like we will need to provide an extra abstracted method to accomplish that generic projection ?

Thought it again, if the table has the (a,b,c,d) columns, and the equality fields are (b,c), then I think it's enough to provide two ways to write the equality-delete file:

Only write the (b,c) column values into equality-delete file. That is enough to maintain the correct deletion semantics if people don't expect to do any real incremental pulling.

Write the (a,b,c,d) column values into equality-delete file. That's suitable if people want to consume the incremental inserts and deletes for future streaming analysis or data pipeline etc.

Except the above cases, I can't think of other scenarios that require a custom equality schema.

Yes, I think those two cases are correct: the caller may want to pass the key to delete, or an entire row to delete.

data/src/test/java/org/apache/iceberg/io/TestTaskDeltaWriter.java

rdblue · 2020-12-08T21:07:29Z

data/src/test/java/org/apache/iceberg/io/TestTaskDeltaWriter.java

+
+    // Check records in the eq-delete file.
+    DeleteFile eqDeleteFile = result.deleteFiles()[0];
+    Assert.assertEquals(ImmutableList.of(record, record), readRecordsAsList(eqDeleteSchema, eqDeleteFile.path()));


Is it necessary to encode the record twice? What about detecting already deleted keys and omitting the second delete? That might be over-complicating this for a very small optimization though.

Em, agreed that we don't have to upsert the same key twice. The first upsert is enough.

rdblue · 2020-12-08T21:08:58Z

data/src/test/java/org/apache/iceberg/io/TestTaskDeltaWriter.java

+        createRecord(4, "ccc")
+    )), actualRowSet("*"));
+
+    // Start the 2th transaction.


Nit: it would normally be "2nd"

rdblue · 2020-12-08T21:15:56Z

core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java

+     */
+    protected abstract StructLike asStructLike(T data);
+
+    protected abstract StructLike asCopiedKey(T row);


I think this could be removed from the API because this already has asStructLike. Rather than relying on the implementation to create the projection and copy it, this could be implemented here like this:

protected StructLike asCopiedKey(T row) { return structProjection.copy().wrap(asStructLike(row)); }

We would need to create StructProjection in this class, but it would be fairly easy.

I think the above implementation of asCopiedKey may has problems, because in some compute engine --- for example flink, it will use the singleton RowDataWrapper instance to implement the asStructLike, then for both the old key and copied key, they are sharing the same RowDataWrapper which has wrapped the new RowData. That is messing up the keys in insertedRowMap.

Considered this issue, I finally decided to abstract a separate asCopiedKey to clone a totally different key which won't share object values with the old one.

Good catch. I like the idea of copying the data into a new struct.

core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java

rdblue · 2020-12-11T00:24:46Z

core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java

  }

+  public Set<CharSequence> referencedDataFiles() {
+    return referencedDataFiles;


Good catch. Should this set be part of the WriteResult instead of separate? I think that tasks are going to need to pass the set back to the commit for validation, so adding it to the WriteResult seems like the right way to handle it.

rdblue · 2020-12-11T18:00:08Z

Merged! Great work, @openinx!

openinx added 2 commits December 8, 2020 14:48

Core: Add BaseDeltaWriter.

afe2a59

More unit tests.

49028eb

github-actions bot added API core data labels Dec 8, 2020

Minor fixes

c194645

rdblue reviewed Dec 8, 2020

View reviewed changes

data/src/test/java/org/apache/iceberg/io/TestTaskDeltaWriter.java Outdated Show resolved Hide resolved

rdblue reviewed Dec 8, 2020

View reviewed changes

openinx added 2 commits December 9, 2020 16:36

Address comments.

4a6c032

Minor changes.

46212bc

openinx mentioned this pull request Dec 9, 2020

Flink: Add PartitionedDeltaTaskWriter and UnpartitionedDeltaTaskWriter to write both insert and equality-deletes #1896

Merged

rdblue reviewed Dec 9, 2020

View reviewed changes

core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java Show resolved Hide resolved

openinx added 2 commits December 10, 2020 10:32

Introduce asCopiedStructLike

8e5cb38

Validate the referenced data files.

dfe0401

rdblue reviewed Dec 11, 2020

View reviewed changes

openinx added 2 commits December 11, 2020 14:39

Addressing the comments.

dfeed41

Minor changes.

b9bc755

rdblue approved these changes Dec 11, 2020

View reviewed changes

rdblue merged commit 47b6b05 into apache:master Dec 11, 2020

Conversation

openinx commented Dec 8, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue Dec 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue Dec 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Dec 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rdblue Dec 8, 2020 •

edited

Loading

rdblue Dec 8, 2020 •

edited

Loading