Core: Add BaseDeltaWriter#1888
Conversation
| private StructLikeMap<PathOffset> insertedRowMap; | ||
|
|
||
| public BaseDeltaWriter(PartitionKey partition, Schema eqDeleteSchema) { | ||
| Preconditions.checkNotNull(eqDeleteSchema, "equality-delete schema could not be null."); |
There was a problem hiding this comment.
Nit: the error message is slightly misleading because it uses "could", which implies that there is some case where it could be null. How about "Equality delete schema cannot be null"?
| /** | ||
| * Base delta writer to write both insert records and equality-deletes. | ||
| */ | ||
| protected abstract class BaseDeltaWriter implements Closeable { |
There was a problem hiding this comment.
How about BaseEqualityDeltaWriter?
I think Spark MERGE INTO will likely use a delta writer that doesn't create the equality writer or use the SortedPosDeleteWriter because it will request that the rows are already ordered.
| * | ||
| * @param key is the projected values which could be write to eq-delete file directly. | ||
| */ | ||
| public void delete(T key) throws IOException { |
There was a problem hiding this comment.
Why pass a key here instead of a row?
I think it would be easier to assume that this is a row, so that the write and delete methods accept the same data. That also provides a way to write the row to the delete file, or just the key based on configuration. The way it is here, there is no way to write the whole row in the delete.
There was a problem hiding this comment.
From the tests, I see that this can be used when the deleted row won't be passed. But I don't think this leaves an option for writing an entire row to the delete file and using a subset of its columns for the delete. This assumes that whatever struct is passed here is the entire delete key.
To support both use cases (writing an already projected key and writing a full row and projecting), I think we should have two methods: deleteKey and delete. That way both are supported. We would also need a test for passing the full row to the delete file, but deleting by key.
There was a problem hiding this comment.
I agreed that it's better to provide a delete(T row) method which can accept an entire row to write eq-delete file. Because it's possible that have a table with (a,b,c,d) columns, the equality fields are (a,c) columns, while someone want to write the (a,b,c,d) values into eq-delete file.
The problem is: how could we project the generic data type T to match the expected eqDeleteRowSchema (so that the eqDeleteWriter could write the correct column values) ? Sounds like we will need to provide an extra abstracted method to accomplish that generic projection ?
There was a problem hiding this comment.
Thought it again, if the table has the (a,b,c,d) columns, and the equality fields are (b,c), then I think it's enough to provide two ways to write the equality-delete file:
-
Only write the
(b,c)column values into equality-delete file. That is enough to maintain the correct deletion semantics if people don't expect to do any real incremental pulling. -
Write the
(a,b,c,d)column values into equality-delete file. That's suitable if people want to consume the incremental inserts and deletes for future streaming analysis or data pipeline etc.
Except the above cases, I can't think of other scenarios that require a custom equality schema.
There was a problem hiding this comment.
Yes, I think those two cases are correct: the caller may want to pass the key to delete, or an entire row to delete.
data/src/test/java/org/apache/iceberg/io/TestTaskDeltaWriter.java
Outdated
Show resolved
Hide resolved
|
|
||
| // Check records in the eq-delete file. | ||
| DeleteFile eqDeleteFile = result.deleteFiles()[0]; | ||
| Assert.assertEquals(ImmutableList.of(record, record), readRecordsAsList(eqDeleteSchema, eqDeleteFile.path())); |
There was a problem hiding this comment.
Is it necessary to encode the record twice? What about detecting already deleted keys and omitting the second delete? That might be over-complicating this for a very small optimization though.
There was a problem hiding this comment.
Em, agreed that we don't have to upsert the same key twice. The first upsert is enough.
| createRecord(4, "ccc") | ||
| )), actualRowSet("*")); | ||
|
|
||
| // Start the 2th transaction. |
There was a problem hiding this comment.
Nit: it would normally be "2nd"
| */ | ||
| protected abstract StructLike asStructLike(T data); | ||
|
|
||
| protected abstract StructLike asCopiedKey(T row); |
There was a problem hiding this comment.
I think this could be removed from the API because this already has asStructLike. Rather than relying on the implementation to create the projection and copy it, this could be implemented here like this:
protected StructLike asCopiedKey(T row) {
return structProjection.copy().wrap(asStructLike(row));
}We would need to create StructProjection in this class, but it would be fairly easy.
There was a problem hiding this comment.
I think the above implementation of asCopiedKey may has problems, because in some compute engine --- for example flink, it will use the singleton RowDataWrapper instance to implement the asStructLike, then for both the old key and copied key, they are sharing the same RowDataWrapper which has wrapped the new RowData. That is messing up the keys in insertedRowMap.
Considered this issue, I finally decided to abstract a separate asCopiedKey to clone a totally different key which won't share object values with the old one.
There was a problem hiding this comment.
Good catch. I like the idea of copying the data into a new struct.
| } | ||
|
|
||
| public Set<CharSequence> referencedDataFiles() { | ||
| return referencedDataFiles; |
There was a problem hiding this comment.
Good catch. Should this set be part of the WriteResult instead of separate? I think that tasks are going to need to pass the set back to the commit for validation, so adding it to the WriteResult seems like the right way to handle it.
|
Merged! Great work, @openinx! |
This is a separate PR to add the BaseDeltaWriter in BaseTaskWriter (https://github.com/apache/iceberg/pull/1818/files#diff-fc9a9fd84d24c607fd85e053b08a559f56dd2dd2a46f1341c528e7a0269f873cR92). The DeltaWriter could accept both
insertand equalitydeletes.For the CDC case and upsert case, compute engine such as flink and spark could write the streaming records (INSERT or DELETE) by this delta writer to apache iceberg table.