Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.3: Dataset writes for position deletes #7029

Merged
merged 8 commits into from
Apr 5, 2023

Conversation

szehon-ho
Copy link
Collaborator

@szehon-ho szehon-ho commented Mar 6, 2023

This is the last pre-requisite for implementing RewriteDeleteFiles.

It allows dataset writes to the position_deletes metadata table, on condition of rewritte_file_set_id being set (ie, it comes from Iceberg's internal use).

Part of this pr, which is simple refactoring, is already split out into: #6924

@szehon-ho szehon-ho changed the title Allow writes for position deletes Spark 3.3: Dataset writes for position deletes Mar 6, 2023
@@ -29,7 +29,7 @@ private static Schema pathPosSchema(Schema rowSchema) {
return new Schema(
MetadataColumns.DELETE_FILE_PATH,
MetadataColumns.DELETE_FILE_POS,
Types.NestedField.required(
Types.NestedField.optional(
Copy link
Collaborator Author

@szehon-ho szehon-ho Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was necessary for the writer to allow writing position deletes with "row", but still be ok when there is null "row".

Currently, the writer code either uses a schema with required "row" field as is here, or a schema without the "row" field (see posPathSchema method just below). This one with required row field is actually not used, so changing to optional should have no impact.

This is actually more in line with the position-delete schema in the spec, where "row" is optional.

Copy link
Collaborator Author

@szehon-ho szehon-ho Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Looks like a few GenericWriter depend on this, to throw exception if null rows passed in. This will thus be a change of behavior , but backward compatible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about this change, the Delete Formats spec says the reason why this column type should be required is to make sure the statistics of the deleted row values is accurate. I think the reason to make sure the statistics are accurate is because of the manifest reader will use them to filter delete files:
image
So I think if this type is changed to optional, the statistics will become unreliable, which may cause the delete manifest entry to be incorrectly filtered? This is just my understanding of the spec, but I'm not sure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea you are right, this is tricky. It says this in spec:

2147483544 row required struct<...> [1] Deleted row values. Omit the column when not storing deleted rows.

When present in the delete file, row is required because all delete entries must include the row values.

So either, entire position delete file has 'row', or entire file does not have 'row'. (Currently it seems Spark does not set 'row' at all, ref: https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java#L436)

I somehow need a way, when compacting delete files, to know whether the original position file all have rows or not. I am not sure at the moment how to get this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the pr with a fix for this , the idea from chatting with @RussellSpitzer offline. I made SparkWrite.DeleteWriter now a fan-out that can redirect deletes to two files, one that either has 'row' as required struct, or no 'row' at all. In most case only one will be chosen. Thanks for the initial comment.

@szehon-ho szehon-ho force-pushed the position_delete_write_master branch from 8775b2a to 1364baf Compare March 6, 2023 21:59
@github-actions github-actions bot added data ORC and removed ORC labels Mar 6, 2023
@szehon-ho szehon-ho force-pushed the position_delete_write_master branch from 9fecbba to 38cc095 Compare March 9, 2023 01:21
@szehon-ho
Copy link
Collaborator Author

Rebase on updated version of #6924

@szehon-ho szehon-ho force-pushed the position_delete_write_master branch from 38cc095 to fb29954 Compare March 13, 2023 17:33
@aokolnychyi
Copy link
Contributor

I went through the change. Let me do a detailed review round with fresh eyes tomorrow.

@szehon-ho
Copy link
Collaborator Author

Made suggested changes (refactored to new classe SparkPositionDeletesRewrite from SparkWrite).

Notes, new classes drop some unused codes from the previous path, like reportMetrics method and cleanupOnAbort flag to control abort behavior. I assume we can go back to this when we implement the commit manager part, as of now its not clear whether we need this or not.

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting close. I did a detailed round. Will check tests with fresh eyes tomorrow.

@szehon-ho szehon-ho force-pushed the position_delete_write_master branch 2 times, most recently from 29788fa to b02c011 Compare March 24, 2023 17:39
@szehon-ho szehon-ho force-pushed the position_delete_write_master branch from b02c011 to 652f37f Compare March 24, 2023 18:13
@szehon-ho szehon-ho added this to In progress in [Priority 1] Maintenance: Delete file compaction via automation Mar 24, 2023
Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there.

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left a few minor comments. Feel free to merge whenever you are ready, @szehon-ho.
Nice work!


abstract class BaseFileRewriteCoordinator<F extends ContentFile<F>> {

private static final Logger LOG = LoggerFactory.getLogger(FileRewriteCoordinator.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are using a wrong class for logging. It should be BaseFileRewriteCoordinator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed

Preconditions.checkArgument(
fileSetId != null, "position_deletes table can only be written by RewriteDeleteFiles");
Preconditions.checkArgument(
writeConf.handleTimestampWithoutZone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part would be easier to read if we define fileSetId and handleTimestampWithoutZone instance variables, similar to what we have in SparkWriteBuilder.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its a bit harder to read if we define in Constructor, like SparkWriteBuilder, as its a bit detached from this code. I rewrote to define the vars at beginning of the method.

partitions.addAll(tasks.stream().map(ContentScanTask::partition).collect(Collectors.toList()));
Preconditions.checkArgument(
partitions.size() == 1,
"All scan tasks of %s are expected to have the same partition",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we miss , but got %s at the end to include partitions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, done

String fileSetId = writeConf.rewrittenFileSetId();

Preconditions.checkArgument(
fileSetId != null, "position_deletes table can only be written by RewriteDeleteFiles");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is RewriteDeleteFiles.
What about a more generic message like Can only write to %s via actions", table.name()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, done

resultMap.remove(id);
}

public Set<String> fetchSetIDs(Table table) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we renamed it to be fetchSetIds instead of fetchSetIDs, so have to keep the old method for now.


SparkFileWriterFactory writerFactoryWithRow =
SparkFileWriterFactory.builderFor(table)
.dataSchema(writeSchema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we need to set these dataXXX methods since we are not writing any data (here).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


private StructLike partition(String fileSetId, List<PositionDeletesScanTask> tasks) {
StructLikeSet partitions = StructLikeSet.create(tasks.get(0).spec().partitionType());
partitions.addAll(tasks.stream().map(ContentScanTask::partition).collect(Collectors.toList()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can use forEach instead of a temp list.

tasks.stream().map(ContentScanTask::partition).forEach(partitions::add);

In any case, I like what you did here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

[Priority 1] Maintenance: Delete file compaction automation moved this from In progress to Reviewer approved Mar 31, 2023
@szehon-ho szehon-ho merged commit 22d29a5 into apache:master Apr 5, 2023
32 checks passed
[Priority 1] Maintenance: Delete file compaction automation moved this from Reviewer approved to Done Apr 5, 2023
@szehon-ho
Copy link
Collaborator Author

Merged, thanks @aokolnychyi for detailed review, and @zhongyujiang @amogh-jahagirdar for initial reviews

ericlgoodman pushed a commit to ericlgoodman/iceberg that referenced this pull request Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants