Skip to content

Spark: Add builder pattern to SparkAppenderFactory to handle increasing number of arguments#2499

Merged
openinx merged 5 commits intoapache:masterfrom
szehon-ho:appender_factory
Apr 27, 2021
Merged

Spark: Add builder pattern to SparkAppenderFactory to handle increasing number of arguments#2499
openinx merged 5 commits intoapache:masterfrom
szehon-ho:appender_factory

Conversation

@szehon-ho
Copy link
Member

As discussed in : #2362, rebased and submit part of my PR: #2240 about the SparkAppenderFactoryBuilder

@github-actions github-actions bot added the spark label Apr 20, 2021
@szehon-ho
Copy link
Member Author

Hi @aokolnychyi @RussellSpitzer , change is ready if you guys have some time to take a look, thanks

@szehon-ho szehon-ho changed the title [spark] Add builder pattern to SparkAppenderFactory to handle increasing number of arguments Spark: Add builder pattern to SparkAppenderFactory to handle increasing number of arguments Apr 21, 2021
Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me, but all the use cases replaced here don't seem like they benefit from the change, they all seem just as long? I'm probably just missing the spot in the original PR though.

@szehon-ho
Copy link
Member Author

Hi Russell, yea its just a code cleanup.

Both the aforementioned PR's (#2362 and #2240) noticed there's 4 versions of the SparkAppenderFactory Constructor and that it becomes harder and harder to add new things, and can be cleaner like this.

I noticed that I added an extraneous sort-order pointer as part of copy and paste from my PR (#2240), and removed it as its now redundant with the table added by the #2362, updated the PR with the small fix

}

public SparkAppenderFactory build() {
return new SparkAppenderFactory(properties, writeSchema, dsSchema, spec, equalityFieldIds,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to do the necessary validation before we construct the SparkAppenderFactory instance:

  1. The properties, writeSchema, dsSchema must not be null;
  2. If the equalityFieldIds is not null, means people plan to write equality deletes into delete files. Then the eqDeleteRowSchema must not be null; Another side, if eqDeleteRowSchema is not null, then equalityFieldIds must not be null.
  3. the posDeleteRowSchema can always choose to be null or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return new SparkAppenderFactoryBuilder(table, writeSchema, dsSchema);
}

public static class SparkAppenderFactoryBuilder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the class SparkAppenderFactory was not marked as public, so I think the newly introduced SparkAppenderFactoryBuilder also don't have to be public ?

btw, I think we could just use the name Builder because it's in the class SparkAppenderFactory, the prefix SparkAppenderFactory seems to be redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@szehon-ho
Copy link
Member Author

Thanks for taking a look and the clarification of what should not be null @openinx , i made the suggested changes if you want to take another look.

Copy link
Member

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall, just left a minor comment !

this.posDeleteRowSchema = posDeleteRowSchema;
}

public static Builder builderFor(Table table, Schema writeSchema, StructType dsSchema) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: As iceberg is a library which will be embedded in user's project, so we usually strictly control methods/classes marked as public. In this case, we don't have to mark this as public. Similar comment in the following Builder's methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@szehon-ho
Copy link
Member Author

@openinx thanks again, yea I missed it

@openinx openinx merged commit 172ecb5 into apache:master Apr 27, 2021
@openinx
Copy link
Member

openinx commented Apr 27, 2021

Got this merged, thanks @szehon-ho for contributing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments