Spark: Add builder pattern to SparkAppenderFactory to handle increasing number of arguments#2499
Conversation
…ing number of arguments
|
Hi @aokolnychyi @RussellSpitzer , change is ready if you guys have some time to take a look, thanks |
RussellSpitzer
left a comment
There was a problem hiding this comment.
This looks fine to me, but all the use cases replaced here don't seem like they benefit from the change, they all seem just as long? I'm probably just missing the spot in the original PR though.
|
Hi Russell, yea its just a code cleanup. Both the aforementioned PR's (#2362 and #2240) noticed there's 4 versions of the SparkAppenderFactory Constructor and that it becomes harder and harder to add new things, and can be cleaner like this. I noticed that I added an extraneous sort-order pointer as part of copy and paste from my PR (#2240), and removed it as its now redundant with the table added by the #2362, updated the PR with the small fix |
| } | ||
|
|
||
| public SparkAppenderFactory build() { | ||
| return new SparkAppenderFactory(properties, writeSchema, dsSchema, spec, equalityFieldIds, |
There was a problem hiding this comment.
We will need to do the necessary validation before we construct the SparkAppenderFactory instance:
- The
properties,writeSchema,dsSchemamust not be null; - If the
equalityFieldIdsis not null, means people plan to write equality deletes into delete files. Then theeqDeleteRowSchemamust not be null; Another side, ifeqDeleteRowSchemais not null, thenequalityFieldIdsmust not be null. - the
posDeleteRowSchemacan always choose to be null or not.
| return new SparkAppenderFactoryBuilder(table, writeSchema, dsSchema); | ||
| } | ||
|
|
||
| public static class SparkAppenderFactoryBuilder { |
There was a problem hiding this comment.
Since the class SparkAppenderFactory was not marked as public, so I think the newly introduced SparkAppenderFactoryBuilder also don't have to be public ?
btw, I think we could just use the name Builder because it's in the class SparkAppenderFactory, the prefix SparkAppenderFactory seems to be redundant.
|
Thanks for taking a look and the clarification of what should not be null @openinx , i made the suggested changes if you want to take another look. |
openinx
left a comment
There was a problem hiding this comment.
Looks good to me overall, just left a minor comment !
| this.posDeleteRowSchema = posDeleteRowSchema; | ||
| } | ||
|
|
||
| public static Builder builderFor(Table table, Schema writeSchema, StructType dsSchema) { |
There was a problem hiding this comment.
Nit: As iceberg is a library which will be embedded in user's project, so we usually strictly control methods/classes marked as public. In this case, we don't have to mark this as public. Similar comment in the following Builder's methods.
|
@openinx thanks again, yea I missed it |
|
Got this merged, thanks @szehon-ho for contributing. |
As discussed in : #2362, rebased and submit part of my PR: #2240 about the SparkAppenderFactoryBuilder