-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Add column mapping metadata update functionality #3393
[Kernel] Add column mapping metadata update functionality #3393
Conversation
0f5ac79
to
3169b2e
Compare
Thanks for the PR @nastra ! Change LGTM so far. cc: @scottsand-db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Left some comments!
kernel/kernel-api/src/main/java/io/delta/kernel/internal/TransactionBuilderImpl.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/TransactionImpl.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/ColumnMapping.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/ColumnMapping.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/ColumnMapping.java
Outdated
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/DeltaTableWritesSuite.scala
Outdated
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/DeltaTableWritesSuite.scala
Outdated
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/DeltaTableWritesSuite.scala
Outdated
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/DeltaTableWritesSuite.scala
Outdated
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/DeltaTableWritesSuite.scala
Outdated
Show resolved
Hide resolved
7df2b70
to
bcd839a
Compare
f6ba330
to
599bd06
Compare
//////////////////////////// | ||
|
||
/** Visible for testing */ | ||
static int findMaxColumnId(StructType schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't handle nested schemas, right? are you planning on adding that in a subsequent PR?
I worry that adding it in a subsequent PR will leave us open to forgetting spots to add it ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I initially discussed this with @raveeram-db to add nested schema support in a separate PR. However, I can also add it as part of this PR if necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #3427 to handled nested schema
if (field.getMetadata().contains(COLUMN_MAPPING_PHYSICAL_NAME_KEY)) { | ||
return (String) field | ||
.getMetadata() | ||
.get(COLUMN_MAPPING_PHYSICAL_NAME_KEY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you create that issue about getString
on FieldMetadata
class? we can leave a TODO here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on #3393 (comment) I was waiting for a response whether we should do it in the first place, so I haven't opened an issue yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #3419 to address that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for working on this.
Qn: looks like the data writing path is not yet handled which means we can create tables with column mapping enabled, but the data written is not correct. Can we split out the physical name/id mapping info method and complete the implementation (i.e handling the nested and array/map types)?
private static boolean allowMappingModeChange(String oldMode, String newMode) { | ||
// only upgrade from none to name mapping is allowed | ||
return oldMode.equals(newMode) || | ||
(COLUMN_MAPPING_MODE_NONE.equals(oldMode) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if we should just define an enum and let the TableConfig
return the enum value. Otherwise we need to deal with the case sensitive issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know, I can make a separate PR to change it to use the enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would that enum be in ColumnMapping
or in TableConfig
? No worries, I can add that enum. However, that feels like it would be better to do in a follow-up PR in order to not increase the diff here, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enum in either places is fine.
private static Metadata assignColumnIdAndPhysicalName(Metadata metadata, boolean isNewTable) { | ||
StructType schema = metadata.getSchema(); | ||
int maxColumnId = Math.max( | ||
Integer.parseInt(metadata.getConfiguration() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the MAX_ID property to TableConfig
and use it to parse? btw Delta-Spark is parsing it as Long
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know why delta-spark
actually parses this as Long
instead of an Int
? Schema Ids should be ints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only reason I haven't added the MAX_COLUMN_ID
to TableConfig
because it's an internal table property that should be set by users and TableConfig
seems to be made for properties that can be explicitly set by users.
I think TableConfig
would need a concept of whether a property should be settable by a user or not, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we need a property in the TableConfig
to say it is changeable by the use or not. It can be a separate PR.
//////////////////////////// | ||
|
||
/** Visible for testing */ | ||
static int findMaxColumnId(StructType schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/ColumnMapping.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/ColumnMapping.java
Outdated
Show resolved
Hide resolved
.putString(COLUMN_MAPPING_PHYSICAL_NAME_KEY, physicalName) | ||
.build()); | ||
} | ||
newSchema = newSchema.add(updatedField); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we need to handle the nested types and array/map elements?
kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/ColumnMapping.java
Outdated
Show resolved
Hide resolved
@vkorukanti are you referring to what's being handled as part of #3384 w.r.t the data write path? |
599bd06
to
be1dc76
Compare
be1dc76
to
bb55534
Compare
bb55534
to
2652313
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor follow ups and in addition to the data path change. This PR LGTM.
private static boolean allowMappingModeChange(String oldMode, String newMode) { | ||
// only upgrade from none to name mapping is allowed | ||
return oldMode.equals(newMode) || | ||
(COLUMN_MAPPING_MODE_NONE.equals(oldMode) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enum in either places is fine.
private static Metadata assignColumnIdAndPhysicalName(Metadata metadata, boolean isNewTable) { | ||
StructType schema = metadata.getSchema(); | ||
int maxColumnId = Math.max( | ||
Integer.parseInt(metadata.getConfiguration() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we need a property in the TableConfig
to say it is changeable by the use or not. It can be a separate PR.
Yes, that's the issue. |
Which Delta project/connector is this regarding?
Description
This adds column mapping metadata when the column mapping feature is enabled on a table and fixes #3383
How was this patch tested?
New tests have been added to verify the behavior
Does this PR introduce any user-facing changes?