-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 3.5: Don't change table distribution when only altering local order #10774
Spark 3.5: Don't change table distribution when only altering local order #10774
Conversation
...n/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala
Outdated
Show resolved
Hide resolved
...s/src/test/java/org/apache/iceberg/spark/extensions/TestSetWriteDistributionAndOrdering.java
Outdated
Show resolved
Hide resolved
...s/src/test/java/org/apache/iceberg/spark/extensions/TestSetWriteDistributionAndOrdering.java
Outdated
Show resolved
Hide resolved
29f80d6
to
66d58ed
Compare
@RussellSpitzer could you please take another look? |
@RussellSpitzer @aokolnychyi @szehon-ho would you mind taking another look? |
} else if (orderingSpec.UNORDERED != null || orderingSpec.LOCALLY != null) { | ||
DistributionMode.NONE | ||
Some(DistributionMode.HASH) | ||
} else if (orderingSpec.UNORDERED != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just return None when distributionSpec not set here? I thought that's more inline with the goal of the pr (to not override default unless explicitly set).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize it would break a few tests and is a big behavior change. So @RussellSpitzer @aokolnychyi let me know if you disagree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which do you mean will break a few tests and is a big change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val distributionMode = if (distributionSpec != null) {
DistributionMode.HASH
} else if (orderingSpec.UNORDERED != null || orderingSpec.LOCALLY != null) {
DistributionMode.NONE
} else {
DistributionMode.RANGE
}
makes more sense to me as:
val distributionMode = if (distributionSpec != null) {
Some(DistributionMode.HASH)
} else {
None
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szehon-ho I'm not sure I follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the goal is to not set distributionMode , if its not in the ALTER statement. Hence why I thought the above code, which returns None in those cases unless its explicitly present in ALTER statement.
The current code works for the exact statement ALTER TABLE LOCALLY UNORDERED, but not for other ALTER's that dont have distributionMode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this function is used by other pieces of code. Specifically the Create statements where RANGE is set if there is an ordering. Maybe it makes sense to separate it out into a Distribution and ordering for ALTER and one for CREATE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RussellSpitzer This method is for ALTER only.
Line 73 in c16cefa
| ALTER TABLE multipartIdentifier WRITE writeSpec #setWriteDistributionAndOrdering |
@szehon-ho The result of this method can be one of HASH
, NONE
, RANGE
or unchanged depending on the combination of distributionSpec
and orderSpec
. Thus, we cannot just return None (which mean unchanged) when distributionSpec
is not set.
@RussellSpitzer Could you please take another look? It might be valuable to be included in 1.7 since it's a long-standing issue. |
Yes I think we are good here! Thanks @manuzhang for the patch and @szehon-ho for the review |
As discussed with @aokolnychyi @szehon-ho and @RussellSpitzer in #10647 (comment)