Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.5: Don't change table distribution when only altering local order #10774

Merged
merged 1 commit into from
Oct 25, 2024

Conversation

manuzhang
Copy link
Contributor

@manuzhang manuzhang commented Jul 24, 2024

@manuzhang
Copy link
Contributor Author

@RussellSpitzer could you please take another look?

@manuzhang
Copy link
Contributor Author

@RussellSpitzer @aokolnychyi @szehon-ho would you mind taking another look?

} else if (orderingSpec.UNORDERED != null || orderingSpec.LOCALLY != null) {
DistributionMode.NONE
Some(DistributionMode.HASH)
} else if (orderingSpec.UNORDERED != null) {
Copy link
Collaborator

@szehon-ho szehon-ho Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just return None when distributionSpec not set here? I thought that's more inline with the goal of the pr (to not override default unless explicitly set).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize it would break a few tests and is a big behavior change. So @RussellSpitzer @aokolnychyi let me know if you disagree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which do you mean will break a few tests and is a big change?

Copy link
Collaborator

@szehon-ho szehon-ho Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 val distributionMode = if (distributionSpec != null) {
      DistributionMode.HASH
    } else if (orderingSpec.UNORDERED != null || orderingSpec.LOCALLY != null) {
      DistributionMode.NONE
    } else {
      DistributionMode.RANGE
    }

makes more sense to me as:

    val distributionMode = if (distributionSpec != null) {
       Some(DistributionMode.HASH)
    } else {
      None
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szehon-ho I'm not sure I follow

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the goal is to not set distributionMode , if its not in the ALTER statement. Hence why I thought the above code, which returns None in those cases unless its explicitly present in ALTER statement.

The current code works for the exact statement ALTER TABLE LOCALLY UNORDERED, but not for other ALTER's that dont have distributionMode?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this function is used by other pieces of code. Specifically the Create statements where RANGE is set if there is an ordering. Maybe it makes sense to separate it out into a Distribution and ordering for ALTER and one for CREATE?

Copy link
Contributor Author

@manuzhang manuzhang Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RussellSpitzer This method is for ALTER only.

| ALTER TABLE multipartIdentifier WRITE writeSpec #setWriteDistributionAndOrdering

@szehon-ho The result of this method can be one of HASH, NONE, RANGE or unchanged depending on the combination of distributionSpec and orderSpec. Thus, we cannot just return None (which mean unchanged) when distributionSpec is not set.

@manuzhang
Copy link
Contributor Author

@RussellSpitzer Could you please take another look? It might be valuable to be included in 1.7 since it's a long-standing issue.

@RussellSpitzer RussellSpitzer merged commit 32e9f40 into apache:main Oct 25, 2024
42 checks passed
@RussellSpitzer
Copy link
Member

Yes I think we are good here! Thanks @manuzhang for the patch and @szehon-ho for the review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants