Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38412][SS] Fix the swapped sequence of from and to in StateSchemaCompatibilityChecker #35731

Closed
wants to merge 2 commits into from

Conversation

HeartSaVioR
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes the StateSchemaCompatibilityChecker which mistakenly swapped from (should be provided schema) and to (should be existing schema).

Why are the changes needed?

The bug mistakenly allows the case where it should not be allowed, and disallows the case where it should be allowed.

That allows nullable column to be stored into non-nullable column, which should be prohibited. This is less likely making runtime problem since state schema is conceptual one and row can be stored even not respecting the state schema.

The opposite case is worse, that disallows non-nullable column to be stored into nullable column, which should be allowed. Spark fails the query for this case.

Does this PR introduce any user-facing change?

Yes, after the fix, storing non-nullable column into nullable column for state will be allowed, which should have been allowed.

How was this patch tested?

Modified UTs.

@HeartSaVioR
Copy link
Contributor Author

}

test("changing the nullability of nullable to non-nullable in value should fail") {
test("storing non-nullable to nullable in value schema should be allowed") {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you word this the same as the test above, this felt a bit harder to understand
"storing non-nullable column into nullable column in value should be allowed"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. explicitly mentioning column seems to be better to understand. will update the test name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I already did that and I just missed this one. Will fix. Thanks!

Copy link

@pranavanand pranavanand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@pranavanand pranavanand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@pranavanand pranavanand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@HeartSaVioR
Copy link
Contributor Author

Thanks all for reviewing! I'll merge this once test passes, given the last commit only refines the test name.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Mar 9, 2022

Test passed. Merging to master/3.2/3.1.

HeartSaVioR added a commit that referenced this pull request Mar 9, 2022
…emaCompatibilityChecker

### What changes were proposed in this pull request?

This PR fixes the StateSchemaCompatibilityChecker which mistakenly swapped `from` (should be provided schema) and `to` (should be existing schema).

### Why are the changes needed?

The bug mistakenly allows the case where it should not be allowed, and disallows the case where it should be allowed.

That allows nullable column to be stored into non-nullable column, which should be prohibited. This is less likely making runtime problem since state schema is conceptual one and row can be stored even not respecting the state schema.

The opposite case is worse, that disallows non-nullable column to be stored into nullable column, which should be allowed. Spark fails the query for this case.

### Does this PR introduce _any_ user-facing change?

Yes, after the fix, storing non-nullable column into nullable column for state will be allowed, which should have been allowed.

### How was this patch tested?

Modified UTs.

Closes #35731 from HeartSaVioR/SPARK-38412.

Authored-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(cherry picked from commit 43c7824)
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
HeartSaVioR added a commit that referenced this pull request Mar 9, 2022
…emaCompatibilityChecker

### What changes were proposed in this pull request?

This PR fixes the StateSchemaCompatibilityChecker which mistakenly swapped `from` (should be provided schema) and `to` (should be existing schema).

### Why are the changes needed?

The bug mistakenly allows the case where it should not be allowed, and disallows the case where it should be allowed.

That allows nullable column to be stored into non-nullable column, which should be prohibited. This is less likely making runtime problem since state schema is conceptual one and row can be stored even not respecting the state schema.

The opposite case is worse, that disallows non-nullable column to be stored into nullable column, which should be allowed. Spark fails the query for this case.

### Does this PR introduce _any_ user-facing change?

Yes, after the fix, storing non-nullable column into nullable column for state will be allowed, which should have been allowed.

### How was this patch tested?

Modified UTs.

Closes #35731 from HeartSaVioR/SPARK-38412.

Authored-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(cherry picked from commit 43c7824)
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
@HeartSaVioR
Copy link
Contributor Author

Merged into master/3.2/3.1.

kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…emaCompatibilityChecker

### What changes were proposed in this pull request?

This PR fixes the StateSchemaCompatibilityChecker which mistakenly swapped `from` (should be provided schema) and `to` (should be existing schema).

### Why are the changes needed?

The bug mistakenly allows the case where it should not be allowed, and disallows the case where it should be allowed.

That allows nullable column to be stored into non-nullable column, which should be prohibited. This is less likely making runtime problem since state schema is conceptual one and row can be stored even not respecting the state schema.

The opposite case is worse, that disallows non-nullable column to be stored into nullable column, which should be allowed. Spark fails the query for this case.

### Does this PR introduce _any_ user-facing change?

Yes, after the fix, storing non-nullable column into nullable column for state will be allowed, which should have been allowed.

### How was this patch tested?

Modified UTs.

Closes apache#35731 from HeartSaVioR/SPARK-38412.

Authored-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(cherry picked from commit 43c7824)
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(cherry picked from commit f1efc95)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants