-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25371][SQL] struct() should allow being called with 0 args #22373
Conversation
cc @dongjoon-hyun @gatorsmile @maropu who worked on SPARK-21281 |
Test build #95847 has finished for PR 22373 at commit
|
@@ -256,4 +256,9 @@ class VectorAssemblerSuite | |||
assert(runWithMetadata("keep", additional_filter = "id1 > 2").count() == 4) | |||
} | |||
|
|||
test("SPARK-25371: VectorAssembler with empty inputCols") { | |||
val vectorAssembler = new VectorAssembler().setInputCols(Array()).setOutputCol("a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is VectorAssembler
with zero input column useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @yanboliang and @srowen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't sound that useful, but the JIRA suggests this is the behavior in 2.2. It throws a weird error in 2.3. I could imagine just allowing this behavior, or throwing a better exception. Is there a use case for no input? maybe you have some reusable pipeline that is applied to a subset of columns and sometimes it matches nothing. The output is empty but maybe that doesn't matter for whatever purpose it serves... maybe it's assembled with something else afterwards. I could picture a valid use case.
@mgaido91, BTW are you sure SPARK-21281 introduced that behaviour change? Before:
After:
it should be good to document that behaviour change if that's allowed before in a separate PR. |
@HyukjinKwon I am sure, since I tried removing the added check and the UT I added here passed. |
cc @cloud-fan @jerryshao despite a very minor one, this can be considered a regression, so may be considered as a blocker for 2.4/2.3.2? |
I think we should allow |
Yes, I agree @cloud-fan. At least until 3.0 IMHO. But since the change was already released in 3.0, I was not sure whether to revert that additional check or add this logic here. If nobody objects, I'd switch to remove the check, since this would also prevent other users's workflows to potentially break having a regression. |
thanks @cloud-fan @maropu , I'll update this accordingly ASAP, thanks |
ya, sorry for bothering you all. thansk, @mgaido91 |
@@ -256,4 +256,9 @@ class VectorAssemblerSuite | |||
assert(runWithMetadata("keep", additional_filter = "id1 > 2").count() == 4) | |||
} | |||
|
|||
test("SPARK-25371: VectorAssembler with empty inputCols") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, after this patch this test passes, before it doesn't. I think it is helpful to avoid regressions like this in the future.
Can you also update the PR description? thanks! |
Test build #95877 has finished for PR 22373 at commit
|
retest this please |
Test build #95895 has finished for PR 22373 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
## What changes were proposed in this pull request? SPARK-21281 introduced a check for the inputs of `CreateStructLike` to be non-empty. This means that `struct()`, which was previously considered valid, now throws an Exception. This behavior change was introduced in 2.3.0. The change may break users' application on upgrade and it causes `VectorAssembler` to fail when an empty `inputCols` is defined. The PR removes the added check making `struct()` valid again. ## How was this patch tested? added UT Closes #22373 from mgaido91/SPARK-25371. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 0736e72) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks, merging to master/2.4! @mgaido91 can you send a new PR to 2.3? it conflicts |
SPARK-21281 introduced a check for the inputs of `CreateStructLike` to be non-empty. This means that `struct()`, which was previously considered valid, now throws an Exception. This behavior change was introduced in 2.3.0. The change may break users' application on upgrade and it causes `VectorAssembler` to fail when an empty `inputCols` is defined. The PR removes the added check making `struct()` valid again. added UT Closes apache#22373 from mgaido91/SPARK-25371. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too
SPARK-21281 introduced a check for the inputs of `CreateStructLike` to be non-empty. This means that `struct()`, which was previously considered valid, now throws an Exception. This behavior change was introduced in 2.3.0. The change may break users' application on upgrade and it causes `VectorAssembler` to fail when an empty `inputCols` is defined. The PR removes the added check making `struct()` valid again. added UT Closes apache#22373 from mgaido91/SPARK-25371. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 0736e72) RB=1520856 G=superfriends-reviewers R=fli,mshen,yezhou,edlu A=fli
What changes were proposed in this pull request?
SPARK-21281 introduced a check for the inputs of
CreateStructLike
to be non-empty. This means thatstruct()
, which was previously considered valid, now throws an Exception. This behavior change was introduced in 2.3.0. The change may break users' application on upgrade and it causesVectorAssembler
to fail when an emptyinputCols
is defined.The PR removes the added check making
struct()
valid again.How was this patch tested?
added UT