-
Notifications
You must be signed in to change notification settings - Fork 1.8k
fix: shuffle seed #18518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: shuffle seed #18518
Conversation
| pub fn new() -> Self { | ||
| Self { | ||
| signature: Signature::arrays(1, None, Volatility::Volatile), | ||
| signature: Signature::user_defined(Volatility::Volatile), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid using user_defined and make use of TypeSignature::ArraySignature, for example:
datafusion/datafusion/expr-common/src/signature.rs
Lines 1247 to 1262 in 7591919
| /// Specialized [Signature] for ArrayElement and similar functions. | |
| pub fn array_and_index(volatility: Volatility) -> Self { | |
| Signature { | |
| type_signature: TypeSignature::ArraySignature( | |
| ArrayFunctionSignature::Array { | |
| arguments: vec![ | |
| ArrayFunctionArgument::Array, | |
| ArrayFunctionArgument::Index, | |
| ], | |
| array_coercion: Some(ListCoercion::FixedSizedListToList), | |
| }, | |
| ), | |
| volatility, | |
| parameter_names: None, | |
| } | |
| } |
- Index can be used as seed since it is Int64 type
- We can't use
Signature::array_and_indexdirectly since that would coerce FixedSizeLists to List arrays, which we don't want since we have a native implementation for FixedSizeLists - Ensure there is support for both array only, and array + seed
| SELECT shuffle([1, 2, 3, 4, 5, NULL]) != [1, 2, 3, 4, 5, NULL]; | ||
| SELECT shuffle([1, 2, 3, 4, 5, NULL], 1) != [1, 2, 3, 4, 5, NULL]; | ||
| ---- | ||
| true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just assert the output list instead of it being an inequality with its sorted version now, e.g.
query ?
SELECT shuffle([1, 2, 3, 4, 5, NULL], 1);
----
[2, 5, NULL, 3, 4, 1]| if arg_types.is_empty() { | ||
| return plan_err!("shuffle expects at least 1 argument"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if arg_types.is_empty() { | |
| return plan_err!("shuffle expects at least 1 argument"); | |
| } |
We don't need this check
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @chenkovsky and @Jefffrey
## Which issue does this PR close? - Closes apache#18476. ## Rationale for this change shuffle test sometimes fails ## What changes are included in this PR? add seed to shuffle, make sure slt won't fail. ## Are these changes tested? UT ## Are there any user-facing changes? No
## Which issue does this PR close? - Closes apache#18476. ## Rationale for this change shuffle test sometimes fails ## What changes are included in this PR? add seed to shuffle, make sure slt won't fail. ## Are these changes tested? UT ## Are there any user-facing changes? No
Which issue does this PR close?
shuffle.sltIntermittent CI failure forSELECT shuffle([1, 2, 3, 4, 5, NULL]) != [1, 2, 3, 4, 5, NULL];#18476.Rationale for this change
shuffle test sometimes fails
What changes are included in this PR?
add seed to shuffle, make sure slt won't fail.
Are these changes tested?
UT
Are there any user-facing changes?
No