Skip to content

Conversation

@chenkovsky
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

shuffle test sometimes fails

What changes are included in this PR?

add seed to shuffle, make sure slt won't fail.

Are these changes tested?

UT

Are there any user-facing changes?

No

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Nov 6, 2025
pub fn new() -> Self {
Self {
signature: Signature::arrays(1, None, Volatility::Volatile),
signature: Signature::user_defined(Volatility::Volatile),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid using user_defined and make use of TypeSignature::ArraySignature, for example:

/// Specialized [Signature] for ArrayElement and similar functions.
pub fn array_and_index(volatility: Volatility) -> Self {
Signature {
type_signature: TypeSignature::ArraySignature(
ArrayFunctionSignature::Array {
arguments: vec![
ArrayFunctionArgument::Array,
ArrayFunctionArgument::Index,
],
array_coercion: Some(ListCoercion::FixedSizedListToList),
},
),
volatility,
parameter_names: None,
}
}

  • Index can be used as seed since it is Int64 type
  • We can't use Signature::array_and_index directly since that would coerce FixedSizeLists to List arrays, which we don't want since we have a native implementation for FixedSizeLists
  • Ensure there is support for both array only, and array + seed

Comment on lines 25 to 27
SELECT shuffle([1, 2, 3, 4, 5, NULL]) != [1, 2, 3, 4, 5, NULL];
SELECT shuffle([1, 2, 3, 4, 5, NULL], 1) != [1, 2, 3, 4, 5, NULL];
----
true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just assert the output list instead of it being an inequality with its sorted version now, e.g.

query ?
SELECT shuffle([1, 2, 3, 4, 5, NULL], 1);
----
[2, 5, NULL, 3, 4, 1]

Comment on lines 71 to 73
if arg_types.is_empty() {
return plan_err!("shuffle expects at least 1 argument");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if arg_types.is_empty() {
return plan_err!("shuffle expects at least 1 argument");
}

We don't need this check

@Jefffrey Jefffrey added this pull request to the merge queue Nov 8, 2025
Merged via the queue into apache:main with commit 92727b5 Nov 8, 2025
28 checks passed
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @chenkovsky and @Jefffrey

hsiang-c pushed a commit to hsiang-c/datafusion that referenced this pull request Nov 9, 2025
## Which issue does this PR close?

- Closes apache#18476.

## Rationale for this change

shuffle test sometimes fails

## What changes are included in this PR?

add seed to shuffle, make sure slt won't fail.

## Are these changes tested?

UT

## Are there any user-facing changes?

No
codetyri0n pushed a commit to codetyri0n/datafusion that referenced this pull request Nov 11, 2025
## Which issue does this PR close?

- Closes apache#18476.

## Rationale for this change

shuffle test sometimes fails

## What changes are included in this PR?

add seed to shuffle, make sure slt won't fail.

## Are these changes tested?

UT

## Are there any user-facing changes?

No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[datafusion-cli] shuffle.slt Intermittent CI failure for SELECT shuffle([1, 2, 3, 4, 5, NULL]) != [1, 2, 3, 4, 5, NULL];

3 participants