Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46062][SQL] Sync the isStreaming flag between CTE definition and reference #43966

Closed
wants to merge 7 commits into from

Conversation

HeartSaVioR
Copy link
Contributor

What changes were proposed in this pull request?

This PR proposes to sync the flag isStreaming from CTE definition to CTE reference.

The essential issue is that CTE reference node cannot determine the flag isStreaming by itself, and never be able to have a proper value and always takes the default as it does not have a parameter in constructor. The other flag resolved is handled, and we need to do the same for isStreaming.

Once we add the parameter to the constructor, we will also need to make sure the flag is in sync with CTE definition. We have a rule ResolveWithCTE doing the sync, hence we add the logic to sync the flag isStreaming as well.

Why are the changes needed?

The bug may impact some rules which behaves differently depending on isStreaming flag. It would no longer be a problem once CTE reference is replaced with CTE definition at some point in "optimization phase", but all rules in analyzer and optimizer being triggered before the rule takes effect may misbehave based on incorrect isStreaming flag.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UT.

Was this patch authored or co-authored using generative AI tooling?

No.

@HeartSaVioR
Copy link
Contributor Author

cc. @cloud-fan PTAL, thanks!

statsOpt: Option[Statistics] = None) extends LeafNode with MultiInstanceRelation {

final override val nodePatterns: Seq[TreePattern] = Seq(CTE)

override lazy val resolved: Boolean = _resolved

override lazy val isStreaming: Boolean = _isStreaming
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isStreaming is a def, so we can simply put override val isStreaming in the constructor

@HeartSaVioR
Copy link
Contributor Author

29874fc

Just fixed other part being broken from the suggestion.

6840d25

Just updated the golden files - the only diff is that CTE reference node now shows the value of isStreaming flag.

@HeartSaVioR
Copy link
Contributor Author

https://github.com/HeartSaVioR/spark/actions/runs/6969118403/job/18967484716

It failed only from test_pandas_map with connection refused error message, which is unrelated.

@HeartSaVioR
Copy link
Contributor Author

Thanks, merging to master/3.5/3.4.

HeartSaVioR added a commit that referenced this pull request Nov 23, 2023
…nd reference

This PR proposes to sync the flag `isStreaming` from CTE definition to CTE reference.

The essential issue is that CTE reference node cannot determine the flag `isStreaming` by itself, and never be able to have a proper value and always takes the default as it does not have a parameter in constructor. The other flag `resolved` is handled, and we need to do the same for `isStreaming`.

Once we add the parameter to the constructor, we will also need to make sure the flag is in sync with CTE definition. We have a rule `ResolveWithCTE` doing the sync, hence we add the logic to sync the flag `isStreaming` as well.

The bug may impact some rules which behaves differently depending on isStreaming flag. It would no longer be a problem once CTE reference is replaced with CTE definition at some point in "optimization phase", but all rules in analyzer and optimizer being triggered before the rule takes effect may misbehave based on incorrect isStreaming flag.

No.

New UT.

No.

Closes #43966 from HeartSaVioR/SPARK-46062.

Authored-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(cherry picked from commit 4304663)
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
HeartSaVioR added a commit that referenced this pull request Nov 23, 2023
…nd reference

This PR proposes to sync the flag `isStreaming` from CTE definition to CTE reference.

The essential issue is that CTE reference node cannot determine the flag `isStreaming` by itself, and never be able to have a proper value and always takes the default as it does not have a parameter in constructor. The other flag `resolved` is handled, and we need to do the same for `isStreaming`.

Once we add the parameter to the constructor, we will also need to make sure the flag is in sync with CTE definition. We have a rule `ResolveWithCTE` doing the sync, hence we add the logic to sync the flag `isStreaming` as well.

The bug may impact some rules which behaves differently depending on isStreaming flag. It would no longer be a problem once CTE reference is replaced with CTE definition at some point in "optimization phase", but all rules in analyzer and optimizer being triggered before the rule takes effect may misbehave based on incorrect isStreaming flag.

No.

New UT.

No.

Closes #43966 from HeartSaVioR/SPARK-46062.

Authored-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(cherry picked from commit 4304663)
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Feb 7, 2024
…nd reference

This PR proposes to sync the flag `isStreaming` from CTE definition to CTE reference.

The essential issue is that CTE reference node cannot determine the flag `isStreaming` by itself, and never be able to have a proper value and always takes the default as it does not have a parameter in constructor. The other flag `resolved` is handled, and we need to do the same for `isStreaming`.

Once we add the parameter to the constructor, we will also need to make sure the flag is in sync with CTE definition. We have a rule `ResolveWithCTE` doing the sync, hence we add the logic to sync the flag `isStreaming` as well.

The bug may impact some rules which behaves differently depending on isStreaming flag. It would no longer be a problem once CTE reference is replaced with CTE definition at some point in "optimization phase", but all rules in analyzer and optimizer being triggered before the rule takes effect may misbehave based on incorrect isStreaming flag.

No.

New UT.

No.

Closes apache#43966 from HeartSaVioR/SPARK-46062.

Authored-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(cherry picked from commit 4304663)
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants