Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34807][SQL] Transpose Window nodes with Project between them #31980

Closed
wants to merge 9 commits into from

Conversation

tanelk
Copy link
Contributor

@tanelk tanelk commented Mar 27, 2021

What changes were proposed in this pull request?

Extend the TransposeWindow rule to transpose Window nodes, that have Project between them.

Why are the changes needed?

The analyzer will turn a dataset.withColumn("colName", expressionWithWindowFunction) method call to a Project - Window - Project chain in the logical plan. When this method is called multiple times in a row, then the projects can block the Window nodes from being transposed by the current TransposeWindow rule.

TPCDS q47 and q57 are also improved by this.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

@github-actions github-actions bot added the SQL label Mar 27, 2021
@tanelk
Copy link
Contributor Author

tanelk commented Mar 27, 2021

cc @wangyum

@SparkQA
Copy link

SparkQA commented Mar 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41178/

@SparkQA
Copy link

SparkQA commented Mar 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41178/

@SparkQA
Copy link

SparkQA commented Mar 27, 2021

Test build #136594 has finished for PR 31980 at commit b9c38a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 28, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41187/

@SparkQA
Copy link

SparkQA commented Mar 28, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41187/

@SparkQA
Copy link

SparkQA commented Mar 28, 2021

Test build #136604 has finished for PR 31980 at commit bcf0b09.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 28, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41198/

@SparkQA
Copy link

SparkQA commented Mar 28, 2021

Test build #136616 has finished for PR 31980 at commit 8f15d15.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

This looks fine too but it would be great if we can have a sign-off from @hvanhovell too .. he has much better insight than I have

@SparkQA
Copy link

SparkQA commented Apr 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41425/

@SparkQA
Copy link

SparkQA commented Apr 2, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41425/

@SparkQA
Copy link

SparkQA commented Apr 2, 2021

Test build #136847 has finished for PR 31980 at commit 194b5bf.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 3, 2021

Test build #136879 has finished for PR 31980 at commit 194b5bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 24, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43400/

@SparkQA
Copy link

SparkQA commented May 24, 2021

Test build #138877 has finished for PR 31980 at commit faa17f6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 11, 2021

Test build #139691 has started for PR 31980 at commit b0b64a3.

@SparkQA
Copy link

SparkQA commented Jun 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44215/

@SparkQA
Copy link

SparkQA commented Jun 11, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44215/

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44411/

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44411/

@SparkQA
Copy link

SparkQA commented Jun 16, 2021

Test build #139881 has finished for PR 31980 at commit 1b0553e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tanelk
Copy link
Contributor Author

tanelk commented Jun 23, 2021

@maropu
This is antoher slight improvement to an optimizer rule, that has been stuck in approved state for a quite while.
Could you take another look and perhaps we can merge this?

@SparkQA
Copy link

SparkQA commented Jun 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44751/

@SparkQA
Copy link

SparkQA commented Jun 23, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44751/

@SparkQA
Copy link

SparkQA commented Jun 24, 2021

Test build #140223 has finished for PR 31980 at commit 60c6643.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class MergedBlockMetaRequest extends AbstractMessage implements RequestMessage
  • public class MergedBlockMetaSuccess extends AbstractResponseMessage
  • public abstract class AbstractFetchShuffleBlocks extends BlockTransferMessage
  • public class FetchShuffleBlockChunks extends AbstractFetchShuffleBlocks
  • public class FetchShuffleBlocks extends AbstractFetchShuffleBlocks
  • case class ShuffleBlockInfo(shuffleId: Int, mapId: Long)
  • class AvroSchemaHelper(avroSchema: Schema, avroPath: Seq[String])
  • class DecimalOps(FractionalOps):
  • class IntegralExtensionOps(IntegralOps):
  • class FractionalExtensionOps(FractionalOps):
  • class StringExtensionOps(StringOps):
  • class GroupBy(Generic[T_Frame], metaclass=ABCMeta):
  • class DataFrameGroupBy(GroupBy[DataFrame]):
  • class SeriesGroupBy(GroupBy[Series]):
  • class SparkIndexOpsMethods(Generic[T_IndexOps], metaclass=ABCMeta):
  • class SparkSeriesMethods(SparkIndexOpsMethods[\"ps.Series\"]):
  • class SparkIndexMethods(SparkIndexOpsMethods[\"ps.Index\"]):
  • case class TempResolvedColumn(child: Expression, nameParts: Seq[String]) extends UnaryExpression
  • case class Cast(
  • class ExpressionContainmentOrdering extends Ordering[Expression]
  • case class GetTimestampWithoutTZ(
  • case class ParseToTimestampWithoutTZ(
  • case class MakeDTInterval(
  • case class MakeYMInterval(years: Expression, months: Expression)
  • case class DayTimeIntervalType(startField: Byte, endField: Byte) extends AtomicType
  • case class YearMonthIntervalType(startField: Byte, endField: Byte) extends AtomicType
  • final class ParquetReadState
  • public class ParquetVectorUpdaterFactory
  • case class CommandResult(
  • case class StateStoreCustomSumMetric(name: String, desc: String) extends StateStoreCustomMetric
  • case class StateStoreCustomSizeMetric(name: String, desc: String) extends StateStoreCustomMetric
  • case class StateStoreCustomTimingMetric(name: String, desc: String) extends StateStoreCustomMetric
  • trait TestGroupState[S] extends GroupState[S]

@wangyum wangyum closed this in b3a2ceb Jun 24, 2021
@wangyum
Copy link
Member

wangyum commented Jun 24, 2021

Merged to master.

wangyum pushed a commit that referenced this pull request May 26, 2023
…995)

Extend the `TransposeWindow` rule to transpose `Window` nodes, that have `Project` between them.

The analyzer will turn a `dataset.withColumn("colName", expressionWithWindowFunction)` method call to a `Project - Window - Project` chain in the logical plan. When this method is called multiple times in a row, then the projects can block the `Window` nodes from being transposed by the current `TransposeWindow` rule.

TPCDS q47 and q57 are also improved by this.

No

UT

Closes #31980 from tanelk/SPARK-34807_transpose_window.

Lead-authored-by: tanel.kiis@gmail.com <tanel.kiis@gmail.com>
Co-authored-by: Tanel Kiis <tanel.kiis@gmail.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>

Co-authored-by: Tanel Kiis <tanel.kiis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants