Skip to content

Conversation

@ulysses-you
Copy link
Contributor

What changes were proposed in this pull request?

Use runtime statistics to decide if we can convert join to shuffled hash join.

Why are the changes needed?

Use AQE runtime statistics to decide if we can use shuffled hash join instead of sort merge join. Currently, the formula of shuffled hash join selection dose not work due to the dymanic shuffle partition number.

Add a new config spark.sql.adaptive.shuffledHashJoinLocalMapThreshold to decide if join can be converted to shuffled hash join safely.

Does this PR introduce any user-facing change?

Yes, add a new config.

How was this patch tested?

Add new test.

@github-actions github-actions bot added the SQL label May 6, 2021
@ulysses-you
Copy link
Contributor Author

cc @maropu @cloud-fan @maryannxue @c21 do you have any thought about this new config ?

@SparkQA
Copy link

SparkQA commented May 6, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42713/

s"${PREFER_SORTMERGEJOIN.key} is false.")
.version("3.2.0")
.bytesConf(ByteUnit.BYTE)
.createWithDefaultString("64MB")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why we choose this default value? to be same as spark.sql.adaptive.shuffle.targetPostShuffleInputSize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main idea is that the default skew join size is 256MB and the local map should smaller 3x(follow the existed formula) than other side. So assume the local map size is 64MB and other side is 192MB.

Comment on lines +60 to +61
isRuntime: Boolean = false,
mapOutputStatistics: Option[MapOutputStatistics] = None) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's a bit weird that Statistics has a field MapOutputStatistics where MapOutputStatistics is a physical shuffle operator only thing, but Statistics is for all logical operators. Maybe we can have:

RunTimeStatsSpec(
  isRuntime: Boolean,
  sizeInBytesPerPartition: Option[Array[Long]]
)

Statistics(
  runTimeStatsSpec: Option[RunTimeStatsSpec] = None,
  ...
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have the similar thought and the change you point out seems a better approach. What do you think about ? @maropu @cloud-fan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, we took anther approach to support SHJ in AQE. We added a rule in AdaptiveSparkPlanExec to convert SMJ to SHJ according to shuffle stats, which requires no changes in Statistics.scala as the statistics is ready in ShuffleStageInfo.

The SMJ could also be converted to SHJ if applicable even if PREFER_SORTMERGE is set. cc @Liulietong

cc @luuliietong

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added a rule in AdaptiveSparkPlanExec to convert SMJ to SHJ according to shuffle stats

This looks like a better idea. Do you want to open a PR for it?

@SparkQA
Copy link

SparkQA commented May 6, 2021

Test build #138192 has finished for PR 32450 at commit 0766487.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* Note: this assume that the number of partition is fixed, requires additional work if it's
* dynamic.
* In AQE framework, we use runtime statistics to check if we can build local map. Only if
* all the partition size not large than `ADAPTIVE_SHUFFLE_HASH_JOIN_LOCAL_MAP_THRESHOLD`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size not large -> size is not larger

* dynamic.
* In AQE framework, we use runtime statistics to check if we can build local map. Only if
* all the partition size not large than `ADAPTIVE_SHUFFLE_HASH_JOIN_LOCAL_MAP_THRESHOLD`,
* we allow to build local hash map.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build local -> build a local

@SparkQA
Copy link

SparkQA commented May 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42772/

@SparkQA
Copy link

SparkQA commented May 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42772/

@SparkQA
Copy link

SparkQA commented May 7, 2021

Test build #138250 has finished for PR 32450 at commit a706472.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43140/

@SparkQA
Copy link

SparkQA commented May 17, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43140/

@SparkQA
Copy link

SparkQA commented May 17, 2021

Test build #138620 has finished for PR 32450 at commit e8283a2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you ulysses-you deleted the SPARK-35282 branch November 22, 2021 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants