Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Make bloom_filter_agg fall back when might_contain is not transformable #3917

Merged
merged 5 commits into from
Dec 6, 2023

Conversation

zhli1142015
Copy link
Contributor

@zhli1142015 zhli1142015 commented Dec 4, 2023

What changes were proposed in this pull request?

Fallback bloom_filter_agg when might_contain fall back, either because of might_contain self or other expressions in same operator. This fixes below error:

java.io.IOException: Unexpected Bloom filter version number (16777217)
  at org.apache.spark.util.sketch.BloomFilterImpl.readFrom0(BloomFilterImpl.java:256)
  at org.apache.spark.util.sketch.BloomFilterImpl.readFrom(BloomFilterImpl.java:265)
  at org.apache.spark.util.sketch.BloomFilter.readFrom(BloomFilter.java:178)
  at org.apache.spark.sql.catalyst.expressions.BloomFilterMightContain.deserialize(BloomFilterMightContain.scala:111)

How was this patch tested?

UT.

Copy link

github-actions bot commented Dec 4, 2023

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Dec 4, 2023

Run Gluten Clickhouse CI

@zhli1142015 zhli1142015 marked this pull request as ready for review December 4, 2023 23:28
@zhli1142015 zhli1142015 changed the title [WIP][VL] fallback bloom_filter_agg when might_contain is fallbacked [VL] fallback bloom_filter_agg when might_contain is fallbacked Dec 4, 2023
@zhli1142015
Copy link
Contributor Author

@JkSelf , @PHILO-HE , @jinchengchenghh , coudl you help to review this?
Thanks.

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work! Just some suggestions.

@@ -714,6 +714,13 @@ case class AddTransformHintRule() extends Rule[SparkPlan] {
s"${e.getMessage}, original sparkplan is " +
s"${plan.getClass}(${plan.children.toList.map(_.getClass)})")
}

if (TransformHints.isAlreadyTagged(plan) && TransformHints.isNotTransformable(plan)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, we can move this part into a dedicated rule after AddTransformHintRule (see link).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, moved to a separate rule.

@@ -130,6 +130,33 @@ class Spark34Shims extends SparkShims {
@transient locations: Array[String] = Array.empty): PartitionedFile =
PartitionedFile(partitionValues, SparkPath.fromPathString(filePath), start, length, locations)

override def handleBloomFilterFallback(plan: SparkPlan)(fun: SparkPlan => Unit): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method looks repeated. If so, maybe, we can just move this method into a common place. But just move the check for BloomFilterAggregate into shims, as I note it doesn't exist in spark3.2.

Spark 3.3/3.4:

def hasBloomFilterAggregate(agg): Boolean = {
  agg.aggregateExpressions.exists( expr.aggregateFunction.isInstanceOf[BloomFilterAggregate])
}

Spark 3.2:

def hasBloomFilterAggregate(agg): Boolean = {
   false
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks,

@PHILO-HE PHILO-HE changed the title [VL] fallback bloom_filter_agg when might_contain is fallbacked [VL] Make bloom_filter_agg fall back when might_contain is not transformable Dec 5, 2023
@PHILO-HE
Copy link
Contributor

PHILO-HE commented Dec 5, 2023

Also note this doc: https://github.com/oap-project/gluten/blob/main/docs/velox-backend-limitations.md#runtime-bloomfilter. With this fix, we need to document the behavior into fallback section. Thanks!

Copy link

github-actions bot commented Dec 5, 2023

Run Gluten Clickhouse CI

@zhli1142015
Copy link
Contributor Author

Also note this doc: https://github.com/oap-project/gluten/blob/main/docs/velox-backend-limitations.md#runtime-bloomfilter. With this fix, we need to document the behavior into fallback section. Thanks!

Thanks, updated the doc.

Copy link

github-actions bot commented Dec 5, 2023

Run Gluten Clickhouse CI

* need fall back related bloom filter agg.
*/
case class FallbackBloomFilterAggIfNeeded() extends Rule[SparkPlan] {
override def apply(plan: SparkPlan): SparkPlan = plan.transformDown {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Maybe, we can skip the handling and just return the original plan if spark.gluten.sql.native.bloomFilter=false. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, thanks.

Copy link

github-actions bot commented Dec 5, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Dec 5, 2023

Run Gluten Clickhouse CI

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!

Copy link
Contributor

@JkSelf JkSelf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @zhli1142015 @PHILO-HE .

@JkSelf JkSelf merged commit d7d8e28 into apache:main Dec 6, 2023
17 checks passed
@ulysses-you
Copy link
Contributor

Hi all, there is a conflict between this pr and #3843. I sent a #3940 to fix it.

loneylee pushed a commit to loneylee/gluten that referenced this pull request Dec 8, 2023
loneylee added a commit to loneylee/gluten that referenced this pull request Dec 8, 2023
loneylee added a commit to loneylee/gluten that referenced this pull request Dec 8, 2023
baibaichen pushed a commit that referenced this pull request Dec 8, 2023
…t transformable (#3917)" (#3977)

* Revert "[VL] Make bloom_filter_agg fall back when might_contain is not transformable (#3917)"

This reverts commit d7d8e28.

* Revert "[GLUTEN-3917][FOLLOWUP] Add back SparkShimLoader import (#3940)"

This reverts commit 81bb6c9.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants