Skip to content

Conversation

@zwangsheng
Copy link
Contributor

@zwangsheng zwangsheng commented Sep 22, 2023

Why are the changes needed?

Fast fail if there are too much un-support operators in spark plan when enabled gluten.

Too much un-support operators when enabled gluten may significantly reduce the performance, so we want to notify user by fast fail this application.

If the user can tolerate this kind of performance penalty, you can increase the value of the spark.sql.gluten.fallbackOperatorThreshold.

See more gluten un-support operators in Velox Backend Support Progress

Note:

Gluten-Related extension is experimental and under rapid development, this configuration is added to allow user to control extension, not intended exposing to end users, it may be removed in anytime.

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

Was this patch authored or co-authored using generative AI tooling?

No

@codecov-commenter
Copy link

codecov-commenter commented Sep 24, 2023

Codecov Report

Merging #5321 (8a40a9d) into master (ffebc64) will not change coverage.
Report is 16 commits behind head on master.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           master   #5321   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         590     588    -2     
  Lines       33436   33399   -37     
  Branches     4422    4387   -35     
======================================
+ Misses      33436   33399   -37     

see 11 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@zwangsheng zwangsheng marked this pull request as ready for review September 25, 2023 07:25
@zwangsheng
Copy link
Contributor Author

cc @ulysses-you

@zwangsheng zwangsheng changed the title [WIP][NOT MERGE][EXTENSION] Inject Rule to fast fail gluten app if there are too much un-support operator [WIP][EXTENSION] Inject Rule to fast fail gluten app if there are too much un-support operator Sep 25, 2023
_: Bin |
_: Contains |
_: StartsWith |
_: EndsWith |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems Contains, StartsWith, EndsWith are supported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"intended exposing to end users, it may be removed in anytime.")
.version("1.8.0")
.intConf
.createWithDefault(5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use Int.MaxValue to avoid breaking something

override def apply(plan: SparkPlan): SparkPlan = {
val count = plan.collect {
case p: FileSourceScanExec
if !p.relation.fileFormat.isInstanceOf[ParquetFileFormat] =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip orc file format

_: BroadcastNestedLoopJoinExec =>
true
case p: SparkPlan
if p.expressions.exists(e =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

          if p.expressions.exists(_.exists {
            ...
          })

@zwangsheng zwangsheng requested a review from ulysses-you October 7, 2023 10:03
.createWithDefault(Int.MaxValue)

val GLUTEN_NON_SUPPORT_OPERATOR_LIST =
buildConf("spark.sql.gluten.nonSupportOperatorList")
Copy link
Member

@pan3793 pan3793 Oct 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about naming spark.sql.gluten.fastFallbackOperators? and as an open source project, we'd better make the default value match a released version instead of the current main branch of Gluten

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, let's remove the default value and mark it as optional.

}

object GlutenPlanAnalysis extends Rule[SparkPlan] {
private val nonSupportedOperatorList = conf.getConf(GLUTEN_NON_SUPPORT_OPERATOR_LIST)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this inside apply, so we can change this config at runtime

@zwangsheng
Copy link
Contributor Author

thanks @ulysses-you @pan3793 , make some changes:

  1. Rename fallback operator config key
  2. make fallback operator as optional
  3. make fall back operator can be changed in runtime case(Get this operators each call apply method)

@zwangsheng zwangsheng closed this by deleting the head repository Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants