Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Support C2R and R2C between broadcast relations #4544

Merged
merged 13 commits into from
Jan 31, 2024

Conversation

zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Jan 26, 2024

The patch removes the restriction that broadcast hash join and broadcast exchange should be enabled and validated at the same time. spark.gluten.sql.columnar.broadcastExchange spark.gluten.sql.columnar.broadcastJoin can be then turned on/off individually.

C2Rs / R2Cs will work as expected to convert between vanilla Spark broadcast relation and Gluten (Velox, as of now)'s broadcast relation.

By doing this, the related broadcast exchange+join coherent validation rules will be removed from core module. They will continue existing in backends-clickhouse module until we implement this feature for CH backend.

Require for #4533

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

2 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Comment on lines 1 to 209
case bhj: BroadcastHashJoinExec =>
// FIXME Hongze: In following codes we perform a lot of if-else conditions to
// make sure the broadcast exchange and broadcast hash-join are of same type,
// either vanilla or columnar. In order to simplify the codes we have to do
// some tricks around C2R and R2C to make them adapt to columnar broadcast.
// Currently their doBroadcast() methods just propagate child's broadcast
// payloads which is not right in speaking of columnar.
if (!enableColumnarBroadcastJoin) {
TransformHints.tagNotTransformable(
bhj,
"columnar BroadcastJoin is not enabled in BroadcastHashJoinExec")
} else {
val isBhjTransformable: ValidationResult = {
val transformer = BackendsApiManager.getSparkPlanExecApiInstance
.genBroadcastHashJoinExecTransformer(
bhj.leftKeys,
bhj.rightKeys,
bhj.joinType,
bhj.buildSide,
bhj.condition,
bhj.left,
bhj.right,
isNullAwareAntiJoin = bhj.isNullAwareAntiJoin)
transformer.doValidate()
}
val buildSidePlan = bhj.buildSide match {
case BuildLeft => bhj.left
case BuildRight => bhj.right
}

val maybeExchange = buildSidePlan
.find {
case BroadcastExchangeExec(_, _) => true
case _ => false
}
.map(_.asInstanceOf[BroadcastExchangeExec])

maybeExchange match {
case Some(exchange @ BroadcastExchangeExec(mode, child)) =>
TransformHints.tag(bhj, isBhjTransformable.toTransformHint)
if (!isBhjTransformable.isValid) {
TransformHints.tagNotTransformable(exchange, isBhjTransformable)
}
case None =>
// we are in AQE, find the hidden exchange
// FIXME did we consider the case that AQE: OFF && Reuse: ON ?
var maybeHiddenExchange: Option[BroadcastExchangeLike] = None
breakable {
buildSidePlan.foreach {
case e: BroadcastExchangeLike =>
maybeHiddenExchange = Some(e)
break
case t: BroadcastQueryStageExec =>
t.plan.foreach {
case e2: BroadcastExchangeLike =>
maybeHiddenExchange = Some(e2)
break
case r: ReusedExchangeExec =>
r.child match {
case e2: BroadcastExchangeLike =>
maybeHiddenExchange = Some(e2)
break
case _ =>
}
case _ =>
}
case _ =>
}
}
// restriction to force the hidden exchange to be found
val exchange = maybeHiddenExchange.get
// to conform to the underlying exchange's type, columnar or vanilla
exchange match {
case BroadcastExchangeExec(mode, child) =>
TransformHints.tagNotTransformable(
bhj,
"it's a materialized broadcast exchange or reused broadcast exchange")
case ColumnarBroadcastExchangeExec(mode, child) =>
if (!isBhjTransformable.isValid) {
throw new IllegalStateException(
s"BroadcastExchange has already been" +
s" transformed to columnar version but BHJ is determined as" +
s" non-transformable: ${bhj.toString()}")
}
TransformHints.tagTransformable(bhj)
}
}
}
}
} catch {
case e: UnsupportedOperationException =>
TransformHints.tagNotTransformable(
p,
s"${e.getMessage}, original Spark plan is " +
s"${p.getClass}(${p.children.toList.map(_.getClass)})")
}
}
plan
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code in this file is moved from gluten-core to backends-clickhouse.

Copy link

Run Gluten Clickhouse CI

4 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer changed the title WIP: [VL] Support C2R and R2C between broadcast relations [VL] Support C2R and R2C between broadcast relations Jan 30, 2024
@zhztheplayer zhztheplayer marked this pull request as ready for review January 30, 2024 01:40
Copy link

Run Gluten Clickhouse CI

fixup

fixup

fixup

fixup

fixup

fixup

fixup

fixup

fixup

fixup

style

style

fixup

fixup

fixup

fixup

fixup

fixup

fixup

fixup

fixup

fixup

fixup

fixup

WIP

[VL] Support Gluten BHJ + Vanilla BE (broadcast exchange), Vanilla BHJ + Gluten BE
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@@ -1005,6 +1005,7 @@ class VeloxTestSettings extends BackendTestSettings {
"SPARK-9083: sort with non-deterministic expressions"
)
enableSuite[GlutenDataFrameTimeWindowingSuite]
.exclude("time window joins") // FIXME hongze
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test fails by

Input schema contains unsupported type when convert row to columnar for StructType(StructField(window,StructType(StructField(start,TimestampNTZType,true),StructField(end,TimestampNTZType,true)),false),StructField(othervalue,IntegerType,false)) due to do not support data type: TimestampNTZType
java.lang.UnsupportedOperationException: Input schema contains unsupported type when convert row to columnar for StructType(StructField(window,StructType(StructField(start,TimestampNTZType,true),StructField(end,TimestampNTZType,true)),false),StructField(othervalue,IntegerType,false)) due to do not support data type: TimestampNTZType
	at io.glutenproject.execution.RowToVeloxColumnarExec.$anonfun$doExecuteColumnarInternal$1(RowToVeloxColumnarExec.scala:52)
	at scala.Option.foreach(Option.scala:407)
	at io.glutenproject.execution.RowToVeloxColumnarExec.doExecuteColumnarInternal(RowToVeloxColumnarExec.scala:49)
	at io.glutenproject.execution.RowToColumnarExecBase.doExecuteColumnar(RowToColumnarExecBase.scala:62)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:221)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
	at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:217)
	at io.glutenproject.backendsapi.velox.SparkPlanExecApiImpl.createBroadcastRelation(SparkPlanExecApiImpl.scala:332)
	at org.apache.spark.sql.execution.ColumnarBroadcastExchangeExec.$anonfun$relationFuture$2(ColumnarBroadcastExchangeExec.scala:79)
	at io.glutenproject.utils.Arm$.withResource(Arm.scala:25)
	at io.glutenproject.metrics.GlutenTimeMetric$.millis(GlutenTimeMetric.scala:37)
	at org.apache.spark.sql.execution.ColumnarBroadcastExchangeExec.$anonfun$relationFuture$1(ColumnarBroadcastExchangeExec.scala:69)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Will fix in a later patch.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Member Author

@marin-ma @zhouyuan

@zhztheplayer
Copy link
Member Author

cc @Surbhi-Vijay

@zhztheplayer
Copy link
Member Author

@zzcclp Please review CH's changes, thanks.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@Surbhi-Vijay Surbhi-Vijay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhztheplayer the changes looks good to me. Added some queries and minor suggestions.
Thanks.

  • Surbhi

@@ -152,6 +157,20 @@ case class ShuffledHashJoinExecTransformer(
copy(left = newLeft, right = newRight)
}

case class VeloxBroadcastBuildSideRDD(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case class is not specific to hash join and can be moved to generic place or to a new file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm moving it out from this file. If you think it can be reused for BNLJ.

from: Broadcast[F],
fn: Iterator[InternalRow] => Iterator[ColumnarBatch]): Broadcast[T] = {
// HashedRelation to ColumnarBuildSideRelation.
val fromBroadcast = from.asInstanceOf[Broadcast[HashedRelation]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a check here for BroadcastMode? HashedRelation will only be present in case of HashedRelationBroadcastMode . It will be Array[InternalRow] in case of IdentityBroadcastmode

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checked your comment FIXME: Add checking for broadcast mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will add some checks for mode.

case IdentityBroadcastMode =>
throw new IllegalStateException("Unreachable code")
case HashedRelationBroadcastMode(_, _) =>
val serialized: Array[ColumnarBatchSerializeResult] = child
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no hash table created in Gluten for the broadcast relation. IdentityBroadcastMode and hashedRelationBroadcastMode are actually same in case of Gluten.

Please check below comment in ColumnarBroadcastExchangeExec

 //this created relation ignore HashedRelationBroadcastMode isNullAware, because we
            // cannot get child output rows, then compare the hash key is null, if not null,
            // compare the isNullAware, so gluten will not generate HashedRelationWithAllNullKeys
            // or EmptyHashedRelation, this difference will cause performance regression in some
            // cases.

This match block for mode can be removed safely when BNLJ is supported. Please correct me if my understanding is wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption is correct to me. Currently Gluten doesn't handle anything about IdentityBroadcastMode so it's reasonable to throw when it sees one. If BNLJ requires the same code with BHJ in this code block, then the one for IdentityBroadcastMode can be removed.

.filter(_.numRows() != 0)
.map(
b => {
ColumnarBatches.retain(b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be out of memory scenarios? since the original broadcast will be there in heap memory and then again memory will be allotted until the serialization is finished.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, you are considering the amplified memory consumption by the original hash relation + converted columnar batch + serialized byte arrays?

If so, it should become a problem when the broadcast threshold is set to a large number. We can optimize it in further development iterations with some benchmarks.

super.sparkConf
.set("spark.sql.sources.useV1SourceList", "parquet")
.set("spark.sql.autoBroadcastJoinThreshold", "30M")
.set("spark.gluten.sql.columnar.broadcastJoin", "false")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be one more test case where both the configs broadcastJoin and broadcastExchange are enabled ?

Copy link
Member Author

@zhztheplayer zhztheplayer Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other existing suites in the same file should already cover the case you mentioned.

@zzcclp
Copy link
Contributor

zzcclp commented Jan 31, 2024

LGTM, we will support this feature for CH backend later

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@zhztheplayer
Copy link
Member Author

/Benchmark Velox

1 similar comment
@zhztheplayer
Copy link
Member Author

/Benchmark Velox

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4544_time.csv log/native_master_01_30_2024_ec7f10c58_time.csv difference percentage
q1 35.47 35.36 -0.111 99.69%
q2 24.11 24.10 -0.012 99.95%
q3 36.49 38.51 2.018 105.53%
q4 38.23 38.16 -0.068 99.82%
q5 70.11 68.29 -1.823 97.40%
q6 5.51 7.27 1.755 131.83%
q7 84.38 82.16 -2.218 97.37%
q8 84.82 84.80 -0.014 99.98%
q9 124.22 126.41 2.182 101.76%
q10 44.72 44.51 -0.213 99.52%
q11 20.65 20.56 -0.088 99.57%
q12 29.70 28.18 -1.523 94.87%
q13 45.60 45.37 -0.229 99.50%
q14 20.13 16.65 -3.477 82.72%
q15 27.19 28.38 1.183 104.35%
q16 14.39 14.19 -0.203 98.59%
q17 100.79 101.24 0.446 100.44%
q18 148.27 149.10 0.831 100.56%
q19 13.89 12.63 -1.262 90.92%
q20 26.50 26.37 -0.135 99.49%
q21 225.71 226.96 1.251 100.55%
q22 13.51 13.51 -0.006 99.96%
total 1234.42 1232.71 -1.715 99.86%

@zhztheplayer zhztheplayer merged commit adedf3d into apache:main Jan 31, 2024
16 of 20 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4544_time.csv log/native_master_01_30_2024_ec7f10c58_time.csv difference percentage
q1 33.18 35.36 2.187 106.59%
q2 24.43 24.10 -0.331 98.64%
q3 37.86 38.51 0.652 101.72%
q4 38.14 38.16 0.023 100.06%
q5 69.74 68.29 -1.450 97.92%
q6 5.39 7.27 1.880 134.89%
q7 84.03 82.16 -1.876 97.77%
q8 85.49 84.80 -0.692 99.19%
q9 123.07 126.41 3.337 102.71%
q10 42.47 44.51 2.039 104.80%
q11 20.82 20.56 -0.252 98.79%
q12 25.50 28.18 2.677 110.49%
q13 45.31 45.37 0.067 100.15%
q14 21.04 16.65 -4.389 79.14%
q15 28.33 28.38 0.048 100.17%
q16 16.25 14.19 -2.054 87.36%
q17 102.30 101.24 -1.064 98.96%
q18 149.17 149.10 -0.068 99.95%
q19 12.60 12.63 0.033 100.27%
q20 25.93 26.37 0.443 101.71%
q21 224.45 226.96 2.517 101.12%
q22 13.82 13.51 -0.315 97.72%
total 1229.30 1232.71 3.410 100.28%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants