-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25379][SQL] Improve AttributeSet and ColumnPruning performance #22364
Conversation
Test build #95828 has finished for PR 22364 at commit
|
new AttributeSet( | ||
baseSet | ||
.flatMap(_.references) | ||
.map(new AttributeEquals(_)).toSet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi, good catch. one question here; is it bad to just simply write new AttributeSet(baseSet.map(_.references.baseSet).foldLeft(Set.empty[AttributeEquals])(_ ++ _)))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just duplicating the code in fromAttributeSets
, right? What would be the benefit of doing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simple one is better? maropu@cba767e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so there are 2 reasons why the current one is better than the proposed:
- in your proposal we are calling
_.references
also in cases when it is not needed. This causes unneeded operations (basically https://github.com/apache/spark/pull/22364/files#diff-75576f0ec7f9d8b5032000245217d233R40 is called for each attribute... see alsoAttribute.references
); - the current one uses a mutable Set, but this is not a big deal, it can be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, ok. can you rename fromAttributeSets
-> apply
, and then change the mutable set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot rename to apply
because there is already an apply
with Iterable
and we cannot have 2 apply
one with Iterable[Expression]
and one with Iterable[AbstractSet]
because they would be conflicting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is ok. We still need to use the mutable set instead of new AttributeSet(sets.foldLeft(Set.empty[AttributeEquals])(_ ++ _.baseSet))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to use a LinkedHashSet
because without keeping the order of insertion I had some UT failures. But I can use the more compact syntax you suggested here, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment and help here!
Test build #95869 has finished for PR 22364 at commit
|
Test build #95944 has finished for PR 22364 at commit
|
retest this please |
Test build #95957 has finished for PR 22364 at commit
|
anymore comments @maropu @gatorsmile ? |
Can we replace the syntax( spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala Line 294 in 9deddbb
|
@maropu yes, that can be done as well, but I think the main focus of this PR is the In this way, we can also better track and evaluate the performance gain we can obtain with a given change. What do you think? Thanks. |
IIUC this pr targets to improve |
@maropu anyway I checked and that is the only other places where this pattern happens. So I am ok including it here. The point is that there the situation is a bit different, ie. it is not an I'll try and do some benchmarks and I'll do the change if I see a significant difference. Thanks. |
@maropu I have run the following benchmark:
And the output is:
So for the case you mentioned, using
The point is that there we have a Hope this is clear, let me know otherwise. Thanks. |
oh, yea, thanks! I wrongly mixed up |
Basically, this change looks good to me. I leave this to other reviewers. |
thanks @maropu for your review! @gatorsmile do you have any comments? |
also, cc: @dongjoon-hyun |
I also run on the TPCDS and TPCH benchmark with 10 runs:
Which confirm a ~20% improvement after the change (with wide-column datasets, the improvement is much higher as showed by the benchmark referenced in the PR description). |
@cloud-fan @dongjoon-hyun @gatorsmile any luck with this? Thanks. |
LGTM, merging to master! |
@mgaido91 well done! |
thanks @cloud-fan and @maropu |
## What changes were proposed in this pull request? This PR contains 3 optimizations: 1) it improves significantly the operation `--` on `AttributeSet`. As a benchmark for the `--` operation, the following code has been run ``` test("AttributeSet -- benchmark") { val attrSetA = AttributeSet((1 to 100).map { i => AttributeReference(s"c$i", IntegerType)() }) val attrSetB = AttributeSet(attrSetA.take(80).toSeq) val attrSetC = AttributeSet((1 to 100).map { i => AttributeReference(s"c2_$i", IntegerType)() }) val attrSetD = AttributeSet((attrSetA.take(50) ++ attrSetC.take(50)).toSeq) val attrSetE = AttributeSet((attrSetC.take(50) ++ attrSetA.take(50)).toSeq) val n_iter = 1000000 val t0 = System.nanoTime() (1 to n_iter) foreach { _ => val r1 = attrSetA -- attrSetB val r2 = attrSetA -- attrSetC val r3 = attrSetA -- attrSetD val r4 = attrSetA -- attrSetE } val t1 = System.nanoTime() val totalTime = t1 - t0 println(s"Average time: ${totalTime / n_iter} us") } ``` The results are: ``` Before PR - Average time: 67674 us (100 %) After PR - Average time: 28827 us (42.6 %) ``` 2) In `ColumnPruning`, it replaces the occurrences of `(attributeSet1 -- attributeSet2).nonEmpty` with `attributeSet1.subsetOf(attributeSet2)` which is order of magnitudes more efficient (especially where there are many attributes). Running the previous benchmark replacing `--` with `subsetOf` returns: ``` Average time: 67 us (0.1 %) ``` 3) Provides a more efficient way of building `AttributeSet`s, which can greatly improve the performance of the methods `references` and `outputSet` of `Expression` and `QueryPlan`. This basically avoids unneeded operations (eg. creating many `AttributeEqual` wrapper classes which could be avoided) The overall effect of those optimizations has been tested on `ColumnPruning` with the following benchmark: ``` test("ColumnPruning benchmark") { val attrSetA = (1 to 100).map { i => AttributeReference(s"c$i", IntegerType)() } val attrSetB = attrSetA.take(80) val attrSetC = attrSetA.take(20).map(a => Alias(Add(a, Literal(1)), s"${a.name}_1")()) val input = LocalRelation(attrSetA) val query1 = Project(attrSetB, Project(attrSetA, input)).analyze val query2 = Project(attrSetC, Project(attrSetA, input)).analyze val query3 = Project(attrSetA, Project(attrSetA, input)).analyze val nIter = 100000 val t0 = System.nanoTime() (1 to nIter).foreach { _ => ColumnPruning(query1) ColumnPruning(query2) ColumnPruning(query3) } val t1 = System.nanoTime() val totalTime = t1 - t0 println(s"Average time: ${totalTime / nIter} us") } ``` The output of the test is: ``` Before PR - Average time: 733471 us (100 %) After PR - Average time: 362455 us (49.4 %) ``` The performance improvement has been evaluated also on the `SQLQueryTestSuite`'s queries: ``` (before) org.apache.spark.sql.catalyst.optimizer.ColumnPruning 518413198 / 1377707172 2756 / 15717 (after) org.apache.spark.sql.catalyst.optimizer.ColumnPruning 415432579 / 1121147950 2756 / 15717 % Running time 80.1% / 81.3% ``` Also other rules benefit especially from (3), despite the impact is lower, eg: ``` (before) org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 307341442 / 623436806 2154 / 16480 (after) org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 290511312 / 560962495 2154 / 16480 % Running time 94.5% / 90.0% ``` The reason why the impact on the `SQLQueryTestSuite`'s queries is lower compared to the other benchmark is that the optimizations are more significant when the number of attributes involved is higher. Since in the tests we often have very few attributes, the effect there is lower. ## How was this patch tested? run benchmarks + existing UTs Closes apache#22364 from mgaido91/SPARK-25379. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR contains 3 optimizations:
--
onAttributeSet
. As a benchmark for the--
operation, the following code has been runThe results are:
ColumnPruning
, it replaces the occurrences of(attributeSet1 -- attributeSet2).nonEmpty
withattributeSet1.subsetOf(attributeSet2)
which is order of magnitudes more efficient (especially where there are many attributes). Running the previous benchmark replacing--
withsubsetOf
returns:AttributeSet
s, which can greatly improve the performance of the methodsreferences
andoutputSet
ofExpression
andQueryPlan
. This basically avoids unneeded operations (eg. creating manyAttributeEqual
wrapper classes which could be avoided)The overall effect of those optimizations has been tested on
ColumnPruning
with the following benchmark:The output of the test is:
The performance improvement has been evaluated also on the
SQLQueryTestSuite
's queries:Also other rules benefit especially from (3), despite the impact is lower, eg:
The reason why the impact on the
SQLQueryTestSuite
's queries is lower compared to the other benchmark is that the optimizations are more significant when the number of attributes involved is higher. Since in the tests we often have very few attributes, the effect there is lower.How was this patch tested?
run benchmarks + existing UTs