Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26205][SQL] Optimize InSet Expression for bytes, shorts, ints, dates #23171

Closed
wants to merge 5 commits into from

Conversation

aokolnychyi
Copy link
Contributor

@aokolnychyi aokolnychyi commented Nov 28, 2018

What changes were proposed in this pull request?

This PR optimizes InSet expressions for byte, short, integer, date types. It is a follow-up on PR #21442 from @dbtsai.

In expressions are compiled into a sequence of if-else statements, which results in O(n) time complexity. InSet is an optimized version of In, which is supposed to improve the performance if all values are literals and the number of elements is big enough. However, InSet actually worsens the performance in many cases due to various reasons.

The main idea of this PR is to use Java switch statements to significantly improve the performance of InSet expressions for bytes, shorts, ints, dates. All switch statements are compiled into tableswitch and lookupswitch bytecode instructions. We will have O(1) time complexity if our case values are compact and tableswitch can be used. Otherwise, lookupswitch will give us O(log n).

Locally, I tried Spark OpenHashSet and primitive collections from fastutils in order to solve the boxing issue in InSet. Both options significantly decreased the memory consumption and fastutils improved the time compared to HashSet from Scala. However, the switch-based approach was still more than two times faster even on 500+ non-compact elements.

I also noticed that applying the switch-based approach on less than 10 elements gives a relatively minor improvement compared to the if-else approach. Therefore, I placed the switch-based logic into InSet and added a new config to track when it is applied. Even if we migrate to primitive collections at some point, the switch logic will be still faster unless the number of elements is really big. Another option is to have a separate InSwitch expression. However, this would mean we need to modify other places (e.g., DataSourceStrategy).

See here and here for more information.

This PR does not cover long values as Java switch statements cannot be used on them. However, we can have a follow-up PR with an approach similar to binary search.

How was this patch tested?

There are new tests that verify the logic of the proposed optimization.

The performance was evaluated using existing benchmarks. This PR was also tested on an EC2 instance (OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 4.14.77-70.59.amzn1.x86_64, Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz).

Notes

  • This link contains source code that decides between tableswitch and lookupswitch. The logic was re-used in the benchmarks. See the isLookupSwitch method.

}

private def isSwitchCompatible: Boolean = list.forall {
case Literal(_, dt) => dt == ByteType || dt == ShortType || dt == IntegerType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case Literal(_, dt) if dt == ByteType || dt == ShortType || dt == IntegerType => true

is easier to read?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified to?

private def isSwitchCompatible: Boolean = {
  inSetConvertible && (value.dataType == ByteType || value.dataType == ShortType || value.dataType == IntegerType)
}

@aokolnychyi
Copy link
Contributor Author

aokolnychyi commented Nov 28, 2018

@gatorsmile @cloud-fan @dongjoon-hyun @viirya It would be great to have your feedback.

val (nullLiterals, nonNullLiterals) = list.partition {
case Literal(null, _) => true
case _ => false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is null in the list, it will be only one. As a result, we may not need to use nullLiterals.

val containNullInList = ...
val nonNullLiterals = ... 

Copy link
Contributor

@cloud-fan cloud-fan Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can follow InSet, define a hasNull ahead, and filter out null values from the list before processing.

@dbtsai
Copy link
Member

dbtsai commented Nov 28, 2018

The approach looks great, and can significantly improve the performance. For Long, I agree that we should also implement binary search approach for O(logn) look up.

Wondering which one will be faster, binary search using arrays or rewrite the if-else in binary search form.

@SparkQA
Copy link

SparkQA commented Nov 28, 2018

Test build #99393 has finished for PR 23171 at commit 1477f10.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Also cc @ueshin

@cloud-fan
Copy link
Contributor

I'm wondering if this is still useful after we fix the boxing issue in InSet. We can write a binary hash set for primitive types, like LongToUnsafeRowMap, which should have better performance.

val listGen = nonNullLiterals.map(_.genCode(ctx))
val valueGen = value.genCode(ctx)

val caseBranches = listGen.map(literal =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style:

listGen.map { literal =>
  ...
}

}

private def isSwitchCompatible: Boolean = list.forall {
case Literal(_, dt) => dt == ByteType || dt == ShortType || dt == IntegerType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified to?

private def isSwitchCompatible: Boolean = {
  inSetConvertible && (value.dataType == ByteType || value.dataType == ShortType || value.dataType == IntegerType)
}

@aokolnychyi
Copy link
Contributor Author

aokolnychyi commented Nov 29, 2018

@cloud-fan, yeah, let’s see if this PR is useful.

The original idea wasn’t to avoid fixing autoboxing in InSet. In was tested on 250 numbers to prove O(1) time complexity on compact values and outline problems with InSet. After this change, In will be faster than InSet but this is not the goal. Overall, the intent was to have a tiny and straightforward change that would optimize In expressions even if the number of elements is less than “spark.sql.optimizer.inSetConversionThreshold” and Spark does not use InSet.

Once we solve autoboxing issues in InSet, we would need to benchmark against this approach in order to compare to the most efficient implementation of In.

Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my (maybe stupid?) question is: one we do such a change, does it still make sense to convert In to InSet? Most likely now In is even more efficient. Shall we change the optimizer in order to reflect this? Maybe we can do this in a followup.

|${CodeGenerator.JAVA_BOOLEAN} ${ev.value} = false;
|if (!${valueGen.isNull}) {
| switch (${valueGen.value}) {
| ${caseBranches.mkString("")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider that if the number of items is very big, this can cause a compile exception due to the method size limit. So we should use the proper splitting methods for the code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aokolnychyi Could you please address @mgaido91 's comment? The current code will throw an exception for a huge sequence of In.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add test cases that could cause more than 64KB Java bytecode size in one switch statement?

@dbtsai
Copy link
Member

dbtsai commented Nov 29, 2018

@cloud-fan as @aokolnychyi said, switch will still be faster than optimized Set without autoboxing when the number of elements are small. As a result, this PR is still very useful.

@mgaido91 InSet can be better when we implement properly without autoboxing for large numbers of elements controlled by spark.sql.optimizer.inSetConversionThreshold. Also, generating In with huge lists can cause a compile exception due to the method size limit as you pointed out. As a result, we should convert it into InSet for large set.

@mgaido91
Copy link
Contributor

@dbtsai I see, it would be great, though, to check which is this threshold. My understanding is that the current solution has better performance even for several hundreds of items. If this number is some thousands and since this depends on the datatype (so it is hard to control by the users with a single config), it is arguable which is the best solution: I don't think it is very common to have thousands of elements, while for lower numbers (more common) we would use the less efficient solution.

@aokolnychyi
Copy link
Contributor Author

@dbtsai @mgaido91 I think we can come back to this question once SPARK-26203 is resolved. That JIRA will give us enough information about each data type.

@aokolnychyi
Copy link
Contributor Author

aokolnychyi commented Nov 29, 2018

To sum up, I would set the goal of this PR is to make In expressions as efficient as possible for bytes/shorts/ints. Then we can do benchmarks for In vs InSet in SPARK-26203 and try to come up with a solution for InSet in SPARK-26204. By the time we solve SPARK-26204, we will have a clear undestanding of pros and cons in In and InSet and would be able to determine the right thresholds.

This approach sets a pretty high bar even for huge value lists, so it would be a nice basis to benchmark our solution for InSet.

@mgaido91
Copy link
Contributor

yes @aokolnychyi , I agree that the work can be done later (not in the scope of this PR). We can maybe just open a new JIRA about it so we won't forget.

@rxin
Copy link
Contributor

rxin commented Dec 3, 2018

I'm not a big fan of making the physical implementation of an expression very different depending on the situation. It complicates the code base and makes things more difficult to reason about. Why can't we just make InSet efficient and convert these cases to that?

@cloud-fan
Copy link
Contributor

@rxin I proposed the same thing before, but one problem is that, we only convert In to InSet when the length of list reaches the threshold. If the switch way is faster than hash set when the list is small, it seems still worth to optimize In using switch.

@rxin
Copy link
Contributor

rxin commented Dec 4, 2018 via email

@cloud-fan
Copy link
Contributor

I think InSet is not an optimized version of In, but just a way to separate the implementation for different conditions (the length of the list). Maybe we should do the same thing here, create a InSwitch and convert In to it when meeting some conditions. One problem is, In and InSwitch is same in the interpreted version, maybe we should create a base class for them.

@rxin
Copy link
Contributor

rxin commented Dec 4, 2018 via email

@cloud-fan
Copy link
Contributor

How about, we create an OptimizedIn, and convert In to OptimizedIn if the list is all literals? OptimizedIn will pick switch or hash set based on the length of the list.

@dbtsai
Copy link
Member

dbtsai commented Dec 4, 2018

@rxin switch in Java is still significantly faster than hash set even without boxing / unboxing problems when the number of elements are small. We were thinking about to have two implementations in InSet, and pick up switch if the number of elements are small, or otherwise pick up hash set one. But this is the same complexity as having two implements in In as this PR.

@cloud-fan do you suggest to create an OptimizeIn which has switch and hash set implementations based on the length of the elements and remove InSet? Basically, what we were thinking above.

@rxin
Copy link
Contributor

rxin commented Dec 4, 2018 via email

@aokolnychyi
Copy link
Contributor Author

As @rxin said, if we introduce a separate expression for the switch-based approach, then we will need to modify other places. For example, DataSourceStrategy$translateFilter. So, integrating into In or InSet seems safer.

I think we can move the switch-based logic to InSet and make InSet responsible for picking the most optimal execution path. We might need to modify the condition when we convert In to InSet as this will most likely depend on the underlying data type. I saw noticeable improvements starting from 5 elements when you compare the if-else approach to the switch-based one. Right now, the conversion happens for more than 10 elements.

@aokolnychyi
Copy link
Contributor Author

@dbtsai @cloud-fan @mgaido91 @rxin @dongjoon-hyun @viirya @gatorsmile

PR #23291 contains benchmarks for different data types.

@rxin was your latest suggestion to convert In to InSet if all elements are literals independently of data types and the number of elements?

@mgaido91
Copy link
Contributor

thanks @aokolnychyi , may you please post here the result of that benchmark after applying this patch?

Just a quick question: can't we support timestamp too in the switch approach?

@aokolnychyi
Copy link
Contributor Author

@mgaido91 It won't be possible to apply the switch-based approach on timestamps as they are represented as longs. We can try dates as are represented as ints.

Below is the result of that benchmark with this patch:

================================================================================================
In Expression Benchmark
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
5 bytes:                                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   45 /   56        223.3           4.5       1.0X
InSet expression                                58 /   61        173.4           5.8       0.8X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
10 bytes:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   37 /   40        268.8           3.7       1.0X
InSet expression                                63 /   67        158.2           6.3       0.6X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
25 bytes:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   51 /   55        194.4           5.1       1.0X
InSet expression                                87 /   92        114.3           8.7       0.6X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
50 bytes:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   74 /   79        135.4           7.4       1.0X
InSet expression                               128 /  138         78.1          12.8       0.6X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
100 bytes:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                  123 /  128         81.1          12.3       1.0X
InSet expression                               197 /  212         50.7          19.7       0.6X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
5 shorts:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   29 /   31        342.2           2.9       1.0X
InSet expression                               110 /  114         90.8          11.0       0.3X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
10 shorts:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   25 /   27        407.3           2.5       1.0X
InSet expression                               122 /  127         82.1          12.2       0.2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
25 shorts:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   27 /   29        367.2           2.7       1.0X
InSet expression                               124 /  127         80.9          12.4       0.2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
50 shorts:                               Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   26 /   27        386.5           2.6       1.0X
InSet expression                               158 /  162         63.2          15.8       0.2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
100 shorts:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   28 /   30        353.5           2.8       1.0X
InSet expression                               136 /  141         73.4          13.6       0.2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
200 shorts:                              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   33 /   36        307.0           3.3       1.0X
InSet expression                               136 /  141         73.5          13.6       0.2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
5 ints:                                  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   26 /   28        389.4           2.6       1.0X
InSet expression                               108 /  115         92.7          10.8       0.2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
10 ints:                                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   23 /   25        431.0           2.3       1.0X
InSet expression                               119 /  124         84.1          11.9       0.2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
25 ints:                                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   24 /   25        420.9           2.4       1.0X
InSet expression                               123 /  127         81.5          12.3       0.2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
50 ints:                                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   25 /   26        406.6           2.5       1.0X
InSet expression                               153 /  157         65.4          15.3       0.2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
100 ints:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   26 /   27        391.4           2.6       1.0X
InSet expression                               128 /  133         78.3          12.8       0.2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
200 ints:                                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                   28 /   30        354.9           2.8       1.0X
InSet expression                               126 /  131         79.7          12.6       0.2X

@mgaido91
Copy link
Contributor

thanks @aokolnychyi. I just have a couple of comments on this:
1 - As @rxin mentioned, now we have InSet for handling Literals and In for handling arbitrary expressions. Since this method works only with literals, I'd rather see it as an alternative execution for InSet rather than for In. Then we might want to convert always (without a threshold) a In containing literals in InSet and let InSet pick the best implementation (either switch or the real InSet). What do you think about this?
2 - I think we may also support longs. We just need to split a long in 2 integers, so with 2 nested switches it would be doable I think. I see this will add complexity and we need to write a dedicated implementation for it, but we may consider this as a followup work. Do you agree on this?

""".stripMargin)
}

private def isSwitchCompatible: Boolean = list.forall {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please take care of the following limitation of Java switch statement, too?

npairs pairs of signed 32-bit values

https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-6.html#jvms-6.5.lookupswitch
https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-6.html#jvms-6.5.tableswitch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you, please, elaborate a bit on this? I am not sure I got. Shouldn't we be fine if we limit this approach to bytes/shorts/ints?

Copy link
Member

@kiszk kiszk Dec 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing some words. My comment is that isSwitchCompatible can be true only if list.size is less than or eqal to INT.MAX. Otherwise, Janino will cause a failure.

withSQLConf(SQLConf.OPTIMIZER_INSET_SWITCH_THRESHOLD.key -> "20") {
checkAllTypes()
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a test case that spark.sql.optimizer.inSetSwitchThreshold has maximum value and this optimization calls genCodeWithSwitch()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean testing that if the set size is 100 and spark.sql.optimizer.inSetSwitchThreshold is 100, then genCodeWithSwitch is still applied?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question addressed what you are talking here. The current implementation can accept large int value (e.g. Integer.MAX) for spark.sql.optimizer.inSetSwitchThreshold. Thus, I am afraid switch code requires more than 64KB java byte code.
If the option would have the appropriate upper limit, it is fine.

@dongjoon-hyun
Copy link
Member

I'm +1 for this approach. Thank you for updating, @aokolnychyi .

.internal()
.doc("Configures the max set size in InSet for which Spark will generate code with " +
"switch statements. This is applicable only to bytes, shorts, ints, dates.")
.intConf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent user configuration errors, can we have a meaningful min/max check?

.checkValue(v => v > 0 && v < ???, ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kiszk @mgaido91 we had a discussion about generating codes bigger than 64KB.

I am wondering if we still want to split the switch-based logic into multiple methods if we have this check suggested by @dongjoon-hyun. I've implemented the split logic locally. However, the code looks more complicated and we will need some extensions to splitExpressionsWithCurrentInputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why you'd need any extension. We have other parts of the code with swtich which are split. I think in general it is safer to have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgaido91 could you point me to an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, you're right sorry, I was remembering wrongly. There were switch based expressions for for splitting them we migrated them to a do while approach. Since the whole point of this PR is to introduce the switch construct, then I agree with you that the best way is to add a constraint here in order to have the number small enough not to cause issues with code generation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the default and max values then? The switch logic was faster than HashSet on 500 elements for every data type and on every machine I tested. In some cases, HashSet started to outperform on 550+. Also, I had to generate a set of 6000+ element to hit the limit of 64KB. My proposal is to have 400 as default and 600 as max. Then we should be safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sounds fine to me. Please add a comment in the codegen part in order to explain why we are not splitting the code. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll add a comment.

OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Java HotSpot(TM) 64-Bit Server VM 1.8.0_192-b12 on Mac OS X 10.14.3
Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
200 dates:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, this PR is irrelevant to this ratio change, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it has no effect on this. I assume we see such a difference because of machines. My original evaluation had a similar ratio as we see now.

Also, I re-tested this PR on a t2.xlarge EC2 instance.

OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 4.14.77-70.59.amzn1.x86_64
Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
200 structs:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
In expression                                 2614 / 2895          0.4        2614.5       1.0X
InSet expression                               427 /  433          2.3         427.3       6.1X

@aokolnychyi aokolnychyi changed the title [SPARK-26205][SQL] Optimize In for bytes, shorts, ints [SPARK-26205][SQL] Optimize InSet Expression for bytes, shorts, ints, dates Feb 28, 2019
@SparkQA
Copy link

SparkQA commented Feb 28, 2019

Test build #102871 has finished for PR 23171 at commit bab82f2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

"switch statements. This is applicable only to bytes, shorts, ints, dates.")
.intConf
.checkValue(threshold => threshold >= 0 && threshold <= 600, "The max set size " +
"for using switch statements in InSet must be positive and less than or equal to 600")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, the description is not matched to the condition check; must be positive -> threahold > 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've started with threshold > 0 but then changed it to threshold >= 0 and forgot to update the description. I kept 0 as a possible value to ensure we can disable this optimization if needed. Do you think it makes sense or shall we require threshold > 0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling is also a good idea if you give the description clearly.

val valueSQL = child.sql
val listSQL = hset.toSeq.map(Literal(_).sql).mkString(", ")
s"($valueSQL IN ($listSQL))"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not changed. To reduce the code diff more clearly, could you move override def sql and private def canBeComputedUsingSwitch after genCodeWithSwitch ?

@@ -241,6 +242,52 @@ class PredicateSuite extends SparkFunSuite with ExpressionEvalHelper {
}
}

test("SPARK-26205: Optimize InSet for bytes, shorts, ints, dates using switch statements") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove SPARK-26205: prefix since this is an improvement. We use JIRA ID only for bug fixes.

@@ -2,550 +2,739 @@
In Expression Benchmark
================================================================================================

OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently, #23914 added Stdev to the benchmark result. We need to rerun this.

@aokolnychyi . After you update the PR code, I'll rerun the benchmark on EC2 and make a PR to you.

dateValues)
}

withSQLConf(SQLConf.OPTIMIZER_INSET_SWITCH_THRESHOLD.key -> "0") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After https://github.com/apache/spark/pull/23171/files#r261888276, we need to increase this from 0 to 1.

@@ -413,6 +415,43 @@ class ColumnExpressionSuite extends QueryTest with SharedSQLContext {
}
}

test("SPARK-26205: Optimize InSet for bytes, shorts, ints, dates") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. Let's remove SPARK-26205: .

}

spark.sessionState.conf.clear()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the intention, but I think we can skip this testing. :)
Could you revert the change on this file please?

${CodeGenerator.JAVA_BOOLEAN} ${ev.value} = false;
if (!${valueGen.isNull}) {
switch (${valueGen.value}) {
${caseBranches.mkString("")}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add new lines?

-            ${caseBranches.mkString("")}
+            ${caseBranches.mkString("\n")}

Otherwise, the readability is not good since it goes like the following (AS-IS).

/* 037 */           case 2:
/* 038 */             filter_value_0 = true;
/* 039 */           break;case 1:
...

@dongjoon-hyun
Copy link
Member

I made a benchmark result PR to you, @aokolnychyi .
It's a result from your branch after I rebased it to the master branch. It's based on the same EC2 environment. Please review and merge it.

@SparkQA
Copy link

SparkQA commented Mar 4, 2019

Test build #102954 has finished for PR 23171 at commit bab82f2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@aokolnychyi
Copy link
Contributor Author

@dongjoon-hyun thanks for running the benchmarks! It's great to verify the performance benefit on one more machine.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

This PR has a clear benefit in terms of the performance. And, the generated code is also safe and clean.

@dbtsai
Copy link
Member

dbtsai commented Mar 4, 2019

LGTM too!

@dbtsai dbtsai self-assigned this Mar 4, 2019
@dbtsai dbtsai self-requested a review March 4, 2019 21:50
@SparkQA
Copy link

SparkQA commented Mar 4, 2019

Test build #103006 has finished for PR 23171 at commit fcef14a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you, @aokolnychyi , @dbtsai , @gatorsmile , @cloud-fan , @rxin , @kiszk, @viirya , @mgaido91 .

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
10 participants