Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13242] [SQL] Generate one method per when clause #11221

Closed
wants to merge 4 commits into from

Conversation

joehalliwell
Copy link

This patch modifies code generation to split when clauses into separate methods.

As the included test demonstrates, with the previous behaviour it was quite easy to breach the 64KB method size limit with even moderately complex when expressions.

This contribution is my original work and I license the work to the project under the project's open source license.

This is presumably slower than packing them all into a single method, but
does allow many more clauses. A further improvement might be to combine
the two approaches using some heuristic to determine how many clauses
can be combined in a single method.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@rxin
Copy link
Contributor

rxin commented Feb 16, 2016

cc @davies

I wonder if we can do something smarter (e.g. batch some?). This is going to regress performance for very simple cases, especially when we use whole-stage codegen.

@davies
Copy link
Contributor

davies commented Feb 16, 2016

@joehalliwell This will not work in whole stage codegen, because there is not a variable for the input row.

I think an easy fix could be fallback to interpret mode if the number of branches go over 10? you could take Literal as example.

@joehalliwell
Copy link
Author

Thanks for taking the time to review and comment.

@rxin Yes, I wondered about batching to (say) 64KB of source code before splitting. That's the approach used in #7076. Without measuring, I wasn't sure what the performance impact of not batching would be. For example I believe JIT can inline small methods such as these. But if you think it's worth it, I'll push in that general direction in future work.

@davies Ahh... I'm only really familiar with few bits of codegen I needed to write this patch. Are there tests for or examples of whole stage codgen you could point me at?

In the meantime, I'll implement your suggestion of falling back to interpretation. Shall I do that on this PR, or open a new one?

@davies
Copy link
Contributor

davies commented Feb 17, 2016

@joehalliwell Since the implementation will be totally different, it make sense to open a new one, thanks!

@davies
Copy link
Contributor

davies commented Feb 17, 2016

@joehalliwell In order to test whole stage codegen, you need to have a SQL query to run.

@asfgit asfgit closed this in 9634e17 Mar 9, 2016
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
## What changes were proposed in this pull request?

If there are many branches in a CaseWhen expression, the generated code could go above the 64K limit for single java method, will fail to compile. This PR change it to fallback to interpret mode if there are more than 20 branches.

This PR is based on apache#11243 and apache#11221, thanks to joehalliwell

Closes apache#11243
Closes apache#11221

## How was this patch tested?

Add a test with 50 branches.

Author: Davies Liu <davies@databricks.com>

Closes apache#11592 from davies/fix_when.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants