-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12286] [SPARK-12290] [SPARK-12294] [SPARK-12284] [SQL] always output UnsafeRow #10511
Conversation
cc @nongli What are the performance implications? |
The purpose of this PR is to pay off the technical debt, there could be a little performance regression because we may do more copies. |
Test build #48425 has finished for PR 10511 at commit
|
Test build #48432 has finished for PR 10511 at commit
|
The only heavy operator that didn't support unsaferow was just sortbased agg? Is that right? This seems reasonable to me given how much code this removes. |
@nongli InsertIntoHiveTable also does not support UnsafeRow now. |
@viirya I'm very sorry for this. Originally, I thought this could be done step by step, then finally do this clean up. After reviewed some of the PRs, realized that trying to make them work with both UnsafeRow and SafeRow just waster ours time (because we will remove the SafeRow support in the end), also waste our time in the process for tiny PRs. I just start to work on this in this morning, could left some comments about this earlier (but can't say much without actual prototyping). If you think it's worth to address the comments in your PR, go ahead to do that, I could merge your PR before this one (then rebase this PR). Sorry again for the conflict. |
@davies Thanks for replying my comments. It is good to know what it is going on. Now it does feel better. It is ok for the closed PR. I just got confusing why after several updates to the PR it is getting closed by replacing one. I would like to suggest to put some words explaining briefly why to close other PRs with replacing one like this when creating it, if it is possibly. |
Test build #48454 has finished for PR 10511 at commit
|
Test build #48457 has finished for PR 10511 at commit
|
Test build #2263 has finished for PR 10511 at commit
|
I took a quick look and I'm going to merge this because I'm working on something that conflicts with it. It would be great if @nongli or @JoshRosen can review this more carefully post-hoc. |
Actually I'm going to revert this for now. I've seen two separate pull requests that failed python tests. https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2292/consoleFull and https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2291/consoleFull Neither of them had anything to do with the failure and my hunch is that this is the one causing it. |
It doesn't look like the failure is related to this -- so I reverted the revert commit ... now it is merged again. |
It's confusing that some operator output UnsafeRow but some not, easy to make mistake.
This PR change to only output UnsafeRow for all the operators (SparkPlan), removed the rule to insert Unsafe/Safe conversions. For those that can't output UnsafeRow directly, added UnsafeProjection into them.
Closes #10330
cc @JoshRosen @rxin