Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12286] [SPARK-12290] [SPARK-12294] [SPARK-12284] [SQL] always output UnsafeRow #10511

Closed
wants to merge 5 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Dec 29, 2015

It's confusing that some operator output UnsafeRow but some not, easy to make mistake.

This PR change to only output UnsafeRow for all the operators (SparkPlan), removed the rule to insert Unsafe/Safe conversions. For those that can't output UnsafeRow directly, added UnsafeProjection into them.

Closes #10330

cc @JoshRosen @rxin

@rxin
Copy link
Contributor

rxin commented Dec 29, 2015

cc @nongli

What are the performance implications?

@davies
Copy link
Contributor Author

davies commented Dec 29, 2015

The purpose of this PR is to pay off the technical debt, there could be a little performance regression because we may do more copies.

@davies davies changed the title [SPARK-12286] [SPARK-12290] [SPARK-12294] always output UnsafeRow [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL] always output UnsafeRow Dec 29, 2015
@SparkQA
Copy link

SparkQA commented Dec 29, 2015

Test build #48425 has finished for PR 10511 at commit cba3934.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 30, 2015

Test build #48432 has finished for PR 10511 at commit 513597c.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nongli
Copy link
Contributor

nongli commented Dec 30, 2015

The only heavy operator that didn't support unsaferow was just sortbased agg? Is that right? This seems reasonable to me given how much code this removes.

@davies
Copy link
Contributor Author

davies commented Dec 30, 2015

@nongli InsertIntoHiveTable also does not support UnsafeRow now.

@davies
Copy link
Contributor Author

davies commented Dec 30, 2015

@viirya I'm very sorry for this. Originally, I thought this could be done step by step, then finally do this clean up. After reviewed some of the PRs, realized that trying to make them work with both UnsafeRow and SafeRow just waster ours time (because we will remove the SafeRow support in the end), also waste our time in the process for tiny PRs.

I just start to work on this in this morning, could left some comments about this earlier (but can't say much without actual prototyping).

If you think it's worth to address the comments in your PR, go ahead to do that, I could merge your PR before this one (then rebase this PR).

Sorry again for the conflict.

@viirya
Copy link
Member

viirya commented Dec 30, 2015

@davies Thanks for replying my comments. It is good to know what it is going on. Now it does feel better. It is ok for the closed PR. I just got confusing why after several updates to the PR it is getting closed by replacing one. I would like to suggest to put some words explaining briefly why to close other PRs with replacing one like this when creating it, if it is possibly.

@davies davies changed the title [SPARK-12286] [SPARK-12290] [SPARK-12294] [SQL] always output UnsafeRow [SPARK-12286] [SPARK-12290] [SPARK-12294] [SPARK-12284] [SQL] always output UnsafeRow Dec 30, 2015
@SparkQA
Copy link

SparkQA commented Dec 30, 2015

Test build #48454 has finished for PR 10511 at commit 6bc518c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 30, 2015

Test build #48457 has finished for PR 10511 at commit 90405fc.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class ASTErrorNode extends ASTNode
    • public class ASTNode extends CommonTree implements Node, Serializable
    • public class ParseDriver
    • //function of the super class ANTLRStringStream. The LA() function is the lookahead function
    • public class ANTLRNoCaseStringStream extends ANTLRStringStream
    • public class HiveLexerX extends SparkSqlLexer
    • public class ParseError
    • public class ParseException extends Exception
    • public final class ParseUtils
    • public abstract class SemanticAnalyzer

@SparkQA
Copy link

SparkQA commented Dec 30, 2015

Test build #2263 has finished for PR 10511 at commit 90405fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 1, 2016

I took a quick look and I'm going to merge this because I'm working on something that conflicts with it. It would be great if @nongli or @JoshRosen can review this more carefully post-hoc.

@asfgit asfgit closed this in 0da7bd5 Jan 1, 2016
@rxin
Copy link
Contributor

rxin commented Jan 2, 2016

Actually I'm going to revert this for now. I've seen two separate pull requests that failed python tests.

https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2292/consoleFull

and

https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2291/consoleFull

Neither of them had anything to do with the failure and my hunch is that this is the one causing it.

@rxin
Copy link
Contributor

rxin commented Jan 3, 2016

It doesn't look like the failure is related to this -- so I reverted the revert commit ... now it is merged again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants