-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10613] [SPARK-10624] [SQL] Reduce LocalNode tests dependency on SQLContext #8764
Conversation
This commit refactors DummyNode to take in data from LocalRelation. Then it rewrites FilterNodeSuite to make it read from DummyNode instead of from a DataFrame. Future commits will cover other LocalNode test suites.
…-cleanup Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/local/LocalNode.scala sql/core/src/test/scala/org/apache/spark/sql/execution/local/LocalNodeTest.scala
@zsxwing please have a look in the mean time. Thanks. |
(new PoissonSampler[InternalRow](upperBound - lowerBound, useGapSamplingIfPossible = false), | ||
// Use the seed for partition 0 like PartitionwiseSampledRDD to generate the same result | ||
// of DataFrame | ||
random.nextLong()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zsxwing I had to remove this to make testing deterministic. Looking at this further I still don't see the point of introducing another layer of randomness here. What change in behavior does this entail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using DataFrame.sample
to test SampleNode and it mocked the behavior of DataFrame.sample(withReplacement = true)
. Since you don't use DataFrame to test it now, I agree that we can remove this tricky logic.
Test build #42479 has finished for PR 8764 at commit
|
I didn't notice spark/sql/core/src/main/scala/org/apache/spark/sql/execution/local/TakeOrderedAndProjectNode.scala Line 53 in e626ac5
It should be queue.toArray.sorted(ord).iterator .
|
62f0afd
to
3bd5ac7
Compare
Alright, as of the latest commit I believe this patch is basically ready. @zsxwing I fixed the take ordered issue we discussed, filed a new JIRA SPARK-10624 for it, and included the fix in this patch. |
Looks like the relevant tests have already passed. Merging into master. |
Thanks @andrewor14 I will update #8769 as per your changes. |
Test build #42510 has finished for PR 8764 at commit
|
Test build #1761 has finished for PR 8764 at commit
|
Test build #1763 has finished for PR 8764 at commit
|
Test build #1762 has finished for PR 8764 at commit
|
Instead of relying on
DataFrames
to verify our answers, we can just use simple arrays. This significantly simplifies the test logic forLocalNode
s and reduces a lot of code duplicated fromSparkPlanTest
.This also fixes an additional issue SPARK-10624 where the output of
TakeOrderedAndProjectNode
is not actually ordered.