-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-9986][SPARK-9991][SPARK-9993][SQL]Create a simple test framework for local operators #8464
Conversation
cc @rxin |
override def open(): Unit = { | ||
iterator = data.iterator | ||
} | ||
private var iter: Iterator[InternalRow] = _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just assign value here(we can use val
then)? Your changes to FilterNode
and ProjectNode
also assign value to predicate
/project
immediately instead of in the open
.
Test build #41628 has finished for PR 8464 at commit
|
I talked to @marmbrus and we thought it'd make more sense to just have the operators be iterators themselves, and then we will create new instances of these somewhere else. |
I removed OpenCloseIterator and made LocalNodes be Iterators as discussion with @rxin. Also updated the description of this PR. |
CatalystTypeConverters.createToCatalystConverter(StructType.fromAttributes(output)) | ||
new SeqScanNode( | ||
output, | ||
df.collect().map(r => converter(r).asInstanceOf[InternalRow])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use df.queryExecution.toRDD.collect()
to get the internal rows directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Fixed it.
Test build #41676 has finished for PR 8464 at commit
|
Test build #41681 has finished for PR 8464 at commit
|
* @param sortAnswers if true, the answers will be sorted by their toString representations prior | ||
* to being compared. | ||
*/ | ||
protected def checkAnswer2( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just name this checkAnswer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be checkAnswer2
because there is a default parameter sortAnswers
and it cannot work with overload
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use overloading instead of defaults for that as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, the type inference doesn't work well... E.g.,
checkAnswer(
testData,
node => FilterNode(condition.expr, node),
testData.filter(condition).collect()
)
will need to change to
checkAnswer(
testData,
(node: LocalNode) => FilterNode(condition.expr, node),
testData.filter(condition).collect()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It reminds me some old arguments about overloading in the Scala community. It seems that they don't like overloading and don't want to improve it: https://groups.google.com/forum/#!msg/scala-language/h7akCAFnu8c/dmReTTsW11gJ
Test build #41717 has finished for PR 8464 at commit
|
|
||
import org.apache.spark.sql.test.SharedSQLContext | ||
|
||
class FilterNodeSuite extends LocalNodeTest with SharedSQLContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I was thinking it would be great if we can get rid of the SQLContext in the test cases for these local stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to reuse SQLTestData
. It would be easy if we could use DataFrame to test the local stuff. I feel if we make sure not using SQLContext in the main codes of LocalNodes, we don't need to get rid of SQLContext in the test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that SQLTestData is going away though.... #7406
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that SQLTestData is going away though.... #7406
Didn't notice that. I will add test data for each test case manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized we need to call DataFrame.resolve
to create a Column
. Looks it's hard to get rid of SQLContext
only in tests. I think it's better to do it in a separate PR to add Analyzer
for LocalNode
.
OK I'm going to merge this so we have the infrastructure in. We can fix issues later. |
This PR includes the following changes:
LocalNodeTest
for local operator tests and add unit tests for FilterNode and ProjectNode.LimitNode
andUnionNode
and their unit tests to show how to useLocalNodeTest
. (SPARK-9991, SPARK-9993)