New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2373]RDD add span function (split an RDD to two RDD based on user's function)] #1306
Conversation
Hi all, I want to submit a basic operator Intersect For example , in sql case select * from table1 intersect select * from table2 So ,i want use this operator support this function in Spark SQL This operator will return the the intersection of SparkPlan child table RDD .
Thanks ,I optimize the code so it only evaluates the function once .Other comments are on JIRA |
This function is useful in some cases ,Such as when i do Skew Join in another PR,I need to split an RDD to two RDD,One has skew keys ,and the other is not . |
test this please |
QA tests have started for PR 1306 at commit
|
QA tests have finished for PR 1306 at commit
|
@@ -1270,4 +1270,20 @@ abstract class RDD[T: ClassTag]( | |||
def toJavaRDD() : JavaRDD[T] = { | |||
new JavaRDD(this)(elementClassTag) | |||
} | |||
|
|||
def span(p: T => Boolean) : (RDD[T], RDD[T]) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs scaladoc, since this method's meaning won't be clear to users unless they're familiar with the Scala span
function.
@mateiz Thoughts on this? Do you think this API will be broadly useful enough to include? |
IMO this is too specialized to include. It's small enough that applications can do it themselves, but also fairly confusing unless your RDD is already sorted in some way. I think we should just leave it for applications to do it. If you are doing a skewed join operator for example, you can do it within the implementation of that but not show it to the user. |
Ok ,Got it, I will close this PR ; |
It's a nice start to code the scala |
FWIW, I frequently have this use case and would love a |
Hi all,
This function is a basic function in Scala.
def span(p: T => Boolean): (RDD[T], RDD[T])
Splits this RDD into a prefix/suffix pair according to a predicate .
returns
a pair consisting of the longest prefix of this RDD whose elements all satisfy p, and the rest of this list.
JIRA:https://issues.apache.org/jira/browse/SPARK-2373