-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7385][Core] Add RDD.foreachPartitionWithIndex #5927
Conversation
Why not just use TaskContext? |
Test build #31920 has finished for PR 5927 at commit
|
Easier for normal users. The alternative is.
This is ugly and non-intuitive. |
Test build #31921 has finished for PR 5927 at commit
|
@koeninger Isn't this going to make it easier to do transactional output operations? |
Test build #31923 has finished for PR 5927 at commit
|
@tdas yeah, Kafka transactional output was why I originally wanted to add Although that usage of taskcontext shown above is better than my
|
Test build #31927 has finished for PR 5927 at commit
|
Test build #31964 has finished for PR 5927 at commit
|
@rxin Any objections? |
This seems fine to me. If we were to design this all over again, I'd consider just dropping the |
Test build #32065 has finished for PR 5927 at commit
|
@JoshRosen I agree with that we should not use this as a precedence for If there are not other objections, I will merge it. On Wed, May 6, 2015 at 8:55 PM, UCB AMPLab notifications@github.com wrote:
|
Hm - I think it might be better to document |
But as @koeninger pointed out that it was not obvious to find out the |
Why not just add to the javadoc of mapPartitions to suggest how to get the partition id and task context. |
Still ugly IMO. On Fri, May 8, 2015 at 12:44 AM, Reynold Xin notifications@github.com
|
I think if you're going to decide you really don't like withContext/withIndex etc they should be marked as deprecated, in addition to having a scaladoc reference to TaskContext.get Either that or foreachPartitionWithIndex seems ok to me. |
Marking them deprecated sounds like a good idea. The static getter method was specifically designed to replace them. |
All right. Since adding ForeachPartitionWithIndex is not the best idea, I will close this PR. Additionally I will document it in the programming guide to use |
What do you do if you are in local mode and |
Spark Streaming apps often update external stores transactionally, which requires it to have an id that uniquely identifies the partition of data to be inserted. This can be the (batch time, partition index).
Current work around is to use
mapPartitionsWithIndex().count()
which is quite hacky. This PR is to addforeachPartitionWithIndex()
.