New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-3604. Replace the map call in UnionRDD#getPartitions method to avo... #2463
Conversation
…avoid creating an additional Seq.
QA tests have started for PR 2463 at commit
|
Is the goal here just to make the recursive calls take fewer stack frames and make it harder to overflow ? I got the impression there was an infinite recusrsion lurking here but don't see that this fixes it, but maybe I misunderstood the JIRA. |
Yes. The issue is that there could be union RDDs inside the rdds array - so the recursion may be unavoidable, but we can make them take fewer frames. I can't think of a real fix for this though. |
QA tests have finished for PR 2463 at commit
|
Fundamentally the way union works is flawed because it forces a caller to create a recursive structure. In my case, I have files = [] # some list At each point in the loop, I'm creating a UnionRDD whose collection of RDDs contains exactly one RDD (also a UnionRDD). You've coded for a tree, but really have a linked list that will blow up the stack. It should be possible for me to get a broad, flat structure instead, ideally by doing something like this: rddgen = (sc.createAnRddInTheUsualWay(x) for x in files) The proposed patch does not do that, but it should. |
@ericdf What is the type of rddgen in your pseudocode? I'm not understanding why the existing |
Ah! I was not aware that there was an API for getting a union for a list on SparkContext -- I had only seen the one on RDD itself, which only takes a single `other' RDD. Yes, the SparkContext#union is exactly what I want. Thank you! |
@ericdf is your original issue fixed by using the union utility function? I misread it to be a bug report, but I think the issue is just that you were chaining together unions instead of composing them using the utility. |
@harishreedharan I think the fix is that for people chaining many unions together they should use |
Agreed. This patch simply make it more difficult to overflow - so it is not really a fix. Will close this. Thanks, On Sat, Sep 20, 2014 at 5:33 PM, Patrick Wendell notifications@github.com
|
Gotcha - sounds good! |
Let's close this issue then |
Done |
...id creating an additional Seq.