-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3679] [PySpark] pickle the exact globals of functions #2522
Conversation
QA tests have started for PR 2522 at commit
|
Based on the JIRA, it looks like this PR description should link to #2144 instead? |
Note for other reviewers: this commit mostly reverts #2144's changes to |
@JoshRosen fixed the description |
out_names.add(names[oparg]) | ||
#print 'extracted', out_names, ' from ', names | ||
|
||
if co.co_consts: # see if nested function have any global refs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on a read through this PR, it looks like this line is the first place where this function diverges from the pre-#2144 version of cloudpickle
.
It looks like the original version of cloudpickle
called this code from outside of extract_code_globals
, so I guess the old code would only perform one level of recursion when trying to extract globals?
Do you think that adding actual, unbounded recursion could cause problems here? If the "nested function" implies that this only applies to functions defined within other functions, then there aren't cycles in the nesting and therefore shouldn't be cycles that lead to infinite recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code only perform two levels of functions, the new version can handle multiple levels.
Each level is on function code, which is created by def
or lambda
, so I think the code cannot be recursive.
This looks good to me, pending Jenkins. |
QA tests have finished for PR 2522 at commit
|
Test PASSed. |
LGTM; merging this now. Thanks! |
function.func_code.co_names has all the names used in the function, including name of attributes. It will pickle some unnecessary globals if there is a global having the same name with attribute (in co_names).
There is a regression introduced by #2144, revert part of changes in that PR.
cc @JoshRosen