Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3679] [PySpark] pickle the exact globals of functions #2522

Closed
wants to merge 1 commit into from

Conversation

davies
Copy link
Contributor

@davies davies commented Sep 24, 2014

function.func_code.co_names has all the names used in the function, including name of attributes. It will pickle some unnecessary globals if there is a global having the same name with attribute (in co_names).

There is a regression introduced by #2144, revert part of changes in that PR.

cc @JoshRosen

@SparkQA
Copy link

SparkQA commented Sep 24, 2014

QA tests have started for PR 2522 at commit dfbccf5.

  • This patch merges cleanly.

@JoshRosen
Copy link
Contributor

Based on the JIRA, it looks like this PR description should link to #2144 instead?

@JoshRosen
Copy link
Contributor

Note for other reviewers: this commit mostly reverts #2144's changes to extract_code_globals, so although the changed code is complicated, we were using it before in cloudpickle and should be able to trust that it works.

@davies
Copy link
Contributor Author

davies commented Sep 24, 2014

@JoshRosen fixed the description

out_names.add(names[oparg])
#print 'extracted', out_names, ' from ', names

if co.co_consts: # see if nested function have any global refs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on a read through this PR, it looks like this line is the first place where this function diverges from the pre-#2144 version of cloudpickle.

It looks like the original version of cloudpickle called this code from outside of extract_code_globals, so I guess the old code would only perform one level of recursion when trying to extract globals?

Do you think that adding actual, unbounded recursion could cause problems here? If the "nested function" implies that this only applies to functions defined within other functions, then there aren't cycles in the nesting and therefore shouldn't be cycles that lead to infinite recursion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code only perform two levels of functions, the new version can handle multiple levels.

Each level is on function code, which is created by def or lambda, so I think the code cannot be recursive.

@JoshRosen
Copy link
Contributor

This looks good to me, pending Jenkins.

@SparkQA
Copy link

SparkQA commented Sep 24, 2014

QA tests have finished for PR 2522 at commit dfbccf5.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 24, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20759/

@JoshRosen
Copy link
Contributor

LGTM; merging this now. Thanks!

@asfgit asfgit closed this in bb96012 Sep 24, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants