-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-1251] Fix Python 3 compatibility bug in pickler.py #7953
Conversation
R: @aaltay |
@@ -164,7 +164,7 @@ def new_save_module_dict(pickler, obj): | |||
obj_id = id(obj) | |||
if not known_module_dicts or '__file__' in obj or '__package__' in obj: | |||
if obj_id not in known_module_dicts: | |||
for m in sys.modules.values(): | |||
for m in list(sys.modules.values()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we understand why does sys.modules.values() change during iteration? Is it a side-effect of pytest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, though I don't understand why it sys.modules changes in this loop (assuming no background threads). I believe that accessing m.__dict__
triggers a lazy loading of the module.
I tested adding an additional for-loop before the original. The first loop iterates on list(sys.modules.value()), while the second on sys.modules.value(). The first loop conditionally accesses m.dict to see which modules change sys.modules. If the second loop succeeds, the touched modules are culprits.
I've isolated this module as a culprit:
py._vendored_packages.apipkg.ApiModule
Maybe related:
pytest has a module import hook for rewriting assertions (no need for self.assertEqual(), etc.):
https://github.com/pytest-dev/pytest/blob/7dcd9bf5add337686ec6f2ee81b24e8424319dba/src/_pytest/assertion/rewrite.py#L277-L301
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/pytest-dev/py/tree/master/py/_vendored_packages
https://github.com/pytest-dev/py/blob/master/py/__init__.py
So an alternative solution would be to add an additional loop before the original one:
_ = [m.__dict__ for m in list(sys.modules.values())]
for m in sys.modules.values():
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charlesccychen I believe my above suggestion is the more correct one, allowing lazily-loaded modules to initialize before putting their contents in known_module_dicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @udim--that makes sense. Are you going to update this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charlesccychen updated, ready to merge once all tests pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is list()
essential in line 168? If so please add a comment so that somebody doesn't try to clean it up, otherwise we can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, added more commentary
run python postcommit |
pytest docs say to avoid setup.cfg for pytest settings since it has a different parser. Include pickler fix from apache#7953.
run python postcommit |
Thanks, this LGTM. |
sys.modules changes during iteration. It seems that the intention was to make a copy using sys.modules.values(), which returns a copy in Python 2, but a view in Python 3. The issue comes up when using pytest, which implements lazy module loading.
run python postcommit |
Thanks! |
This issue was discovered while testing the pytest framework using Python 3 in #7949.
sys.modules changes during iteration. It seems that the intention was to
make a copy using sys.modules.values(), which returns a copy in Python
2, but a view in Python 3.
The issue comes up when using pytest, which implements lazy module
loading.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.