Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-1251] Fix Python 3 compatibility bug in pickler.py #7953

Merged
merged 1 commit into from
Feb 28, 2019

Conversation

udim
Copy link
Member

@udim udim commented Feb 27, 2019

This issue was discovered while testing the pytest framework using Python 3 in #7949.

sys.modules changes during iteration. It seems that the intention was to
make a copy using sys.modules.values(), which returns a copy in Python
2, but a view in Python 3.

The issue comes up when using pytest, which implements lazy module
loading.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Python Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@udim udim marked this pull request as ready for review February 27, 2019 02:10
@udim
Copy link
Member Author

udim commented Feb 27, 2019

R: @aaltay
CC: @tvalentyn , @charlesccychen

@@ -164,7 +164,7 @@ def new_save_module_dict(pickler, obj):
obj_id = id(obj)
if not known_module_dicts or '__file__' in obj or '__package__' in obj:
if obj_id not in known_module_dicts:
for m in sys.modules.values():
for m in list(sys.modules.values()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we understand why does sys.modules.values() change during iteration? Is it a side-effect of pytest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, though I don't understand why it sys.modules changes in this loop (assuming no background threads). I believe that accessing m.__dict__ triggers a lazy loading of the module.

I tested adding an additional for-loop before the original. The first loop iterates on list(sys.modules.value()), while the second on sys.modules.value(). The first loop conditionally accesses m.dict to see which modules change sys.modules. If the second loop succeeds, the touched modules are culprits.

I've isolated this module as a culprit:
py._vendored_packages.apipkg.ApiModule

Maybe related:
pytest has a module import hook for rewriting assertions (no need for self.assertEqual(), etc.):
https://github.com/pytest-dev/pytest/blob/7dcd9bf5add337686ec6f2ee81b24e8424319dba/src/_pytest/assertion/rewrite.py#L277-L301

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/pytest-dev/py/tree/master/py/_vendored_packages
https://github.com/pytest-dev/py/blob/master/py/__init__.py

So an alternative solution would be to add an additional loop before the original one:

_ = [m.__dict__ for m in list(sys.modules.values())]

for m in sys.modules.values():
  ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesccychen I believe my above suggestion is the more correct one, allowing lazily-loaded modules to initialize before putting their contents in known_module_dicts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @udim--that makes sense. Are you going to update this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesccychen updated, ready to merge once all tests pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is list() essential in line 168? If so please add a comment so that somebody doesn't try to clean it up, otherwise we can remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, added more commentary

@udim
Copy link
Member Author

udim commented Feb 27, 2019

run python postcommit

udim added a commit to udim/beam that referenced this pull request Feb 27, 2019
pytest docs say to avoid setup.cfg for pytest settings since it has a
different parser.

Include pickler fix from apache#7953.
@udim
Copy link
Member Author

udim commented Feb 27, 2019

run python postcommit

@charlesccychen
Copy link
Contributor

Thanks, this LGTM.

sys.modules changes during iteration. It seems that the intention was to
make a copy using sys.modules.values(), which returns a copy in Python
2, but a view in Python 3.

The issue comes up when using pytest, which implements lazy module
loading.
@udim
Copy link
Member Author

udim commented Feb 27, 2019

run python postcommit

@charlesccychen
Copy link
Contributor

Thanks!

@charlesccychen charlesccychen merged commit 7ddf735 into apache:master Feb 28, 2019
@udim udim deleted the pickler branch February 28, 2019 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants