New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support to upload modules from zip files #886
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally this looks pretty good to me. A few comments.
distributed/tests/test_client.py
Outdated
if os.path.exists('myfile.zip'): | ||
os.remove('myfile.zip') | ||
|
||
sleep(1) # TODO: why is this necessary? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this is not present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one second resolution may depend on the filesystem. Try importlib.invalidate_caches()
instead. (see https://docs.python.org/3/library/importlib.html#importlib.invalidate_caches ). It isn't needed on Python 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no case where we call invalidate_cache in the upload_file method.
Should I add it in all cases ?
In the same way the sleep call is in all the upload_file tests.
Should I remove it from all the tests ?
if ext in ('.py', '.pyc'):
logger.info("Reload module %s from .py file", name)
name = name.split('-')[0]
reload(import_module(name))
if ext == '.egg':
import pkg_resources
sys.path.append(out_filename)
pkgs = pkg_resources.find_distributions(out_filename)
for pkg in pkgs:
logger.info("Load module %s from egg", pkg.project_name)
reload(import_module(pkg.project_name))
if not pkgs:
logger.warning("Found no packages in egg file")
if ext == '.zip':
logger.info("Reload module %s from .zip file", name)
if out_filename not in sys.path:
sys.path.insert(0, out_filename)
invalidate_caches()
reload(import_module(name))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should at least try it and see it if works, yes :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the invalid_caches and commonalized some code:
try:
name, ext = os.path.splitext(filename)
names_to_import = []
if ext in ('.py', '.pyc'):
names_to_import.append(name)
if ext in ('.egg', '.zip'):
if out_filename not in sys.path:
sys.path.insert(0, out_filename)
if ext == '.egg':
import pkg_resources
pkgs = pkg_resources.find_distributions(out_filename)
for pkg in pkgs:
names_to_import.append(pkg.project_name)
elif ext == '.zip':
names_to_import.append(name)
if not names_to_import:
logger.warning("Found nothing to import from %s", filename)
else:
invalidate_caches()
for name in names_to_import:
logger.info("Reload module %s from %s file", name, ext)
reload(import_module(name))
I removed the sleep in my test case and it works fine (But it also works without the call to invalidate_caches).
However the test still fails in the 'test_upload_file' case which only uploads .py or .pyc files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I found the answer here :
http://stackoverflow.com/questions/8122734/pythons-imp-reload-function-is-not-working
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may be the case indeed. You may want to try to remove the cached bytecode file. On Python 3, see https://docs.python.org/3/library/importlib.html#importlib.util.cache_from_source . On Python 2, you will need to reimplement that function yourself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok i'll try this. It starts to be much bigger than the original change :)
By the way, I'm not used to write code for Python2/Python3
Is there a good way to do that ? What I see on forums is something like that:
if hasattr(importlib, "invalidate_caches"):
importlib.invalidate_caches()
Is that right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, though we have a distributed.compatibility
module where you could add a invalidate_import_caches
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the changes to delete the pyc file
It works fine now without the sleep(1) call
distributed/tests/test_client.py
Outdated
os.remove('myfile.zip') | ||
|
||
sleep(1) # TODO: why is this necessary? | ||
x = c.submit(g, pure=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably don't need pure=False
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same way pure=False
is used in the other upload_file test but i agree that it could be removed.
Should I remove it from the other tests ?
distributed/tests/test_client.py
Outdated
return myfile.f() | ||
|
||
with tmp_text('myfile.py', 'def f():\n return 123') as fn_my_file, \ | ||
tmp_text('init.py', '') as fn_init: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the file init.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you're right. I thought that the init.py was necessary to make a valid module but actually it's not. I tried to remove it and the test passed as well.
Thx for the reveiw
distributed/tests/test_client.py
Outdated
@@ -1116,6 +1117,27 @@ def g(): | |||
result = yield y._result() | |||
assert result == 456 | |||
|
|||
@gen_cluster(client=True) | |||
def test_upload_file_zip(c, s, a, b): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test would be more robust if it scrubbed out myfile
from sys.modules
at the start and at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure to get this one.
This module is uploaded locally on workers , right ? The workers aren't restarted before each test ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With gen_cluster
, the workers should run in the current process, so they would inherit the current (process-wide) import state. For example, perhaps the myfile
module seen in this test actually comes from another one of the tests here.
It seems all tests for upload_file
are written in this style. I think it would deserve fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so i'm going to apply all those modifications to all tests :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the upload_file_* tests i mean
distributed/worker.py
Outdated
logger.info("Reload module %s from .zip file", name) | ||
if out_filename not in sys.path: | ||
sys.path.insert(0, out_filename) | ||
name = name.split('-')[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question: why this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept this line from ('.py', '.pyc') case
if ext in ('.py', '.pyc'):
logger.info("Reload module %s from .py file", name)
name = name.split('-')[0]
reload(import_module(name))
Actually I don't know what it is for, so i can remove it in the zip case
distributed/worker.py
Outdated
if out_filename not in sys.path: | ||
sys.path.insert(0, out_filename) | ||
name = name.split('-')[0] | ||
reload(import_module(name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
invalidate_caches
should probably be called just before this line.
It starts to be confusing here...
With this code the second test |
You need the pure=False if you're doing this twice and expect to get
different results. Otherwise the second call will just reference the first.
…On Tue, Feb 21, 2017 at 9:57 AM, bmaisonn ***@***.***> wrote:
It starts to be confusing here...
I modified my test to integrate your reviews (Using invalid_cache, remove
sleep and pure=False).
I also modified the test to make the upload twice with different code and
check if the modifications are correctly taken into account (As it's done
in test_upload_file):
@gen_cluster(client=True)
def test_upload_file_zip(c, s, a, b):
def g():
import myfile
return myfile.f()
try:
for value in [123, 456]:
with tmp_text('myfile.py', 'def f():\n return {}'.format(value)) as fn_my_file:
with zipfile.ZipFile('myfile.zip', 'w') as z:
z.write(fn_my_file, arcname=os.path.basename(fn_my_file))
yield c._upload_file('myfile.zip')
x = c.submit(g)
result = yield x._result()
assert result == value
finally:
if os.path.exists('myfile.zip'):
os.remove('myfile.zip')
if 'myfile' in sys.modules:
del sys.modules['myfile']
for path in sys.path:
if os.path.basename(path) == 'myfile.zip':
sys.path.remove(path)
break
With this code the second test assert result == 456 fails as result == 123.
So the second file isn't updated.
The same happens in test_upload_file is I remove pure=False OR sleep(1).
I really don't understand why it fails here ...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#886 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszLL-ff6wNyoy1qa8mBQqItNkB5Qfks5revs9gaJpZM4MGr5t>
.
|
Ok, i misunderstood the meaning of pure. |
…e now remove the pyc files corresponding to the .py file being uploaded As importlib.invalidate_caches and importlib.util.cache_from_source are python3 specifc move them in compatibility (+1 squashed commits) Squashed commits: [92d94b4] Code reviews: In worker.py::upload_file: Commonalize code, Remove split('-') as it seems useless, Use invalid_caches before reloading modules In test_client.py::test_upload_file_*: commonalize code Make two tries to check for upload updates
I commited a new change to integrate the reviews |
distributed/worker.py
Outdated
else: | ||
for name in names_to_import: | ||
logger.info("Reload module %s from %s file", name, ext) | ||
invalidate_caches() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit: you can pull invalidate_caches()
out of the for
loop.
Thanks a lot! |
I moved the call to invalidate_caches and updated the documentation |
I don't have anything to add here, perhaps @mrocklin wants to take a last look. |
Happy to trust you two on this. Thank you both for the effort and review. |
Fix issue (#865).