Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with distributed master #155

Closed
TomAugspurger opened this issue Jun 13, 2019 · 10 comments
Closed

Compatibility with distributed master #155

TomAugspurger opened this issue Jun 13, 2019 · 10 comments

Comments

@TomAugspurger
Copy link
Member

Seeing errors like

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /Users/taugspurger/sandbox/dask-kubernetes/dask_kubernetes/tests/test_core.py(370)test_reject_evicted_workers()
-> assert time() < start + 60
(Pdb) q
2019-06-13 08:45:06,882 distributed.core[3810] WARNING No handler register found in Scheduler
Traceback (most recent call last):
  File "/Users/taugspurger/sandbox/distributed/distributed/core.py", line 395, in handle_comm
    handler = self.handlers[op]
KeyError: 'register'
2019-06-13 08:45:07,456 distributed.core[3810] WARNING No handler register found in Scheduler
Traceback (most recent call last):
  File "/Users/taugspurger/sandbox/distributed/distributed/core.py", line 395, in handle_comm
    handler = self.handlers[op]
KeyError: 'register'
@mrocklin
Copy link
Member

No server currently has a register route. My first guess would be that this is a version mismatch. Are you specifying --worker-image correctly when running pytest?

@TomAugspurger
Copy link
Member Author

TomAugspurger commented Jun 13, 2019

That's my suspicion, but things seem to be OK...

I'm running

pytest dask_kubernetes/tests/test_core.py::test_basic --worker-image daskdev/dask:dev -x --pdb

where daskdev/dask:dev is an image with dask / distributed master

docker run --rm -it daskdev/dask:dev python -c 'import distributed; print(distributed.__version__)'
+ '[' '' ']'
+ '[' -e /opt/app/environment.yml ']'
+ echo 'no environment.yml'
no environment.yml
+ '[' '' ']'
+ '[' '' ']'
+ exec python -c 'import distributed; print(distributed.__version__)'
1.28.1+56.g2ba70b31

which matches my local version. Will do some more debugging.

edit: though my local python version doesn't match the version in the docker image. That may be part of it.

@TomAugspurger
Copy link
Member Author

Ahh what a fun way to waste an hour. I built the daskdev/dask:dev image locally on my host, but Minikube has a different docker registry. Normally, this would have raised an error, but my Minikube had a daskdev/dask:dev image from 13 months ago 😄

@mrocklin
Copy link
Member

:/

@TomAugspurger
Copy link
Member Author

@mrocklin does it make sense for KubeCluster to inherit from SpecCluster now?

@jacobtomlinson
Copy link
Member

@TomAugspurger yes! This is on my todo list.

@cicdw
Copy link

cicdw commented Jul 19, 2019

Not sure if this is 100% related or not, but it appears that dask-kubernetes isn't currently compatible with the newest version of distributed; I'm seeing a lot of:

tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7f4183cc1810>>, <Future finished exception=AttributeError("'KubeCluster' object has no attribute 'workers'")>)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/distributed/deploy/adaptive.py", line 216, in recommendations
    current = len(self.cluster.worker_spec)
AttributeError: 'KubeCluster' object has no attribute 'worker_spec'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/usr/local/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
    future.result()
  File "/usr/local/lib/python3.7/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/usr/local/lib/python3.7/site-packages/distributed/deploy/adaptive.py", line 259, in _adapt
    recommendations = self.recommendations()
  File "/usr/local/lib/python3.7/site-packages/distributed/deploy/adaptive.py", line 218, in recommendations
    current = len(self.cluster.workers)
AttributeError: 'KubeCluster' object has no attribute 'workers'

@TomAugspurger
Copy link
Member Author

Hmm I thought that #156 fixed all those. What version of dask-kubernetes @cicdw?

@cicdw
Copy link

cicdw commented Jul 19, 2019

Oh this is embarrassing; I just realized dask-kubernetes was pinned to 0.8.0 -- thanks and sorry for the noise!

@jacobtomlinson
Copy link
Member

Going to close this as #162 has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants