Skip to content

Environ plugin: also update dask config#5274

Open
gjoseph92 wants to merge 1 commit intodask:mainfrom
gjoseph92:plugin/env-update-config
Open

Environ plugin: also update dask config#5274
gjoseph92 wants to merge 1 commit intodask:mainfrom
gjoseph92:plugin/env-update-config

Conversation

@gjoseph92
Copy link
Copy Markdown
Collaborator

Because Nannies pass their config into worker through a separate config= argument, setting dask config environment variables with the Enviorn nanny plugin confusingly had no effect. Now, any dask-config environment variables work automatically (besides memory limit, which is still handled specially by Nanny).

  • Closes #xxxx
  • Tests added / passed
  • Passes black distributed / flake8 distributed / isort distributed

Because Nannies pass their config into worker through a separate `config=` argument, setting dask config environment variables with the Enviorn nanny plugin confusingly had no effect. Now, any dask-config environment variables work automatically (besides memory limit, which is still handled specially by Nanny).
Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gjoseph92!

It looks like this behavior can happen with the Nanny, independent of the Environ plugin:

os.environ.update(env)
dask.config.set(config)

Updating in WorkerProcess._run seems like a more centralized place to put this type of logic.

Also, can you add a test which demonstrates things are working as expected? See, for example, distributed/tests/test_nanny.py::test_environment_variable_by_config

@gjoseph92
Copy link
Copy Markdown
Collaborator Author

Sure, that makes sense to me. The interface for specifying config/environ to the Nanny seems a bit ambiguous, since there are a number of overlapping options (distributed.nanny.environ vs env= vs config= vs anything already set in dask.config.config via the actual process environment/YAMLs). But if having env= (but not anything in the actual os.environ) potentially override config= is desirable, that is a more general solution.

There were no tests for the Environ plugin to begin with so I continued the trend :) I'll add them for the changes to Nanny itself though.

@jrbourbeau
Copy link
Copy Markdown
Member

Yeah, prioritization here can get tricky. I think there was a previous conversation around this, but I can't seem to find it. cc @jacobtomlinson who may have thoughts on this topic

There's this existing tests which could serve as a good model

@gen_cluster(client=True, Worker=Nanny, timeout=10000000)
async def test_environ_plugin(c, s, a, b):
from dask.distributed import Environ
await c.register_worker_plugin(Environ({"ABC": 123}))
async with Nanny(s.address, name="new") as n:
results = await c.run(os.getenv, "ABC")
assert results[a.worker_address] == "123"
assert results[b.worker_address] == "123"
assert results[n.worker_address] == "123"

@gjoseph92
Copy link
Copy Markdown
Collaborator Author

There's this existing tests which could serve as a good model

🤦 because the plugin is in plugin.py, I just looked in test_plugin.py, not test_nanny.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants