-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow default dfk #50 #252
Conversation
parsl/dataflow/dflow.py
Outdated
module = importlib.util.module_from_spec(spec) | ||
spec.loader.exec_module(module) | ||
if hasattr(module, 'config'): | ||
cls._dfk = DataFlowKernel(config=module.config, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good/clean idea. Why would we not allow the payload to be a config ? I might be missing some advantage to this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yadudoc: agreed, I've switched to just loading the config dict and rebased onto master.
parsl/dataflow/dflow.py
Outdated
- **kwargs : Arbitrary keyword arguments which are passed to the DataFlowKernel constructor. | ||
|
||
""" | ||
if cls._dfk is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return a bool, so that whoever calls can know if this succeeded ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow. If it doesn't succeed, it will raise an error, which the caller can handle. That seems cleaner to me than checking against a bool. In the former case, the error-handling code would only need to be traversed if there's an error, whereas a check against a bool would have to be traversed every time.
@@ -0,0 +1,5 @@ | |||
from parsl import App | |||
|
|||
@App('python') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, @App('python')
, @App('python', dfk)
and @App('python', dfk=DFK)
are all legal now ? This does look better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more variable assignment to get @python
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but let's wait on that for a separate PR
ea82bdc
to
220af63
Compare
from parsl.dataflow.dflow import DataFlowKernelLoader | ||
|
||
from config import local_threads | ||
DataFlowKernelLoader.load(local_threads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of words in that classname bothers me for something that is approximately a simple, mandatory init operation.
Perhaps parsl.init(local_threads)
and forget that the DFK exists at all: I don't see that the user needs to know that such a thing as a DFK exists, in the simple case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mandatory init operation.
It's not mandatory (you can still pass the DFK to the app decorator instead), but I agree it's long. I like your suggestion and will implement it. One thing to consider is that you can pass kwargs which are passed on to the DFK, so you'd need to look at the DFK docs to understand what is possible there. I am hoping when we re-work the config we can eliminate most of that, so that the main interface the user needs to deal with is the Config
class. So for now I'm fine with getting rid of DataFlowKernel
in the name here but still saying "these kwargs will be passed to the DFK", with the understanding that when the configs are tweaked we should be able to remove the reference to the DFK and just say "pass a Config
", which the user can read about in the Config
docs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, to keep things simple, I'll just ditch passing the kwargs, and assume the user will manipulate the config dict beforehand if they need to.
docs/userguide/importing.rst
Outdated
:func:`~parsl.app.app.App` decorator, it must be loaded using | ||
:meth:`DataFlowKernelLoader.load <parsl.dataflow.dflow.DataFlowKernelLoader.load>` prior | ||
to importing the app. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"I think if you take the approach that a starting user doesn't need to know about the DFK at all, then this paragraph looks awkward: "here's this thing I'm going to tell you about, ok, BUT NOW! you don't need to know about it". Just don't tell me about it, and document it the same as the other optional parameters to the App
decorator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a brief look at the code, it looks like the default-dfk resolution happens at the time of making a call to an @App
annotated function, not at the point of declaring the @App
-annotated function; in which case you can declare functions, and initialise later, as long as it's happened before executing. (I haven't actually tried that).
If that's the case, libraries of @App
s can be imported at the top of a file, like other imports - which I think is a nicer style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @benclifford-- I made that change (having the resolution executed at calltime instead of declaration time) and forgot to change it in the docs. I'll update this.
parsl/dataflow/dflow.py
Outdated
|
||
""" | ||
if cls._dfk is None: | ||
cls.load(config, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't silently ignore a load if this is already initialised: probably throw an exception - it's a horrible user experience to spend 3 days wondering why your configuration is being ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to throw an exception-- the point of this method is that it should only load if nobody else has loaded first. If you want to load no matter what, you should use load
instead of set_default
. Would more verbosity here be enough? ("Configuration has already been loaded; default will have no effect" and "Using default configuration").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specific failure case (that I have encountered on other projects) I am trying to prevent is:
A user, while hacking round on their application code, end up sticking in a bunch of inits in all different files.
Then six months later, when they've forget that they've done that, they modify one to include some different config options.
Then they spend a long time trying to understand why their config options aren't being applied.
I don't want people to load no matter what - I want to stop people putting multiple init calls in in the first place - or more specifically, specifying multiple configurations, and having one of them used based on the possibly fairly arbitrary choice of which of their modules was imported first.
At the very least output a warning, though I think it is insufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. The reason I put this in in the first place was in order for pytest
to have control over the DFK while still being able to run tests interactively. Now that the resolution is executed at calltime instead of declaration time, that isn't necessary, so I'll just remove this method, which I think should address your concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my same concern applies to load()
(and that's probably what I should have made this comment against as it's the documented one) - it silently overwrites one config with another, which I assert is a thing we should be reporting very hard as a likely mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Plan is to 1) remove set_default
2) raise an exception if config is already loaded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that is good
220af63
to
12c3559
Compare
To be consistent with our naming conventions, functions should be lowercase. But if this is lowercase, it will collide with the `exec` statement. So I have re-named it.
This allows users to load a configuration which will be used to instantiate a DFK, which is set as the active DFK. After a DFK has been loaded, Apps can be imported without specifying an executor. If an app does not specify an executor and a DFK has not been loaded, a `RuntimeError` will be raised. Fixes #50.
I am adding the examples in a separate folder so they can be more easily tested. Fixes #141.
12c3559
to
3f0fbac
Compare
@benclifford sorry for the excessive rebasing, as we were discussing on Slack, I'm trying to keep "doing X", "undoing X, doing Y"-type commits out of the history where possible. Next time I'll put the unrelated commits in a separate PR, or at least before the relevant commits so that only the actually-changed commits are updated. I've implemented your requests, you can see the relevant diff here: 1ec7d04...3f0fbac. A few notes:
Thanks for your feedback! Here's a preview of the docs: |
looks good |
This PR adds a singleton class for loading the DFK which allows for more natural importing of apps. These changes were necessary in order to allow the testing framework to swap DFKs in and out such that we can factorize the testing code and the testing configurations, and cycle through such that each test is run on each configuration. Note it is possible to load a DFK either from a file containing a configuration or from a local DFK instance; I have only documented the latter way in the docstring, trying to keep things simple for now. Here is a preview of the new documentation: