Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow default dfk #50 #252

Merged
merged 5 commits into from
May 6, 2018
Merged

Allow default dfk #50 #252

merged 5 commits into from
May 6, 2018

Conversation

annawoodard
Copy link
Collaborator

@annawoodard annawoodard commented May 1, 2018

This PR adds a singleton class for loading the DFK which allows for more natural importing of apps. These changes were necessary in order to allow the testing framework to swap DFKs in and out such that we can factorize the testing code and the testing configurations, and cycle through such that each test is run on each configuration. Note it is possible to load a DFK either from a file containing a configuration or from a local DFK instance; I have only documented the latter way in the docstring, trying to keep things simple for now. Here is a preview of the new documentation:

screen shot 2018-05-01 at 10 26 50 am

@annawoodard annawoodard requested a review from yadudoc May 1, 2018 19:34
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
if hasattr(module, 'config'):
cls._dfk = DataFlowKernel(config=module.config, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good/clean idea. Why would we not allow the payload to be a config ? I might be missing some advantage to this method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yadudoc: agreed, I've switched to just loading the config dict and rebased onto master.

- **kwargs : Arbitrary keyword arguments which are passed to the DataFlowKernel constructor.

"""
if cls._dfk is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return a bool, so that whoever calls can know if this succeeded ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. If it doesn't succeed, it will raise an error, which the caller can handle. That seems cleaner to me than checking against a bool. In the former case, the error-handling code would only need to be traversed if there's an error, whereas a check against a bool would have to be traversed every time.

@@ -0,0 +1,5 @@
from parsl import App

@App('python')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, @App('python') , @App('python', dfk) and @App('python', dfk=DFK) are all legal now ? This does look better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

@benclifford benclifford May 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more variable assignment to get @python !

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but let's wait on that for a separate PR

@annawoodard annawoodard force-pushed the allow-default-dfk-#50 branch 2 times, most recently from ea82bdc to 220af63 Compare May 4, 2018 03:06
from parsl.dataflow.dflow import DataFlowKernelLoader

from config import local_threads
DataFlowKernelLoader.load(local_threads)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of words in that classname bothers me for something that is approximately a simple, mandatory init operation.

Perhaps parsl.init(local_threads) and forget that the DFK exists at all: I don't see that the user needs to know that such a thing as a DFK exists, in the simple case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mandatory init operation.

It's not mandatory (you can still pass the DFK to the app decorator instead), but I agree it's long. I like your suggestion and will implement it. One thing to consider is that you can pass kwargs which are passed on to the DFK, so you'd need to look at the DFK docs to understand what is possible there. I am hoping when we re-work the config we can eliminate most of that, so that the main interface the user needs to deal with is the Config class. So for now I'm fine with getting rid of DataFlowKernel in the name here but still saying "these kwargs will be passed to the DFK", with the understanding that when the configs are tweaked we should be able to remove the reference to the DFK and just say "pass a Config", which the user can read about in the Config docs)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, to keep things simple, I'll just ditch passing the kwargs, and assume the user will manipulate the config dict beforehand if they need to.

:func:`~parsl.app.app.App` decorator, it must be loaded using
:meth:`DataFlowKernelLoader.load <parsl.dataflow.dflow.DataFlowKernelLoader.load>` prior
to importing the app.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"I think if you take the approach that a starting user doesn't need to know about the DFK at all, then this paragraph looks awkward: "here's this thing I'm going to tell you about, ok, BUT NOW! you don't need to know about it". Just don't tell me about it, and document it the same as the other optional parameters to the App decorator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a brief look at the code, it looks like the default-dfk resolution happens at the time of making a call to an @App annotated function, not at the point of declaring the @App-annotated function; in which case you can declare functions, and initialise later, as long as it's happened before executing. (I haven't actually tried that).

If that's the case, libraries of @Apps can be imported at the top of a file, like other imports - which I think is a nicer style.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @benclifford-- I made that change (having the resolution executed at calltime instead of declaration time) and forgot to change it in the docs. I'll update this.


"""
if cls._dfk is None:
cls.load(config, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't silently ignore a load if this is already initialised: probably throw an exception - it's a horrible user experience to spend 3 days wondering why your configuration is being ignored.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to throw an exception-- the point of this method is that it should only load if nobody else has loaded first. If you want to load no matter what, you should use load instead of set_default. Would more verbosity here be enough? ("Configuration has already been loaded; default will have no effect" and "Using default configuration").

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific failure case (that I have encountered on other projects) I am trying to prevent is:

A user, while hacking round on their application code, end up sticking in a bunch of inits in all different files.

Then six months later, when they've forget that they've done that, they modify one to include some different config options.

Then they spend a long time trying to understand why their config options aren't being applied.

I don't want people to load no matter what - I want to stop people putting multiple init calls in in the first place - or more specifically, specifying multiple configurations, and having one of them used based on the possibly fairly arbitrary choice of which of their modules was imported first.

At the very least output a warning, though I think it is insufficient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. The reason I put this in in the first place was in order for pytest to have control over the DFK while still being able to run tests interactively. Now that the resolution is executed at calltime instead of declaration time, that isn't necessary, so I'll just remove this method, which I think should address your concern.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my same concern applies to load() (and that's probably what I should have made this comment against as it's the documented one) - it silently overwrites one config with another, which I assert is a thing we should be reporting very hard as a likely mistake.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Plan is to 1) remove set_default 2) raise an exception if config is already loaded

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is good

To be consistent with our naming conventions, functions should be
lowercase. But if this is lowercase, it will collide with the `exec`
statement. So I have re-named it.
This allows users to load a configuration which will be used to
instantiate a DFK, which is set as the active DFK. After a DFK has been
loaded, Apps can be imported without specifying an executor. If an app
does not specify an executor and a DFK has not been loaded, a
`RuntimeError` will be raised. Fixes #50.
I am adding the examples in a separate folder so they can be more
easily tested. Fixes #141.
@annawoodard
Copy link
Collaborator Author

@benclifford sorry for the excessive rebasing, as we were discussing on Slack, I'm trying to keep "doing X", "undoing X, doing Y"-type commits out of the history where possible. Next time I'll put the unrelated commits in a separate PR, or at least before the relevant commits so that only the actually-changed commits are updated. I've implemented your requests, you can see the relevant diff here: 1ec7d04...3f0fbac. A few notes:

  • To avoid using globals I wanted to retain the singleton class, but I've concealed it from the user by importing DataFlowKernelLoader.load in parsl.__init__.py such that the user can just call parsl.load. The motivation for doing that over writing a wrapper function was so that the method documentation only needs to appear once (note it is accessible via >>>help(parsl.load)).
  • Because init might be confused with __init__, I went with parsl.load instead of parsl.init. Let me know if you have a strong preference.

Thanks for your feedback! Here's a preview of the docs:

screen shot 2018-05-04 at 11 02 08 am

@benclifford
Copy link
Collaborator

looks good

@yadudoc yadudoc merged commit 37850a2 into master May 6, 2018
@annawoodard annawoodard deleted the allow-default-dfk-#50 branch May 9, 2018 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants