Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pluggable DependencyResolvers #3111

Merged
merged 26 commits into from
May 23, 2024
Merged

Conversation

benclifford
Copy link
Collaborator

@benclifford benclifford commented Feb 29, 2024

Description

This PR allows users to do things like store futures inside data structures such as dictionaries, which is a style of workflow that @Andrew-S-Rosen is especially enthusiastic about.

Changed Behaviour

By default nothing. This new behaviour will be (I think) somewhat slower when enabled, as a tradeoff for that newer functionality, but I have not quantified that.

Type of change

  • New feature

@Andrew-S-Rosen
Copy link
Contributor

Andrew-S-Rosen commented Mar 1, 2024

Thank you for taking an initial stab at this, @benclifford! As always, you've done a great job here thus far. (Linking this to #3108).

For tuples, this seems to work precisely as expected. I tried a slightly more complex example where there is a deferred Future via a __getitem__ call, and it still works.

import parsl
from parsl import python_app
from pathlib import Path

parsl.load()

@python_app
def job1():
    return {"directory": Path.cwd()}

@python_app
def job2(t: tuple[Path, str]):
    return Path(t[0], t[1])

job2((job1()["directory"], "hello")).result() # PosixPath('/home/rosen/test/hello')

In order for this to be practically useful, it would be ideal to have essentially the same support for other Python data structures. Naturally, the procedure for a list (and set) should be the same, although you have the added complexity of mutability. The dict was actually my original inspiration for this (see second example in #3108), as I had AppFutures as keys. For dict, doing an analogous check as the tuple would check all primary keys for AppFutures, which I think is a good start. All of these are effectively doing an if in check on any first-class data structure with a __contains__, which is at least internally consistent. The dict introduces some added complexity because if it works for primary keys, one might naively think the same about values, but that's a different beast altogether. This discussion, of course, neglects any mention of recursive traversal. That seems a fair bit tougher, and I too would be unsure if it would be wise to have it enabled by default if it were implemented.

@benclifford
Copy link
Collaborator Author

The dict introduces some added complexity because if it works for primary keys, one might naively think the same about values, but that's a different beast altogether.

I don't think there's any deep problem with values vs keys here: they're pairs of values-that-could-be-Futures that only get different meaning because things like the [] operator give them different meaning.

@Andrew-S-Rosen
Copy link
Contributor

Andrew-S-Rosen commented Mar 1, 2024

I don't think there's any deep problem with values vs keys here: they're pairs of values-that-could-be-Futures that only get different meaning because things like the [] operator give them different meaning.

Good point. So, with that being said, what should be the next course of action here to get it to the finish line? Is this something that you would like a hand on, or will you see it through when you get some time to do so?

@benclifford
Copy link
Collaborator Author

@Andrew-S-Rosen there's a bunch of grunt work that is implementing, for each data type that this behaviour should work on, the two singledispatch methods and a test case - that's not really high priority for me to work on, so implementing some of those would be a good thing do.

@Andrew-S-Rosen
Copy link
Contributor

Makes sense. Happy to take a stab at it when I get a spare moment.

@benclifford
Copy link
Collaborator Author

benclifford commented Mar 3, 2024

i realised that my implementation of tuple unwrap recreates every tuple, not only tuples that have futures in them -
because unwrap is called on every argument value, not only on values that are detected as Future-ish by the gather stage

@Andrew-S-Rosen
Copy link
Contributor

i realised that my implementation of tuple unwrap recreates every tuple, not only tuples that have futures in them - because unwrap is called on every argument value, not only on values that are detected as Future-ish by the gather stage

I noticed this as well but wasn't immediately sure how to avoid that.

@Andrew-S-Rosen
Copy link
Contributor

Andrew-S-Rosen commented Mar 24, 2024

Note to self: There are two main aspects left here to address.

1. We need to add support for dict types, e.g. with a Future as a key. We could also add support for the value, but it would be a shallow search just like we have for list, set, and tuple. I guess we'd probably generalize this to Mapping types rather than just dict, although I'm not sure it matters much.
2. We need to avoid doing the unwrap/recreate process for every single tuple, set, list, dict, etc. and only do it if there is a Future in it (for performance reasons).

Andrew-S-Rosen added a commit to Andrew-S-Rosen/parsl that referenced this pull request Mar 25, 2024
@Andrew-S-Rosen
Copy link
Contributor

@benclifford --- what should be done to continue this PR? Is it a need to find out how to not have it call the resolution every time?

@benclifford
Copy link
Collaborator Author

Is it a need to find out how to not have it call the resolution every time?

that, and this coming near the top of my attention stack...

@benclifford
Copy link
Collaborator Author

note to self, if this demo doesn't already do this: after some tossing round of ideas with @svandenhaute, I think possibly also join app end-results could be resolved this way. I suspect this PR doesn't actually do that but I think it's more consistent to do so and opens up some return value possibilities with less concurrency but less boilerplate.

@Andrew-S-Rosen
Copy link
Contributor

Is this good to go, @benclifford? :)

@benclifford benclifford marked this pull request as ready for review May 16, 2024 13:53
@benclifford
Copy link
Collaborator Author

I want to do a bit more tidyup around DependencyError and _unwrap_exceptions in a different PR - there is a bit of inconsistency in case handling there.

@benclifford
Copy link
Collaborator Author

see #3445 for:

I want to do a bit more tidyup around DependencyError and _unwrap_exceptions in a different PR - there is a bit of inconsistency in case handling there.

Copy link
Collaborator

@khk-globus khk-globus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing large jumps out at me, but I also note that we appear to be duplicating or recreating objects in the deep path. I wonder if that will prove to be a memory (if not strictly performance) concern for heavier use-cases. Specifically, this seems like an issue:

type_ = type(iterable)
return type_(map(deep_traverse_to_unwrap, iterable))

But I also don't think this is something to tackle until it becomes an issue. "YAGNI" and lazy evaluation being top of mind.

So, looks good, with some inline comments and suggestions if you'd like to follow up on them. (In particular, I do think the tests should be fleshed out, but I'll leave that to y'all's discretion.)

Comment on lines +93 to +104
When Parsl examines the arguments to an app, it uses a `DependencyResolver`.
The default `DependencyResolver` will cause Parsl to wait for
``concurrent.futures.Future`` instances (including `AppFuture` and
`DataFuture`), and pass through other arguments without waiting.

This behaviour is pluggable: Parsl comes with another dependency resolver,
`DEEP_DEPENDENCY_RESOLVER` which knows about futures contained with structures
such as tuples, lists, sets and dicts.

This plugin interface might be used to interface other task-like or future-like
objects to the Parsl dependency mechanism, by describing how they can be
interpreted as a Future.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This text is accurate, and points to the right place, but as a "documentation consumer," I find myself without a proper mental model for what this looks like. Would an example implementation be an undue burden to place here? Or at the end of one of the links?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on a presentation for this for next Tuesday so I'll try to use the preparation for that as a way to get my head around more introductory material.

parsl/dataflow/dependency_resolvers.py Outdated Show resolved Hide resolved
Comment on lines +207 to +209
self.dependency_resolver = self.config.dependency_resolver if self.config.dependency_resolver is not None \
else SHALLOW_DEPENDENCY_RESOLVER

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies that self.dependency_resolver is a required attribute. Is there utility in making it required at the configuration as well, rather than an implied requirement? That is, moving this conditional into config, and instead either trusting the config object, or asserting here? Perhaps something like:

class Config(...):
    def __init__(
        ...
        dependency_resolver: Optional[DependencyResolver],
        ...
    ):
        if dependency_resolver is None:
            dependency_resolver = SHALLOW_DEPENDENCY_RESOLVER
        self.dependency_resolver = dependency_resolver

Mypy may complain with that particular construction and not-Noneness (so fiddle!), but the point is that the config object is explicit as to what dependency resolver is in use.

Functionally a wash, I think (so I won't be fussed about this), but thinking in terms of overall clarity for when someone is poking at the REPL or CLI.

Comment on lines +14 to +15
def local_config():
return Config(dependency_resolver=DEEP_DEPENDENCY_RESOLVER)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good deal; this appears to test the majority of pathways for the deep resolver. But I think we should implement a similar set of tests for the shallow variant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current tests in this PR go through the whole Parsl machinery -- good! But an ancillary set of unit tests that verify strictly this class in isolation would be a good value-add. That is, this is a decently isolated class that doesn't depend on Parsl, so it's functionality could be verified independently of the rest of the infrastructure.

(Note that we've only recently added the tests/unit/ directory, so this would be a good second addition to that.)

@benclifford
Copy link
Collaborator Author

@khk-globus :

Nothing large jumps out at me, but I also note that we appear to be duplicating or recreating objects in the deep path. I wonder if that will prove to be a memory (if not strictly performance) concern for heavier use-cases.

[...]

But I also don't think this is something to tackle until it becomes an issue.

This was one of the things delaying me wanting to merge this but in some other conversations with @Andrew-S-Rosen I decided that for this PR:

i) this is an opt-in feature, and I'm fairly comfortable in this context with "you can opt into a worse-on-one-axis, better-on-another-axis" feature. More concerning is how this affects performances in the default case, but I think (without measuring) that it is noise around the function call stack and so I'm not super concerned.

ii) many execution paths already have quite heavy object-recreation behaviour that looks kinda like this: for example to any remote executor like HighThroughputExecutor, parameters go through a serialization/deserialization reconstruction.

So I expect there to be measurable performance change here, I expect there are much nicer ways to do it, users aren't subject to it initially, if it becomes a problem, someone can pay more attention later on.

@benclifford
Copy link
Collaborator Author

Merging what is here now. I'm working on this a bit more now, so hopefully there will be a follow-up PR addressing some more of @khk-globus 's comments.

@benclifford benclifford merged commit 1fc73aa into master May 23, 2024
6 checks passed
@benclifford benclifford deleted the benc-plugin-future-resolution branch May 23, 2024 11:50
@Andrew-S-Rosen
Copy link
Contributor

Wonderful. Thank you both!!

benclifford added a commit that referenced this pull request May 25, 2024
…hich hangs - rather than even giving an error directly
Andrew-S-Rosen added a commit to Quantum-Accelerators/quacc that referenced this pull request May 30, 2024
## Summary of Changes

Closes #1776.

Requires:
- Parsl/parsl#3111

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants