Add pluggable DependencyResolvers #3111

benclifford · 2024-02-29T11:28:19Z

Description

This PR allows users to do things like store futures inside data structures such as dictionaries, which is a style of workflow that @Andrew-S-Rosen is especially enthusiastic about.

Changed Behaviour

By default nothing. This new behaviour will be (I think) somewhat slower when enabled, as a tradeoff for that newer functionality, but I have not quantified that.

Type of change

New feature

… when trying to render a DependencyException to a str, with tid = None not a str - the .join fails. this should be a separate tested/bugfix PR.

Andrew-S-Rosen · 2024-03-01T03:11:12Z

Thank you for taking an initial stab at this, @benclifford! As always, you've done a great job here thus far. (Linking this to #3108).

For tuples, this seems to work precisely as expected. I tried a slightly more complex example where there is a deferred Future via a __getitem__ call, and it still works.

import parsl
from parsl import python_app
from pathlib import Path

parsl.load()

@python_app
def job1():
    return {"directory": Path.cwd()}

@python_app
def job2(t: tuple[Path, str]):
    return Path(t[0], t[1])

job2((job1()["directory"], "hello")).result() # PosixPath('/home/rosen/test/hello')

In order for this to be practically useful, it would be ideal to have essentially the same support for other Python data structures. Naturally, the procedure for a list (and set) should be the same, although you have the added complexity of mutability. The dict was actually my original inspiration for this (see second example in #3108), as I had AppFutures as keys. For dict, doing an analogous check as the tuple would check all primary keys for AppFutures, which I think is a good start. All of these are effectively doing an if in check on any first-class data structure with a __contains__, which is at least internally consistent. The dict introduces some added complexity because if it works for primary keys, one might naively think the same about values, but that's a different beast altogether. This discussion, of course, neglects any mention of recursive traversal. That seems a fair bit tougher, and I too would be unsure if it would be wise to have it enabled by default if it were implemented.

benclifford · 2024-03-01T08:49:16Z

The dict introduces some added complexity because if it works for primary keys, one might naively think the same about values, but that's a different beast altogether.

I don't think there's any deep problem with values vs keys here: they're pairs of values-that-could-be-Futures that only get different meaning because things like the [] operator give them different meaning.

Andrew-S-Rosen · 2024-03-01T18:16:11Z

I don't think there's any deep problem with values vs keys here: they're pairs of values-that-could-be-Futures that only get different meaning because things like the [] operator give them different meaning.

Good point. So, with that being said, what should be the next course of action here to get it to the finish line? Is this something that you would like a hand on, or will you see it through when you get some time to do so?

benclifford · 2024-03-02T10:54:47Z

@Andrew-S-Rosen there's a bunch of grunt work that is implementing, for each data type that this behaviour should work on, the two singledispatch methods and a test case - that's not really high priority for me to work on, so implementing some of those would be a good thing do.

Andrew-S-Rosen · 2024-03-02T21:10:11Z

Makes sense. Happy to take a stab at it when I get a spare moment.

benclifford · 2024-03-03T15:29:02Z

i realised that my implementation of tuple unwrap recreates every tuple, not only tuples that have futures in them -
because unwrap is called on every argument value, not only on values that are detected as Future-ish by the gather stage

Andrew-S-Rosen · 2024-03-03T16:49:17Z

i realised that my implementation of tuple unwrap recreates every tuple, not only tuples that have futures in them - because unwrap is called on every argument value, not only on values that are detected as Future-ish by the gather stage

I noticed this as well but wasn't immediately sure how to avoid that.

Andrew-S-Rosen · 2024-03-24T19:41:20Z

Note to self: There are two main aspects left here to address.

1. We need to add support for dict types, e.g. with a Future as a key. We could also add support for the value, but it would be a shallow search just like we have for list, set, and tuple. I guess we'd probably generalize this to Mapping types rather than just dict, although I'm not sure it matters much.
2. We need to avoid doing the unwrap/recreate process for every single tuple, set, list, dict, etc. and only do it if there is a Future in it (for performance reasons).

Andrew-S-Rosen · 2024-04-20T17:26:34Z

@benclifford --- what should be done to continue this PR? Is it a need to find out how to not have it call the resolution every time?

benclifford · 2024-04-21T06:47:04Z

Is it a need to find out how to not have it call the resolution every time?

that, and this coming near the top of my attention stack...

benclifford · 2024-04-25T07:24:46Z

note to self, if this demo doesn't already do this: after some tossing round of ideas with @svandenhaute, I think possibly also join app end-results could be resolved this way. I suspect this PR doesn't actually do that but I think it's more consistent to do so and opens up some return value possibilities with less concurrency but less boilerplate.

…resolution

Andrew-S-Rosen · 2024-05-15T17:51:43Z

Is this good to go, @benclifford? :)

benclifford · 2024-05-21T14:28:58Z

I want to do a bit more tidyup around DependencyError and _unwrap_exceptions in a different PR - there is a bit of inconsistency in case handling there.

benclifford · 2024-05-21T14:56:58Z

see #3445 for:

I want to do a bit more tidyup around DependencyError and _unwrap_exceptions in a different PR - there is a bit of inconsistency in case handling there.

…resolution

khk-globus

Nothing large jumps out at me, but I also note that we appear to be duplicating or recreating objects in the deep path. I wonder if that will prove to be a memory (if not strictly performance) concern for heavier use-cases. Specifically, this seems like an issue:

type_ = type(iterable)
return type_(map(deep_traverse_to_unwrap, iterable))

But I also don't think this is something to tackle until it becomes an issue. "YAGNI" and lazy evaluation being top of mind.

So, looks good, with some inline comments and suggestions if you'd like to follow up on them. (In particular, I do think the tests should be fleshed out, but I'll leave that to y'all's discretion.)

khk-globus · 2024-05-22T14:21:00Z

docs/userguide/plugins.rst

+When Parsl examines the arguments to an app, it uses a `DependencyResolver`.
+The default `DependencyResolver` will cause Parsl to wait for
+``concurrent.futures.Future`` instances (including `AppFuture` and
+`DataFuture`), and pass through other arguments without waiting.
+
+This behaviour is pluggable: Parsl comes with another dependency resolver,
+`DEEP_DEPENDENCY_RESOLVER` which knows about futures contained with structures
+such as tuples, lists, sets and dicts.
+
+This plugin interface might be used to interface other task-like or future-like
+objects to the Parsl dependency mechanism, by describing how they can be
+interpreted as a Future.


This text is accurate, and points to the right place, but as a "documentation consumer," I find myself without a proper mental model for what this looks like. Would an example implementation be an undue burden to place here? Or at the end of one of the links?

I'm working on a presentation for this for next Tuesday so I'll try to use the preparation for that as a way to get my head around more introductory material.

parsl/dataflow/dependency_resolvers.py

khk-globus · 2024-05-22T14:35:42Z

parsl/dataflow/dflow.py

+        self.dependency_resolver = self.config.dependency_resolver if self.config.dependency_resolver is not None \
+            else SHALLOW_DEPENDENCY_RESOLVER
+


This implies that self.dependency_resolver is a required attribute. Is there utility in making it required at the configuration as well, rather than an implied requirement? That is, moving this conditional into config, and instead either trusting the config object, or asserting here? Perhaps something like:

class Config(...): def __init__( ... dependency_resolver: Optional[DependencyResolver], ... ): if dependency_resolver is None: dependency_resolver = SHALLOW_DEPENDENCY_RESOLVER self.dependency_resolver = dependency_resolver

Mypy may complain with that particular construction and not-Noneness (so fiddle!), but the point is that the config object is explicit as to what dependency resolver is in use.

Functionally a wash, I think (so I won't be fussed about this), but thinking in terms of overall clarity for when someone is poking at the REPL or CLI.

khk-globus · 2024-05-22T14:38:37Z

parsl/tests/test_python_apps/test_pluggable_future_resolution.py

+def local_config():
+    return Config(dependency_resolver=DEEP_DEPENDENCY_RESOLVER)


Good deal; this appears to test the majority of pathways for the deep resolver. But I think we should implement a similar set of tests for the shallow variant.

khk-globus · 2024-05-22T14:41:47Z

parsl/dataflow/dependency_resolvers.py

The current tests in this PR go through the whole Parsl machinery -- good! But an ancillary set of unit tests that verify strictly this class in isolation would be a good value-add. That is, this is a decently isolated class that doesn't depend on Parsl, so it's functionality could be verified independently of the rest of the infrastructure.

(Note that we've only recently added the tests/unit/ directory, so this would be a good second addition to that.)

benclifford · 2024-05-23T09:37:30Z

@khk-globus :

Nothing large jumps out at me, but I also note that we appear to be duplicating or recreating objects in the deep path. I wonder if that will prove to be a memory (if not strictly performance) concern for heavier use-cases.

[...]

But I also don't think this is something to tackle until it becomes an issue.

This was one of the things delaying me wanting to merge this but in some other conversations with @Andrew-S-Rosen I decided that for this PR:

i) this is an opt-in feature, and I'm fairly comfortable in this context with "you can opt into a worse-on-one-axis, better-on-another-axis" feature. More concerning is how this affects performances in the default case, but I think (without measuring) that it is noise around the function call stack and so I'm not super concerned.

ii) many execution paths already have quite heavy object-recreation behaviour that looks kinda like this: for example to any remote executor like HighThroughputExecutor, parameters go through a serialization/deserialization reconstruction.

So I expect there to be measurable performance change here, I expect there are much nicer ways to do it, users aren't subject to it initially, if it becomes a problem, someone can pay more attention later on.

…resolution

benclifford · 2024-05-23T11:43:31Z

Merging what is here now. I'm working on this a bit more now, so hopefully there will be a follow-up PR addressing some more of @khk-globus 's comments.

Andrew-S-Rosen · 2024-05-23T14:03:42Z

Wonderful. Thank you both!!

…hich hangs - rather than even giving an error directly

## Summary of Changes Closes #1776. Requires: - Parsl/parsl#3111 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

benclifford added 5 commits February 29, 2024 10:53

Add deliberately breaking test for deep-future-resolution

fd9f668

Replace explicit future check with traverse calls

020a37f

Add example for tuples

1889bf9

fiddling round unlocks a new codepath that is not tested, and breaks,…

3241189

… when trying to render a DependencyException to a str, with tid = None not a str - the .join fails. this should be a separate tested/bugfix PR.

Merge branch 'master' into benc-plugin-future-resolution

9fabe49

Andrew-S-Rosen mentioned this pull request Mar 2, 2024

The copy_files keyword can fail when using the output directory from a prior @job with Parsl Quantum-Accelerators/quacc#1776

Closed

Andrew-S-Rosen added a commit to Andrew-S-Rosen/parsl that referenced this pull request Mar 3, 2024

Adding onto Parsl#3111

6ee91a4

Andrew-S-Rosen added a commit to Andrew-S-Rosen/parsl that referenced this pull request Mar 3, 2024

Adding onto Parsl#3111

051ab70

Andrew-S-Rosen mentioned this pull request Mar 3, 2024

Adding onto #3111: shallow resolution of lists, sets, and tuples #3118

Merged

Adding onto #3111: shallow resolution of lists, sets, and tuples (#3118)

e382e39

This was referenced Mar 24, 2024

[WIP] Add dict support for #3111 #3288

Closed

Add support for dicts in future-resolution-plugin (#3111) #3289

Merged

Add support for dicts in future-resolution-plugin (#3111) (#3289)

c0827be

Andrew-S-Rosen added a commit to Andrew-S-Rosen/parsl that referenced this pull request Mar 25, 2024

Use dict instead of Mapping in Parsl#3111

1772687

Andrew-S-Rosen mentioned this pull request Mar 25, 2024

Fix copy_files support with Parsl Quantum-Accelerators/quacc#1942

Merged

Use dict instead of Mapping in #3111 (#3291)

99ec556

benclifford added the demo label Apr 18, 2024

benclifford added 2 commits April 29, 2024 12:42

Merge remote-tracking branch 'origin/master' into benc-plugin-future-…

01f51b2

…resolution

Allow dependency resolver to be configured per-DFK

d979b08

benclifford added 5 commits April 29, 2024 16:55

Add a (broken/not implemented) test of very deep traversal

48bfb58

Implement deep list traversal

282f609

test tuples and lists deep resolver

77caf05

traverse dicts

93e4b9a

Remove some TODOs

7ac5b72

benclifford marked this pull request as ready for review May 16, 2024 13:53

benclifford requested a review from khk-globus May 17, 2024 15:22

benclifford assigned yadudoc May 17, 2024

Merge branch 'master' into benc-plugin-future-resolution

c1d7206

benclifford requested a review from yadudoc May 17, 2024 15:25

benclifford unassigned yadudoc May 17, 2024

benclifford removed the demo label May 17, 2024

benclifford added 2 commits May 21, 2024 14:20

Remove now-misplaced comment

3f54c17

Fix typo in debug log

0234d45

benclifford added 3 commits May 21, 2024 16:03

Merge remote-tracking branch 'origin/master' into benc-plugin-future-…

f781148

…resolution

Remove type expansion that was addressed differently in #3445

ffc6b4f

Fix typo in docstring

59628e0

khk-globus approved these changes May 22, 2024

View reviewed changes

benclifford added 2 commits May 23, 2024 09:38

Merge remote-tracking branch 'origin/master' into benc-plugin-future-…

b703b91

…resolution

pronoun typo fix in docstring

961bb6d

benclifford merged commit 1fc73aa into master May 23, 2024
6 checks passed

benclifford deleted the benc-plugin-future-resolution branch May 23, 2024 11:50

benclifford added a commit that referenced this pull request May 25, 2024

Add a test for a bug discovered working on a presentation of #3111, w…

9057dda

…hich hangs - rather than even giving an error directly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pluggable DependencyResolvers #3111

Add pluggable DependencyResolvers #3111

benclifford commented Feb 29, 2024 •

edited

Andrew-S-Rosen commented Mar 1, 2024 •

edited

benclifford commented Mar 1, 2024

Andrew-S-Rosen commented Mar 1, 2024 •

edited

benclifford commented Mar 2, 2024

Andrew-S-Rosen commented Mar 2, 2024

benclifford commented Mar 3, 2024 •

edited

Andrew-S-Rosen commented Mar 3, 2024

Andrew-S-Rosen commented Mar 24, 2024 •

edited

Andrew-S-Rosen commented Apr 20, 2024

benclifford commented Apr 21, 2024

benclifford commented Apr 25, 2024

Andrew-S-Rosen commented May 15, 2024

benclifford commented May 21, 2024

benclifford commented May 21, 2024

khk-globus left a comment

khk-globus May 22, 2024

benclifford May 23, 2024

khk-globus May 22, 2024

khk-globus May 22, 2024

khk-globus May 22, 2024

benclifford commented May 23, 2024

benclifford commented May 23, 2024

Andrew-S-Rosen commented May 23, 2024

		self.dependency_resolver = self.config.dependency_resolver if self.config.dependency_resolver is not None \
		else SHALLOW_DEPENDENCY_RESOLVER

		def local_config():
		return Config(dependency_resolver=DEEP_DEPENDENCY_RESOLVER)

Add pluggable DependencyResolvers #3111

Add pluggable DependencyResolvers #3111

Conversation

benclifford commented Feb 29, 2024 • edited

Description

Changed Behaviour

Type of change

Andrew-S-Rosen commented Mar 1, 2024 • edited

benclifford commented Mar 1, 2024

Andrew-S-Rosen commented Mar 1, 2024 • edited

benclifford commented Mar 2, 2024

Andrew-S-Rosen commented Mar 2, 2024

benclifford commented Mar 3, 2024 • edited

Andrew-S-Rosen commented Mar 3, 2024

Andrew-S-Rosen commented Mar 24, 2024 • edited

Andrew-S-Rosen commented Apr 20, 2024

benclifford commented Apr 21, 2024

benclifford commented Apr 25, 2024

Andrew-S-Rosen commented May 15, 2024

benclifford commented May 21, 2024

benclifford commented May 21, 2024

khk-globus left a comment

Choose a reason for hiding this comment

khk-globus May 22, 2024

Choose a reason for hiding this comment

benclifford May 23, 2024

Choose a reason for hiding this comment

khk-globus May 22, 2024

Choose a reason for hiding this comment

khk-globus May 22, 2024

Choose a reason for hiding this comment

khk-globus May 22, 2024

Choose a reason for hiding this comment

benclifford commented May 23, 2024

benclifford commented May 23, 2024

Andrew-S-Rosen commented May 23, 2024

benclifford commented Feb 29, 2024 •

edited

Andrew-S-Rosen commented Mar 1, 2024 •

edited

Andrew-S-Rosen commented Mar 1, 2024 •

edited

benclifford commented Mar 3, 2024 •

edited

Andrew-S-Rosen commented Mar 24, 2024 •

edited