Update `DaskTaskRunner` for compatibility with the updated engine #13555

desertaxle · 2024-05-24T11:42:20Z

Updates the task runner to implement the new TaskRunner interface. The new task runner delegates most responsibilities to the PrefectDistributedClient, but the task runner handles wrapping and unwrapping PrefectDaskFutures to ensure work is efficiently scheduled on a Dask cluster.

Example

Checklist

This pull request references any related issue by including "closes <link to issue>"
- If no issue exists and your change is not a small fix, please create an issue first.
If this pull request adds new functionality, it includes unit tests that cover the changes
This pull request includes a label categorizing the change e.g. maintenance, fix, feature, enhancement, docs.

For documentation changes:

This pull request includes redirect settings in netlify.toml for files that are removed or renamed.

For new functions or classes in the Python SDK:

This pull request includes helpful docstrings.
If a new Python file was added, this pull request contains a stub page in the Python SDK docs and an entry in mkdocs.yml navigation.

desertaxle · 2024-05-24T17:15:08Z

src/integrations/prefect-dask/prefect_dask/task_runners.py

+        from prefect.utilities.engine import (
+            collect_task_run_inputs_sync,
+            resolve_inputs_sync,
+        )
+
+        # We need to resolve some futures to map over their data, collect the upstream
+        # links beforehand to retain relationship tracking.
+        task_inputs = {
+            k: collect_task_run_inputs_sync(v, max_depth=0)
+            for k, v in parameters.items()
+        }
+
+        # Resolve the top-level parameters in order to get mappable data of a known length.
+        # Nested parameters will be resolved in each mapped child where their relationships
+        # will also be tracked.
+        parameters = resolve_inputs_sync(parameters, max_depth=0)
+
+        # Ensure that any parameters in kwargs are expanded before this check
+        parameters = explode_variadic_parameter(task.fn, parameters)
+
+        iterable_parameters = {}
+        static_parameters = {}
+        annotated_parameters = {}
+        for key, val in parameters.items():
+            if isinstance(val, (allow_failure, quote)):
+                # Unwrap annotated parameters to determine if they are iterable
+                annotated_parameters[key] = val
+                val = val.unwrap()
+
+            if isinstance(val, unmapped):
+                static_parameters[key] = val.value
+            elif isiterable(val):
+                iterable_parameters[key] = list(val)
+            else:
+                static_parameters[key] = val
+
+        if not len(iterable_parameters):
+            raise MappingMissingIterable(
+                "No iterable parameters were received. Parameters for map must "
+                f"include at least one iterable. Parameters: {parameters}"
+            )
+
+        iterable_parameter_lengths = {
+            key: len(val) for key, val in iterable_parameters.items()
+        }
+        lengths = set(iterable_parameter_lengths.values())
+        if len(lengths) > 1:
+            raise MappingLengthMismatch(
+                "Received iterable parameters with different lengths. Parameters for map"
+                f" must all be the same length. Got lengths: {iterable_parameter_lengths}"
+            )
+
+        map_length = list(lengths)[0]
+
+        futures = []
+        for i in range(map_length):
+            call_parameters = {
+                key: value[i] for key, value in iterable_parameters.items()
+            }
+            call_parameters.update(
+                {key: value for key, value in static_parameters.items()}
+            )
+
+            # Add default values for parameters; these are skipped earlier since they should
+            # not be mapped over
+            for key, value in get_parameter_defaults(task.fn).items():
+                call_parameters.setdefault(key, value)
+
+            # Re-apply annotations to each key again
+            for key, annotation in annotated_parameters.items():
+                call_parameters[key] = annotation.rewrap(call_parameters[key])
+
+            # Collapse any previously exploded kwargs
+            call_parameters = collapse_variadic_parameters(task.fn, call_parameters)


This is identical to a chunk of logic in the ThreadPoolTaskRunner. If I need to repeat this in the RayTaskRunner too, I'll move it up to the TaskRunner base class.

cicdw

overall looks good, left a few questions that I'd like to understand before ✅

cicdw · 2024-05-25T00:20:20Z

src/integrations/prefect-dask/prefect_dask/task_runners.py

+        parameters: Dict[str, Any],
+        wait_for: Iterable[PrefectFuture],
+        dependencies: Optional[Dict[str, Set[TaskRunInput]]] = None,
+    ) -> PrefectDaskFuture:


Every future type that we implement for each task runner will have the same interface, right? It's just there will be some special handling based on the underlying system?

Yes, Each specific future will have a different implementation for waiting for a future and getting a future result, but the .wait, .result, and .state interfaces will all be the same.

cicdw · 2024-05-25T00:21:14Z

src/integrations/prefect-dask/prefect_dask/task_runners.py

-        """
-        self.__dict__.update(data)
-        self._client = distributed.get_client()
+    def __exit__(self, *args):


What let you drop the serialization requirement?

Previously, a PrefectFuture carried an instance of the task runner that submitted the run around with it. We don't have to do that anymore because we directly wrap a future, so we shouldn't be pickling task runners anymore.

Base automatically changed from udpated-dask-task-runner to main May 24, 2024 15:49

desertaxle force-pushed the udpated-dask-task-runner-2 branch from 6600f22 to e298937 Compare May 24, 2024 16:50

desertaxle added 5 commits May 24, 2024 11:51

Updates DaskTaskRunner

472bd20

Fixes some issues in the new task runner

6ef3905

Unskip utils tests

2563711

Add test and fix bugs

17c177c

Adds necessary prefect changes

aff963e

desertaxle force-pushed the udpated-dask-task-runner-2 branch from e298937 to aff963e Compare May 24, 2024 16:52

Update fixture usage

4328a15

desertaxle commented May 24, 2024

View reviewed changes

Merge branch 'main' into udpated-dask-task-runner-2

e7fa5b8

desertaxle marked this pull request as ready for review May 24, 2024 17:25

desertaxle requested review from zzstoatzz, chrisguidry and a team as code owners May 24, 2024 17:25

desertaxle requested review from abrookins and cicdw May 24, 2024 17:25

cicdw reviewed May 25, 2024

View reviewed changes

desertaxle requested a review from cicdw May 27, 2024 03:00

cicdw approved these changes May 27, 2024

View reviewed changes

desertaxle merged commit 9e7bb98 into main May 27, 2024
30 checks passed

desertaxle deleted the udpated-dask-task-runner-2 branch May 27, 2024 04:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `DaskTaskRunner` for compatibility with the updated engine #13555

Update `DaskTaskRunner` for compatibility with the updated engine #13555

desertaxle commented May 24, 2024 •

edited

Loading

desertaxle May 24, 2024

cicdw left a comment

cicdw May 25, 2024

desertaxle May 27, 2024

cicdw May 25, 2024

desertaxle May 27, 2024

cicdw May 27, 2024

Update DaskTaskRunner for compatibility with the updated engine #13555

Update DaskTaskRunner for compatibility with the updated engine #13555

Conversation

desertaxle commented May 24, 2024 • edited Loading

Example

Checklist

desertaxle May 24, 2024

Choose a reason for hiding this comment

cicdw left a comment

Choose a reason for hiding this comment

cicdw May 25, 2024

Choose a reason for hiding this comment

desertaxle May 27, 2024

Choose a reason for hiding this comment

cicdw May 25, 2024

Choose a reason for hiding this comment

desertaxle May 27, 2024

Choose a reason for hiding this comment

cicdw May 27, 2024

Choose a reason for hiding this comment

Update `DaskTaskRunner` for compatibility with the updated engine #13555

Update `DaskTaskRunner` for compatibility with the updated engine #13555

desertaxle commented May 24, 2024 •

edited

Loading