Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow HookContext to access Output Metadata #20025

Closed
JYoussouf opened this issue Feb 24, 2024 · 1 comment
Closed

Allow HookContext to access Output Metadata #20025

JYoussouf opened this issue Feb 24, 2024 · 1 comment

Comments

@JYoussouf
Copy link
Contributor

JYoussouf commented Feb 24, 2024

What's the use case?

Upon hook execution, one can access output values via the HookContext (HookContext.op_output_values, but there is currently no way to access output metadata in the same way. Enabling this would allow users to pass information more dynamically to their hooks, flexible with respect to each individual ops Output metadata.

One application of this is in dynamically applying tags to runs based on the metadata of the last failed/successful run.

Ideas of implementation

If a hook is present, we currently initialize a step_output_capture dict here:

# Enable step output capture if there are any hooks which will receive them.
# Expect in the future that hooks may control whether or not they get outputs,
# but for now presence of any will cause output capture.
if self.job_def.get_all_hooks_for_handle(self.node_handle):
self._step_output_capture = {}

then, we capture the step output value here:
if step_context.step_output_capture is not None:
step_context.step_output_capture[step_output_handle] = output.value

and finally, the hook accesses this the step_output_capture dict here:

@property
def solid_output_values(self) -> Mapping[str, Union[Any, Mapping[str, Any]]]:
"""The computed output values.
Returns a dictionary where keys are output names and the values are:
* the output values in the normal case
* a dictionary from mapping key to corresponding value in the mapped case
"""
results: Dict[str, Union[Any, Dict[str, Any]]] = {}
captured = self._step_execution_context.step_output_capture
if captured is None:
check.failed("Outputs were unexpectedly not captured for hook")
# make the returned values more user-friendly
for step_output_handle, value in captured.items():
if step_output_handle.mapping_key:
if results.get(step_output_handle.output_name) is None:
results[step_output_handle.output_name] = {
step_output_handle.mapping_key: value
}
else:
results[step_output_handle.output_name][step_output_handle.mapping_key] = value
else:
results[step_output_handle.output_name] = value
return results

Implementation:

Option 1: One proposed solution would be to change what gets stored in step_context.step_output_capture, to include both {"value": output.value, "metadata": output.metadata}. This is a relatively simple patch, but it will break nearly all applications that currently call HookContext.op_output_values, since we would be returning a dict now, instead of a single output value.

    if step_context.step_output_capture is not None:
        step_context.step_output_capture[step_output_handle] = {"value": output.value, "metadata": output.metadata}

Option 2: My preferred solution is to replicate the step_output_capture behaviour with the _output_metadata property currently available here:

self._output_metadata: Dict[str, Any] = {}

We could then add the following line:
step_context.output_metadata[step_output_handle] = output.metadata to execute_step.py
and a solid_output_metadata/op_output_metadata method to hooks.py

Additional information

N/A

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

@JYoussouf
Copy link
Contributor Author

JYoussouf commented Feb 24, 2024

I've created a PR for this here: #20027

smackesey added a commit that referenced this issue Mar 19, 2024
## Summary & Motivation
Please refer to the following issue/project spec for context and
alternative solutions:
#20025.

Context:
> Upon hook execution, one can access output values via the HookContext
(using HookContext's `op_output_values`), but there is currently no way
to access output metadata in the same way. Enabling this would allow
users to pass information more dynamically to their hooks with respect
to each individual ops Output metadata. One application of this is in
dynamically applying tags to runs based on the metadata of the last
failed/successful run.

_Transparency note: Long-time Dagster user, first time Dagster
contributor, thanks in advance for your support!_

## How I Tested These Changes
- Ran `make ruff` and `make pyright` before committing
- Successfully ran `python -m pytest
python_modules/dagster/dagster_tests/core_tests/`
- Installed these changes locally with `pip install -e
dagster/python_modules/dagster` and confirmed successful cases

---------

Co-authored-by: Sean Mackesey <s.mackesey@gmail.com>
PedramNavid pushed a commit that referenced this issue Mar 28, 2024
## Summary & Motivation
Please refer to the following issue/project spec for context and
alternative solutions:
#20025.

Context:
> Upon hook execution, one can access output values via the HookContext
(using HookContext's `op_output_values`), but there is currently no way
to access output metadata in the same way. Enabling this would allow
users to pass information more dynamically to their hooks with respect
to each individual ops Output metadata. One application of this is in
dynamically applying tags to runs based on the metadata of the last
failed/successful run.

_Transparency note: Long-time Dagster user, first time Dagster
contributor, thanks in advance for your support!_

## How I Tested These Changes
- Ran `make ruff` and `make pyright` before committing
- Successfully ran `python -m pytest
python_modules/dagster/dagster_tests/core_tests/`
- Installed these changes locally with `pip install -e
dagster/python_modules/dagster` and confirmed successful cases

---------

Co-authored-by: Sean Mackesey <s.mackesey@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant