Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove incorrectly-global-mutating update_wrapper in bash_app #3492

Merged
merged 2 commits into from
Jun 24, 2024

Conversation

benclifford
Copy link
Collaborator

tl;dr bash_app unexpectedly mutates the global remote_side_bash_executor function, which is bad. This PR makes it not do that.

This addresses behaviour subtle enough we've gone years without noticing but is now causing problems for a test introduced in apparently-unrelated PR #3489.

What was happening before this PR, in the removed line is that update_wrapper is being used in two ways:

i) Deliberately, a functional style: return a remote_side_bash_executor
that looks like self.func

and

ii) Accidentally, modify the global remote_side_bash_executor to look
like self.func

That second step is problematic: it modifies a global object (the remote_side_bash_executor callable object) on every bash_app decoration, and so leaves it finally looking like the most recent bash_app decoration.

For example:

$ cat uw1.py
from functools import update_wrapper

def a():
  pass

def b():
  return 7

print("b looks like this:")
print(repr(b))

print("update_wrapper of b to look like a:")
print(update_wrapper(b, a))

print("b looks like this:")
print(repr(b))

$ python3 uw1.py
b looks like this:
<function b at 0x70fd0ff1e2a0>
update_wrapper of b to look like a:
<function a at 0x70fd0ff1e2a0>
b looks like this:
<function a at 0x70fd0ff1e2a0>

PR #3489 introduces a bash_app that cannot be serialized. That's fine in the context of that PR, because it only tries to run it in a ThreadPoolExecutor where serialization does not happen, and executor-specific apps are fine - see, for example, the concept of join apps, which must run in the main DFK process in a ThreadPoolExecutor.

However, in certain test case orders, the __wrapped__ value of remote_side_bash_executor points to that "bad" bash_app, and when that has happened, remote_side_bash_executor cannot be serialized as part of an app invocation to a remote worker in a different (for example, htex-using) test.

This PR removes that update_wrapper, causing a few changes: because wrapped is now not set, the function for the last-decorated bash_app is no longer sent along side every invocation of every other bash_app. This removes the error in PR #3489.

Because the name of remote_side_bash_executor is no longer mutated, the default pickle pass-by-name serialization can happen, as pickle is able to assume that it can import the function as a global on the remote side rather than sending the modified definition of remote_side_bash_executor.

These two changes result in a reduction of the serialized form of an example bash_app (measured in DillCallableSerializer.serialize) from 6940 bytes to 2305 bytes. Issue #3941 contains a feature request to look at those remaining 2305 bytes to see if anything else can be removed here.

This change also removes some confusing repr() behaviour when debugging: when update_wrapper is used, any reference in reprs to remote_side_bash_executor are output as references to the most recently decorated bash_app. After this PR, when update_wrapper is removed, references are output correctly.

Description

Please include a summary of the change and (optionally) which issue is fixed. Please also include
relevant motivation and context.

Changed Behaviour

Which existing user workflows or functionality will behave differently after this PR?

Fixes

Fixes # (issue)

Type of change

Choose which options apply, and delete the ones which do not apply.

  • Bug fix
  • New feature
  • Update to human readable text: Documentation/error messages/comments
  • Code maintenance/cleanup

tl;dr bash_app unexpectedly mutates the global remote_side_bash_executor
function, which is bad. This PR makes it not do that.

This addresses behaviour subtle enough we've gone years without noticing
but is now causing problems for a test introduced in apparently-unrelated
PR #3489.

What was happening before this PR, in the removed line is that
update_wrapper is being used in two ways:

i) Deliberately, a functional style: return a remote_side_bash_executor
   that looks like self.func

and

ii) Accidentally, modify the global remote_side_bash_executor to look
    like self.func

That second step is problematic: it modifies a global object (the
remote_side_bash_executor callable object) on every bash_app decoration,
and so leaves it finally looking like the most recent bash_app decoration.

For example:

```
$ cat uw1.py
from functools import update_wrapper

def a():
  pass

def b():
  return 7

print("b looks like this:")
print(repr(b))

print("update_wrapper of b to look like a:")
print(update_wrapper(b, a))

print("b looks like this:")
print(repr(b))

$ python3 uw1.py
b looks like this:
<function b at 0x70fd0ff1e2a0>
update_wrapper of b to look like a:
<function a at 0x70fd0ff1e2a0>
b looks like this:
<function a at 0x70fd0ff1e2a0>
```

PR #3489 introduces a bash_app that cannot be serialized. That's fine in
the context of that PR, because it only tries to run it in a
ThreadPoolExecutor where serialization does not happen, and executor-specific
apps are fine - see, for example, the concept of join apps, which must run
in the main DFK process in a ThreadPoolExecutor.

However, in certain test case orders, the `__wrapped__` value of
remote_side_bash_executor points to that "bad" bash_app, and when that has
happened, remote_side_bash_executor cannot be serialized as part of an
app invocation to a remote worker in a different (for example, htex-using)
test.

This PR removes that update_wrapper, causing a few changes: because
__wrapped__ is now not set, the function for the last-decorated bash_app
is no longer sent along side every invocation of every other bash_app.
This removes the error in PR #3489.

Because the __name__ of remote_side_bash_executor is no longer mutated, the
default pickle pass-by-name serialization can happen, as pickle is able to
assume that it can import the function as a global on the remote side rather
than sending the modified definition of remote_side_bash_executor.

These two changes result in a reduction of the serialized form of an
example bash_app (measured in DillCallableSerializer.serialize) from
6940 bytes to 2305 bytes. Issue #3941 contains a feature request to look at
those remaining 2305 bytes to see if anything else can be removed here.

This change also removes some confusing repr() behaviour when debugging:
when update_wrapper is used, any reference in reprs to
remote_side_bash_executor are output as references to the most recently
decorated bash_app. After this PR, when update_wrapper is removed, references
are output correctly.
Copy link
Collaborator

@khk-globus khk-globus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugly. Subtle find. Thanks for this. "Woof."

@benclifford benclifford merged commit a32886a into master Jun 24, 2024
7 checks passed
@benclifford benclifford deleted the benc-update-wrapper branch June 24, 2024 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants