monitoring vs serialization extremely poor performance with many apps #2848

benclifford · 2023-08-01T13:54:21Z

Describe the bug
PR #2468 moves the monitoring remote wrapper into its own module - parsl.monitoring.remote - because serialization of that wrapper was bringing along with it the entirely of the module it was defined in. By placing the remote wrapper in its own module, the size of the module brought along can be reduced.

However, subsequently, PR #2493 introduces a cache that is stored as a global in the parsl.monitoring.remote module, to try to work around a lack of caching in parsl.serialize except at the top level - prior to PR #2493, a new callable would be defined for each invocation when used with monitoring, bringing along with it the task and try ID in a closure. PR #2493 instead tries to make a callable that is re-used between multiple invocations of the same app, with the task and try ID added as additional parameters: attempting to be equivalent to a closure but unwrapped in a way that the naive parsl.serialize callable serializer can see that. To do that, it introduces its own cache, so that any particular app will be wrapped only once.

Because this cache is in the parsl.monitoring.remote module, it is serialized as part of serializing a monitoring-wrapped app invocation.

In workflows with a small number of apps, this is a small-ish (but undesired) overhead. In workflows with many apps (in my case, ctrl_parsl_bps which makes a new app for every task), this cache becomes large and the cache easily grows to hundreds of kilobytes, repeatedly serialized on each invocation.

While debugging this, I discovered that this serialization of caching only happens when the monitoring wrapper has the functools.wraps decorator applied -

parsl/parsl/monitoring/remote.py

Line 47 in 4b8b0b8

@wraps(f)

Removing that wraps decorator makes this problem go away.

More nuanced, having the wraps decorator copy a smaller subset of attributes, rather than its default of functools.WRAPPER_ASSIGNMENTS, also makes the problem go away:

        @functools.wraps(f, assigned = ('__name__', '__qualname__', '__doc__', '__annotations__'))

This copies the same attributes except for the __module__ attribute.

It isn't clear to me why copying that attribute onto the wrapped function causes the module to be serialized, when that module is probably also the __module__ attribute of the underlying function.

To Reproduce
Observe the size of serialized messages using a configuration such as parsl/tests/configs/htex_local_alternate.py.

Expected behavior
This cache should not be transmitted.

This whole caching feature in parsl.monitoring.remote only exists to cope with the lack of nested-function-away caching inside parsl.serialize.concretes.DillCallableSerializer and so perhaps some more work can happen there to make nested functions get cached, at which point PR #2493 could be reverted.

Environment
my laptop, parsl master 4b8b0b8

The text was updated successfully, but these errors were encountered:

benclifford · 2023-08-02T12:27:11Z

As part of trying to quantify a potential fix to this, I wrote a script that monkeypatches parsl.serialize.serialize and reports both time and byte usage.

import time
import parsl.serialize

cumulative_serializer_time = 0
cumulative_serializer_bytes = 0

def s(*args, **kwargs):
  global cumulative_serializer_time
  global cumulative_serializer_bytes
  start = time.time()
  b = old_serializer(*args, **kwargs)
  end = time.time()
  cumulative_serializer_time += end-start
  cumulative_serializer_bytes += len(b)
  return b

# patch both symbols...
old_serializer = parsl.serialize.serialize
parsl.serialize.serialize = s
parsl.serialize.facade.serialize = s

import parsl

# configuration axes:
from parsl.tests.configs.htex_local_alternate import fresh_config
decorate_many = True

parsl.load(fresh_config())

def g():
  return 7


if not decorate_many:
  @parsl.python_app
  def f():
    return 7

count = 1000

for b in (1,2):
  print(f"Batch {b}")
  for _ in range(0,count):
    if decorate_many:
      f = parsl.python_app(g)
    r = f().result()
    assert r == 7

  print(f"Cumulative serializer time: {cumulative_serializer_time}")
  print(f"Cumulative serializer bytes: {cumulative_serializer_bytes}")

  print(f"Serializer time per task: {cumulative_serializer_time / (b*count)}")
  print(f"Serializer bytes per task: {cumulative_serializer_bytes / (b*count)}")

I ran this against current master, 4b8b0b8, in 4 configurations, on 2 axes:

Monitoring on or off?
i) parsl/tests/config/htex_local.py - has no monitoring
ii) parsl/tests/config/htex_local_alternate.py - has monitoring

Decoration style:
a) decorate once, invoke 2000 times
b) decorate each invocation, 2000 times

This latter decoration may sound surprising but it's relatively common in power user cases where the callable to be executed is dynamically generated - for example, ctrl_bps_parsl in LSST generates and decorates a partial for every invocation.

The results against master are here. The values in cells are serialized bytes/invocation and serialization time/invocation.

	No monitoring	With monitoring
Decorate once	2063b, 2x10^-5s	5665b, 2.6x10^-5s
Decorate many	2063b, 2x10^-3s	732kb, 0.37s

The bottom right corner is a demonstration of this current issue #2848

This test is run on my laptop with a custom-build python 3.11.

benclifford · 2023-08-03T12:55:56Z

I've prototyped removing this cache entire, in PR #2852. These are the results of the same test:

	No Monitoring	With Monitoring
Decorate once	2063b, 2x10^-5s	5570b, 5x10^-3s
Decorate many	2063b, 2x10^-3s	5572b, 5x10^-3s

This makes the decorate-once, reuse-apps, monitoring enabled case more expensive (by around 5 milliseconds per call in this test), the same cost as the decorate many case - because caching is no longer happening

It also pulls the decorate many case down to that same value - because bad-cache behaviour is removed.

I think this is the right way to go, in light of more recent experiences with serialization which push me towards any caching happening deeping inside the serialization code, rather than futher out towards user space.

See issue #2848 for a bunch of description and benchmarking.

…2852) See issue #2848 for a bunch of description and benchmarking.

benclifford added bug monitoring serialization labels Aug 1, 2023

benclifford added a commit that referenced this issue Aug 3, 2023

Remove monitoring vs serialization cache as causes space explosion

7dc8810

See issue #2848 for a bunch of description and benchmarking.

benclifford mentioned this issue Aug 3, 2023

Remove monitoring vs serialization cache as causes space explosion #2852

Merged

benclifford closed this as completed in #2852 Aug 3, 2023

benclifford added a commit that referenced this issue Aug 3, 2023

Remove monitoring vs serialization cache as causes space explosion (#…

dd754b1

…2852) See issue #2848 for a bunch of description and benchmarking.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

monitoring vs serialization extremely poor performance with many apps #2848

monitoring vs serialization extremely poor performance with many apps #2848

benclifford commented Aug 1, 2023

benclifford commented Aug 2, 2023 •

edited

benclifford commented Aug 3, 2023 •

edited

monitoring vs serialization extremely poor performance with many apps #2848

monitoring vs serialization extremely poor performance with many apps #2848

Comments

benclifford commented Aug 1, 2023

benclifford commented Aug 2, 2023 • edited

benclifford commented Aug 3, 2023 • edited

benclifford commented Aug 2, 2023 •

edited

benclifford commented Aug 3, 2023 •

edited