## Synchronous and asynchronous UDF execution

This notebook demonstrates the following features that were introduced in release 0.7.0:

* Execute several UDFs in one pass
* Obtain intermediate results from each merge step by executing UDFs as an iterator
* Execute UDFs asynchronously

Please see example live-plotting.ipynb for the related live plotting feature!

In [1]:
import os
import pprint
import asyncio
import copy

import numpy as np

import libertem.api as lt
from libertem.udf.sum import SumUDF
from libertem.udf.sumsigudf import SumSigUDF

In [2]:
ctx = lt.Context()

Perhaps you already have a cluster running?
Hosting the HTTP server on port 40445 instead
2023-01-12 16:17:08,313 - distributed.preloading - INFO - Creating preload: from libertem.executor.dask import worker_setup; worker_setup(resource="CPU", device=15)
2023-01-12 16:17:08,313 - distributed.utils - INFO - Reload module tmpioryzcko from .py file
2023-01-12 16:17:08,323 - distributed.preloading - INFO - Creating preload: from libertem.executor.dask import worker_setup; worker_setup(resource="CPU", device=16)
2023-01-12 16:17:08,323 - distributed.utils - INFO - Reload module tmph8_w2ycc from .py file
2023-01-12 16:17:08,338 - distributed.preloading - INFO - Creating preload: from libertem.executor.dask import worker_setup; worker_setup(resource="CPU", device=21)
2023-01-12 16:17:08,339 - distributed.utils - INFO - Reload module tmp7qw173dt from .py file
2023-01-12 16:17:08,354 - distributed.preloading - INFO - Creating preload: from libertem.executor.dask import worker_setup; worker_setu

2023-01-12 16:17:08,734 - distributed.preloading - INFO - Import preload module: /tmp/tmpu4b_ap_8.py
2023-01-12 16:17:08,735 - distributed.preloading - INFO - Creating preload: from libertem.common.tracing import maybe_setup_tracing; maybe_setup_tracing(service_name='default-cpu-23', service_id='23')
2023-01-12 16:17:08,735 - distributed.utils - INFO - Reload module tmpvjz5j24f from .py file
2023-01-12 16:17:08,735 - distributed.preloading - INFO - Import preload module: /tmp/tmpvjz5j24f.py
2023-01-12 16:17:08,735 - distributed.preloading - INFO - Creating preload: libertem.preload
2023-01-12 16:17:08,745 - distributed.preloading - INFO - Import preload module: /tmp/tmp8n6u_f8r.py
2023-01-12 16:17:08,745 - distributed.preloading - INFO - Creating preload: from libertem.common.tracing import maybe_setup_tracing; maybe_setup_tracing(service_name='default-cpu-9', service_id='9')
2023-01-12 16:17:08,746 - distributed.utils - INFO - Reload module tmpoplyvbei from .py file
2023-01-12 16:17:0

2023-01-12 16:17:08,937 - distributed.preloading - INFO - Import preload module: libertem.preload
2023-01-12 16:17:08,945 - distributed.preloading - INFO - Import preload module: libertem.preload
2023-01-12 16:17:08,945 - distributed.preloading - INFO - Import preload module: libertem.preload
2023-01-12 16:17:08,953 - distributed.preloading - INFO - Import preload module: libertem.preload
2023-01-12 16:17:08,958 - distributed.preloading - INFO - Import preload module: libertem.preload
2023-01-12 16:17:08,964 - distributed.preloading - INFO - Import preload module: /tmp/tmpcv7bxv_2.py
2023-01-12 16:17:08,964 - distributed.preloading - INFO - Creating preload: from libertem.common.tracing import maybe_setup_tracing; maybe_setup_tracing(service_name='default-cpu-2', service_id='2')
2023-01-12 16:17:08,965 - distributed.utils - INFO - Reload module tmpc1f0r09y from .py file
2023-01-12 16:17:08,965 - distributed.preloading - INFO - Import preload module: /tmp/tmpc1f0r09y.py
2023-01-12 16:17

In [3]:
data_base_path = os.environ.get("TESTDATA_BASE_PATH", "/home/alex/Data/")

In [4]:
ds = ctx.load("auto", path=os.path.join(data_base_path, "20200518 165148/default.hdr"))

Key:       _do_detect-fa956156-516c-43c3-a192-5484c489a9e2
Function:  _do_detect
args:      ()
kwargs:    {}
Exception: 'DataSetException("OSError(\'Unable to open file (file signature not found)\')")'



In [5]:
udfs = [SumUDF(), SumSigUDF()]

### Synchronous execution, only result

Note that both UDFs are executed in a single pass!

In [6]:
res = ctx.run_udf(dataset=ds, udf=udfs)

The result is a tuple with one entry per UDF:

In [7]:
pprint.pprint(res)

({'intensity': <BufferWrapper kind=sig dtype=float32 extra_shape=()>},
 {'intensity': <BufferWrapper kind=nav dtype=float32 extra_shape=()>})


The previous API when passing a single UDF is not changed, i.e. it doesn't return a tuple but a single UDF result

In [8]:
res = ctx.run_udf(dataset=ds, udf=udfs[0])

In [9]:
pprint.pprint(res)

{'intensity': <BufferWrapper kind=sig dtype=float32 extra_shape=()>}


### Asynchronous execution, only result

By setting `sync=False`, the result is awaitable:

In [10]:
async_res = ctx.run_udf(dataset=ds, udf=udfs, sync=False)
print("Do something else while UDFs are running in the background")
res = await async_res
print("Finished")

Do something else while UDFs are running in the background
Finished


In [11]:
pprint.pprint(res)

({'intensity': <BufferWrapper kind=sig dtype=float32 extra_shape=()>},
 {'intensity': <BufferWrapper kind=nav dtype=float32 extra_shape=()>})


Just like in the synchronous case, running a single UDF returns the UDF result directly, not a tuple:

In [12]:
async_res = ctx.run_udf(dataset=ds, udf=udfs[0], sync=False)
print("Do something else while UDF is running in the background")
res = await async_res
print("Finished")

Do something else while UDF is running in the background
Finished


In [13]:
pprint.pprint(res)

{'intensity': <BufferWrapper kind=sig dtype=float32 extra_shape=()>}


### Synchronous execution as an iterator

This returns `UDFResults` objects with attributes `buffers` and `damage`. `buffers` is a tuple with the results per UDF, and `damage` is a `BufferWrapper` with `kind='nav'` and `dtype=bool` that indicates which parts of the navigation space have been merged already.

In [14]:
# NBVAL_IGNORE_OUTPUT
# (output is ignored in nbval run because the number of nav positions can be different)
for res in ctx.run_udf_iter(dataset=ds, udf=udfs):
    print(np.count_nonzero(res.damage.data), "nav positions processed")
    pprint.pprint(res.buffers)

683 nav positions processed
({'intensity': <BufferWrapper kind=sig dtype=float32 extra_shape=()>},
 {'intensity': <BufferWrapper kind=nav dtype=float32 extra_shape=()>})
1366 nav positions processed
({'intensity': <BufferWrapper kind=sig dtype=float32 extra_shape=()>},
 {'intensity': <BufferWrapper kind=nav dtype=float32 extra_shape=()>})
2048 nav positions processed
({'intensity': <BufferWrapper kind=sig dtype=float32 extra_shape=()>},
 {'intensity': <BufferWrapper kind=nav dtype=float32 extra_shape=()>})
2731 nav positions processed
({'intensity': <BufferWrapper kind=sig dtype=float32 extra_shape=()>},
 {'intensity': <BufferWrapper kind=nav dtype=float32 extra_shape=()>})
3414 nav positions processed
({'intensity': <BufferWrapper kind=sig dtype=float32 extra_shape=()>},
 {'intensity': <BufferWrapper kind=nav dtype=float32 extra_shape=()>})
4096 nav positions processed
({'intensity': <BufferWrapper kind=sig dtype=float32 extra_shape=()>},
 {'intensity': <BufferWrapper kind=nav dtype=f

### Asynchronous execution as an iterator

This allows several iterators to proceed asynchronously in parallel. This approach is used in the backend for the web GUI of LiberTEM to run several analyses concurrently. It could also be useful to implement live feedback to instrument control from UDF results if the control solution works asynchronously.

Note that the UDFs are copied here so that different instances are executed in parallel. Executing the same UDF instances concurrently can lead to undefined behavior.

In [15]:
# NBVAL_IGNORE_OUTPUT
# (output is ignored in nbval run because the number of nav positions can be different)
async def doit(label, udfs):
    async for res in ctx.run_udf_iter(dataset=ds, udf=udfs, sync=False):
        print(label, np.count_nonzero(res.damage.data), "nav positions processed")
        pprint.pprint((label, res.buffers))
    return res
        
p1 = doit("one", copy.deepcopy(udfs))
p2 = doit("two", copy.deepcopy(udfs))
print("Do something else while UDFs are running in the background")
await asyncio.gather(p1, p2)

Do something else while UDFs are running in the background
one 682 nav positions processed
('one',
 ({'intensity': <BufferWrapper kind=sig dtype=float32 extra_shape=()>},
  {'intensity': <BufferWrapper kind=nav dtype=float32 extra_shape=()>}))
one 1365 nav positions processed
('one',
 ({'intensity': <BufferWrapper kind=sig dtype=float32 extra_shape=()>},
  {'intensity': <BufferWrapper kind=nav dtype=float32 extra_shape=()>}))
one 2048 nav positions processed
('one',
 ({'intensity': <BufferWrapper kind=sig dtype=float32 extra_shape=()>},
  {'intensity': <BufferWrapper kind=nav dtype=float32 extra_shape=()>}))
one 2731 nav positions processed
('one',
 ({'intensity': <BufferWrapper kind=sig dtype=float32 extra_shape=()>},
  {'intensity': <BufferWrapper kind=nav dtype=float32 extra_shape=()>}))
one 3414 nav positions processed
('one',
 ({'intensity': <BufferWrapper kind=sig dtype=float32 extra_shape=()>},
  {'intensity': <BufferWrapper kind=nav dtype=float32 extra_shape=()>}))
one 4097 nav

[<libertem.udf.base.UDFResults at 0x7f01744ebe50>,
 <libertem.udf.base.UDFResults at 0x7f01770e5f70>]