### Comparing `dask.bag` vs `concurrent.futures.ThreadPoolExecutor` for multi-threading

In [1]:
import dask.bag as db
from concurrent.futures import ThreadPoolExecutor
import numpy as np
import skimage.filters

In [2]:
# Testimage
img = np.random.randint(0, 256, size=(512,512), dtype=np.uint8)

In [3]:
# Sigmas
sigmas = list(range(10,25))
sigmas

[10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]

Use `dask.bag.map` to distribute gaussian filtering for different sigmas over multiple threads

In [4]:
sigmas_bag = db.from_sequence(sigmas)

In [5]:
%timeit _res_db = sigmas_bag.map(lambda s: skimage.filters.gaussian(img, s)).compute(scheduler="threads")

209 ms ± 15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Now do the same with `concurrent.futures.ThreadPoolExecutor.map`

In [6]:
te = ThreadPoolExecutor()

In [7]:
%timeit  _res_te = list(te.map(lambda s: skimage.filters.gaussian(img,s), sigmas))

183 ms ± 6.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


And as a baseline, single-threaded calculation

In [8]:
%timeit _res_singlethread = list(map(lambda s: skimage.filters.gaussian(img,s), sigmas))

701 ms ± 36.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
