Skip to content

Design review report #26

@kdw503

Description

@kdw503

Design Review Report — RegisterDriver.jl


Likely Design Issues

  1. mm_package_loader silently loads only one device's backend

The vector overload mm_package_loader(algorithms::Vector{<:AbstractWorker}) always delegates to algorithms[1]. If a user passes workers bound to different
compute devices (e.g., one CPU worker and one CUDA worker), only the first device's mismatch package is loaded. The second worker will fail at runtime —
possibly deep inside the pipeline with a cryptic error — instead of at setup time. This looks like an accidental implementation detail (the common case is
all workers on the same device) promoted to a rule without validation.

  1. threadids() is a scheduler probe with no registration-domain semantics

threadids() works by spawning tasks with both @threads and Threads.@Spawn, observing which thread IDs appear, deduplicating, and sorting. This is a
general Julia scheduler introspection utility. It has no registration-domain meaning. Exporting it gives users access to an implementation detail whose
behavior depends on Julia internals, and whose contract is hard to document or test. A user reading the package's public API would not expect to find it
here.

  1. The three driver overloads have inconsistent return types
  • driver(outfile, algorithm, img, mon) → Nothing (writes to disk)
  • driver(outfile, algorithms, img, mon) → Nothing (writes to disk)
  • driver(algorithm, img, mon) → Dict (in-memory, returns results)

The in-memory overload has a fundamentally different contract: no output file, single-frame only, returns the populated monitor dict. While overloading on
presence/absence of outfile is reasonable, the implicit single-frame constraint on the third form is not signaled by the signature. A user who passes a
multi-frame img to the in-memory form will get unexpected behavior (or an error) rather than a natural extension of the file-writing form.

  1. Re-exporting StaticArray from StaticArrays.jl

The package re-exports AbstractWorker, ArrayDecl, NumDenom, and StaticArray. The first three appear directly in driver's logic and the worker protocol, so
re-exporting them is defensible. StaticArray is a general-purpose type from a major package with its own ecosystem. Putting it in RegisterDriver's
namespace is surprising — it implies users should import StaticArray from here rather than from StaticArrays.jl.


Design Questions

Q1. Should nicehdf5 be public?

nicehdf5 is not exported, but the test suite calls it with qualified names (RegisterDriver.nicehdf5(...)), suggesting it is tested as an intentional API
surface. Its job — converting arrays of StaticArray, NumDenom, or SharedArray elements into plain HDF5-storable arrays — could be genuinely useful to
users who want to manually write registration results or inspect what the driver would store. Was this meant to be public? If so, it should be exported
(or at minimum marked public). If not, the qualified test calls are exposing an internal — consider whether testing it through the driver interface would
be sufficient.

Q2. Is the in-memory driver supposed to handle multiple frames?

The file-based driver is explicitly designed for multi-frame image stacks. The in-memory form requires exactly one image. Is this a deliberate scope
limitation ("use this for quick single-image experiments"), or did the multi-frame case just not come up? If users would naturally want in-memory
multi-frame registration (e.g., for testing or small datasets), the current single-frame constraint is a footgun.

Q3. What is the error model for the async writer task?

The file-based driver spawns an async writer task that reads from a Channel and flushes results to the JLD file. If the writer task throws (e.g., disk
full, HDF5 error), does the exception propagate back to the calling task, or is it silently dropped? The worker tasks and the writer task are concurrent —
a failure in either leg needs to cleanly shut down the other. Is this handled, and if so, is the behavior intentional?

Q4. Why does BitsType exist as a module-level constant?

BitsType = HDF5.BitsType is defined as a module-level const. It's used internally to dispatch on types that can be written directly as HDF5 bit-for-bit
datasets vs. those that need per-frame serialization. This is an HDF5 implementation detail. Is there a reason to keep it at the module level rather than
inlining HDF5.BitsType at its use sites?


Observations

  • initialize_jld! is a meaningful internal function (pre-allocates HDF5 datasets and returns a have_unpackable flag) that is not exported and not public.
    The implicit HDF5 schema conventions it encodes — e.g., "stackN" group naming, the 2×n layout for NumDenom — live only in the implementation. Users who
    want to read the output files need to reverse-engineer these conventions. A documented schema or a companion reader function would close this gap.
  • The package uses SharedArray in several nicehdf5 methods, but the primary parallelism mechanism is @threads :dynamic, not Distributed. SharedArray is a
    distributed-memory abstraction that predates the modern threading model. Its presence suggests either an older design choice or that upstream packages
    return SharedArray results that RegisterDriver must handle. Either way, the coexistence of two parallelism paradigms (Threads and SharedArray) is worth
    noting.
  • The threading threshold (≤2 workers → sequential, >2 → @threads :dynamic) is a hard-coded policy baked into driver. Users who want to force sequential
    or parallel execution cannot do so without changing the number of workers.

Overall Characterization

RegisterDriver.jl has a clear identity and a sensible core: it is a thin driver harness that sequences init!/worker/close! calls and routes results to
HDF5. That core composes well. The main tension is between the package's stated role as a "thin harness" and the presence of lower-level or broader
utilities (threadids, mm_package_loader, re-exported StaticArray, nicehdf5) that sit at the wrong abstraction level or belong in a different package. The
two highest-leverage changes would be (1) auditing what is truly public (export, public, or internal) with explicit intent, and (2) fixing the
silent-failure hazard in mm_package_loader when called with heterogeneous workers.


Phase 5 — Values Clarification

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions