Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve interface for AUX data #626

Open
uellue opened this issue Feb 20, 2020 · 3 comments
Open

Improve interface for AUX data #626

uellue opened this issue Feb 20, 2020 · 3 comments
Milestone

Comments

@uellue
Copy link
Member

uellue commented Feb 20, 2020

Discussion with @sk1p and @Brow71189:

Currently, UDF users have to declare AUX data explicitly when constructing an UDF. This can make using a built-in UDF with aux data unnecessarily complicated.

A cleaner solution could be to implement wrapping parameters in the constructor. That means an UDF implements a specific __init__ function with dedicated parameters instead of the default catch-all. That is emerging as good practice for documentation purposes anyway. Users pass plain NumPy arrays as specific parameters to the constructor, following the signature and documentation of the UDF. The UDF's constructor then wraps parameters in an AUX buffer as needed and handles them, for example by passing them up to the superclasse's constructor. One just has to take care about parameters being passed again on the worker nodes.

@uellue
Copy link
Member Author

uellue commented Feb 21, 2020

We can make sure that UDF.aux_data() is a fix point, i.e. that creating aux data from aux data just returns the same aux data without change. That way the constructor doesn't have to distinguish between being called on the main node or re-called on the workers with the modified arguments that it passed up to the super-constructor for the case of wrapping arrays in aux data.

@uellue uellue added the good first issue Good for newcomers label Feb 21, 2020
@uellue
Copy link
Member Author

uellue commented Mar 16, 2020

A quick feedback after using AUX data "for real" for the first time LiberTEM/LiberTEM-blobfinder#23

  • Having to construct AUX data with specifying kind, extra_shape, dtype etc really is clunky, as the issue states.
  • It feels strange to pass AUX data to the UDF constructor. IMO it would be more intuitive to pass the data to run_udf() like the ROI and dataset, since the shape has to match the dataset, and the content can vary between runs, just like the ROI.
    • What about run_udf(..., aux=dict(name_1=data_1, name_2=data_2))
    • Passing that would be optional. A missing AUX buffer content would be initialized with zero.
  • It would feel natural to specify AUX data similar to get_result_buffers(). In that case it can be populated from the aux=dict(...) argument automatically. NumPy broadcasting could be used for the assignment, perhaps, so that a constant initializer can be passed instead of an array?

@uellue
Copy link
Member Author

uellue commented Mar 16, 2020

A minor note about dtype behavior: Currently, the dtype defaults to float32. It should default to None and use the passed data's dtype.

@uellue uellue added this to the 0.6 milestone Apr 9, 2020
@uellue uellue modified the milestones: 0.6, 0.7 Jun 10, 2020
@uellue uellue modified the milestones: 0.7, 0.8 May 27, 2021
@sk1p sk1p modified the milestones: 0.8, backlog Aug 24, 2021
@uellue uellue mentioned this issue Oct 23, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants