New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add chunk yielding plugin and tests #769
Conversation
Tested processing on a small run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implements a new DownChunkingPlugin
base class, for plugins that produce multiple chunks iteratively from a single compute
call.
The main use case is plugins that create data that is much more itense in memory/time than the input, e.g. simulating waveforms from photon arrival times. Currently, you instead have to ensure the input is read in in small enough chunks. If memory isn't an issue but you want smaller files on disk, I suppose you could also do that with a modified Saver. But since memory is smaller than the disk, I guess that wouldn't help you much.
I left some comments but I definitely like the idea. This just makes a very small modification to the plugin base class so there should be no impact on other things.
There may be some ways to generalize this in the future, e.g. track the last produced chunk time in the base class so the user doesn't have to, or make a plugin type where compute gets iterators in its arguments too (like the iters
in Plugin.iter
, but time-synchronized), so it can keep state in local variables rather than instance variables.
(I didn't look at the code style since #762 is likely incoming, but happy to do that if you like.)
I will not merge master into this now since that will mess up Jelle's review. Let's merge master into this PR later. |
for more information, see https://pre-commit.ci
Thank you very much @JelleAalbers for your thorough review. I agree with all your suggestions and I think I adopted all of them. Most of your points raised are originating from the fact that first I modified the strax.Plugin class in #751, before we decided to move it into a separate plugin class. (Except for the organization of our tests here I was simply lazy, but I moved the tests now in a separate file like we did with CutPlugins.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Daniel -- Looks good!
What is the problem / what does the code in this PR do
Adds new type of plugin which allows compute method to return a generator instead of chunks. This allows to return sub-chunks early, in case the chunk of the new data types becomes too large. This is needed for example when simulating events and computing from a list of photon hits raw waveforms.