Allow compute to return a generator instead of chunks #751

WenzDaniel · 2023-08-24T07:53:02Z

What is the problem / what does the code in this PR do
Slight modification which allows to yield chunks within a plugin instead of returning them.

This allows to reduce the chunk size while creating some new data. This is not needed for normal processing of data, but for simulations. Simulations starts with a small list of photons which will then be changed into pulses and fragments where each photon takes then 110 x 16 bit of data. Thus, in such a case it is helpful to yield multiple smaller chunks for a singe larger input chunk.

Since the chunking needs to be done while computing, the plugin's compute method needs to yield the data early. An example plugin can be found in testutils.

This change will help fuse to avoid the out-of-memory issues we were facing with wfsim leading to a more reliable and stable performance.

coveralls · 2023-08-24T08:02:24Z

coverage: 91.517% (+0.08%) from 91.439% when pulling bc4e4eb on add_chunk_yielding_for_fuse into 5aba752 on master.

WenzDaniel · 2023-08-25T15:19:04Z

I wont be able to fix the codefactor complained about a too complex method. So I would propose to merge without fixing it.

jmosbacher · 2023-08-25T16:29:11Z

This is essentially just online rechunking right?
in that case wouldn't it be better to have the context control the online rechunking via some other settings such as max_chunk_length or something like that?
why is the control over the re-chunking ceded to the plugin compute method?
I feel very uncomfortable about this PR. Maybe we can have an in depth discussion about it in team A?

dachengx · 2023-08-27T16:13:59Z

I guess that from the logic of Plugin.compute, the memory overflow happens when the resulting chunk is very large. Then the new function splits the result into pieces.

WenzDaniel · 2023-08-28T07:28:36Z

This is essentially just online rechunking right?

Yes.

in that case wouldn't it be better to have the context control the online rechunking via some other settings such as max_chunk_length or something like that?
why is the control over the re-chunking ceded to the plugin compute method?

Hmm not sure if this will work. As Dacheng pointed out the overflow will happen while performing the tasks in the compute method. So compute must yield early. For example in case of wfsim and fuse we will transform some information in form of photons detected by a PMT (time, channel) into raw_records. This means that suddenly you blow up your information from ~10 Byte/photon to ~10 + 220 Byte/photon.
So either you need to make you input chunks all tiny or you need to provide a way that a plugin can yield results early if a user defined condition is met.

I feel very uncomfortable about this PR. Maybe we can have an in depth discussion about it in team A?

I can see your point, I am also not 100 % happy about this solution. An alternative solution would be to develop a dedicated ChunkDown plugin class with a dedicated do_compute and iter method (which is of course much more work). But sure let us discuss in team A.

WenzDaniel added 2 commits August 24, 2023 02:04

Allow compute to return a generator instead of chunks

1a6a0da

Remove print statements

ae0ea5f

WenzDaniel added 3 commits August 25, 2023 10:12

Add exception for multi processing

7563a05

Add tests

6e9bdec

Make codefactor happy

be5ee41

WenzDaniel requested a review from dachengx August 25, 2023 15:15

WenzDaniel marked this pull request as ready for review August 25, 2023 15:15

Merge branch 'master' into add_chunk_yielding_for_fuse

addc2c5

HenningSE mentioned this pull request Aug 28, 2023

Update PMTResponseAndDAQ & S2PhotonPropagation XENONnT/fuse#60

Merged

Merge branch 'master' into add_chunk_yielding_for_fuse

bc4e4eb

WenzDaniel closed this Oct 24, 2023

WenzDaniel mentioned this pull request Nov 22, 2023

Add chunk yielding plugin and tests #769

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow compute to return a generator instead of chunks #751

Allow compute to return a generator instead of chunks #751

WenzDaniel commented Aug 24, 2023 •

edited

coveralls commented Aug 24, 2023 •

edited

WenzDaniel commented Aug 25, 2023 •

edited

jmosbacher commented Aug 25, 2023

dachengx commented Aug 27, 2023

WenzDaniel commented Aug 28, 2023 •

edited

Allow compute to return a generator instead of chunks #751

Allow compute to return a generator instead of chunks #751

Conversation

WenzDaniel commented Aug 24, 2023 • edited

coveralls commented Aug 24, 2023 • edited

WenzDaniel commented Aug 25, 2023 • edited

jmosbacher commented Aug 25, 2023

dachengx commented Aug 27, 2023

WenzDaniel commented Aug 28, 2023 • edited

WenzDaniel commented Aug 24, 2023 •

edited

coveralls commented Aug 24, 2023 •

edited

WenzDaniel commented Aug 25, 2023 •

edited

WenzDaniel commented Aug 28, 2023 •

edited