Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use chunked loading in ctapipe-train-* tools #2413

Closed
maxnoe opened this issue Oct 19, 2023 · 2 comments · Fixed by #2423
Closed

Use chunked loading in ctapipe-train-* tools #2413

maxnoe opened this issue Oct 19, 2023 · 2 comments · Fixed by #2423

Comments

@maxnoe
Copy link
Member

maxnoe commented Oct 19, 2023

Please describe the use case that requires this feature.

At the moment, the ctapipe-train-... tools use TableLoader.read_telescope_events to load all telescope events for a given telescope type in one go.

This potentially uses large amounts of memory given that we

  • Apply quality criteria that will throw away a significant percentage of the events
  • Only use a subset of the available columns
  • Sub-sample events if n_events or n_signal / n_background are configured.

Describe the solution you'd like

Load data in smaller chunks, applying the event selection and column selection for each chunk and then merge chunks into the needed big training table to reduce overall memory usage.

@kosack
Copy link
Contributor

kosack commented Oct 20, 2023

For the quality criteria: pytables has efficient filtering (table.where()) that could also be used to filter events before creating the astropy tables and even before chunking, but that would require some lower-level changes to how data are read and I'm not sure the added complexity is worth it.

@maxnoe
Copy link
Member Author

maxnoe commented Oct 20, 2023

We already support that in read_table:

if condition is None:
array = table.read(start=start, stop=stop, step=step)
else:
array = table.read_where(
condition=condition, start=start, stop=stop, step=step
)

and it is used to filter the telescope trigger table by tel_id in the TableLoader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants