Chunked loading of training data #2423

LukasBeiske · 2023-10-25T14:23:42Z

This will fix #2413.

ctapipe-train-energy-regressor
ctapipe-train-particle-classifier
ctapipe-train-disp-reconstructor

ctapipe/tools/train_energy_regressor.py

Tobychev · 2023-10-27T11:54:16Z

Is there any test that checks that the _read_tabel functions are correct?

maxnoe · 2023-10-27T16:29:22Z

This looks extremely similar between the three tools. Can we refactor the code into a single common function?

LukasBeiske · 2023-10-27T17:17:34Z

This looks extremely similar between the three tools. Can we refactor the code into a single common function?

For the disp tool, the calculation of true_disp needs columns which get dropped right after this calculation is done. But we could try to keep these needed columns if all of them are loaded, which only happens in the disp tool, and calculate true_disp outside of _read_table.

For the other two tools, I see no problem.

ctapipe/tools/train_disp_reconstructor.py

LukasBeiske · 2023-11-07T09:35:58Z

This looks extremely similar between the three tools. Can we refactor the code into a single common function?

I did this now, but maybe there are better ways to do this. I also don't have a strong opinion on whether this is worth it or if we should leave it as it was before.

Is there any test that checks that the _read_tabel functions are correct?

I haven't come up with anything useful for this yet, since most of it is already covered by the existing tests of the QualityQuery and the ChunkIterator. If you have a good idea, please let me know.

ctapipe/tools/utils.py

ctapipe/tools/train_particle_classifier.py

- Since the total number of telescope events in a file is not known when loading in chunks (only the total number of subarray events), a progress bar for loading does not make sense.

codecov · 2023-11-17T11:04:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (7751fc1) 60.60% compared to head (bb01357) 60.60%.
Report is 4 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2423   +/-   ##
=======================================
  Coverage   60.60%   60.60%           
=======================================
  Files           3        3           
  Lines          33       33           
=======================================
  Hits           20       20           
  Misses         13       13

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

LukasBeiske marked this pull request as draft October 25, 2023 14:23

maxnoe reviewed Oct 25, 2023

View reviewed changes

ctapipe/tools/train_energy_regressor.py Outdated Show resolved Hide resolved

maxnoe reviewed Oct 25, 2023

View reviewed changes

ctapipe/tools/train_energy_regressor.py Outdated Show resolved Hide resolved

LukasBeiske force-pushed the chunked_training branch from be1e8d3 to b0235ad Compare October 26, 2023 14:17

LukasBeiske marked this pull request as ready for review October 26, 2023 14:54

LukasBeiske requested a review from maxnoe October 27, 2023 11:38

LukasBeiske force-pushed the chunked_training branch from b0235ad to 9e185fe Compare November 6, 2023 11:51

Tobychev reviewed Nov 6, 2023

View reviewed changes

ctapipe/tools/train_disp_reconstructor.py Outdated Show resolved Hide resolved

LukasBeiske force-pushed the chunked_training branch from 9d0b81a to 9815e7a Compare November 7, 2023 09:08

Tobychev previously approved these changes Nov 7, 2023

View reviewed changes

LukasBeiske dismissed Tobychev’s stale review via e864e92 November 7, 2023 09:42

maxnoe reviewed Nov 14, 2023

View reviewed changes

ctapipe/tools/utils.py Outdated Show resolved Hide resolved

maxnoe reviewed Nov 14, 2023

View reviewed changes

ctapipe/tools/train_particle_classifier.py Outdated Show resolved Hide resolved

LukasBeiske requested review from maxnoe and Tobychev November 15, 2023 09:52

Tobychev previously approved these changes Nov 15, 2023

View reviewed changes

maxnoe previously approved these changes Nov 15, 2023

View reviewed changes

LukasBeiske dismissed stale reviews from maxnoe and Tobychev via 0c8a3d7 November 17, 2023 10:37

LukasBeiske force-pushed the chunked_training branch from c70d36b to 0c8a3d7 Compare November 17, 2023 10:37

maxnoe previously approved these changes Nov 17, 2023

View reviewed changes

LukasBeiske added 5 commits November 17, 2023 11:53

Load training data in chunks when training an energy regressor

bb7cc84

Check event validity after merging chunks and make it faster

0849576

Do all event validation in chunk loop; remove loading bar

8b60cf5

- Since the total number of telescope events in a file is not known when loading in chunks (only the total number of subarray events), a progress bar for loading does not make sense.

Chunked loading for training particle clf and training disp reco

88f00b5

Add changelog

525e3ba

LukasBeiske added 8 commits November 17, 2023 11:53

Fix line too long

4ac7709

Count invalid events as non-predictable, not valid events

3fae0b0

Refactor reading training data into standalone funtion

8378645

Add docstring

a389183

Add type hints

b52be84

Use relative imports, default logger, and correct type hint

f93e245

Use keyword arguments

b61576e

Use n_background for background events

bb01357

LukasBeiske dismissed maxnoe’s stale review via bb01357 November 17, 2023 10:53

LukasBeiske force-pushed the chunked_training branch from 0c8a3d7 to bb01357 Compare November 17, 2023 10:53

maxnoe approved these changes Nov 17, 2023

View reviewed changes

LukasBeiske requested a review from Tobychev November 17, 2023 13:52

Tobychev approved these changes Nov 17, 2023

View reviewed changes

maxnoe merged commit 3117023 into cta-observatory:main Nov 17, 2023
14 checks passed

LukasBeiske deleted the chunked_training branch November 20, 2023 09:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunked loading of training data #2423

Chunked loading of training data #2423

LukasBeiske commented Oct 25, 2023 •

edited

Tobychev commented Oct 27, 2023

maxnoe commented Oct 27, 2023

LukasBeiske commented Oct 27, 2023 •

edited

LukasBeiske commented Nov 7, 2023 •

edited

codecov bot commented Nov 17, 2023

Chunked loading of training data #2423

Chunked loading of training data #2423

Conversation

LukasBeiske commented Oct 25, 2023 • edited

Tobychev commented Oct 27, 2023

maxnoe commented Oct 27, 2023

LukasBeiske commented Oct 27, 2023 • edited

LukasBeiske commented Nov 7, 2023 • edited

codecov bot commented Nov 17, 2023

Codecov Report

LukasBeiske commented Oct 25, 2023 •

edited

LukasBeiske commented Oct 27, 2023 •

edited

LukasBeiske commented Nov 7, 2023 •

edited