Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunked loading of training data #2423

Merged
merged 13 commits into from Nov 17, 2023

Conversation

LukasBeiske
Copy link
Contributor

@LukasBeiske LukasBeiske commented Oct 25, 2023

This will fix #2413.

  • ctapipe-train-energy-regressor
  • ctapipe-train-particle-classifier
  • ctapipe-train-disp-reconstructor

@LukasBeiske LukasBeiske marked this pull request as draft October 25, 2023 14:23
@LukasBeiske LukasBeiske marked this pull request as ready for review October 26, 2023 14:54
@Tobychev
Copy link
Contributor

Is there any test that checks that the _read_tabel functions are correct?

@maxnoe
Copy link
Member

maxnoe commented Oct 27, 2023

This looks extremely similar between the three tools. Can we refactor the code into a single common function?

@LukasBeiske
Copy link
Contributor Author

LukasBeiske commented Oct 27, 2023

This looks extremely similar between the three tools. Can we refactor the code into a single common function?

For the disp tool, the calculation of true_disp needs columns which get dropped right after this calculation is done. But we could try to keep these needed columns if all of them are loaded, which only happens in the disp tool, and calculate true_disp outside of _read_table.

For the other two tools, I see no problem.

Tobychev
Tobychev previously approved these changes Nov 7, 2023
@LukasBeiske
Copy link
Contributor Author

LukasBeiske commented Nov 7, 2023

This looks extremely similar between the three tools. Can we refactor the code into a single common function?

I did this now, but maybe there are better ways to do this. I also don't have a strong opinion on whether this is worth it or if we should leave it as it was before.

Is there any test that checks that the _read_tabel functions are correct?

I haven't come up with anything useful for this yet, since most of it is already covered by the existing tests of the QualityQuery and the ChunkIterator. If you have a good idea, please let me know.

ctapipe/tools/utils.py Outdated Show resolved Hide resolved
Tobychev
Tobychev previously approved these changes Nov 15, 2023
maxnoe
maxnoe previously approved these changes Nov 15, 2023
maxnoe
maxnoe previously approved these changes Nov 17, 2023
Copy link

codecov bot commented Nov 17, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (7751fc1) 60.60% compared to head (bb01357) 60.60%.
Report is 4 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2423   +/-   ##
=======================================
  Coverage   60.60%   60.60%           
=======================================
  Files           3        3           
  Lines          33       33           
=======================================
  Hits           20       20           
  Misses         13       13           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maxnoe maxnoe merged commit 3117023 into cta-observatory:main Nov 17, 2023
14 checks passed
@LukasBeiske LukasBeiske deleted the chunked_training branch November 20, 2023 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use chunked loading in ctapipe-train-* tools
3 participants