-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to not parse blocks #21
Conversation
Codecov ReportBase: 96.09% // Head: 95.98% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #21 +/- ##
==========================================
- Coverage 96.09% 95.98% -0.12%
==========================================
Files 19 20 +1
Lines 384 473 +89
==========================================
+ Hits 369 454 +85
- Misses 15 19 +4
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Hi @maxnoe! I played around with this in the last week and successfully use this in https://github.com/The-Ludwig/PANAMA . I think this is fine to merge, actually, should I add a test? Only comment I have is: also checking |
@orelgueta Could you give a quick review here, it's a nice-to-have feature for @The-Ludwig and shouldn't interfere with our usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added code all looks good.
Just for my understanding though, if this option is used, the arrays are remained unparsed and the user has to parse them after reading all the events using their own code. Is that correct?
@orelgueta Yes, that is correct. In my testing I am around 5 times faster if I don't parse the particle blocks, put them into a python list, make a pandas dataframe out of them and then name the columns. Of course it depends on the size and structure of the file itself, but there are definitely some good use-cases. |
Not using their own code, the functions here can be used. The difference is basically that for the use case of reading all events in a file into a single data structure, instead of parsing n arrays and then stacking you stack n simple arrays first and then parse once. |
When loading large files in-bulk, it's much faster to accumulate the arrays and then parse the low-level float arrays then parsing each event directly.
Added an option to just keep the float array in the event loop.