-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance on huge files #26
Comments
Just to comment here what I also wrote in slack: The main issue is that we read the files 273 bytes at a time, resulting in a lot of system calls that switch between userland and kernel space. This could be much improved by reading the data in larger chunks. E.g. some largish multiple of 273 for files without the "buffer size" and the actual buffer size for files that do have one. I can't promise when I will be able to try that out, feel free to try yourself and open a PR. I doubt multi-threading will help much here, CORSIKA files are sequential and you need to look for the markers in the first 4 bytes of every chunk (RUNH / EVTH / EVTE / LONGI / RUNE) |
In this regard, but also for simpler use cases, what about adding some computing benchmarks to the CI using files using git lfs? |
The main issue here was addressed: we now read much larger blocks than 273 bytes from the filesystem and this has resulted in a speedup: #29 I am closing this. If performance is still an issue, please provide profiling information in a new issue. |
I am dealing with ~135 Gb particle files and wondering about the best way to work with them.
The code I used is the following,
I then used
cProfile
to produce the following profile file,test_pycorsikaio_simplest.prof.zip
which can be opened with e.g. Snakeviz.
My ideal solution would be to read the file in multi-threaded chunks, but given this is Corsika I am not sure if and how it can be done.
The text was updated successfully, but these errors were encountered: