Performance on huge files #26

HealthyPear · 2023-03-27T14:27:42Z

I am dealing with ~135 Gb particle files and wondering about the best way to work with them.

The code I used is the following,

from corsikaio import CorsikaParticleFile

input_file = "[....]/DAT100001"

with CorsikaParticleFile(input_file) as f:
    for event in f:
        if event.header['event_number']==2:
            break
        pass

I then used cProfile to produce the following profile file,

test_pycorsikaio_simplest.prof.zip

which can be opened with e.g. Snakeviz.

My ideal solution would be to read the file in multi-threaded chunks, but given this is Corsika I am not sure if and how it can be done.

The text was updated successfully, but these errors were encountered:

maxnoe · 2023-03-27T15:21:52Z

Just to comment here what I also wrote in slack:

The main issue is that we read the files 273 bytes at a time, resulting in a lot of system calls that switch between userland and kernel space.

This could be much improved by reading the data in larger chunks. E.g. some largish multiple of 273 for files without the "buffer size" and the actual buffer size for files that do have one.

I can't promise when I will be able to try that out, feel free to try yourself and open a PR.

I doubt multi-threading will help much here, CORSIKA files are sequential and you need to look for the markers in the first 4 bytes of every chunk (RUNH / EVTH / EVTE / LONGI / RUNE)

HealthyPear · 2023-06-09T07:48:30Z

In this regard, but also for simpler use cases, what about adding some computing benchmarks to the CI using files using git lfs?

maxnoe · 2023-11-23T11:16:14Z

The main issue here was addressed: we now read much larger blocks than 273 bytes from the filesystem and this has resulted in a speedup: #29

I am closing this. If performance is still an issue, please provide profiling information in a new issue.

maxnoe mentioned this issue Mar 27, 2023

Call tell only when needed #27

Merged

HealthyPear mentioned this issue Mar 27, 2023

pycorsikaio to astropy Qtable #28

Open

maxnoe closed this as completed Nov 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance on huge files #26

Performance on huge files #26

HealthyPear commented Mar 27, 2023

maxnoe commented Mar 27, 2023

HealthyPear commented Jun 9, 2023

maxnoe commented Nov 23, 2023

Performance on huge files #26

Performance on huge files #26

Comments

HealthyPear commented Mar 27, 2023

maxnoe commented Mar 27, 2023

HealthyPear commented Jun 9, 2023

maxnoe commented Nov 23, 2023