-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add offline file-size optimisation #45
Comments
Pull request #48 implements a more efficient data structure. |
I've tested PR #48 with the complete ATLAS DY 3D grid, before (LagrangeSubgridV1) and after (LagrangeSparseSubgridV1) the optimisation, and both with and without LZ4 compression. Here's the table:
The numbers before compression are basically the memory requirements when loading the grid (for convolutions, etc.). Due to the smaller size also convolutions of the grid with PDFs are faster: from 22 seconds down to 16 seconds, where the LZ4 compression virtually makes no difference. |
To optimise a grid, simply run |
Here a comparison against APPLgrid (using the converter from PR #17 and
|
Commit 1ca0a55 further optimizes the file sizes of grids for initial-state symmetric processes (for instance proton-proton collisions) by making use of the symmetry of the double-sum over the interpolated
|
Commits 098fe5d and a0f32fc further decrease the size of all grids that have a static scale (different static scales in different bins are also optimised). The size improvement is a factor of four by default (the interpolation degree plus one). This optimisation modifies the numerical value of the convolution, since the PDFs are no longer evaluated at multiple |
Using
|
Commit fce09e1 removes empty luminosity entries, which is primarily required for generating smaller FK tables. |
@scarlehoff just came across this one: "strip numerical zeros" - maybe we can increase the priority? |
In the meantime the size of |
Definitely not a problem
That's great :) |
Here's an update of the numbers from #45 (comment), using the CLI
That's a -46% reduction! |
Are you comparing with the |
The PineAPPL grids are LZ4 compressed and as far as I understand the ROOT file format is ZLIB compressed1, so in that sense it's a fair comparison I think. However, you might wonder how good or helpful the compression in PineAPPL's case is, so I added the number without compression in the comment above. Footnotes |
Ok, good. Then PineAPPL is already doing a great job on its own :D
Perfect, it was reasonable.
I wonder if there is a reason why LZ4 compression is doing so little. In some sense, that's a good sign on its own. |
I think the reason is that the format is binary with already very small entropy (one |
In eko we're compressing .npy
we can - but this is a N3LO problem, I'd say |
N3LO is essentially now :) |
This has long been implemented, let's open a new Issue for more optimizations. |
Possible optimisations:
factor = 0.0
(why are they there in the first place?)x1
andx2
grid values in subgrids #151The text was updated successfully, but these errors were encountered: