Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove empty unused x1 and x2 grid values in subgrids #151

Closed
Tracked by #45
cschwan opened this issue Jul 2, 2022 · 7 comments · Fixed by #193
Closed
Tracked by #45

Remove empty unused x1 and x2 grid values in subgrids #151

cschwan opened this issue Jul 2, 2022 · 7 comments · Fixed by #193
Assignees
Labels
enhancement New feature or request

Comments

@cschwan
Copy link
Contributor

cschwan commented Jul 2, 2022

We already remove unused grid points in the muf2/mur2 dimension, and we should also be able to do that with x1 and x2. This might require some modifications here and there, but should improve the size of each subgrid as they usually have a small range in x in which the subgrid is non-zero.

This will require changing Grid::optimize to also modify ImportOnlySubgridV1 subgrids.

@cschwan cschwan added the enhancement New feature or request label Jul 2, 2022
@cschwan cschwan self-assigned this Jul 2, 2022
@cschwan
Copy link
Contributor Author

cschwan commented Sep 9, 2022

@alecandido @felixhekhorn I am working on this change; it will not decrease the file sizes, but I hope it will improve the performance of the evolution a bit. The price that we have to pay for this is that instead of having always the same x-grid everywhere - the usual 50 points - the grids return a subset of these 50 points. So while the x-grid points are still the same each process uses a different range of them. This might interfere with the caching of the EKOs, what do you think?

@felixhekhorn
Copy link
Contributor

caching? the best we do at the moment is just the slice here (and that is on pid so unaffected): https://github.com/N3PDF/pineappl/blob/d924e9e41798c1f755849fd054938c05bbdb99c2/pineappl/src/grid.rs#L1740
afterwards we have the index remap, which would need to be shifted down to subgrid level as far as I understand
https://github.com/N3PDF/pineappl/blob/d924e9e41798c1f755849fd054938c05bbdb99c2/pineappl/src/grid.rs#L1754

@cschwan
Copy link
Contributor Author

cschwan commented Sep 9, 2022

Sorry, I meant caching of the EKOs in pineko or probably somewhere higher. It might be a problem that every grid we'd like to evolve then has a different x-grid (well, technically a subset as I said).

@felixhekhorn
Copy link
Contributor

ok, that is no problem - with the output rework my idea was to compute the EKOs just by Q2 and then do the interpolation on x at fitting or process scale just on demand

@alecandido
Copy link
Member

alecandido commented Sep 9, 2022

Of course we can do, if needed (that's why we implemented in the first place). But I'd try to avoid re-interpolating as much as possible...

I.e.: everything fine, we'll do as @felixhekhorn said, if we have to. But since we agreed to have a single x-grid for the theory, it would be ideal to have all the PineAPPLgrid with the same one, if possible. This will automatically skip re-interpolation when not needed

In any case, a subset is better than other options, there will be less interpolation error.

@cschwan
Copy link
Contributor Author

cschwan commented Sep 10, 2022

I.e.: everything fine, we'll do as @felixhekhorn said, if we have to. But since we agreed to have a single x-grid for the theory,

This won't change.

it would be ideal to have all the PineAPPLgrid with the same one, if possible. This will automatically skip re-interpolation when not needed

That's the case, you should be able to feed the same set of EKOs to Grid::convolute_eko, and it'll select the necessary subset. But Grid::eko_info will return a smaller set of x-grid points, and I wonder whether higher up the toolchain this will lead to regeneration of many subsets of EKOs of the original 50x50 again and again for the newly implemented datasets with the implementation of pineko (?) right now. But it's certainly possible to avoid that.

@alecandido
Copy link
Member

That's the case, you should be able to feed the same set of EKOs to Grid::convolute_eko, and it'll select the necessary subset.

Point is that if you are going to use a smaller set of points, the interpolation basis is different, and thus also the interpolation coefficients (you have to account also for the missing part). So re-interpolation is needed (though for a subset is for sure advantageous).

But Grid::eko_info will return a smaller set of x-grid points, and I wonder whether higher up the toolchain this will lead to regeneration of many subsets of EKOs of the original 50x50 again and again for the newly implemented datasets with the implementation of pineko (?) right now. But it's certainly possible to avoid that.

As @felixhekhorn said: if it's not already the case, we'll make sure that a new EKO won't be recomputed (unless strictly needed, e.g. because of a new Q2 request), just re-interpolating the available one.

However, what is already happening is that one EKO is generated for each dataset, since in general they have different Q2 value, even when the xgrid is actually the same. The "unique" EKO for the theory is only the one needed to evolve the fit result.
Our goal is to transition to a two-EKOs-per-theory scheme, in which we'll have:

  • a small EKO, the one already present to evolve the fit
  • a big EKO, that will be the accumulator for all the pieces needed to evolve all the datasets (only possible if accumulation is on disk, unfeasible for full in-memory approach)

@cschwan cschwan linked a pull request Dec 7, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants