Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize FkTable representation #79

Closed
scarlehoff opened this issue Oct 19, 2021 · 38 comments
Closed

Optimize FkTable representation #79

scarlehoff opened this issue Oct 19, 2021 · 38 comments
Labels
enhancement New feature or request
Milestone

Comments

@scarlehoff
Copy link
Member

For instance, at the initial scale we have that

sigma = T24 = T35
V = v15 = v24 = v35

One thing that the NNPDF fktables do is to say "ok, since these are equal I'll just sum everything in, say, V, and empty the other rows" so one gets "V+V15+V24+V35" but the luminosity function is just V.

@scarlehoff scarlehoff added the enhancement New feature or request label Oct 19, 2021
@scarlehoff scarlehoff changed the title Simplify fktable for trivial contributions Simplify fktables for trivial contributions Oct 19, 2021
@cschwan
Copy link
Contributor

cschwan commented Oct 19, 2021

That's a good idea. However, it only works when the 1) charm PDF is equal to the anti-charm PDF and if, at the same time, 2) bottom- and top-quark PDFs are zero at the FK table scale. Since this isn't the case in general, but is the case for NNPDF, the place where we can do this is in Grid::convolute_eko; let's add @alecandido and @felixhekhorn to see what they're saying.

In practice I'd give convolute_eko a few more parameters that allow the specification of these constraints. That metadata of the FK tables should also reflect the constraints, and ideally allow PineAPPL to check the PDF that is used to prevent the user from using sets that don't fulfill them. We should probably discuss more optimization strategies that in general would allow us to speed up convolute_eko and make the FK tables smaller; I'm changing the title of this Issue to reflect that.

@scarlehoff
Copy link
Member Author

scarlehoff commented Oct 19, 2021

  1. charm PDF is equal to the anti-charm PDF

@felixhekhorn had the same worry. I'd say for now we can accept that to be true

  1. bottom- and top-quark PDFs are zero at the FK table scale

But this is always true (or always false) for a given fktable, right?

Personally I was thinking of reduced_table() and reduced_lumi() methods (or something like that) where the simplification is done by the user (i.e., me) if they (i.e., I) need a smaller fktable. It doesn't really matter to me if the fktable are unnecessarily big on disk or if I need to spend a few more seconds to load them* but I want them to be as small as possible in memory for the fit.

*of course, I'm even happier if their size is reduced also in disk but that's a second order problem for me

@felixhekhorn
Copy link
Contributor

But as you said this is a NNPDF thing so let's do as @scarlehoff and I said in person: let's change the .tables() ... (that is at the latest point possible ...)

@alecandido
Copy link
Member

I agree on not to add any further complication to convolute_eko, and add a method to the FkTable object.

However, instead of providing methods for a different representation, I would add a method to get a new FkTable object by collapsing some rows of the original one. And then you can even dump the new object, already collapsed.

@felixhekhorn
Copy link
Contributor

We should probably discuss more optimization strategies that in general would allow us to speed up convolute_eko and make the FK tables smaller; I'm changing the title of this Issue to reflect that.

at some point definetly yes, however let's focus for the moment on the things we need to reach our deadline 🙃

@alecandido
Copy link
Member

I would say that the idea of returning a new collapsed object serves well any possible usage:

  • if you don't care about disk representation and initial effort, you can load whatever you have there, call the method and get the new collapsed FkTable in memory
  • if you do care, you do the operation offline, and you dump the result

@cschwan cschwan changed the title Simplify fktables for trivial contributions Optimize FkTable representation Oct 19, 2021
@cschwan
Copy link
Contributor

cschwan commented Oct 19, 2021

I would say that the idea of returning a new collapsed object serves well any possible usage:

* if you **don't** care about disk representation and initial effort, you can load whatever you have there, call the method and get the new collapsed `FkTable` in memory

* if you **do** care, you do the operation offline, and you dump the result

That's a good idea! We could call it FkTable::optimize(&self, constraints: ...) -> FkTable.

@alecandido
Copy link
Member

I agree, at the end of the day an FkTable is conceptually different from a Grid, so there is no clash.

@felixhekhorn
Copy link
Contributor

That's a good idea! We could call it FkTable::optimize(constraints: ...) -> FkTable.

Mmm but this clashes with our idea of an FKTable beeing immutable ...

@cschwan
Copy link
Contributor

cschwan commented Oct 19, 2021

That's a good idea! We could call it FkTable::optimize(constraints: ...) -> FkTable.

Mmm but this clashes with our idea of an FKTable beeing immutable ...

Not really, because it returns another FkTable and doesn't modify self.

@felixhekhorn
Copy link
Contributor

but will this work if the thing is not clonable? (and we want this)

@cschwan
Copy link
Contributor

cschwan commented Oct 19, 2021

That's a good question...

@cschwan
Copy link
Contributor

cschwan commented Nov 5, 2021

@alecandido @felixhekhorn Important question: are the grids (opposed to the grids convoluted with PDFs) for sigma, T24, T35, and for V, V15, V24, V35 the same? In that case we can easily implement this optimization at the level of redefining the luminosity functions.

@felixhekhorn
Copy link
Contributor

No (if I understand correctly) because some processes, e.g. CC DIS or CC DY, can resolve difference between b and bbar, e.g., and hence V24 and T24 are unique

@scarlehoff
Copy link
Member Author

@cschwan is this available? i..e, can I do .optimize() to get

sigma = T24 = T35
V = v15 = v24 = v35

? Or maybe this should be done by pineko (since it depends on the final scale)? @felixhekhorn

@cschwan
Copy link
Contributor

cschwan commented Feb 11, 2022

@scarlehoff The short answer is no. How important is that for the fit in terms of a timescale?

I'm a bit hesitant to implement the optimization because I'm still wondering what the proper way to do it would be. Isn't this technically a different basis?

In any case I'd like make the assumptions explicit somewhere so that the user can't use a PDFs where these assumptions are not fulfilled.

@scarlehoff
Copy link
Member Author

scarlehoff commented Feb 11, 2022

@scarlehoff The short answer is no. How important is that for the fit in terms of the timescale?

When we do the fit with all the tables the memory is going to grow quite a bit and thus the fits will take longer. It's not a disaster but it's a big QoL feature.'

Isn't this technically a different basis?

Yes and no. It's just an optimization over the evolution basis at a given scale.

In any case I'd like make the assumptions explicit somewhere so that the user can't use a PDFs where these assumptions are not fulfilled.

One possibility is that this is done at loading time, so that one can ask for the fktable table or the optimized_fktable where some of the values are summed (for instance T24 and 35 get absorbed into singlet) Saving them in an already optimized format makes it easier in the storage but I don't care that much about that part.

@cschwan cschwan added this to the v0.5.0 milestone Feb 11, 2022
@cschwan
Copy link
Contributor

cschwan commented Feb 11, 2022

When we do the fit with all the tables the memory is going to grow quite a bit and thus the fits will take longer. It's not a disaster but it's a big QoL feature.'

I see, that's why I've added this to the v0.5.0 milestone.

Would the interface to the fit understand if I'd remove T24, T35, V15, V24 and V35 from the list of luminosities or should I keep them in there (their contribution would be zero)?

@felixhekhorn @alecandido I'd suggest to add one parameter to FkTable::optimize, because somewhere higher up the theory must tell us which assumptions we need (for instance whether we distinguish between charm and anti-charm as mentioned in the PC, or not).

@scarlehoff
Copy link
Member Author

Would the interface to the fit understand if I'd remove T24, T35, V15, V24 and V35 from the list of luminosities or should I keep them in there (their contribution would be zero)?

Since this is an optimization feature I guess all 0s should be removed for maximum optimization.

@alecandido
Copy link
Member

In any case, we don't want this to be a breaking change: we want the fit to work without, but it would be quite bad if it's not going to still work with the 0s.

Thus, I propose to release v0.5.0, and start a v0.5.1 milestone

@cschwan
Copy link
Contributor

cschwan commented Feb 11, 2022

@felixhekhorn @alecandido I'd suggest to add one parameter to FkTable::optimize, because somewhere higher up the theory must tell us which assumptions we need (for instance whether we distinguish between charm and anti-charm as mentioned in the PC, or not).

Since FkTable::optimize doesn't exist yet, contrary to what I said above, we can add FkTable::optimize which accepts one parameter that specifies the assumptions. This wouldn't be a breaking change, and then we can add it in v0.5.1, as @alecandido said.

@cschwan cschwan modified the milestones: v0.5.0, v0.5.1 Feb 11, 2022
@cschwan
Copy link
Contributor

cschwan commented Feb 25, 2022

Commit cf02e33 should implement all interface-visible changes to do exactly what @scarlehoff proposed.

As promised there's one new method FkTable::optimize, which requires an enum FkAssumptions; please have a look at the documentation of it,

https://github.com/N3PDF/pineappl/blob/cf02e33dc65c53343c0276fd411094c6fe2bc95e/pineappl/src/fk_table.rs#L47-L72

and let me know if I can improve it. It should enable a general version of the optimization that @scarlehoff proposed in the first comment, if we want to fit bottom = anti-bottom, for instance. The optimization should work for FK tables in the flavour and evolution-basis.

What we need next is 1) the corresponding method in the Python interface and 2) probably a slightly modified theory that specifies the assumption and that calls this method in Pineko/fktable. @alecandido @felixhekhorn could you please have a look at this?

What's still missing is the actual optimization, for the time being only the luminosity function is rewritten and therefore one gets repeated entries. I'm working on that now.

@alecandido
Copy link
Member

alecandido commented Feb 25, 2022

I'll update immediately Python interface, I guess there are only two things to bind:

  • extra method FkTable.optimize()
  • enum FkAssumptions, that is needed as an input

It should not be hard.

@alecandido
Copy link
Member

alecandido commented Feb 25, 2022

In 834a041 I added the bindings for the method.

I had to make the FkAssumptions Copy (it's just a plain C-like enum, so it should be fine). In order to be Copy, it had to be Clone (according to cargo, and so I guess rustc).

@felixhekhorn
Copy link
Contributor

At this point it would have been worth a dedicated PR ;-)

@alecandido
Copy link
Member

In c63b0ea it becomes actually possible to generate FkAssumptions, choosing them by string.

Since this mapping is encoded twice:
https://github.com/N3PDF/pineappl/blob/c63b0ea0b2355b84b8d16b8d767e34513bb830c1/pineappl/src/fk_table.rs#L323-L331
https://github.com/N3PDF/pineappl/blob/c63b0ea0b2355b84b8d16b8d767e34513bb830c1/pineappl_py/src/fk_table.rs#L34-L42

I would suggest to split it in an HashMap and reuse it.

@cschwan
Copy link
Contributor

cschwan commented Feb 25, 2022

@alecandido I agree, I'll implement the trait Display and TryFrom (?) for FkAssumptions, for which I'll probably also make the strings look like exactly like the enum values. We should probably hold off feeding strings into the Python interface until this is final.

@alecandido
Copy link
Member

Ok.

At the moment, it is what it is, if you wish to remove it you can do it, but since it's not part of a release I believe we can keep it, and use creation from Python just to experiment. The moment you have a better candidate, I'll make sure to update.

@cschwan
Copy link
Contributor

cschwan commented Feb 25, 2022

Implemented in commit 4427e9e.

@cschwan
Copy link
Contributor

cschwan commented Feb 25, 2022

Commit 19aa0a5 should implement everything we need, and adds the option --fk-table=Nf4Sym, for instance, to pineappl optimize. This will allow us to optimize the already generated FK tables. I still have to test everything, though.

@alecandido
Copy link
Member

alecandido commented Feb 25, 2022

Well done :D

Is there anything left that is needed (but testing) @scarlehoff?

@scarlehoff
Copy link
Member Author

To first approximation only testing.

@cschwan
Copy link
Contributor

cschwan commented Feb 25, 2022

First tests look very promising:

  • run mkdir ../appl_subfktables_opt && for i in *.pineappl.lz4; do pineappl optimize --fk-table=Nf4Sym $i ../appl_subfktables_opt/$i; done in /media/FK/fktables/data/appl_subgrids on dom to optimize the tables - that's surprisingly quick
  • the size difference between appl_subfktables and appl_subfktables_opt is quite nice: 1.3 GB vs. 643 MB, that's a bit less than half the size
  • the differences after convolution are typically 1e-12, at maximum 1e-10; run: for i in *.pineappl.lz4; do pineappl --silence-lhapdf diff --ignore-lumis $i ../appl_subfktables_opt/$i NNPDF40_nnlo_as_01180; done

@cschwan
Copy link
Contributor

cschwan commented Feb 25, 2022

Finally it seems to do what it promises:

pineappl obl --lumis appl_subfktables_opt/200-ATLASDY2D8TEV-aMCfast_obs_0.pineappl.lz4

doesn't show evolution IDs higher than 208 (V8) and 115 (T15). Prior to optimization this grid has 91 luminosities, after optimization there are only 45 luminosities which is plausible if you think about the size reduction.

Applying pineappl optimize to the PineAPPL grids seems to be non-modifying as expected.

@cschwan
Copy link
Contributor

cschwan commented Feb 25, 2022

As far as PineAPPL is concerned this Issue can be closed, if you agree. Don't forget to call FkTable::optimize in pineko.

@cschwan cschwan closed this as completed Feb 25, 2022
@alecandido
Copy link
Member

In order to use this in pineko, we'll need a new release of pineappl.

If you don't have anything more to add close in the future @cschwan, I would just release 0.5.1.

@scarlehoff
Copy link
Member Author

I have one question, the optimize only removes 1) flavours that are 0 and 2) reduces flavours by summing them into the same "histogram-bin" (depending on the theory) but the xgrid remains the same for all fktables? right @cschwan ?

@cschwan
Copy link
Contributor

cschwan commented Mar 8, 2022

That's correct, @scarlehoff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants