Add `Grid::evolve`, a better version of `Grid::convolute_eko` #184

cschwan · 2022-10-06T08:31:29Z

@andreab1997 @alecandido @felixhekhorn

This pull request adds the method Grid::evolve, which is going to replace Grid::convolute_eko. The new method should be primarily faster, but (in the end) also easier to read and to maintain. It should also avoid using unwraps and index access in general.

I also replaced EkoInfo with OperatorInfo (name is up for debate), in which I removed the use of GridAxes and mainly change the naming of the variables. Instead of naming some objects with 'target' and 'source', which is terminology that's used differently in EKO and in Grid::convolute_eko I instead use 0 for objects defined in the FkTable/at fitting scale and 1 for objects defined in Grid/at process scale(s).

There's also a test added, which will not work since I didn't want to upload the EKO and the grid. If you'd like to test simply place the grid LHCB_WP_8TEV.pineappl.lz4 and the EKO into PineAPPL's main directory. You must untar the operator and also uncompress alphas.npy and operators.npy.

Right now there are a few limitations which are documented in TODOs. I will remove them soon, of course:

only double-hadronic grids can be converted;
only grids that share a single x grid can be converted (those generated from pinefarm, but not the converted grids);
there's a bug that causes x-grid value ordering in the operator and grid to be ignored (you might remember that PineAPPL sorts the x-grid values in descending order, EKO in ascending order); that's what I'm going to fix next.

If you have comments I'm happy to hear them!

codecov-commenter · 2022-10-06T08:35:51Z

Codecov Report

Base: 91.22% // Head: 91.38% // Increases project coverage by +0.15% 🎉

Coverage data is based on head (f841dff) compared to base (5b1a3c0).
Patch coverage: 92.30% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #184      +/-   ##
==========================================
+ Coverage   91.22%   91.38%   +0.15%     
==========================================
  Files          47       49       +2     
  Lines        6920     7369     +449     
==========================================
+ Hits         6313     6734     +421     
- Misses        607      635      +28

Flag	Coverage Δ
python	`78.35% <25.00%> (-2.30%)`	⬇️
rust	`91.55% <92.87%> (+0.18%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pineappl/src/bin.rs	`93.11% <ø> (+1.22%)`	⬆️
pineappl/src/lib.rs	`100.00% <ø> (ø)`
pineappl_py/pineappl/grid.py	`69.38% <25.00%> (-3.95%)`	⬇️
pineappl/src/lumi.rs	`85.23% <60.00%> (-1.27%)`	⬇️
pineappl_cli/src/helpers.rs	`89.94% <87.50%> (+0.22%)`	⬆️
pineappl/src/grid.rs	`89.64% <90.74%> (-0.17%)`	⬇️
pineappl/src/evolution.rs	`93.50% <93.50%> (ø)`
pineappl_cli/src/evolve.rs	`95.83% <95.83%> (ø)`
pineappl_cli/src/main.rs	`94.59% <100.00%> (+0.15%)`	⬆️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

cschwan · 2022-10-06T08:50:58Z

To give you a starting point,

pineappl/pineappl/src/grid.rs

Lines 2066 to 2076 in 58e1132

    
           let mut result = Array2::zeros((array.dim().1, array.dim().2)); 
        
           for imu2 in 0..array.dim().0 { 
        
               let opa = opa.index_axis(Axis(0), imu2); 
        
               let opb = opb.index_axis(Axis(0), imu2); 
        
               let arr = array.index_axis(Axis(0), imu2); 
        
               result += &opa.dot(&arr.dot(&opb.t())); 
        
           } 
        
           fk_table.scaled_add(factor, &result);

is where the actual convolution happens, everything else is just bookkeeping. In line 2073 the values opa, opb are two-dimensional EKOs and arr the two-dimensional grid. The operations dot and t perform a matrix-matrix multiplication and matrix transposition.

felixhekhorn · 2022-10-06T09:50:07Z

The amount of code is still massive (~300 lines), so can you try to split it a bit? e.g.

an initialization step
the big loop and each of it's nested sub-loops (I can count 7 nested loops)
a post-process step

cschwan · 2022-10-06T10:35:01Z

@felixhekhorn yes, that'll definitely happen. I will probably write a different method for DIS, because the core operation can be optimized differently as well.

alecandido · 2022-10-06T10:56:04Z

@felixhekhorn yes, that'll definitely happen. I will probably write a different method for DIS, because the core operation can be optimized differently as well.

Maybe this is also the best option, but then I'd definitely complete double hadronic first.

The moment we have, we can check how many "units" you have, and try to split them in functions. Then, we may reuse the units to build a completely separate functions for DIS (and maybe this will improve, since we don't have to bloat one function to accommodate for both).

For splitting, I'd try to look for more sensible units, rather than just cut pieces in "pre", "during", "post". Also because you might end up passing everything you create in one function to another function.
If possible, let's try to have sensible reusable blocs.

I would also consider splitting the whole business in a different file (evolve.rs, or something like), since this is not related to the core lifecycle of a grid, but it is a further well-separate task.

cschwan · 2022-10-06T11:14:29Z

@felixhekhorn yes, that'll definitely happen. I will probably write a different method for DIS, because the core operation can be optimized differently as well.

Maybe this is also the best option, but then I'd definitely complete double hadronic first.

Agreed.

The moment we have, we can check how many "units" you have, and try to split them in functions. Then, we may reuse the units to build a completely separate functions for DIS (and maybe this will improve, since we don't have to bloat one function to accommodate for both).

For splitting, I'd try to look for more sensible units, rather than just cut pieces in "pre", "during", "post". Also because you might end up passing everything you create in one function to another function. If possible, let's try to have sensible reusable blocs.

I would also consider splitting the whole business in a different file (evolve.rs, or something like), since this is not related to the core lifecycle of a grid, but it is a further well-separate task.

Yes that sounds like good idea, grid.rs is already very big.

cschwan · 2022-10-06T12:54:45Z

Commit 58e8321 fixes the bug I mentioned above and now I get the same results as produced by Grid::convolute_eko. It's strange that the results do not agree with ones produced before evolution, but this must be mistake in the choice of parameters or even in the EKO itself (maybe I've copied the wrong one)? Could someone of you check?

In any case the numbers of Grid::convolute_eko and Grid::evolve are numerically the same and the speedup is promising: Grid::convolute_eko takes 80 seconds, Grid::evolve is less than 0.5 seconds!

felixhekhorn · 2022-10-06T13:05:23Z

Wow! that sounds almost to good to be true

cschwan · 2022-10-06T13:29:37Z

This process might be a bit of a special case, because it has very few initial-states; but on the other hand here are some good reasons that I long suspected:

I implemented a zero-detection algorithm,

pineappl/pineappl/src/grid.rs

Lines 1911 to 1921 in bc48d90

    
           // 1) at least one element of the operator must be non-zero, and 2) the pid must be 
        
           // contained in the lumi somewhere 
        
           operator 
        
               .slice(s![.., pid1_idx, .., pid0_idx, ..]) 
        
               .iter() 
        
               .any(|&value| value != 0.0) 
        
               && self 
        
                   .lumi 
        
                   .iter() 
        
                   .flat_map(|lumi| lumi.entry()) 
        
                   .any(|&(a, b, _)| a == info.pids1[pid1_idx] || b == info.pids1[pid1_idx])

which avoids doing any work on flavour combinations that are zero. The second part wasn't implemented in Grid::convolute_eko.

The heart of the method is the piece of code shown in Add Grid::evolve, a better version of Grid::convolute_eko #184 (comment), which I think is close to ideal from an algorithmic point of view. Instead of multiplying components of the operators and the grid I request actual matrix multiplications, which ndarray does for us; this avoids any kind of index-checking which is probably expensive in deeply-nested loops. We could easily switch on BLAS support in ndarray that could make it even faster without changing the code!
This is probably minor, but I'm really trying to avoid any kind of index-access anywhere in Grid::evolve, which does require some not-so-easy chains of iterators.

pineappl/src/grid.rs

cschwan · 2022-10-07T08:18:46Z

@alecandido @felixhekhorn There's another issue moving forward with this implementation: in order to keep the size of the operators small EKO produces a muf2-dependent x grid (the x in the Grid, not in the FkTable). I don't remember all the details, but I think a potential problem will be that the EKO is sorted according to the ordering of Grid::axes and that this is possibly not reconstructible from metadata.yaml alone (without the grid). It's difficult to describe this here, would you mind discussing this live?

If the subset of used factorization scales was smaller than the set of all factorization scales provided by the operator there would possibly have been a wrong evolution

alecandido · 2022-10-17T14:12:08Z

@andreab1997 you can have a look at the workflow at .github/workflows/rust.yaml.

What @cschwan has done from f91e7ca to 8d58423, is more or less what I was proposing to do in pineko:

store data on the NNPDF server (they don't need to be versioned and blow up .git repo)
download them in the workflow
cache for better performance

I believe we can mirror this on that side :)

The only improvement I would do, is to add to the repository a shell script for the download (containing what now is in the workflow download step), in such a way that is simple to reproduce the workflow environment locally (without the need of copy-pasting from the workflow).

felixhekhorn · 2022-10-18T12:13:40Z

The following are wrong because of the gluon exception (#184 (comment)):

I guess that would need to be fixed before merging ...

as for the DIS grids listed above: it is not sufficient to just look at the difference between grid and FK table, since the observables also span non-physical range. More precisely they include points below $Q_0^2$ where we expect differences: first LHAPDF breaks down (backward evolution) and then pQCD. So we need the bin information next to the numbers - @cschwan can you produce such a list? (both pineappl diff and pineko compare should be able to do so) (or else let me know where your grids/FK tables are ...)

That being said, I can do the JOIN also by hand and most of them are indeed alright (so matching where they should and differences where we expect them). However, looking at NUTEV_CC_NB_FE_SIGMARED I can see also something strange as only the first few bins are off, but they should correspond to an increasing series of Q2 points and so I wonder whether the old bug about not-preserving the observable order got visible again? so question is: which versions did you use @cschwan ?

output from @cschwan

>>> NUTEV_CC_NB_FE_SIGMARED.tar
b    FkTable      Grid       rel. diff
--+-----------+-----------+-------------
0  3.6074998e0 5.8221893e0  6.1391255e-1
1  7.7797554e0 9.9638287e0  2.8073804e-1
2  1.1450347e1 1.3461273e1  1.7562146e-1
3  9.7389131e0 1.0754171e1  1.0424760e-1
4  8.8859954e0 8.2236821e0 -7.4534507e-2
5  2.0471413e1 1.9412808e1 -5.1711398e-2
6  1.7321241e1 1.6906333e1 -2.3953712e-2
7  1.7655144e1 1.7655176e1  1.8078417e-6
8  1.0827803e1 1.0827640e1 -1.5123138e-5
9  7.1845983e0 7.1846006e0  3.2141418e-7
10 2.3103035e1 2.3102408e1 -2.7160240e-5
11 1.3527970e1 1.3527618e1 -2.6071438e-5

https://github.com/NNPDF/runcards/blob/7f11afce4242791acad47d4c7be393e629b5121d/pinecards/NUTEV_CC_NB_FE_SIGMARED/observable.yaml#L61

observables:
  XSNUTEVCC_charm:
  - Q2: 0.7648485767999998
    x: 0.015
    y: 0.349
  - Q2: 2.1415760150399996
    x: 0.042
    y: 0.349
  - Q2: 3.671273168639999
    x: 0.072
    y: 0.349
  - Q2: 5.812849183679999
    x: 0.114
    y: 0.349
  - Q2: 10.554910359839997
    x: 0.207
    y: 0.349
  - Q2: 1.2689035127999997
    x: 0.015
    y: 0.579
  - Q2: 3.5529298358399997
    x: 0.042
    y: 0.579
  - Q2: 6.090736861439998
    x: 0.072
    y: 0.579
  - Q2: 9.643666697279999
    x: 0.114
    y: 0.579
  - Q2: 17.510868476639995
    x: 0.207
    y: 0.579
  - Q2: 1.7006375232
    x: 0.015
    y: 0.776
  - Q2: 4.76178506496
    x: 0.042
    y: 0.776
  - Q2: 8.163060111359998
    x: 0.072
    y: 0.776
  - Q2: 12.92484517632
    x: 0.114
    y: 0.776
  - Q2: 23.468797820159995
    x: 0.207
    y: 0.776
  - Q2: 1.4116504163999999
    x: 0.015
    y: 0.349
  - Q2: 3.95262116592
    x: 0.042
    y: 0.349

cschwan · 2022-10-18T12:57:44Z

I guess that would need to be fixed before merging ...

This is fixed in commits 7b6fdb2 and 8c4274e.

cschwan · 2022-10-18T13:14:59Z

as for the DIS grids listed above: it is not sufficient to just look at the difference between grid and FK table, since the observables also span non-physical range. More precisely they include points below Q02 where we expect differences: first LHAPDF breaks down (backward evolution) and then pQCD. So we need the bin information next to the numbers - @cschwan can you produce such a list? (both pineappl diff and pineko compare should be able to do so) (or else let me know where your grids/FK tables are ...)

That makes a lot of sense. For the bin limits see test-output.txt. It seems that every bin with Q2 < 2.62892 is bad, which is exactly what you described (backward evolution). So it seems everything's OK!

That being said, I can do the JOIN also by hand and most of them are indeed alright (so matching where they should and differences where we expect them). However, looking at NUTEV_CC_NB_FE_SIGMARED I can see also something strange as only the first few bins are off, but they should correspond to an increasing series of Q2 points and so I wonder whether the old bug about not-preserving the observable order got visible again? so question is: which versions did you use @cschwan ?

I'm not sure which version you'd like to know, I simply copied the files from theory 208.

cschwan · 2022-10-18T13:32:32Z

@andreab1997 @alecandido @felixhekhorn I consider this done and ready for merging into master. After merging I'll release a new version of PineAPPL so you can use Grid::evolve in pineko.

We've got integration tests and I tested theory 208 completely. Do you think there's anything missing?

cschwan · 2022-10-18T13:56:17Z

What's clearly missing is that we don't test for evolved and scale-varied FkTables. In commit 1d45b83 I added two parameters that allow for scale variation, and this shows that the code is very likely wrong.

In newer EKO file formats the strong coupling is not stored anymore

The reason why the comparison fails is that there are bins with a factorization scale below the PDFs fitting scale

https://youtu.be/CkdyU_eUm1U

cschwan · 2022-10-21T14:23:49Z

I merged this branch into master. We'll need further tests, but for them I'm going to create a new Issue.

alecandido · 2022-10-21T14:25:38Z

We will need a release, and maybe also a deprecation plan for Grid::convolute_eko

cschwan · 2022-10-21T14:58:39Z

I'll remove Grid::convolute_eko with v0.6.0. For a release I'd like to test scale-varied FkTables, this can still be wrong.

alecandido · 2022-10-21T15:01:18Z

Ok, fine. But I'd like ot have another intermediate release before v0.6.0, because we need to upgrade in Pineko, and it would be handy to be able to test also there with both available.

cschwan · 2022-10-21T15:04:16Z

I agree, I'm working on it right now!

Add first candidate for Grid::evolve

58e1132

cschwan added 4 commits October 6, 2022 14:45

Finish documentation of Grid::evolve

1e824cf

Add more checks of Grid::evolve's arguments

66d103a

Add temporary fix of x-grid ordering

58e8321

Update unit test of Grid::evolve

bc48d90

cschwan added 3 commits October 6, 2022 15:13

Replace todo macros with errors

75899fe

Fix warnings reported by clippy

435a49e

Make floating points assertions less strict

9596e77

alecandido reviewed Oct 7, 2022

View reviewed changes

pineappl/src/grid.rs Outdated Show resolved Hide resolved

cschwan self-assigned this Oct 7, 2022

cschwan added the enhancement New feature or request label Oct 7, 2022

cschwan added 9 commits October 7, 2022 11:17

Fix small but subtle bug

69f4428

If the subset of used factorization scales was smaller than the set of all factorization scales provided by the operator there would possibly have been a wrong evolution

Implement permutation of the EKO x1 entries

f5beb8b

Fix assertion statement in Grid::evolve

f5c2cfe

Fix another bug in Grid::evolve

2ea81cc

Add more checks of Grid::evolve's arguments

f3aeba7

Split off a generic part of Grid::evolve

197d732

Use different pids and operators for each initial state

60788ff

Remove commented lines

47a1f96

Improve error reporting in Grid::evolve

5b73e39

cschwan added 2 commits October 17, 2022 14:54

Add real DIS test

4e13e5f

Cache test data in CI

8d58423

Update Python doc

5407d6d

cschwan added 5 commits October 18, 2022 14:16

Add gluon exception

7b6fdb2

Add missing to gluon exception

8c4274e

Fix clippy warnings

7884fa1

Add integration test for gluon exception

932e073

Add missing test data to CI

ea0f672

Add parameters xir and xif to evolve subcommand

1d45b83

cschwan added 7 commits October 18, 2022 16:09

Fix scale-variation in Grid::evolve

2de87b3

Replace unwrap calls with meaningful error messages

701a8d6

Use strong coupling from LHAPDF

8de3a3e

In newer EKO file formats the strong coupling is not stored anymore

Remove TODO - evolution succeeds

143a394

The reason why the comparison fails is that there are bins with a factorization scale below the PDFs fitting scale

Disable tests unless --features=fktable is enabled

f4b97c3

Use chinese food mind game construction

99e53ed

https://youtu.be/CkdyU_eUm1U

Fix the right factorization scale choice

f841dff

cschwan merged commit d3d020d into master Oct 21, 2022

cschwan deleted the grid_evolve branch October 21, 2022 14:23

cschwan mentioned this pull request Dec 1, 2022

Add option to remove (or modify behaviour) progress bar NNPDF/pineko#65

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `Grid::evolve`, a better version of `Grid::convolute_eko` #184

Add `Grid::evolve`, a better version of `Grid::convolute_eko` #184

cschwan commented Oct 6, 2022 •

edited

Loading

codecov-commenter commented Oct 6, 2022 •

edited

Loading

cschwan commented Oct 6, 2022 •

edited

Loading

felixhekhorn commented Oct 6, 2022

cschwan commented Oct 6, 2022

alecandido commented Oct 6, 2022

cschwan commented Oct 6, 2022

cschwan commented Oct 6, 2022

felixhekhorn commented Oct 6, 2022

cschwan commented Oct 6, 2022

cschwan commented Oct 7, 2022 •

edited

Loading

alecandido commented Oct 17, 2022 •

edited

Loading

felixhekhorn commented Oct 18, 2022

cschwan commented Oct 18, 2022

cschwan commented Oct 18, 2022 •

edited

Loading

cschwan commented Oct 18, 2022 •

edited

Loading

cschwan commented Oct 18, 2022 •

edited

Loading

cschwan commented Oct 21, 2022 •

edited

Loading

alecandido commented Oct 21, 2022

cschwan commented Oct 21, 2022

alecandido commented Oct 21, 2022

cschwan commented Oct 21, 2022

Add Grid::evolve, a better version of Grid::convolute_eko #184

Add Grid::evolve, a better version of Grid::convolute_eko #184

Conversation

cschwan commented Oct 6, 2022 • edited Loading

codecov-commenter commented Oct 6, 2022 • edited Loading

Codecov Report

cschwan commented Oct 6, 2022 • edited Loading

felixhekhorn commented Oct 6, 2022

cschwan commented Oct 6, 2022

alecandido commented Oct 6, 2022

cschwan commented Oct 6, 2022

cschwan commented Oct 6, 2022

felixhekhorn commented Oct 6, 2022

cschwan commented Oct 6, 2022

cschwan commented Oct 7, 2022 • edited Loading

alecandido commented Oct 17, 2022 • edited Loading

felixhekhorn commented Oct 18, 2022

cschwan commented Oct 18, 2022

cschwan commented Oct 18, 2022 • edited Loading

cschwan commented Oct 18, 2022 • edited Loading

cschwan commented Oct 18, 2022 • edited Loading

cschwan commented Oct 21, 2022 • edited Loading

alecandido commented Oct 21, 2022

cschwan commented Oct 21, 2022

alecandido commented Oct 21, 2022

cschwan commented Oct 21, 2022

Add `Grid::evolve`, a better version of `Grid::convolute_eko` #184

Add `Grid::evolve`, a better version of `Grid::convolute_eko` #184

cschwan commented Oct 6, 2022 •

edited

Loading

codecov-commenter commented Oct 6, 2022 •

edited

Loading

cschwan commented Oct 6, 2022 •

edited

Loading

cschwan commented Oct 7, 2022 •

edited

Loading

alecandido commented Oct 17, 2022 •

edited

Loading

cschwan commented Oct 18, 2022 •

edited

Loading

cschwan commented Oct 18, 2022 •

edited

Loading

cschwan commented Oct 18, 2022 •

edited

Loading

cschwan commented Oct 21, 2022 •

edited

Loading