Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vegasflow+pineappl example #56

Merged
merged 18 commits into from
Sep 24, 2020
Merged

vegasflow+pineappl example #56

merged 18 commits into from
Sep 24, 2020

Conversation

scarrazza
Copy link
Contributor

@scarrazza scarrazza commented Sep 11, 2020

Here a first prototype. However there 3 points which require investigation:

  • apply multiple custom cuts
  • make the fill function async (i.e. fight with pickle)
  • replace lhapdf with pdfflow (i.e. make sure pdfflow works in eager mode)

@scarlehoff
Copy link
Member

Pdfflow should work eager 100%

What problem do you have with pickle? In #55 I found out that tensorflow 2.2 have some unpickable internal state that tf 2.3 doesn't have.

@scarrazza
Copy link
Contributor Author

Concerning pdfflow (master, tf2.3), here a short example which fails in eager mode:

import tensorflow as tf
tf.config.run_functions_eagerly(True)
from pdfflow.pflow import mkPDF
pdf = mkPDF('NNPDF31_nlo_as_0118_luxqed/0')
pdf.alphasQ2([10.5])

this fails with:

InvalidArgumentError: cannot compute GreaterEqual as input #1(zero-based) 
was expected to be a float tensor but is a double tensor [Op:GreaterEqual]

Concerning pickle, I have tried the multiprocessing apply_async function:

pool.apply_async(fill, [x1, x2, q2, yll, weight])

however this fails with:

AttributeError: Can't get attribute 'fill' 
on <module '__main__' from 'example_pineappl.py'>

@scarlehoff
Copy link
Member

Concerning pdfflow (master, tf2.3), here a short example which fails in eager mode:

import tensorflow as tf
tf.config.run_functions_eagerly(True)
from pdfflow.pflow import mkPDF
pdf = mkPDF('NNPDF31_nlo_as_0118_luxqed/0')
pdf.alphasQ2([10.5])

this fails with:

InvalidArgumentError: cannot compute GreaterEqual as input #1(zero-based) 
was expected to be a float tensor but is a double tensor [Op:GreaterEqual]

You are calling the tf interface with a python object. You have to either use py_alpha or use float_me

This is a very tricky point, in one of the PR of pdfflow the error is caught and you get told (at least for xfq... Maybe I forgot alphas!)

Concerning pickle, I have tried the multiprocessing apply_async function:

pool.apply_async(fill, [x1, x2, q2, yll, weight])

however this fails with:

AttributeError: Can't get attribute 'fill' 
on <module '__main__' from 'example_pineappl.py'>

But how do you know this is a pickle error?

@scarrazza
Copy link
Contributor Author

Great, thanks for the pdfflow, now it works.

@scarlehoff
Copy link
Member

Many thanks for this, this is a fairly complicated and real-world usage case.

So, seeing the problem you were having dovevo averlo capito subito but I had completely overlooked the imports. Sorry. For many of these things you cannot pickle stuff to send to different processes, you are allowed to open threads but they must be all childs of the same process.
(btw, probably also dask would fail for pineappl, pickles are bad)

Btw, this example highlights the need for a feature to pass arguments around. Neither the usage of partial neither the importance of avoiding tf.function before the partial is obvious. So compile needs to check whether the input function is a tensorflow function and if it is it should failing telling the user "look, you can't do this in this order"

@scarrazza
Copy link
Contributor Author

Right, thanks for spotting this. The current implementation with async is 10x faster (CPU).
At this point the only minor requirement is how to apply cuts in a "elegant" way, in particular the conditions in lines 78-80.

@scarlehoff scarlehoff added this to the v1.2 milestone Sep 14, 2020
@scarlehoff scarlehoff mentioned this pull request Sep 14, 2020
6 tasks
@scarrazza scarrazza marked this pull request as ready for review September 17, 2020 12:09
@scarrazza
Copy link
Contributor Author

@cschwan, in principle this PR is ready. After applying cuts the numbers I get are similar to the python example in pineappl.
Could you please have a look and cross check?

@scarrazza
Copy link
Contributor Author

This default configuration takes ~2s on CPU and ~1s on GPU, so close to 50% improvement.
If I remove pineappl the GPU drops to 0.5s, so in overall the performance is good,

@cschwan
Copy link
Member

cschwan commented Sep 18, 2020

I need the patch NNPDF/pineappl@74ae0d0 to make the PineAPPL Python API work. The bad news is that the numbers from the grid seem random. When I increase the statistics successively by factors of ten, the numbers change wildly. Since the integrated results from vegasflow are stable, Something must be wrong with filling the grid. The integrated result seems to be missing the multiplication with the number of calls.

@cschwan
Copy link
Member

cschwan commented Sep 18, 2020

Are you filling all iterations into a single grid? If so, that will surely give wrong numbers. In that case let's try to use only the last iteration.

@cschwan
Copy link
Member

cschwan commented Sep 18, 2020

Another concern is that the example doesn't convolve the matrix elements with PDFs. With a PLAIN MC that's fine, but I reckon that the adaption can go wrong if you use VEGAS without PDFs, because the PDFs change the importance of the integrand.

@cschwan
Copy link
Member

cschwan commented Sep 18, 2020

The last commit breaks the correctness. You can't fill the grids asynchronously, at least not that easily.

@scarlehoff
Copy link
Member

Ah, I was indeed wondering whether the result seemed correct by chance
In any case, passing the .numpy() seems to make it go much faster (I think due to the loop). This should make it work..

@cschwan
Copy link
Member

cschwan commented Sep 18, 2020

Strange, the results are still wrong and I can't tell you why. The fact that it's faster suggests it does less work, but that's only a guess, of course. I didn't pull the latest commit...

@scarlehoff
Copy link
Member

scarlehoff commented Sep 18, 2020

No, I just checked and it is slower because iterating a tensorflow tensor is slower than a numpy tensor.
I compared the values in the numpy() and they are numerically the same as in the tensorflow tensor.

What do you mean by wrong results? (what are the right ones)

I see :P so they are right, so the problem is simply the grid needs to be filled synchronously which is fair I'd say

@cschwan
Copy link
Member

cschwan commented Sep 18, 2020

The 'correct' results are the results that are calculated by the PineAPPL example program, which I check against mg5_aMC@NLO.

@scarlehoff
Copy link
Member

Oh, I know now why my results were correct in the async case. Of course they were, I was running the whole batch in one go.

@cschwan
Copy link
Member

cschwan commented Sep 18, 2020

When I compare the numbers from commits 1dcf172 and a9f626a I don't get the exact same numbers, but they seem to differ only within the MC uncertainty.

@scarrazza
Copy link
Contributor Author

The async issue should be fixed in my last commit. We forgot to ask explicitly the pool to "wait" for pineappl fill. @cschwan could you please double check?

The apply_async seems to queue the fill evaluations. The .numpy() call is pretty useful, it accelerates a lot the integration procedure, otherwise eager applies the convert to tensor element by element, adding a huge overhead.

Anyway, the bottleneck of the implementation seems to be pineappl fill, for 10M events and events_limit=1e6 it takes on my laptop 1 minute to complete when we run sequentially or asynchronously, obviously the vegasflow time drops from 50s to 23s but the pool dominates the calculation.

So, following our discussion, I believe we should try the approach suggested by Christopher. Create n_events/events_limit pineappl grids, and fill then in separate process. @scarlehoff is there some way to extract the thread id (batch id) inside the integrand function? If have that we can pass an array of pineappl grids and try to fill with apply_async using more than 1 process.

@cschwan
Copy link
Member

cschwan commented Sep 21, 2020

The numbers from last commit look fine, but they seem to have changed again. Is there a way to ensure that, given N evaluation, the random numbers are always the same? In that case you could unit-test it.

@scarlehoff
Copy link
Member

Why would gou need the batch id? Every new call should be a different process regardless of the id right?

Wrt reproducibility, this has been an outstanding issue with tf. One would need to seed numpy, tensorflow and then hope that the multithreading work similarly between the two runs. Let me have a go at it, maybe with the GPU works better (for n3fit I can get reproducible results only running on 1 thread)

@scarrazza
Copy link
Contributor Author

Why would gou need the batch id? Every new call should be a different process regardless of the id right?

In principle we need to now the thread number only for the last iteration (the one where fill is called). I just tried with a global variable and it works, however the performance is pretty bad because we are forced to allocate tons of threads, withdrawing computing power from vegasflow. So I think we should keep the single thread fill implementation, and try to get rid of the python for loop.

@scarlehoff
Copy link
Member

But all batches must fill the grid.

The python loop is surely hurting, can't pineappl take arrays? Numpy arrays can be passed to C

@cschwan
Copy link
Member

cschwan commented Sep 21, 2020

Anyway, the bottleneck of the implementation seems to be pineappl fill, for 10M events and events_limit=1e6 it takes on my laptop 1 minute to complete when we run sequentially or asynchronously, obviously the vegasflow time drops from 50s to 23s but the pool dominates the calculation.

This is expected, since the matrix element and the phase space generation in this example is extremely cheap. For more realistic scenarios the timings will be more favourable.

@scarlehoff
Copy link
Member

This seems to work. I'm not sure it would work as expected if not running eagerly though... I need to look into this and I'll add a set_seed method, it will be useful.

@cschwan
Copy link
Member

cschwan commented Sep 21, 2020

The python loop is surely hurting, can't pineappl take arrays? Numpy arrays can be passed to C

I've implemented a function in NNPDF/pineappl@bfb0d40. Is this what you need? If yes, we also need the corresponding function in the Python interface.

@scarrazza
Copy link
Contributor Author

Yes, that's good, thanks!

@scarrazza
Copy link
Contributor Author

Could you please port this function to the capi?

@cschwan
Copy link
Member

cschwan commented Sep 21, 2020

@scarrazza That is the C API.

@scarrazza
Copy link
Contributor Author

Strange, I have recompiled and the header does not contain this function, and I get ab undefined symbol: pineappl_grid_fill_array..

@scarrazza
Copy link
Contributor Author

Ok, here some results with GPU for 10M events.

  • i9-9980xe:

    • vegas time without pineappl: 7.6s
    • vegas time with pineappl: 7.8s
    • vegas+pool time: 9.4s
  • rtx2080ti:

    • vegas time without pineappl: 2.8s
    • vegas time with pineappl: 2.8s
    • vegas+pool time: 5.1s

So, pineappl is really well integrated and adds a minimal overhead.

@scarlehoff
Copy link
Member

I guess point two shows one interesting advantage, if your integrator is running in the GPU, the CPU is free to do its thing without having a effect in the integrator. Although those 0,2s difference might be a fluctuation :P

@scarlehoff scarlehoff merged commit 1bf91f2 into master Sep 24, 2020
@scarlehoff scarlehoff deleted the pineappl branch September 24, 2020 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants