-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vegasflow+pineappl example #56
Conversation
Pdfflow should work eager 100% What problem do you have with pickle? In #55 I found out that tensorflow 2.2 have some unpickable internal state that tf 2.3 doesn't have. |
Concerning pdfflow (master, tf2.3), here a short example which fails in eager mode: import tensorflow as tf
tf.config.run_functions_eagerly(True)
from pdfflow.pflow import mkPDF
pdf = mkPDF('NNPDF31_nlo_as_0118_luxqed/0')
pdf.alphasQ2([10.5]) this fails with:
Concerning pickle, I have tried the multiprocessing apply_async function: pool.apply_async(fill, [x1, x2, q2, yll, weight]) however this fails with:
|
You are calling the tf interface with a python object. You have to either use This is a very tricky point, in one of the PR of pdfflow the error is caught and you get told (at least for xfq... Maybe I forgot alphas!)
But how do you know this is a pickle error? |
Great, thanks for the pdfflow, now it works. |
Many thanks for this, this is a fairly complicated and real-world usage case. So, seeing the problem you were having dovevo averlo capito subito but I had completely overlooked the imports. Sorry. For many of these things you cannot pickle stuff to send to different processes, you are allowed to open threads but they must be all childs of the same process. Btw, this example highlights the need for a feature to pass arguments around. Neither the usage of partial neither the importance of avoiding |
Right, thanks for spotting this. The current implementation with async is 10x faster (CPU). |
Add a conditions generator
@cschwan, in principle this PR is ready. After applying cuts the numbers I get are similar to the python example in pineappl. |
This default configuration takes ~2s on CPU and ~1s on GPU, so close to 50% improvement. |
I need the patch NNPDF/pineappl@74ae0d0 to make the PineAPPL Python API work. The bad news is that the numbers from the grid seem random. When I increase the statistics successively by factors of ten, the numbers change wildly. |
Are you filling all iterations into a single grid? If so, that will surely give wrong numbers. In that case let's try to use only the last iteration. |
Another concern is that the example doesn't convolve the matrix elements with PDFs. With a PLAIN MC that's fine, but I reckon that the adaption can go wrong if you use VEGAS without PDFs, because the PDFs change the importance of the integrand. |
The last commit breaks the correctness. You can't fill the grids asynchronously, at least not that easily. |
Ah, I was indeed wondering whether the result seemed correct by chance |
|
I see :P so they are right, so the problem is simply the grid needs to be filled synchronously which is fair I'd say |
The 'correct' results are the results that are calculated by the PineAPPL example program, which I check against mg5_aMC@NLO. |
Oh, I know now why my results were correct in the |
The async issue should be fixed in my last commit. We forgot to ask explicitly the pool to "wait" for pineappl fill. @cschwan could you please double check? The apply_async seems to queue the fill evaluations. The Anyway, the bottleneck of the implementation seems to be pineappl fill, for 10M events and events_limit=1e6 it takes on my laptop 1 minute to complete when we run sequentially or asynchronously, obviously the vegasflow time drops from 50s to 23s but the pool dominates the calculation. So, following our discussion, I believe we should try the approach suggested by Christopher. Create n_events/events_limit pineappl grids, and fill then in separate process. @scarlehoff is there some way to extract the thread id (batch id) inside the integrand function? If have that we can pass an array of pineappl grids and try to fill with apply_async using more than 1 process. |
The numbers from last commit look fine, but they seem to have changed again. Is there a way to ensure that, given N evaluation, the random numbers are always the same? In that case you could unit-test it. |
Why would gou need the batch id? Every new call should be a different process regardless of the id right? Wrt reproducibility, this has been an outstanding issue with tf. One would need to seed numpy, tensorflow and then hope that the multithreading work similarly between the two runs. Let me have a go at it, maybe with the GPU works better (for n3fit I can get reproducible results only running on 1 thread) |
In principle we need to now the thread number only for the last iteration (the one where fill is called). I just tried with a global variable and it works, however the performance is pretty bad because we are forced to allocate tons of threads, withdrawing computing power from vegasflow. So I think we should keep the single thread fill implementation, and try to get rid of the python for loop. |
But all batches must fill the grid. The python loop is surely hurting, can't pineappl take arrays? Numpy arrays can be passed to C |
This is expected, since the matrix element and the phase space generation in this example is extremely cheap. For more realistic scenarios the timings will be more favourable. |
This seems to work. I'm not sure it would work as expected if not running eagerly though... I need to look into this and I'll add a |
I've implemented a function in NNPDF/pineappl@bfb0d40. Is this what you need? If yes, we also need the corresponding function in the Python interface. |
Yes, that's good, thanks! |
Could you please port this function to the capi? |
@scarrazza That is the C API. |
Strange, I have recompiled and the header does not contain this function, and I get ab |
Ok, here some results with GPU for 10M events.
So, pineappl is really well integrated and adds a minimal overhead. |
I guess point two shows one interesting advantage, if your integrator is running in the GPU, the CPU is free to do its thing without having a effect in the integrator. Although those 0,2s difference might be a fluctuation :P |
Here a first prototype. However there 3 points which require investigation:
fill
function async (i.e. fight with pickle)