Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap for the paper #19

Closed
4 tasks done
scarrazza opened this issue Apr 6, 2020 · 18 comments
Closed
4 tasks done

Roadmap for the paper #19

scarrazza opened this issue Apr 6, 2020 · 18 comments

Comments

@scarrazza
Copy link
Contributor

scarrazza commented Apr 6, 2020

Following the development, here my wish list for the paper:

  • create final performance benchmark plots
  • create accuracy benchmark plots for NNPDF and other PDF sets
  • prepare examples (singlet top, FK convolution)
  • finalize code and related tasks.
@scarrazza
Copy link
Contributor Author

@marcorossi5 could you please take care of the first 2 points?

@marcorossi5
Copy link
Collaborator

I collected a lot of accuracy plots for different pdf sets, both about central and noncentral pdfs.
As for the performance plots, I can make it just for my laptop (maybe a GPU at cern which is TeslaP100 I think). I don't have access to different devices.

@scarlehoff
Copy link
Member

I can take care of the GPU performance plots, I have access to a good number of different GPUs.

@marcorossi5
Copy link
Collaborator

marcorossi5 commented Apr 7, 2020

I made accuracy plots and collected in this folder to be downloaded https://cernbox.cern.ch/index.php/s/AfTAO0tMf0p3xk0
password: pdfflow
If you want to edit or upload something you should be able to do that as well

@scarrazza
Copy link
Contributor Author

I have place a pair of simple scripts in the paper repo (https://github.com/N3PDF/papers/pull/8). I don't understand why the code below is extremely slow in comparison to LHAPDF, and I was just wondering if multi-replica evaluation is something that we should consider implementing:

#!/usr/bin/env python
import time
import pdfflow.pflow as pdf
import numpy as np

pdfs = [
    pdf.mkPDF(f'NNPDF31_nlo_as_0118_1000/{i}', dirname='/opt/lhapdf/share/LHAPDF/')
    for i in range(1001)
    ]
xgrids = np.loadtxt('xgrids.dat')
q2 = 1.65**2*np.ones(len(xgrids))
fls = [-5,-4,-3,-2,-1,0,1,2,3,4,5]

t0 = time.time()
for pdf in pdfs:
    pdf.xfxQ2(fls, xgrids, q2)
print('total time (s):', time.time()-t0)

@marcorossi5
Copy link
Collaborator

marcorossi5 commented Apr 9, 2020

Could it be because the first time you call xfxQ2, tf builds the graph? The first iteration is way slower than the others. Here you rebuild a graph for each of the 1000 pdfs

@scarrazza
Copy link
Contributor Author

scarrazza commented Apr 10, 2020

Indeed, here some numbers for pdfflow vs lhapdf timings on CPU. We should think if there are possibilities to improve on that:

LHAPDF

loading from file time (s): 19.299583673477173
total FK evaluation time (s): 13.704259634017944

PDFFlow

loading from file time (s): 88.5237967967987
dry run (s): 2184.7245454788208
total FK evaluation time (s): 2.3665452003479004

@scarlehoff
Copy link
Member

As you said, multireplica implementation makes a lot of sense. There are some intermediate solutions, as making sure that it doesn't rebuild the graph replica to replica (checking what changes from one to the next and make it into a tensor instead of python/numpy variables).

My guess is that option 2 will make option 1 easier in the future so there's that.

@marcorossi5
Copy link
Collaborator

Currently the graph is grid dependent. Every instance of pdf will create a new graph because of the shape of the grid. We abstracted just the shape of the query points.
We could think of abstracting the concept of operational graph and making ops independent of the grid. This way we could load a computational graph just once when a script (which wants to use pdfflow) starts and employing it everytime we call interpolations.
It's like haveing a placeholder for the grid.
I am wondering if this is closer to the concept of tf versions <=1.15 rather than @tf.function

@scarlehoff
Copy link
Member

We could think of abstracting the concept of operational graph and making ops independent of the grid. This way we could load a computational graph just once when a script (which wants to use pdfflow) starts and employing it everytime we call interpolations.

Yes, this is what I was thinking. And I don't think that using a placeholder which we fill in later will change anything for one-replica calls since the grids are not that big anyway.

@marcorossi5
Copy link
Collaborator

Can we insert a check that counts how many times the graph is being rebuilt?

@scarlehoff
Copy link
Member

So, the only thing missing here is adding docs for the normal usage.
@marcorossi5, could you add a section (maybe in overview) where the different PDF routines are used? Some examples like generating points for a few flavours in a few Q^2s.
And then also a section (also in overview, maybe at the beginning) on how to install new PDFs saying there are two options, either installing directly from LHAPDF or downloading and pointing to the correct folder

(I'd do it myself but am in a -long- meeting so not sure when I'll have a time today before the paper gets sent at 8pm :P)

Also, the section "General Usage" maybe should have another name. Maybe "Advance usage"...

@marcorossi5
Copy link
Collaborator

marcorossi5 commented Sep 14, 2020

Do you mean in the paper? Outlook section? Or appendix?

@scarlehoff
Copy link
Member

No, no, the documentation in this repository. So that before submiting the paper we generate the version 1 release of pdfflow.

@marcorossi5
Copy link
Collaborator

Ok, I was confused

@scarlehoff
Copy link
Member

Sorry, I was writing here while paying attention to the voice of people :__

@marcorossi5
Copy link
Collaborator

marcorossi5 commented Sep 14, 2020

Ok then, it's clear. I'm going to create a new branch from master named docs, to include some typos I found and these new features.

@scarlehoff
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants