Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computational graph toolchain #73

Closed
alecandido opened this issue Jul 27, 2020 · 12 comments · Fixed by #111
Closed

Computational graph toolchain #73

alecandido opened this issue Jul 27, 2020 · 12 comments · Fixed by #111
Labels
enhancement New feature or request question Further information is requested refactor Refactor code

Comments

@alecandido
Copy link
Member

Keeping investigating from #29 we decided to have a look into a proper way to replace and speed up the hard part of this library.

The goal

The situation is the following:

  • we have a perturbative expansion f(x) = sum(alpha**k * f_k(x)), that involves multiplying and summing functions
  • thus we need lazy evaluation
  • at the end we will integrate the result

Proposals?

We are searching for a proper tool. Currently:

  • lazy evaluation is implemented nesting lambdas, and the only efficiency measure is try to keep nesting as minimal as possible, based on a manual type check (the best we can achieve with lambdas only)

what we would like:

  • a library for dealing automatically with computational graph; maybe we just need tensorflow, but maybe what we need is just the compile part, that maybe is xla (I'm still trying to understand the tf internals), and so jax would be enough (it just ship autograd, that we don't need, together with xla and nothing more, rather minimal if compared with tf)
  • maybe also the integration should be managed on the same level, do we need something like tfp.math.ode.Solver to target performance?
@alecandido alecandido added enhancement New feature or request question Further information is requested refactor Refactor code labels Jul 27, 2020
@alecandido
Copy link
Member Author

Seems like in tf2 the best thing (easiest and efficient) is just decorate stuffs with @tf.function and replacing the sporadic np calls with tf (e.g.: np.log to tf.math.log)

@scarrazza
Copy link
Member

I think there are several options, e.g. in vp2 we use reportengine which does the DAG evaluation. Other simpler options are tf, networkx and dash. More sophisticated options are airflow and Luigi.

Maybe you just need to code a simple DAG in python with numba precompilation.

@alecandido
Copy link
Member Author

alecandido commented Jul 28, 2020

We looked into your proposals:

  • the DAG stuffs (reportengine model and networkx) are interesting, but too manually for us and since they are going to implement the thing in python the won't escape the functions' nesting problem: to evaluate the computational graph you still need to unravel the nodes in some way, and without escaping python no inlining it's really possible
  • the jobs schedulers, like airflow and Luigi (and also dask, I think you simply mistyped), are an interesting further development, indeed when we will be ready to perform some actual fk generations it can be useful to make use of the hardware we have, but currently we are trying to optimize inside the single jobs, so they are not targeting the issue

On the other hand we further investigated and we made up an alternative to tf (that probably would do perfectly the thing for us, but at the price of a huge dependency little used):

  • the thing we need should match the template of theano so simply computing the graph operating over some tensors and inlining in an efficient way at the end (even if theano itself is not anymore an alternative)
  • we found that jax it's really doing this task with a simple decorator @jax.jit, so it should be fine for everything (the inlining at the end is done using xla, that should be the same backend of tf), moreover it has an almost complete numpy replacement, so it's able to trace the operations through logs and so on simply using its numpy
  • the last good thing we found is that the ode solver I mentioned is a part of tfp, that it's not a part and not dependent on tf, and tfp provides also a jax backend

In conclusion probably we will try first the combination jax-scipy for inlining-integrating, and maybe at the end jax-tfp.

Thank you very much for your proposals @scarrazza, as said we will consider others but for a further goal.

@scarrazza
Copy link
Member

Yes, jax sounds like a good choice. By comparing its features with tf, I see that both are quite similar, in particular if we do not need gradients. Do you know if jax provides an interface for custom code in C++/CUDA operators?

@scarrazza
Copy link
Member

Other question, after looking at their documentation and the big "research" statement.
What about just use numba? At the end of the day jax is replicating numba's behaviour with gradients, right?

@felixhekhorn
Copy link
Contributor

When we have time, maybe, we should read this https://numba.pydata.org/numba-doc/latest/developer/inlining.html more carefully ...

@alecandido
Copy link
Member Author

The second paragraph of the cited numba page is exactly:

When attempting to inline at this level, it is important to understand what purpose this serves and what effect this will have. In contrast to the inlining performed by LLVM, which is aimed at improving performance, the main reason to inline at the Numba IR level is to allow type inference to cross function boundaries.

Of course numba was the first try, the reason why we gave up at the time was that it seemed not to be able at all to inline nested functions.
To my knowledge numba is not exactly like tf or jax, so it’s not tracking a graph of operations to compile it at the end, but it’s simply try to compile python code with LLVM, so it’s something like a python-to-c transpiler + a c compiler. Maybe I misinterpreted, but if I were right the nesting of python functions would simply translate to nesting of c functions (and I don’t know if it’s a c nesting of functions or still a python nesting).

If you have any deeper understanding in numba of course it would be the perfect tool, since it’s the only one the provides only a compiler and nothing more.

@alecandido
Copy link
Member Author

alecandido commented Oct 5, 2020

Relevant advice from @scarlehoff

tf.function is requiring tf.Tensor as input, and output will be a tf.Tensor as well, so having I/O can cause a considerably overhead

@alecandido
Copy link
Member Author

alecandido commented Oct 5, 2020

Another relevant info would be to know in advance what happens when you are using closures together with tf.function?

In particular the function we are going to decorate will be only the very small coefficient functions (small, but there is a plethora of them), and they will usually depend on a single argument, x/z, or two (the Q2 for the heavy cf).
On top of them there are all the QCD constants (TR, CF, CA), and we already agreed with @felixhekhorn to make them globals, because passing them around it's rather messy and painful with no advantage.

Perhaps you (@scarlehoff, @scarrazza) already know?

e.g. numba recompile anytime the environment is changing, making it a lot expensive if things are not really constant, but apart from that it's not harmful (if the things you are passing through the closures are not functions or objects, of course...)

@alecandido
Copy link
Member Author

alecandido commented Dec 22, 2020

Looks like jax was a good guess, so maybe we definitely won't need tensorflow:

  • the PyMC team resurrected the old theano trying to use jax as backend
  • theano it's exactly doing this: computing graphs and compiling them together

So the following alternatives:

  • we can copy the way Theano-PyMC it's using jax
    • it will have the benefit of minimal dependency, only the backend we need
  • we can immediately make use of Theano-PyMC and lets him compute the graphs for us
    • it will have the benefit of not reimplementing existing code
    • on the other hand it may be thought to be used in a specific too much constraining way, or doing very little in our case

More investigation it's needed, but probably this is the promising direction.

References

@felixhekhorn felixhekhorn linked a pull request May 13, 2021 that will close this issue
@stale
Copy link

stale bot commented Oct 18, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Oct 18, 2021
@alecandido
Copy link
Member Author

This time @Stale is correct, this one is not going anywhere: yadism is sufficiently fast on its own, light-speed (on light).

The way to improve is simply to compile more ingredients, and maybe to parallelize a bit (if really needed).

@stale stale bot removed the wontfix This will not be worked on label Oct 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested refactor Refactor code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants