Computational graph toolchain #73

alecandido · 2020-07-27T22:49:59Z

Keeping investigating from #29 we decided to have a look into a proper way to replace and speed up the hard part of this library.

The goal

The situation is the following:

we have a perturbative expansion f(x) = sum(alpha**k * f_k(x)), that involves multiplying and summing functions
thus we need lazy evaluation
at the end we will integrate the result

Proposals?

We are searching for a proper tool. Currently:

lazy evaluation is implemented nesting lambdas, and the only efficiency measure is try to keep nesting as minimal as possible, based on a manual type check (the best we can achieve with lambdas only)

what we would like:

a library for dealing automatically with computational graph; maybe we just need tensorflow, but maybe what we need is just the compile part, that maybe is xla (I'm still trying to understand the tf internals), and so jax would be enough (it just ship autograd, that we don't need, together with xla and nothing more, rather minimal if compared with tf)
maybe also the integration should be managed on the same level, do we need something like tfp.math.ode.Solver to target performance?

The text was updated successfully, but these errors were encountered:

alecandido · 2020-07-27T23:20:01Z

Seems like in tf2 the best thing (easiest and efficient) is just decorate stuffs with @tf.function and replacing the sporadic np calls with tf (e.g.: np.log to tf.math.log)

scarrazza · 2020-07-28T07:12:15Z

I think there are several options, e.g. in vp2 we use reportengine which does the DAG evaluation. Other simpler options are tf, networkx and dash. More sophisticated options are airflow and Luigi.

Maybe you just need to code a simple DAG in python with numba precompilation.

alecandido · 2020-07-28T09:54:14Z

We looked into your proposals:

the DAG stuffs (reportengine model and networkx) are interesting, but too manually for us and since they are going to implement the thing in python the won't escape the functions' nesting problem: to evaluate the computational graph you still need to unravel the nodes in some way, and without escaping python no inlining it's really possible
the jobs schedulers, like airflow and Luigi (and also dask, I think you simply mistyped), are an interesting further development, indeed when we will be ready to perform some actual fk generations it can be useful to make use of the hardware we have, but currently we are trying to optimize inside the single jobs, so they are not targeting the issue

On the other hand we further investigated and we made up an alternative to tf (that probably would do perfectly the thing for us, but at the price of a huge dependency little used):

the thing we need should match the template of theano so simply computing the graph operating over some tensors and inlining in an efficient way at the end (even if theano itself is not anymore an alternative)
we found that jax it's really doing this task with a simple decorator @jax.jit, so it should be fine for everything (the inlining at the end is done using xla, that should be the same backend of tf), moreover it has an almost complete numpy replacement, so it's able to trace the operations through logs and so on simply using its numpy
the last good thing we found is that the ode solver I mentioned is a part of tfp, that it's not a part and not dependent on tf, and tfp provides also a jax backend

In conclusion probably we will try first the combination jax-scipy for inlining-integrating, and maybe at the end jax-tfp.

Thank you very much for your proposals @scarrazza, as said we will consider others but for a further goal.

scarrazza · 2020-07-28T10:38:47Z

Yes, jax sounds like a good choice. By comparing its features with tf, I see that both are quite similar, in particular if we do not need gradients. Do you know if jax provides an interface for custom code in C++/CUDA operators?

scarrazza · 2020-07-28T10:55:43Z

Other question, after looking at their documentation and the big "research" statement.
What about just use numba? At the end of the day jax is replicating numba's behaviour with gradients, right?

felixhekhorn · 2020-08-14T10:02:47Z

When we have time, maybe, we should read this https://numba.pydata.org/numba-doc/latest/developer/inlining.html more carefully ...

alecandido · 2020-08-17T09:07:11Z

The second paragraph of the cited numba page is exactly:

When attempting to inline at this level, it is important to understand what purpose this serves and what effect this will have. In contrast to the inlining performed by LLVM, which is aimed at improving performance, the main reason to inline at the Numba IR level is to allow type inference to cross function boundaries.

Of course numba was the first try, the reason why we gave up at the time was that it seemed not to be able at all to inline nested functions.
To my knowledge numba is not exactly like tf or jax, so it’s not tracking a graph of operations to compile it at the end, but it’s simply try to compile python code with LLVM, so it’s something like a python-to-c transpiler + a c compiler. Maybe I misinterpreted, but if I were right the nesting of python functions would simply translate to nesting of c functions (and I don’t know if it’s a c nesting of functions or still a python nesting).

If you have any deeper understanding in numba of course it would be the perfect tool, since it’s the only one the provides only a compiler and nothing more.

alecandido · 2020-10-05T10:41:28Z

Relevant advice from @scarlehoff

tf.function is requiring tf.Tensor as input, and output will be a tf.Tensor as well, so having I/O can cause a considerably overhead

alecandido · 2020-10-05T11:03:49Z

Another relevant info would be to know in advance what happens when you are using closures together with tf.function?

In particular the function we are going to decorate will be only the very small coefficient functions (small, but there is a plethora of them), and they will usually depend on a single argument, x/z, or two (the Q2 for the heavy cf).
On top of them there are all the QCD constants (TR, CF, CA), and we already agreed with @felixhekhorn to make them globals, because passing them around it's rather messy and painful with no advantage.

Perhaps you (@scarlehoff, @scarrazza) already know?

e.g. numba recompile anytime the environment is changing, making it a lot expensive if things are not really constant, but apart from that it's not harmful (if the things you are passing through the closures are not functions or objects, of course...)

alecandido · 2020-12-22T09:36:14Z

Looks like jax was a good guess, so maybe we definitely won't need tensorflow:

the PyMC team resurrected the old theano trying to use jax as backend
theano it's exactly doing this: computing graphs and compiling them together

So the following alternatives:

we can copy the way Theano-PyMC it's using jax
- it will have the benefit of minimal dependency, only the backend we need
we can immediately make use of Theano-PyMC and lets him compute the graphs for us
- it will have the benefit of not reimplementing existing code
- on the other hand it may be thought to be used in a specific too much constraining way, or doing very little in our case

More investigation it's needed, but probably this is the promising direction.

References

stale · 2021-10-18T17:25:41Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

alecandido · 2021-10-18T18:27:21Z

This time @Stale is correct, this one is not going anywhere: yadism is sufficiently fast on its own, light-speed (on light).

The way to improve is simply to compile more ingredients, and maybe to parallelize a bit (if really needed).

alecandido added enhancement New feature or request question Further information is requested refactor Refactor code labels Jul 27, 2020

felixhekhorn linked a pull request May 13, 2021 that will close this issue

Numbifying #111

Merged

stale bot added the wontfix This will not be worked on label Oct 18, 2021

alecandido closed this as completed Oct 18, 2021

stale bot removed the wontfix This will not be worked on label Oct 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Computational graph toolchain #73

Computational graph toolchain #73

alecandido commented Jul 27, 2020

alecandido commented Jul 27, 2020

scarrazza commented Jul 28, 2020

alecandido commented Jul 28, 2020 •

edited

Loading

scarrazza commented Jul 28, 2020

scarrazza commented Jul 28, 2020

felixhekhorn commented Aug 14, 2020

alecandido commented Aug 17, 2020

alecandido commented Oct 5, 2020 •

edited

Loading

alecandido commented Oct 5, 2020 •

edited

Loading

alecandido commented Dec 22, 2020 •

edited

Loading

stale bot commented Oct 18, 2021

alecandido commented Oct 18, 2021

Computational graph toolchain #73

Computational graph toolchain #73

Comments

alecandido commented Jul 27, 2020

The goal

Proposals?

alecandido commented Jul 27, 2020

scarrazza commented Jul 28, 2020

alecandido commented Jul 28, 2020 • edited Loading

scarrazza commented Jul 28, 2020

scarrazza commented Jul 28, 2020

felixhekhorn commented Aug 14, 2020

alecandido commented Aug 17, 2020

alecandido commented Oct 5, 2020 • edited Loading

alecandido commented Oct 5, 2020 • edited Loading

alecandido commented Dec 22, 2020 • edited Loading

References

stale bot commented Oct 18, 2021

alecandido commented Oct 18, 2021

alecandido commented Jul 28, 2020 •

edited

Loading

alecandido commented Oct 5, 2020 •

edited

Loading

alecandido commented Oct 5, 2020 •

edited

Loading

alecandido commented Dec 22, 2020 •

edited

Loading