Making the code faster #19

scarlehoff · 2019-11-21T20:08:47Z

Still haven't changed anything because interpolation.py will need considerable changes in order to make things work the way they should but I wanted to have a place to collect my thoughts where they can be monitored/audited.

Scipy's quad works very nicely with numba but not the other way around (obviosuly). This addresses the doubts @scarrazza had the other day. If the integrand is a numba compiled object quad will be very fast.
The number of dictionaries being passed around need to be decreased because a dictionary is a complicated class albeit a builtin one. In other words, it doesn't make sense to write C-looking code if there is a collection of dictionaries flying around.
Parallelization: I would say this is irrelevant and to first approximation trivial. The grid size give you enough leeway to parallelize it in the most naive way so I won't do anything about this (because it can be done in the future in an orthogonal way)

…rangeBasisFunctions

scarlehoff · 2019-11-26T22:10:25Z

Python parallelization and numba do not necessarily play well together.

Current experimental roadmap:

Compiling the numba objects before entering any loop
Opening the threads at a level in which they can be openned

Might actually make sense to parallelize on the interpolators themselves rather than in any loop.

scarlehoff · 2019-11-26T22:16:25Z

Python parallelization and numba do not necessarily play well together.

It doesn't work in dom but it does work in my laptop. I really hate when this happens.

scarlehoff · 2019-11-27T11:18:16Z

Ok, bar anything that might have gone wrong (for instance, I might have been stupid and maybe I broke something... testing is needed) but I get some important speed up.

These are the times for a test with the run_dglap from the Benchmark jupyter notebook from PR #9 . There are two run_dglap so I tried both.

Time with PR #9

2086s
1224s

Time with this PR

229s
253s

scarrazza · 2019-11-27T11:24:58Z

Ok, that's quite impressive.

scarlehoff · 2019-11-27T14:03:14Z

And it seems to work. And that's without parallelization yet!

scarlehoff · 2019-11-27T14:50:01Z

@felixhekhorn I just enforced a much stricter integration error and the errors are back where they were before. Thanks for your input!

Ok, at the time of the last commit this is fully functional now I am going to

Clean it
Add documentation (I've been frankly sloppy on that front, but I was experimenting...)
Make the parallelization work in general (in dom in particular) because it is a 1/number_of_cores reduction for free.

scarlehoff · 2019-11-28T15:54:18Z

One comment I am writing here so I don't forget, using np.linalg is actually a performance hit when doing things like computing the eigenvalues of a matrix. I'm guessing either the numba compilation is slower or less aggressive.

Until profile is not done (or NLO is not done) I won't know. If is just the compilation being slower then it is fine (it will be compensated by more difficult integrands) if not we can re-implement some of the numpy function to achieve better performance.

One thing I've already mentioned several times is the fact that one of the main performance penalties is just the fact that we need to integrate many times. Reducing the number of integrations (for instance, integrating all splitting functions at once) would reduce the running time by whatever the reduction.

scarlehoff · 2019-11-28T16:13:08Z

The only thing left is the interpolator.py which will need extensive documentation and dglap.py for which the same can be said. But this is almost done.

[WIP] change some stuffs

scarlehoff · 2019-12-03T16:34:27Z

Ok, not sure whether the test will work on travis but bar that this is now finished.

Several things to note @felixhekhorn

I moved your original benchmark to a folder called benchmarks in the root folder of the repository. I only took the things I needed for the test.
I think there was an error in the original benchmark notebook. Now the absolute difference are always below 1e-1 (there is a big relative error still in the last number but this is a feature at this point :P)
The error I think was in the definition of g1 in plot_table2_2. Please have a look and check I am not stupid :P

felixhekhorn · 2019-12-03T17:18:51Z

@scarlehoff

2: yeah, it's me who is stupid ;-) fixed (hopefully);
the "last number" - you referring to the largest x-bin of the gluon? yes, I guess we need to do some more advanced numerical stuff there
1: I reworked my benchmark, which I want to improve further ...

shall we then merge this PR to LO?

scarlehoff · 2019-12-04T09:02:29Z

Yes, if you @felixhekhorn and @scarrazza don't have any comments we can merge, merge LO to master and start thinking on the "next levels" (for instance, I think this might be a good moment to do a APFEL-EKO comparison test to make sure the results are compatible)

scarrazza · 2019-12-04T10:33:09Z

src/eko/dglap.py

-    ret["operator_errors"]["S_qg"] = op_s_qg_err
-    ret["operator_errors"]["S_gq"] = op_s_gq_err
-    ret["operator_errors"]["S_gg"] = op_s_gg_err
+    ret["operators"]["S_qq"] = output_array[:, :, 0, 0]


Should we explain somewhere how the indexes of output_array are sorted?

Well, it is explained in line 140 above :P
Personally I don't like passing and updating a dictionary but I also don't see a better option right now. But as a matter of principle I don't like the output of this function.

Ideally we would get two matrix of results directly from quad, but as we discovered the vectorized version of quad is much slower.

output_array: well that's internal and if Juan knows fine ;-)

I think we should join _run_nonsinglet and run_singlet into a single function/loop as they serve the same purpose

updating dictionary: we have to replace it as soon as we start iterating over Q2; for now it was just the quickest fix;

I will replace it by

ret = {}; # assign return ret

I think we should join _run_nonsinglet and run_singlet into a single function/loop as they serve the same purpose

Well, they are different functions in that one generates a matrix.
But I think the correct thing would be indeed joining them so that the output is an array [nonsinglet, (singlet1, singlet2, singlet3, singlet4)]

scarrazza · 2019-12-04T10:35:48Z

src/eko/dglap.py

    return ret
+
+
+if __name__ == "__main__":


Do we really need a main here?

It is useful for quickly testing that you didn't broke anything and it doesn't hurt anybody.

ok, but then maybe we should consider this to test folder?

I mean, this is more for convenience since you can do python -i dglap.py and then it has run some stuff and you can play with the functions from dglap. It is not a test in the proper sense.

It is useful for quickly testing

that's all said ;-)

scarrazza · 2019-12-04T10:38:29Z

src/eko/ekomath.py

+    r = np.real(N)
+    i = np.imag(N)
+    out = np.empty(2)
+    c_digamma(


This call gives me

src/eko/tests/test_cfunctions.py::test_digamma /home/carrazza/repo/n3pdf/eko/src/eko/tests/test_cfunctions.py:25: UserWarning: implicit cast from 'char *' to a different pointer type: will be forbidden in the future (check that the types are as you expect; use an explicit ffi.cast() if they are correct) c_digamma(x, 0.0, _gsl_digamma.ffi.from_buffer(out))

Which version of cffi do you have? Mine is 1.13.2

Same version for me.

btw same warning for travis https://travis-ci.com/N3PDF/eko/builds/139186641#L329

Strange, it happens only from pytest. I'll have a look.

Indeed, it only happens in the test file.

Ok, so the problem is that upon first call ffi.from_buffer defaults to char *, the correct call would be c_digamma(x, 0.0, _gsl_digamma.ffi.from_buffer("double[]", out))

But then numba is not happy. numba maybe wants to be in control of the compilation, don't know
So I'll let it be, since this is a problem within cffi I hope that once they decide to deprecate it for good they will make from_buffer read the type of the buffer (they are already looking at the size of the buffer without me telling it anything)

just to let you know: I get two more warnings, the first is related to numba:

env/lib/python3.7/site-packages/numba-0.46.0-py3.7-linux-x86_64.egg/numba/types/containers.py:3 /home/felix/Physik/N3PDF/EKO/eko/env/lib/python3.7/site-packages/numba-0.46.0-py3.7-linux-x86_64.egg/numba/types/containers.py:3: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working from collections import Iterable

and there is somewhere a cast missing or an actual comparison to 0+0j

src/eko/tests/test_cfunctions.py::test_digamma /home/felix/Physik/N3PDF/EKO/eko/env/lib/python3.7/site-packages/numpy/testing/_private/utils.py:664: ComplexWarning: Casting complex values to real discards the imaginary part (actual, desired) = map(float, (actual, desired))

These are both not our problem, but the second is undesired (it's numpy's fault) but it can be fixed. I'll push a fix.

scarrazza

Looks pretty good, in particular the self._compile mechanism is clear and very non intrusive.

scarlehoff added 7 commits November 21, 2019 20:50

add a jupyter notebook for benchmarking

63d092d

written an Interpolator Dispatcher which generates exactly the getLag…

d08f364

…rangeBasisFunctions

add current status

4e3dc2d

now the interpolator itself is numbad

076202a

nopython mode for all splitting functions

3f802cc

fix tests

e9ddff9

everything is now on numba

278cfa0

some refactor

ed231ca

poors man test

f2d369c

scarlehoff added 4 commits November 28, 2019 10:04

remove verbosity from the integral

47f16e7

removed old unused code from alpha_s

e660361

added a ekomath library

8c00e4d

use np linalg

388d616

separated the integrand compilation and mellin inversion stages

13da608

scarlehoff and others added 7 commits November 29, 2019 18:59

interpolator digested

47331b7

added tests for interpolation

dbc3502

forgot regression test

a783d86

forgot the __init__

4480f93

[WIP] change some stuffs

15f8580

add DGLAP test, smaller changes

9eb72be

Merge pull request #20 from N3PDF/gottagofast-fh

5e44031

[WIP] change some stuffs

scarlehoff mentioned this pull request Dec 3, 2019

numba everything up #16

Closed

scarlehoff added 3 commits December 3, 2019 15:19

dglap modified, missing only the tests

aa061b5

added felix's benchmark to a benchmarks folder

5557446

add test for dglap.py

2cf6ff9

scarlehoff and others added 3 commits December 3, 2019 17:43

remove unused code and imports

87ea8ad

work on benchmarks

726b225

Merge branch 'gottagofast-fh' into gottagofast

83f0539

scarrazza reviewed Dec 4, 2019

View reviewed changes

scarrazza approved these changes Dec 4, 2019

View reviewed changes

scarlehoff and others added 3 commits December 4, 2019 14:51

change the numpy testing function

d46e437

update benchmark

e5fc80e

Merge branch 'gottagofast-fh' into gottagofast

87c6cb2

felixhekhorn merged commit 2c4b36b into LO Dec 5, 2019

felixhekhorn deleted the gottagofast branch December 5, 2019 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making the code faster #19

Making the code faster #19

scarlehoff commented Nov 21, 2019 •

edited

Loading

scarlehoff commented Nov 26, 2019

scarlehoff commented Nov 26, 2019

scarlehoff commented Nov 27, 2019

scarrazza commented Nov 27, 2019

scarlehoff commented Nov 27, 2019

scarlehoff commented Nov 27, 2019

scarlehoff commented Nov 28, 2019

scarlehoff commented Nov 28, 2019

scarlehoff commented Dec 3, 2019

felixhekhorn commented Dec 3, 2019

scarlehoff commented Dec 4, 2019 •

edited

Loading

scarrazza Dec 4, 2019

scarlehoff Dec 4, 2019

felixhekhorn Dec 4, 2019

scarlehoff Dec 4, 2019 •

edited

Loading

scarrazza Dec 4, 2019

scarlehoff Dec 4, 2019

scarrazza Dec 4, 2019

scarlehoff Dec 4, 2019

felixhekhorn Dec 4, 2019

scarrazza Dec 4, 2019

scarlehoff Dec 4, 2019

scarrazza Dec 4, 2019

scarrazza Dec 4, 2019

scarlehoff Dec 4, 2019

scarlehoff Dec 4, 2019

scarlehoff Dec 4, 2019 •

edited

Loading

felixhekhorn Dec 4, 2019

scarlehoff Dec 4, 2019

scarrazza left a comment

Making the code faster #19

Making the code faster #19

Conversation

scarlehoff commented Nov 21, 2019 • edited Loading

scarlehoff commented Nov 26, 2019

scarlehoff commented Nov 26, 2019

scarlehoff commented Nov 27, 2019

Time with PR #9

Time with this PR

scarrazza commented Nov 27, 2019

scarlehoff commented Nov 27, 2019

scarlehoff commented Nov 27, 2019

scarlehoff commented Nov 28, 2019

scarlehoff commented Nov 28, 2019

scarlehoff commented Dec 3, 2019

felixhekhorn commented Dec 3, 2019

scarlehoff commented Dec 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scarlehoff Dec 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scarlehoff Dec 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scarrazza left a comment

Choose a reason for hiding this comment

scarlehoff commented Nov 21, 2019 •

edited

Loading

scarlehoff commented Dec 4, 2019 •

edited

Loading

scarlehoff Dec 4, 2019 •

edited

Loading

scarlehoff Dec 4, 2019 •

edited

Loading