Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making the code faster #19

Merged
merged 30 commits into from
Dec 5, 2019
Merged

Making the code faster #19

merged 30 commits into from
Dec 5, 2019

Conversation

scarlehoff
Copy link
Member

@scarlehoff scarlehoff commented Nov 21, 2019

Still haven't changed anything because interpolation.py will need considerable changes in order to make things work the way they should but I wanted to have a place to collect my thoughts where they can be monitored/audited.

  • Scipy's quad works very nicely with numba but not the other way around (obviosuly). This addresses the doubts @scarrazza had the other day. If the integrand is a numba compiled object quad will be very fast.
  • The number of dictionaries being passed around need to be decreased because a dictionary is a complicated class albeit a builtin one. In other words, it doesn't make sense to write C-looking code if there is a collection of dictionaries flying around.
  • Parallelization: I would say this is irrelevant and to first approximation trivial. The grid size give you enough leeway to parallelize it in the most naive way so I won't do anything about this (because it can be done in the future in an orthogonal way)

@scarlehoff
Copy link
Member Author

Python parallelization and numba do not necessarily play well together.

Current experimental roadmap:

  • Compiling the numba objects before entering any loop
  • Opening the threads at a level in which they can be openned

Might actually make sense to parallelize on the interpolators themselves rather than in any loop.

@scarlehoff
Copy link
Member Author

Python parallelization and numba do not necessarily play well together.

It doesn't work in dom but it does work in my laptop. I really hate when this happens.

@scarlehoff
Copy link
Member Author

Ok, bar anything that might have gone wrong (for instance, I might have been stupid and maybe I broke something... testing is needed) but I get some important speed up.

These are the times for a test with the run_dglap from the Benchmark jupyter notebook from PR #9 . There are two run_dglap so I tried both.

Time with PR #9

  1. 2086s
  2. 1224s

Time with this PR

  1. 229s
  2. 253s

@scarrazza
Copy link
Member

Ok, that's quite impressive.

@scarlehoff
Copy link
Member Author

And it seems to work. And that's without parallelization yet!

@scarlehoff
Copy link
Member Author

@felixhekhorn I just enforced a much stricter integration error and the errors are back where they were before. Thanks for your input!

Ok, at the time of the last commit this is fully functional now I am going to

  • Clean it
  • Add documentation (I've been frankly sloppy on that front, but I was experimenting...)
  • Make the parallelization work in general (in dom in particular) because it is a 1/number_of_cores reduction for free.

@scarlehoff
Copy link
Member Author

One comment I am writing here so I don't forget, using np.linalg is actually a performance hit when doing things like computing the eigenvalues of a matrix. I'm guessing either the numba compilation is slower or less aggressive.

Until profile is not done (or NLO is not done) I won't know. If is just the compilation being slower then it is fine (it will be compensated by more difficult integrands) if not we can re-implement some of the numpy function to achieve better performance.

One thing I've already mentioned several times is the fact that one of the main performance penalties is just the fact that we need to integrate many times. Reducing the number of integrations (for instance, integrating all splitting functions at once) would reduce the running time by whatever the reduction.

@scarlehoff
Copy link
Member Author

The only thing left is the interpolator.py which will need extensive documentation and dglap.py for which the same can be said. But this is almost done.

@scarlehoff scarlehoff mentioned this pull request Dec 3, 2019
@scarlehoff
Copy link
Member Author

Ok, not sure whether the test will work on travis but bar that this is now finished.

Several things to note @felixhekhorn

  1. I moved your original benchmark to a folder called benchmarks in the root folder of the repository. I only took the things I needed for the test.
  2. I think there was an error in the original benchmark notebook. Now the absolute difference are always below 1e-1 (there is a big relative error still in the last number but this is a feature at this point :P)
    The error I think was in the definition of g1 in plot_table2_2. Please have a look and check I am not stupid :P

@felixhekhorn
Copy link
Contributor

@scarlehoff

  • 2: yeah, it's me who is stupid ;-) fixed (hopefully);
  • the "last number" - you referring to the largest x-bin of the gluon? yes, I guess we need to do some more advanced numerical stuff there
  • 1: I reworked my benchmark, which I want to improve further ...

shall we then merge this PR to LO?

@scarlehoff
Copy link
Member Author

scarlehoff commented Dec 4, 2019

Yes, if you @felixhekhorn and @scarrazza don't have any comments we can merge, merge LO to master and start thinking on the "next levels" (for instance, I think this might be a good moment to do a APFEL-EKO comparison test to make sure the results are compatible)

ret["operator_errors"]["S_qg"] = op_s_qg_err
ret["operator_errors"]["S_gq"] = op_s_gq_err
ret["operator_errors"]["S_gg"] = op_s_gg_err
ret["operators"]["S_qq"] = output_array[:, :, 0, 0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we explain somewhere how the indexes of output_array are sorted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it is explained in line 140 above :P
Personally I don't like passing and updating a dictionary but I also don't see a better option right now. But as a matter of principle I don't like the output of this function.

Ideally we would get two matrix of results directly from quad, but as we discovered the vectorized version of quad is much slower.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • output_array: well that's internal and if Juan knows fine ;-)
  • I think we should join _run_nonsinglet and run_singlet into a single function/loop as they serve the same purpose
  • updating dictionary: we have to replace it as soon as we start iterating over Q2; for now it was just the quickest fix;

I will replace it by

ret = {};
# assign
return ret

Copy link
Member Author

@scarlehoff scarlehoff Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should join _run_nonsinglet and run_singlet into a single function/loop as they serve the same purpose

Well, they are different functions in that one generates a matrix.
But I think the correct thing would be indeed joining them so that the output is an array [nonsinglet, (singlet1, singlet2, singlet3, singlet4)]

return ret


if __name__ == "__main__":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a main here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is useful for quickly testing that you didn't broke anything and it doesn't hurt anybody.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, but then maybe we should consider this to test folder?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, this is more for convenience since you can do python -i dglap.py and then it has run some stuff and you can play with the functions from dglap. It is not a test in the proper sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is useful for quickly testing

that's all said ;-)

r = np.real(N)
i = np.imag(N)
out = np.empty(2)
c_digamma(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call gives me

src/eko/tests/test_cfunctions.py::test_digamma
  /home/carrazza/repo/n3pdf/eko/src/eko/tests/test_cfunctions.py:25: UserWarning: implicit cast from 'char *' to a different pointer type: will be forbidden in the future (check that the types are as you expect; use an explicit ffi.cast() if they are correct)
    c_digamma(x, 0.0, _gsl_digamma.ffi.from_buffer(out))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which version of cffi do you have? Mine is 1.13.2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same version for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, it happens only from pytest. I'll have a look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it only happens in the test file.

Copy link
Member Author

@scarlehoff scarlehoff Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the problem is that upon first call ffi.from_buffer defaults to char *, the correct call would be c_digamma(x, 0.0, _gsl_digamma.ffi.from_buffer("double[]", out))

But then numba is not happy. numba maybe wants to be in control of the compilation, don't know
So I'll let it be, since this is a problem within cffi I hope that once they decide to deprecate it for good they will make from_buffer read the type of the buffer (they are already looking at the size of the buffer without me telling it anything)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to let you know: I get two more warnings, the first is related to numba:

env/lib/python3.7/site-packages/numba-0.46.0-py3.7-linux-x86_64.egg/numba/types/containers.py:3
  /home/felix/Physik/N3PDF/EKO/eko/env/lib/python3.7/site-packages/numba-0.46.0-py3.7-linux-x86_64.egg/numba/types/containers.py:3: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
  from collections import Iterable

and there is somewhere a cast missing or an actual comparison to 0+0j

src/eko/tests/test_cfunctions.py::test_digamma
   /home/felix/Physik/N3PDF/EKO/eko/env/lib/python3.7/site-packages/numpy/testing/_private/utils.py:664: ComplexWarning: Casting complex values to real discards the imaginary part
  (actual, desired) = map(float, (actual, desired))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are both not our problem, but the second is undesired (it's numpy's fault) but it can be fixed. I'll push a fix.

Copy link
Member

@scarrazza scarrazza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, in particular the self._compile mechanism is clear and very non intrusive.

@felixhekhorn felixhekhorn merged commit 2c4b36b into LO Dec 5, 2019
@felixhekhorn felixhekhorn deleted the gottagofast branch December 5, 2019 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants