Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTH_SOLVE: Numba version #103

Closed
wants to merge 38 commits into from
Closed

Conversation

oyamad
Copy link
Member

@oyamad oyamad commented Nov 28, 2014

This is to open a discussion on the trade-off between speed and simplicity/clarity of the code, with a concrete example with gth_solve where I replaced the current, Numpy-based vectorized version with a Numba version.

  • The routine is divided into the wrapper part and the jit-compiled part _gth_solve_jit.
  • Matrix operations are replaced with many for loops in _gth_solve_jit, while the code there is now almost the same as the Julia version.
  • The Numba version is 3x-30x faster than the Numpy version: see the performance comparison here.
    (The option overwrite has been dropped as it did not contribute much to speed-up.)

You guys have already had a discussion on the trade-off before #36. What degree of speed-up justifies the use of Numba? Any thoughts on this particular case?

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.46%) when pulling 0bbb5f9 on oyamad:gth_solve_jit into bb4e9a7 on QuantEcon:master.

@albop
Copy link
Contributor

albop commented Nov 29, 2014

In this particular instance, factoring out the loops should not provide performance gains, because of the automatic loop-lifting feature that was introduced recently in numba. I haven't tested it, but it should be enough to just jit the whole function. The important modification though is still to type the array and to replace the matrix multiplication by custom code, so that the loop (or the factored out) function can be compiled in nopython mode. At one point, nopython mode will certainly have support for matrix multiplication so this won't be required anymore.
This shows a hidden tradeoff of numbaification: in certain cases like this one, the numba code can be almost identical to the original one, but to understand the changes that are needed for the compilation, one often needs an intermediate factored-out step to figure out which changes are needed.

@sglyon
Copy link
Member

sglyon commented Nov 29, 2014

@oyamad: good work here, very cool.

@albop: Great to have a numba expert on the team

@oyamad
Copy link
Member Author

oyamad commented Dec 2, 2014

@albop Thanks.

I tried a bit putting jit on the whole function, and here are what I found:

With Numba 0.14.0, this function (with matrix operations replaced with for loops) seems not to allow nonpython mode, where the second one of the Loop Jitting Constraints (version 0.14.0) seems to be violated.

(Note that this is the version contained in the current Anaconda; see Package Documentation.)
EDIT: conda update numba updates the default Numba to 0.15.1.

I installed Numba 0.15.1, the constraint above has been removed from the Loop Jitting Constraints (version 0.15.1). It seems the automatic nopython mode is working, but it is much slower than the original numpy version; it is 50x slower for a 100x100 matrix (and much worse for 1000x1000). See the demonstration.

Maybe I need some more tricks to let this work?

@albop
Copy link
Contributor

albop commented Dec 2, 2014

@oyamad I see two possible explanations:

  • they temporarily removed the support of AVX instructions in 0.14, which could affect the efficiency of auto-vectorization. Since I don't know very precisely, when LLVM triggers auto-vectorization, I can't tell very well where it happens. The np.sum function is certainly affected.
  • more importantly, it looks like loop-lifting is still failing. In the llvm machine code, there are some :pyobject types starting on line 20. It seems to be due to the break statement and to the fact that it disables loop lifting in 0.15.1 (another temporary fix).
    Replacing one for loop by a while instruction should do the trick, but I don't know if you want to do that.

@oyamad
Copy link
Member Author

oyamad commented Dec 2, 2014

@albop Great, you are right.

I removed the if block which contains a break statement, then loop-lifting now works. See

  • this for a custom environment, and
  • this for an Anaconda environment with Numba updated to 0.15.1.

It works even without type declaration.

  • gth_solve_jit: loops factored out
  • gth_solve_jit2: automatic loop-jitting with type declaration
  • gth_solve_jit3: automatic loop-jitting without type declaration

gth_solve_jit2 and ``gth_solve_jit3exhibit almost the same speed asgth_solve_jit` for large matrices; slower up to 3x for small matrices, but in these cases, the absolute difference should be negligible.

(For an irreducible matrix input, scale will not be equal to zero (at least theoretically), so the if block is not necessary if we restrict to irreducible matrices.)

@albop
Copy link
Contributor

albop commented Dec 2, 2014

Excellent ! I'm glad I could help. I wonder where the small difference between 1 and 2-3 comes from. If they come from the loop-lifting part, they are probably going to vanish over time.

@jstac
Copy link
Contributor

jstac commented Dec 2, 2014

This is pretty cool.

What do other people think of the speed up / complexity trade off?

My rule of thumb has been that if it's not somewhere near two orders of magnitude then it's not worth putting effort into, but maybe that's too strict. Is a one order of magnitude speed up worth the extra complexity?

@jstac jstac mentioned this pull request Apr 7, 2015
@oyamad
Copy link
Member Author

oyamad commented Apr 23, 2015

I updated gth_solve.

  • I believe the routine should check whether the denominator is nonzero. Then by adding one line, it can allow a reducible matrix input.
  • Then I need a break statement in a for loop (actually it can be replaced by some dirty code, which I want to avoid), but with it automatic loop-jitting is not activated. So I need to separate the for loops part to be run in a nopython mode.
  • I added the NumPy version back.
  • There remains the testing issue (the NumPy version is not tested if Numba is installed).

(This PR is to be merged into the numba_improvements branch.)

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 8ea1206 on oyamad:gth_solve_jit into * on QuantEcon:master*.

1 similar comment
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 8ea1206 on oyamad:gth_solve_jit into * on QuantEcon:master*.

@oyamad
Copy link
Member Author

oyamad commented Apr 23, 2015

Something strange is going on.
The tests having failed for the previous commit, I tried just changing the order of the element that caused the error (Q) and some other (np.eye(2)) in testset in test_mc_tools.py, and nothing other than that. Then now the tests passed.

@jstac
Copy link
Contributor

jstac commented Apr 24, 2015

That is strange. Are they failing locally or when you update the PR?

@oyamad
Copy link
Member Author

oyamad commented Apr 25, 2015

The issue has been handled amazingly quickly in numba/numba#1104.

The fix in gth_solve is to add order='C' when making a copy A1 of the input A. I added a test to make sure that the problem is avoided with this.

@oyamad
Copy link
Member Author

oyamad commented Apr 25, 2015

As another issue, which is better, self.P[rec_class, :][:, rec_class] or self.P[:, rec_class][rec_class, :], or is there another better way?

import numpy as np

P = np.zeros((3, 3))
P[0, [0, 2]] = 0.4, 0.6
P[1, 1] = 1
P[2, [0, 2]] = 0.2, 0.8
print P
[[ 0.4  0.   0.6]
 [ 0.   1.   0. ]
 [ 0.2  0.   0.8]]

rec_class0 = [0, 2]
P_0 = P[rec_class0, :][:, rec_class0]
P_1 = P[:, rec_class0][rec_class0, :]

print(P_0)
print(P_0).flags
[[ 0.4  0.6]
 [ 0.2  0.8]]
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

print(P_1)
print(P_1).flags
[[ 0.4  0.6]
 [ 0.2  0.8]]
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

Does OWNDATA : False mean that the array is actually a view? If that's the case, the former seems more efficient in terms of memory consumption. Is there any difference in access speed?

@jstac
Copy link
Contributor

jstac commented Apr 26, 2015

Yes, it's a view, as far as I know, while P_1 is a copy. The latter is also seen to be a copy because it has no base attribute.

print P_0.base
[[ 0.4  0.2]
 [ 0.6  0.8]]
print P_1.base
None

So P_0 seems more efficient. I don't really know which is better though.

@jstac
Copy link
Contributor

jstac commented May 3, 2015

@oyamad @mmcky Hi guys, I've kind of lost track of what's happening with this PR and the different issues. Would you mind to update me? From memory, we are looking to merge in

  • a utilities file
  • a numbafied routine for sampling from MCs
  • some standard code for numba imports with a common warning message

Just a quick update of the situation would be helpful. Also, if possible I would like the utilities file added soon because one of Tom's coauthors has some utilities that look useful and are needed for a lecture Tom wants to add to quant-econ.net.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 89302ec on oyamad:gth_solve_jit into * on QuantEcon:master*.

@oyamad
Copy link
Member Author

oyamad commented May 3, 2015

@jstac Regarding this PR (on gth_solve):

  • the main part is Numba jitted with nopython=True;
  • the common import/warning module external is incorporated;
  • the NumPy version is kept alive, which is executed if not numba_installed;
  • the current structure of the code does not allow the unittest to cover the NumPy version.

If it's ok, I would like to merge this PR into numba_improvements to be reviewed with other changes in that branch.

As for the discussions regarding numba_improvements, we should open a new PR and make a list of the issues to discuss there.

@jstac
Copy link
Contributor

jstac commented May 3, 2015

@oyamad That sounds like a good next step. Please go ahead.

(Thanks for the summary.)

@oyamad
Copy link
Member Author

oyamad commented May 4, 2015

Merged into 'numba_improvements`. Closing.

@oyamad oyamad closed this May 4, 2015
@oyamad oyamad mentioned this pull request May 4, 2015
@oyamad oyamad deleted the gth_solve_jit branch August 8, 2015 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants