GTH_SOLVE: Numba version #103

oyamad · 2014-11-28T14:28:01Z

This is to open a discussion on the trade-off between speed and simplicity/clarity of the code, with a concrete example with gth_solve where I replaced the current, Numpy-based vectorized version with a Numba version.

The routine is divided into the wrapper part and the jit-compiled part _gth_solve_jit.
Matrix operations are replaced with many for loops in _gth_solve_jit, while the code there is now almost the same as the Julia version.
The Numba version is 3x-30x faster than the Numpy version: see the performance comparison here.
(The option overwrite has been dropped as it did not contribute much to speed-up.)

You guys have already had a discussion on the trade-off before #36. What degree of speed-up justifies the use of Numba? Any thoughts on this particular case?

coveralls · 2014-11-28T14:50:14Z

Coverage decreased (-0.46%) when pulling 0bbb5f9 on oyamad:gth_solve_jit into bb4e9a7 on QuantEcon:master.

albop · 2014-11-29T18:38:44Z

In this particular instance, factoring out the loops should not provide performance gains, because of the automatic loop-lifting feature that was introduced recently in numba. I haven't tested it, but it should be enough to just jit the whole function. The important modification though is still to type the array and to replace the matrix multiplication by custom code, so that the loop (or the factored out) function can be compiled in nopython mode. At one point, nopython mode will certainly have support for matrix multiplication so this won't be required anymore.
This shows a hidden tradeoff of numbaification: in certain cases like this one, the numba code can be almost identical to the original one, but to understand the changes that are needed for the compilation, one often needs an intermediate factored-out step to figure out which changes are needed.

sglyon · 2014-11-29T20:20:13Z

@oyamad: good work here, very cool.

@albop: Great to have a numba expert on the team

oyamad · 2014-12-02T03:01:28Z

@albop Thanks.

I tried a bit putting jit on the whole function, and here are what I found:

With Numba 0.14.0, this function (with matrix operations replaced with for loops) seems not to allow nonpython mode, where the second one of the Loop Jitting Constraints (version 0.14.0) seems to be violated.

(Note that this is the version contained in the current Anaconda; see Package Documentation.)
EDIT: conda update numba updates the default Numba to 0.15.1.

I installed Numba 0.15.1, the constraint above has been removed from the Loop Jitting Constraints (version 0.15.1). It seems the automatic nopython mode is working, but it is much slower than the original numpy version; it is 50x slower for a 100x100 matrix (and much worse for 1000x1000). See the demonstration.

Maybe I need some more tricks to let this work?

albop · 2014-12-02T04:34:05Z

@oyamad I see two possible explanations:

they temporarily removed the support of AVX instructions in 0.14, which could affect the efficiency of auto-vectorization. Since I don't know very precisely, when LLVM triggers auto-vectorization, I can't tell very well where it happens. The np.sum function is certainly affected.
more importantly, it looks like loop-lifting is still failing. In the llvm machine code, there are some :pyobject types starting on line 20. It seems to be due to the break statement and to the fact that it disables loop lifting in 0.15.1 (another temporary fix).
Replacing one for loop by a while instruction should do the trick, but I don't know if you want to do that.

oyamad · 2014-12-02T06:28:04Z

@albop Great, you are right.

I removed the if block which contains a break statement, then loop-lifting now works. See

this for a custom environment, and
this for an Anaconda environment with Numba updated to 0.15.1.

It works even without type declaration.

gth_solve_jit: loops factored out
gth_solve_jit2: automatic loop-jitting with type declaration
gth_solve_jit3: automatic loop-jitting without type declaration

gth_solve_jit2 and ``gth_solve_jit3exhibit almost the same speed asgth_solve_jit` for large matrices; slower up to 3x for small matrices, but in these cases, the absolute difference should be negligible.

(For an irreducible matrix input, scale will not be equal to zero (at least theoretically), so the if block is not necessary if we restrict to irreducible matrices.)

albop · 2014-12-02T06:44:26Z

Excellent ! I'm glad I could help. I wonder where the small difference between 1 and 2-3 comes from. If they come from the loop-lifting part, they are probably going to vanish over time.

jstac · 2014-12-02T10:05:57Z

This is pretty cool.

What do other people think of the speed up / complexity trade off?

My rule of thumb has been that if it's not somewhere near two orders of magnitude then it's not worth putting effort into, but maybe that's too strict. Is a one order of magnitude speed up worth the extra complexity?

Adding Manifest In File to Distribute LICENSE and README.md

…here numba is not installed

`numba_installed` flag introduced

oyamad · 2015-04-23T15:16:27Z

I updated gth_solve.

I believe the routine should check whether the denominator is nonzero. Then by adding one line, it can allow a reducible matrix input.
Then I need a break statement in a for loop (actually it can be replaced by some dirty code, which I want to avoid), but with it automatic loop-jitting is not activated. So I need to separate the for loops part to be run in a nopython mode.
I added the NumPy version back.
There remains the testing issue (the NumPy version is not tested if Numba is installed).

(This PR is to be merged into the numba_improvements branch.)

coveralls · 2015-04-23T15:39:05Z

Changes Unknown when pulling 8ea1206 on oyamad:gth_solve_jit into * on QuantEcon:master*.

coveralls · 2015-04-23T15:39:05Z

Changes Unknown when pulling 8ea1206 on oyamad:gth_solve_jit into * on QuantEcon:master*.

oyamad · 2015-04-23T15:46:45Z

Something strange is going on.
The tests having failed for the previous commit, I tried just changing the order of the element that caused the error (Q) and some other (np.eye(2)) in testset in test_mc_tools.py, and nothing other than that. Then now the tests passed.

jstac · 2015-04-24T02:06:06Z

That is strange. Are they failing locally or when you update the PR?

…version that works in 0.18.2

oyamad · 2015-04-25T03:39:58Z

The issue has been handled amazingly quickly in numba/numba#1104.

The fix in gth_solve is to add order='C' when making a copy A1 of the input A. I added a test to make sure that the problem is avoided with this.

oyamad · 2015-04-25T04:25:09Z

As another issue, which is better, self.P[rec_class, :][:, rec_class] or self.P[:, rec_class][rec_class, :], or is there another better way?

import numpy as np

P = np.zeros((3, 3))
P[0, [0, 2]] = 0.4, 0.6
P[1, 1] = 1
P[2, [0, 2]] = 0.2, 0.8
print P
[[ 0.4  0.   0.6]
 [ 0.   1.   0. ]
 [ 0.2  0.   0.8]]

rec_class0 = [0, 2]
P_0 = P[rec_class0, :][:, rec_class0]
P_1 = P[:, rec_class0][rec_class0, :]

print(P_0)
print(P_0).flags
[[ 0.4  0.6]
 [ 0.2  0.8]]
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

print(P_1)
print(P_1).flags
[[ 0.4  0.6]
 [ 0.2  0.8]]
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

Does OWNDATA : False mean that the array is actually a view? If that's the case, the former seems more efficient in terms of memory consumption. Is there any difference in access speed?

Test should fail for this commit

For use with Numba <= 0.18.2

… a[n-1] <= v Docstring and test added

jstac · 2015-04-26T06:35:21Z

Yes, it's a view, as far as I know, while P_1 is a copy. The latter is also seen to be a copy because it has no base attribute.

print P_0.base
[[ 0.4  0.2]
 [ 0.6  0.8]]
print P_1.base
None

So P_0 seems more efficient. I don't really know which is better though.

…dom seeds

jstac · 2015-05-03T01:20:37Z

@oyamad @mmcky Hi guys, I've kind of lost track of what's happening with this PR and the different issues. Would you mind to update me? From memory, we are looking to merge in

a utilities file
a numbafied routine for sampling from MCs
some standard code for numba imports with a common warning message

Just a quick update of the situation would be helpful. Also, if possible I would like the utilities file added soon because one of Tom's coauthors has some utilities that look useful and are needed for a lecture Tom wants to add to quant-econ.net.

coveralls · 2015-05-03T08:11:23Z

Changes Unknown when pulling 89302ec on oyamad:gth_solve_jit into * on QuantEcon:master*.

oyamad · 2015-05-03T08:59:43Z

@jstac Regarding this PR (on gth_solve):

the main part is Numba jitted with nopython=True;
the common import/warning module external is incorporated;
the NumPy version is kept alive, which is executed if not numba_installed;
the current structure of the code does not allow the unittest to cover the NumPy version.

If it's ok, I would like to merge this PR into numba_improvements to be reviewed with other changes in that branch.

As for the discussions regarding numba_improvements, we should open a new PR and make a list of the issues to discuss there.

jstac · 2015-05-03T09:21:14Z

@oyamad That sounds like a good next step. Please go ahead.

(Thanks for the summary.)

oyamad · 2015-05-04T07:44:41Z

Merged into 'numba_improvements`. Closing.

oyamad added 2 commits November 28, 2014 12:34

GTH_SOLVE: Replaced with its Numba version

d057523

Minor edit

0bbb5f9

jstac mentioned this pull request Apr 7, 2015

numba-fied simulation #131

Merged

changes generated by version update

5316070

oyamad mentioned this pull request Apr 8, 2015

Numpy version to fall back on in lss.py #132

Closed

mmcky and others added 12 commits April 11, 2015 17:41

Adding Manifest In File to Distribute LICENSE and README.md

fee2fb8

added full file name for LICENSE

4ebc1ee

Merge pull request QuantEcon#136 from QuantEcon/manifestin

049aa0f

Adding Manifest In File to Distribute LICENSE and README.md

Merge branch 'update_kalman'

833198d

updated to 0.1.9

c339f12

reorganizing lucas stokey code

2fe35e9

Numba version of mc_sample_path: Preliminary version

53a3451

Update LSS with common numba warning message

cb37391

Adding numpy version and adjusting @jit to using jit() for the case w…

02f7664

…here numba is not installed

Merge branch 'numba_improvements' into gth_solve_jit

d7ce11d

Numpy version added back

c51ce3f

`numba_installed` flag introduced

Change the order of some of the dict elements in testset

8ea1206

Removing mc_sample_path() numba 0.17 version in preference for split …

45004f0

…version that works in 0.18.2

oyamad mentioned this pull request Apr 25, 2015

Numba version of mc_sample_path #137

Closed

oyamad added 5 commits April 25, 2015 14:19

Add test for simulate with matrices with C- and F-orders

617ab43

Test should fail for this commit

Minor edit

2eaec7d

Fix: Preallocate cdfs with order='C'

0e24d87

For use with Numba <= 0.18.2

Minor edit

d726b14

search_cdf renamed to searchsorted; work properly for inputs s.t.…

f0bae7f

… a[n-1] <= v Docstring and test added

mmcky and others added 10 commits April 27, 2015 17:43

Adding test for numpy and numba functions ... needs discussion on ran…

17e7100

…dom seeds

Adding Utilities Module and migrated searchsorted

52a67f6

Updates to docstrings for both mc_sample_path functions

4dec740

Removing Tabs in Preference for Spaces

972e258

Bug fix - forgot an underscore

16fec88

Updating test with 1 x seed for each function run

cdf0b09

test_searchsorted moved from test_mc_tools.py to test_utilities.py

6e951e3

Missing import added; minor edits to comply with PEP 8

36e7518

The first searchsorted renamed to _searchsorted; docstring copied

99ad757

PEP 8 compliance

2c70760

oyamad added 2 commits May 3, 2015 16:54

Merge branch 'numba_improvements' into gth_solve_jit

8b7f4af

Numba import statement changed to use the external module

89302ec

oyamad closed this May 4, 2015

oyamad mentioned this pull request May 4, 2015

Numba improvements #144

Merged

oyamad deleted the gth_solve_jit branch August 8, 2015 02:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GTH_SOLVE: Numba version #103

GTH_SOLVE: Numba version #103

oyamad commented Nov 28, 2014

coveralls commented Nov 28, 2014

albop commented Nov 29, 2014

sglyon commented Nov 29, 2014

oyamad commented Dec 2, 2014

albop commented Dec 2, 2014

oyamad commented Dec 2, 2014

albop commented Dec 2, 2014

jstac commented Dec 2, 2014

oyamad commented Apr 23, 2015

coveralls commented Apr 23, 2015

coveralls commented Apr 23, 2015

oyamad commented Apr 23, 2015

jstac commented Apr 24, 2015

oyamad commented Apr 25, 2015

oyamad commented Apr 25, 2015

jstac commented Apr 26, 2015

jstac commented May 3, 2015

coveralls commented May 3, 2015

oyamad commented May 3, 2015

jstac commented May 3, 2015

oyamad commented May 4, 2015

GTH_SOLVE: Numba version #103

GTH_SOLVE: Numba version #103

Conversation

oyamad commented Nov 28, 2014

coveralls commented Nov 28, 2014

albop commented Nov 29, 2014

sglyon commented Nov 29, 2014

oyamad commented Dec 2, 2014

albop commented Dec 2, 2014

oyamad commented Dec 2, 2014

albop commented Dec 2, 2014

jstac commented Dec 2, 2014

oyamad commented Apr 23, 2015

coveralls commented Apr 23, 2015

coveralls commented Apr 23, 2015

oyamad commented Apr 23, 2015

jstac commented Apr 24, 2015

oyamad commented Apr 25, 2015

oyamad commented Apr 25, 2015

jstac commented Apr 26, 2015

jstac commented May 3, 2015

coveralls commented May 3, 2015

oyamad commented May 3, 2015

jstac commented May 3, 2015

oyamad commented May 4, 2015