# Sl/numba tauchen #227

Merged
merged 2 commits into from Feb 2, 2016
+23 −10

None yet

### 6 participants

Member
commented Jan 31, 2016
 I needed a faster version of tauchen, so I dropped the loops into numba. Here are the before timings (on the actual size problem I'm working on in the application): In [21]: timeit tauchen(0.95, 0.1, 3, 451) 1 loops, best of 3: 32.4 s per loop  After timings: In [4]: timeit qe.tauchen(0.95, 0.1, 3, 451) 10 loops, best of 3: 65.4 ms per loop  Not quite as quick as the Julia, but much closer (julia timings below). julia> @benchmark tauchen(451, 0.95, 0.1, 0.0, 3) ================ Benchmark Results ======================== Time per evaluation: 28.28 ms [27.11 ms, 29.45 ms] Proportion of time in GC: 0.51% [0.00%, 1.71%] Memory allocated: 1.55 mb Number of allocations: 3 allocations Number of samples: 100 Number of evaluations: 100 Time spent benchmarking: 3.04 s 
added some commits Jan 31, 2016
 sglyon numbaize tauchen abd11b8 sglyon numbaize tauchen 5e9a6af
Member
commented Jan 31, 2016
 Speedup very impressive. I have a not-so-important comment. In the loop, x[j] - rho * x[i] +/- half_step is computed many times, but isn't it enough to compute once and increment by step each time? Something like: def _fill_tauchen(x, P, n, rho, sigma, step): for i in range(n): w = x[0] - rho * x[i] + step / 2 P[i, 0] = std_norm_cdf(w / sigma) for j in range(1, n - 1): P[i, j] = -std_norm_cdf(w / sigma) w += step P[i, j] += std_norm_cdf(w / sigma) P[i, n-1] = 1 - std_norm_cdf(w / sigma) or def _fill_tauchen(x, P, n, rho, sigma, step): step_sigma = step / sigma for i in range(n): w = (x[0] - rho * x[i] + step / 2) / sigma P[i, 0] = std_norm_cdf(w) for j in range(1, n - 1): P[i, j] = -std_norm_cdf(w) w += step_sigma P[i, j] += std_norm_cdf(w) P[i, n-1] = 1 - std_norm_cdf(w) (Speedup may be negligible though.)
Contributor
commented Jan 31, 2016
 The performance gains seem indeed quite good. There is probably no point in looking for further improvements, since it is unlikely that tauchen will be repeatedly called (ou can always scale the normal). I still did a bit of digging. Most performance costs seem to come from the erfc function (I replaced it by identity to check). Maybe the difference in the implementations of this routine explain the difference. In numba the implementation of erfc is the one from python c (https://github.com/numba/numba/blob/81c83aaacf1b8d871d4e31719efd670bfee50c9c/numba/_helperlib.c). The one used in julia is apparently the same as in the glibc. llvm 3.7 to which numba is currently switching has incorporated erfc so the situation may be easy to improve on the numba side.
Member
commented Feb 1, 2016
 I agree with @albop that it is probably futile to look for more performance gains, but fwiw this was faster on my computer (but only marginally and we lose the benefit of the function being jitted). import numpy as np import scipy.stats as st def tauchen(n, m, ybar, sigma, rho=0.0): """ Takes as inputs n, m, ybar, sigma_e and produces a markov chain approximation to the process : y_t = \bar{y} + \rho y_{t-1} + \varepsilon_t where \varepsilon_t is i.i.d. normal of mean 0, std dev of sigma Parameters ---------- n : int The number of points to approximate the distribution m : int The number of standard deviations to go away from the mean in each direction ybar : float The value \bar{y} in the process. Note that the mean of this AR(1) process, y, is simply ybar/(1 - rho) sigma : float The value of the standard deviation of the \varepsilon process rho : float By default this will be 0, but if you are approximating an AR(1) process then this is the autocorrelation across periods Returns ------- bar : 1d array of floats The possible states for the markov chain pi : 2d array of floats The transition probabilities across states """ # Get the standard deviation of y y_sd = np.sqrt(sigma**2/(1 - rho**2)) # Take go out m standard deviations and create equally spaced points # from -m*y_sd to m*y_sd, note d is distance between points ubar = m*y_sd lbar = -ubar bar, d = np.linspace(lbar, ubar, n, retstep=True) # Create matrix of values barmat = np.repeat(bar, n).reshape(n, n) barup = (barmat.T + d/2. - rho*barmat)/sigma bardo = (barmat.T - d/2. - rho*barmat)/sigma pi = st.norm.cdf(barup) - st.norm.cdf(bardo) pi[:, 0] = st.norm.cdf(barup[:, 0]) pi[:, -1] = 1. - st.norm.cdf(bardo[:, -1]) bar += ybar/(1. - rho) return bar, pi 
Member
commented Feb 1, 2016
 The erfc function must be quite a bit slower in python than whatever Julia is using because the algorithm is exactly the same in both languages: fill in rows at a time. This will not be consistent with Julia's column major arrays so we are likely getting a performance hit by not "transposing" the algorithm in Julia. Anyway, I would be happy merging either this or Chase's function. I'm indifferent between the two. Also agree with @albop that searching for milliseconds here probably doesn't need to be a priority. I just didn't like waiting 35+ seconds to initialize a model when I knew it could take under a second ;)
Member
commented Feb 1, 2016
 I have slight preference for the numbaized function. Cleaner and the difference is pretty negligible (and sounds like it will get better for free).
Contributor
commented Feb 1, 2016
 Yes, @cc7768, it will get better over time. Just for fun: one can use the libm function directly using: from cffi import FFI ffi = FFI() ffi.cdef('double erfc(double x);') libm = ffi.dlopen("m") erfc = libm.erfc  On my machine, this test produces the following timing: cpython: 2.5553548336029053 libm: 0.09564900398254395  It is a bit more than 20 times faster than the cpython version !
Contributor
commented Feb 1, 2016
 With the libm version of erfc the numbaized version goes from 88ms to 25ms on my computer. Just to be clear: my vote goes to keep the numbaized version exactly as it is and not bother with including more optimization. This is essentially a numba "bug". There has been an ongoing disussion about the relative performances of math. and numpy. functions here numba/numba#1616 (see also: https://williamjshipman.wordpress.com/2016/01/12/whats-faster-in-numba-jit-functions-numpy-or-the-math-package/). We should just make sure that they are aware of potential differences with libm.
Member
commented Feb 1, 2016
 Thanks for checking out those benchmarks. I think that as long as @jstac is ok with this, we can merge as is
Contributor
commented Feb 1, 2016
 Thanks all for the discussion. I've been following and I've learned from reading it. It makes sense to Numba-fy the routine and I agree that we don't need to search for the last drop of speed here. As a general comment for this PR and others, you guys are in charge of QE.py and QE.jl so please feel free to merge without my input. They are really your libraries now, and you are better equipped to take the lead than I am. If it's a change that affects the lectures, I'd appreciate if you let me know (although it's not actually your responsibility, and in any case I follow the discussions in the PRs). It will be the job of Tom and I to keep the lectures up to date with changes that you guys make to the libraries. For QuantEcon.applications, Tom and I still need to take charge of changes there, since that repo just serves the lectures at this stage.
merged commit 9ff954c into master Feb 2, 2016

#### 1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Member
commented Feb 2, 2016
 Sounds great, thanks @jstac!
Contributor
commented Feb 2, 2016
 @spencerlyon2 thanks for this update. Do you mind if we delete the branch?
Member
commented Feb 2, 2016
 Not at all. Let's keep things tidy // Spencer On Feb 1, 2016 21:53, "mmcky" notifications@github.com wrote: @spencerlyon2 https://github.com/spencerlyon2 thanks for this update. Do you mind if we delete the branch? — Reply to this email directly or view it on GitHub #227 (comment) .
deleted the sl/numba-tauchen branch Feb 2, 2016