[@LorenaABarba](https://twitter.com/LorenaABarba)

This notebook complements the [interactive CFD online](https://bitbucket.org/cfdpython/cfd-python-class/overview) module **12 steps to Navier-Stokes**, addressing the issue of high performance with Python.

Optimizing Loops with Numba
----
### <font color='blue'> Numba를 이용한 루프 최적화

***

You will recall from our exploration of [array operations with NumPy](./06_Array_Operations_with_NumPy.ipynb) that there are large speed gains to be had from implementing our discretizations using NumPy-optimized array operations instead of many nested loops.
<font color='red'>[NumPy를 사용한 배열 연산](./06_Array_Operations_with_NumPy.ipynb)에 대한 탐구에서 많은 중첩 루프 대신 NumPy 최적화 배열 연산을 사용하여 이산화를 구현함으로써 얻게되는 큰 속도 향상이 있다는것을 기억하실거에요.</font>



[Numba](http://numba.pydata.org/) is a tool that offers another approach to optimizing our Python code.  Numba is a library for Python which turns Python functions into C-style compiled functions using LLVM.  Depending on the original code and the size of the problem, Numba can provide a significant speedup over NumPy optimized code.
<font color='red'>[Numba](http://numba.pydata.org/)는 Python 코드를 최적화하는 또 다른 방법을 제공하는 도구입니다. Numba는 Python 함수를 **LLVM** 사용하여 C 스타일 컴파일(**compiled**)된 함수로 바꾸는 Python 라이브러리입니다. 원래 코드와 문제의 크기에 따라 Numba는 NumPy 최적화 코드보다 현저한 속도 향상을 제공합니다.</font>


Let's revisit the 2D Laplace Equation:
<font color='red'>2차원 라플라스 방정식을 다시 살펴 보겠습니다.</font>


In [27]:
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib import pyplot
import numpy

##variable declarations
nx = 81
ny = 81
c = 1
dx = 2.0/(nx-1)
dy = 2.0/(ny-1)

##initial conditions
p = numpy.zeros((ny,nx)) ##create a XxY vector of 0's

##plotting aids
x = numpy.linspace(0,2,nx)
y = numpy.linspace(0,1,ny)

##boundary conditions
p[:,0] = 0		##p = 0 @ x = 0
p[:,-1] = y		##p = y @ x = 2
p[0,:] = p[1,:]		##dp/dy = 0 @ y = 0
p[-1,:] = p[-2,:]	##dp/dy = 0 @ y = 1


Here is the function for iterating over the Laplace Equation that we wrote in Step 9:
<font color='red'>여기에 9 단계에서 작성한 Laplace 방정식을 반복해주는 함수가 있습니다.</font>


In [17]:
def laplace2d(p, y, dx, dy, l1norm_target):
    l1norm = 1
    pn = numpy.empty_like(p)

    while l1norm > l1norm_target:
        pn = p.copy()
        p[1:-1,1:-1] = (dy**2*(pn[2:,1:-1]+pn[0:-2,1:-1])+dx**2*(pn[1:-1,2:]+pn[1:-1,0:-2]))/(2*(dx**2+dy**2)) 
        p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))
        p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) 
    
        p[:,0] = 0		##p = 0 @ x = 0
        p[:,-1] = y		##p = y @ x = 2
        p[0,:] = p[1,:]		##dp/dy = 0 @ y = 0
        p[-1,:] = p[-2,:]	##dp/dy = 0 @ y = 1
        l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))
     
    return p

Let's use the `%%timeit` cell-magic to see how fast it runs:
<font color='red'>`%%timeit` **cell-magic**을 사용하여 얼마나 빠르게 실행되는지 봅시다</font>

In [28]:
%%timeit
laplace2d(p, y, dx, dy, .00001)

1 loops, best of 3: 206 us per loop


Ok!  Our function `laplace2d` takes around 206 *micro*-seconds to complete.  That's pretty fast and we have our array operations to thank for that.  Let's take a look at how long it takes using a more 'vanilla' Python version.
<font color='red'>좋아요! `laplace2d` 함수는 약 206 *마이크로*초가 걸립니다. 비교적 빠르며 배열 작업 덕분이라는것을 잊지 말아야하죠. 'vanilla' Python 버전을 사용해서 걸리는 시간을 살펴 보겠습니다.</font>

In [29]:
def laplace2d_vanilla(p, y, dx, dy, l1norm_target):
    l1norm = 1
    pn = numpy.empty_like(p)
    nx, ny = len(y), len(y)

    while l1norm > l1norm_target:
        pn = p.copy()
        
        for i in range(1, nx-1):
            for j in range(1, ny-1):
                p[i,j] = (dy**2*(pn[i+1,j]+pn[i-1,j])+dx**2*(pn[i,j+1]-pn[i,j-1]))/(2*(dx**2+dy**2))
                          
        p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))
        p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) 
    
        p[:,0] = 0		##p = 0 @ x = 0
        p[:,-1] = y		##p = y @ x = 2
        p[0,:] = p[1,:]		##dp/dy = 0 @ y = 0
        p[-1,:] = p[-2,:]	##dp/dy = 0 @ y = 1
        l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))
     
    return p

In [30]:
%%timeit
laplace2d_vanilla(p, y, dx, dy, .00001)

10 loops, best of 3: 32 ms per loop


The simple Python version takes 32 *milli*-seconds to complete.  Let's calculate the speedup we gained in using array operations:
<font color='red'>간단한 Python 버전은 완료하려면 32 *마이크로*초 걸립니다. 배열 연산을 사용해서 얻었던 속도 향상을 계산해 봅시다.</font>

In [35]:
32*1e-3/(206*1e-6)

155.33980582524273

So NumPy gives us a 155x speed increase over regular Python code.  That said, sometimes implementing our discretizations in array operations can be a little bit tricky.
<font color='red'>따라서 NumPy는 일반 Python 코드보다 155배 빠른 속도를 제공합니다. 즉, 배열 작업에서 이산화를 구현하는 것이 까다로울때도 있습니다.</font>

Let's see what Numba can do.  We'll start by importing the special function decorator `autojit` from the `numba` library:
<font color='red'>Numba가 무엇을 할 수 있는지 봅시다. 먼저 `numb` 라이브러리에서 특별한 함수 데코레이터 `autojit`을 가져옵니다.</font>

In [36]:
from numba import autojit

To integrate Numba with our existing function, all we have to do it is prepend the `@autojit` function decorator before our `def` statement: 
<font color='red'>Numba를 우리의 기존 함수와 통합하기 위해서는 아래와 같이 `@autojit` 함수를 `def` 명령문 앞에 추가하면 됩니다.</font>

In [38]:
@autojit
def laplace2d_numba(p, y, dx, dy, l1norm_target):
    l1norm = 1
    pn = numpy.empty_like(p)

    while l1norm > l1norm_target:
        pn = p.copy()
        p[1:-1,1:-1] = (dy**2*(pn[2:,1:-1]+pn[0:-2,1:-1])+dx**2*(pn[1:-1,2:]+pn[1:-1,0:-2]))/(2*(dx**2+dy**2)) 
        p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))
        p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) 
    
        p[:,0] = 0		##p = 0 @ x = 0
        p[:,-1] = y		##p = y @ x = 2
        p[0,:] = p[1,:]		##dp/dy = 0 @ y = 0
        p[-1,:] = p[-2,:]	##dp/dy = 0 @ y = 1
        l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))
     
    return p

The only lines that have changed are the `@autojit` line and also the function name, which has been changed so we can compare performance.  Now let's see what happens:
<font color='red'>변경된 유일한 행은 `@autojit` 라인과 성능을 비교할 수 있도록 변경된 함수 이름입니다. 이제 어떻게되는지 봅시다.</font>

In [39]:
%%timeit
laplace2d_numba(p, y, dx, dy, .00001)

1 loops, best of 3: 137 us per loop


Ok!  So it's not a 155x speed increase like we saw between vanilla Python and NumPy, but it is a non-trivial gain in performance time, especially given how easy it was to implement.  Another cool feature of Numba is that you can use the `@autojit` decorator on non-array operation functions, too.  Let's try adding it onto our vanilla version:
<font color='red'>좋아요! vanilla Python과 NumPy 사이에서 일어났던 155배 속도 향상은 아니지만, 구현하기 쉬움에 비해 고려하면 성능 시간에 별다른 이득을 얻지 못하네요. Numba의 또 다른 놀라운면은 비 배열(**non-array**) 연산 함수에서도 `@autojit` 데코레이터를 사용할 수 있다는 것입니다. vanilla 버전에 추가해 보겠습니다.</font>

In [41]:
@autojit
def laplace2d_vanilla_numba(p, y, dx, dy, l1norm_target):
    l1norm = 1
    pn = numpy.empty_like(p)
    nx, ny = len(y), len(y)

    while l1norm > l1norm_target:
        pn = p.copy()
        
        for i in range(1, nx-1):
            for j in range(1, ny-1):
                p[i,j] = (dy**2*(pn[i+1,j]+pn[i-1,j])+dx**2*(pn[i,j+1]-pn[i,j-1]))/(2*(dx**2+dy**2))
                          
        p[0,0] = (dy**2*(pn[1,0]+pn[-1,0])+dx**2*(pn[0,1]+pn[0,-1]))/(2*(dx**2+dy**2))
        p[-1,-1] = (dy**2*(pn[0,-1]+pn[-2,-1])+dx**2*(pn[-1,0]+pn[-1,-2]))/(2*(dx**2+dy**2)) 
    
        p[:,0] = 0		##p = 0 @ x = 0
        p[:,-1] = y		##p = y @ x = 2
        p[0,:] = p[1,:]		##dp/dy = 0 @ y = 0
        p[-1,:] = p[-2,:]	##dp/dy = 0 @ y = 1
        l1norm = (numpy.sum(np.abs(p[:])-np.abs(pn[:])))/np.sum(np.abs(pn[:]))
     
    return p

In [42]:
%%timeit
laplace2d_vanilla_numba(p, y, dx, dy, .00001)

1 loops, best of 3: 561 us per loop


561 micro-seconds.  That's not quite the 155x increase we saw with NumPy, but it's close.  And all we did was add one line of code. 
<font color='red'>561 마이크로 초. 우리가 NumPy에서 보았던 155배 증가 만큼은 아니지만 가깝네요. 그리고 이것이 한줄의 코드를 더한것이 전부였어요.</font>

So we have:
<font color='red'>따라서 결과를 비교했을때 아래와 같습니다.</font>

Vanilla Python: 32 milliseconds 

NumPy Python: 206 microseconds 

Vanilla + Numba: 561 microseconds

NumPy + Numba:  137 microseconds

Clearly the NumPy + Numba combination is the fastest, but the ability to quickly optimize code with nested loops can also come in very handy in certain applications.
<font color='red'>보이는것과 같이 NumPy + Numba 조합이 가장 빠르지만 중첩 루프를 사용하여 코드를 신속하게 최적화하는 함수는 특정 응용 프로그램에서 매우 유용하기도 합니다.</font>



In [1]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

> (The cell above executes the style for this notebook. We modified a style we found on the GitHub of [CamDavidsonPilon](https://github.com/CamDavidsonPilon), [@Cmrn_DP](https://twitter.com/cmrn_dp).)