<a href="https://colab.research.google.com/github/applejxd/colaboratory/blob/master/numba.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[numpy での重い計算の例](https://qiita.com/gyu-don/items/9d223b007ca620e95abc)

In [8]:
import sys
sys.setrecursionlimit(100000)

def ack(m, n):
    if m == 0:
        return n + 1
    if n == 0:
        return ack(m - 1, 1)
    return ack(m - 1, ack(m, n - 1))

通常時の計算時間を測定

In [9]:
import time
from contextlib import contextmanager

@contextmanager
def timer():
    t = time.perf_counter()
    yield None
    print('Elapsed:', time.perf_counter() - t)

with timer():
    ack(3, 10)

Elapsed: 8.20762064400003


numba の nopython モード (njit) 利用

In [10]:
from numba import njit

@njit
def ack(m, n):
    if m == 0:
        return n + 1
    if n == 0:
        return ack(m - 1, 1)
    return ack(m - 1, ack(m, n - 1))

# コンパイル時間含む
with timer():
    ack(3, 10)

# コンパイル時間含まない
with timer():
    ack(3, 10)

Elapsed: 0.5134473350000235
Elapsed: 0.30062071299994386


[高速化のテクニック](https://numba.readthedocs.io/en/stable/user/performance-tips.html)。
並列化 & fastmath。

In [11]:
import numpy as np
from numba import prange


@njit
def sum_of_squares(arr):
    s = 0
    for i in range(arr.shape[0]):
        s += arr[i] ** 2
    return s

@njit(parallel=True)
def sum_of_squares_parallel(arr):
    s = 0
    for i in prange(arr.shape[0]):
        s += arr[i] ** 2
    return s

@njit(parallel=True, fastmath=True)
def sum_of_squares_fast(arr):
    s = 0
    for i in prange(arr.shape[0]):
        s += arr[i] ** 2
    return s

arr = np.random.randn(1000000)

sum_of_squares(arr)
with timer():
    sum_of_squares(arr)
    
sum_of_squares_parallel(arr)
with timer():
    sum_of_squares_parallel(arr)

sum_of_squares_fast(arr)
with timer():
    sum_of_squares_fast(arr)

Elapsed: 0.001429400999995778
Elapsed: 0.0007882520000066506
Elapsed: 0.0005882750000409942


In [17]:
from numba import cuda
import numpy as np
import sys
sys.setrecursionlimit(100000)


# カーネル関数
@cuda.jit
def add_kernel(a, b, c):
    i = cuda.grid(1)
    c[i] = a[i] + b[i]

# 起動関数
def add_arrays(a, b):
    # GPU の使用ブロック数を計算
    threads_per_block = 128
    blocks = (a.size + threads_per_block - 1) // threads_per_block

    # 結果保存用にメモリ確保
    result = cuda.to_device(np.zeros_like(a))

    add_kernel[blocks, threads_per_block](
        cuda.to_device(a), cuda.to_device(b), result)
    return result

array_size = 100000000
a = np.ones(array_size, dtype=np.float32)
b = np.ones(array_size, dtype=np.float32)

with timer():
    a + b

add_arrays(a, b)
with timer():
    add_arrays(a, b)

Elapsed: 0.10005498299983628
Elapsed: 0.3054407480001373


In [13]:
from numba import cuda

@njit(parallel=True, fastmath=True)
def sum_of_squares_fast(arr):
    s = 0
    for i in prange(arr.shape[0]):
        s += arr[i] ** 2
    return s

In [14]:
 from concurrent import futures

with futures.ProcessPoolExecutor() as executor:
    results = executor.map(test_func, np.arange(100))

print(list(results))

NameError: ignored