# Array computation with Bodo

Bodo allows for JIT-compiled and MPI-distributed code execution for Numpy- and Pandas-based programs. 

First, we'll import what we need.

In [1]:
import bodo
import numpy as np
import pandas as pd
import ipyparallel as ipp
import numba
c = ipp.Client(profile="mpi")
view = c[:]
view.activate()
view.block = True
import os
view["cwd"] = os.getcwd()
%px cd $cwd

[stdout:0] /home/dale/Documents/bodo-benchmarks/notebooks
[stdout:1] /home/dale/Documents/bodo-benchmarks/notebooks
[stdout:2] /home/dale/Documents/bodo-benchmarks/notebooks
[stdout:3] /home/dale/Documents/bodo-benchmarks/notebooks


Since Bodo extends Numba, without any parallelism we can see some code acceleration for basic loop-based code. Here we have a basic matrix multiply function and we can see that Bodo performs similarly to Numba.

In [2]:
def matmul(a,b):
    out = np.zeros((a.shape[0], b.shape[1]), dtype=a.dtype)
    for i in range(a.shape[0]):
        for j in range(b.shape[0]):
            for k in range(a.shape[1]):
                out[i,k] += a[i,j] * b[j,k]
    return out

numba_matmul = numba.jit(matmul)
bodo_matmul = bodo.jit(matmul)

In [3]:
a = np.random.random((100,100))
b = np.random.random((100,100))

In [4]:
np.allclose(a.dot(b), matmul(a,b))

True

In [5]:
%timeit out = matmul(a,b)

603 ms ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [6]:
%timeit out = numba_matmul(a,b)

202 µs ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [7]:
%timeit out = bodo_matmul(a,b)

198 µs ± 26.5 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)




While Bodo is certainly effective at accerating algorithmic code, it is designed for use in standard analysis pipelines. Here we can generate a random array compute the sum across the first axis and return a slice of the result.

In [8]:
%%time

# using Numpy
x = np.random.normal(10, 0.1, size=(8000, 8000))
y = x.sum(axis=0)[::100]

CPU times: user 1.39 s, sys: 39.3 ms, total: 1.43 s
Wall time: 1.43 s


In [9]:
%%time

# using Bodo
x = np.random.normal(10, 0.1, size=(8000, 8000))

@bodo.jit
def array_sum(x):    
    y = x.sum(axis=0)[::100]
array_sum(x)



CPU times: user 2.02 s, sys: 36 ms, total: 2.06 s
Wall time: 2.05 s


As we can see from the warning we didn't get any parallelism from this function. To enable MPI inside of a Jupyter notebook we need to use the %%px cell magic. Also, inside the JIT decorator we can tell Bodo what objects we would like to distribute. In this case, we'll distribute the array 'x'.

In [10]:
%%px
%%time


import numpy as np
import bodo

x = np.random.normal(10, 0.1, size=(8000, 8000))

@bodo.jit(distributed=['x'])
def array_sum(x):
    y = x.sum(axis=0)[::100]
array_sum(x)

[stdout:0] 
CPU times: user 2.38 s, sys: 70.6 ms, total: 2.45 s
Wall time: 2.45 s
[stdout:1] 
CPU times: user 2.4 s, sys: 37 ms, total: 2.44 s
Wall time: 2.44 s
[stdout:2] 
CPU times: user 2.45 s, sys: 70 ms, total: 2.52 s
Wall time: 2.54 s
[stdout:3] 
CPU times: user 2.57 s, sys: 41.9 ms, total: 2.61 s
Wall time: 2.61 s


Bodo support a large portion of the Numpy.ndarray API. Here we can see how it performs with matrix multiply and elementwise multiply.

In [15]:
%%time
A = np.random.random((4000,4000))
A.dot(A.T) * 5

CPU times: user 1.42 s, sys: 36.1 ms, total: 1.45 s
Wall time: 1.48 s


array([[6613.10857233, 4985.22299454, 4967.0096968 , ..., 4944.15787411,
        4962.45570918, 5024.09110694],
       [4985.22299454, 6641.94719669, 4980.24049265, ..., 4966.06562832,
        4954.87059481, 5048.20910047],
       [4967.0096968 , 4980.24049265, 6618.30575987, ..., 4943.68686453,
        4960.93817956, 5035.38475147],
       ...,
       [4944.15787411, 4966.06562832, 4943.68686453, ..., 6676.50723882,
        4958.39803417, 5008.26384702],
       [4962.45570918, 4954.87059481, 4960.93817956, ..., 4958.39803417,
        6609.38072068, 5001.54931209],
       [5024.09110694, 5048.20910047, 5035.38475147, ..., 5008.26384702,
        5001.54931209, 6771.24013721]])

In [17]:
%%px
%%time

import numpy as np
import bodo

@bodo.jit
def dot():
    A = np.random.random((4000,4000))
    return A.dot(A.T) * 5
dot()

[stdout:0] 
CPU times: user 4.48 s, sys: 148 ms, total: 4.63 s
Wall time: 4.62 s
[stdout:1] 
CPU times: user 4.55 s, sys: 127 ms, total: 4.68 s
Wall time: 4.66 s
[stdout:2] 
CPU times: user 4.53 s, sys: 111 ms, total: 4.64 s
Wall time: 4.62 s
[stdout:3] 
CPU times: user 4.59 s, sys: 131 ms, total: 4.72 s
Wall time: 4.7 s


[stderr:0] 
[stderr:1] 
[stderr:2] 
[stderr:3] 


[0;31mOut[0:13]: [0m
array([[6549.31493325, 4934.71881109, 4854.00281397, ..., 4943.82146498,
        5013.17831785, 4879.60686016],
       [4934.71881109, 6690.74829348, 4942.71338107, ..., 5051.67575504,
        5017.53486812, 4943.84624566],
       [4854.00281397, 4942.71338107, 6576.72292076, ..., 4997.46079528,
        5008.23689438, 4926.81070032],
       ...,
       [4943.82146498, 5051.67575504, 4997.46079528, ..., 6749.56177356,
        5105.19363168, 5004.15213347],
       [5013.17831785, 5017.53486812, 5008.23689438, ..., 5105.19363168,
        6793.5573002 , 5031.60800397],
       [4879.60686016, 4943.84624566, 4926.81070032, ..., 5004.15213347,
        5031.60800397, 6601.60650939]])

[0;31mOut[1:13]: [0m
array([[6561.94382376, 5050.62079619, 4928.85175852, ..., 4860.17007196,
        4971.20655104, 4879.67868334],
       [5050.62079619, 6828.79771849, 5166.43391521, ..., 5066.12086631,
        5097.69170661, 5034.09766605],
       [4928.85175852, 5166.43391521, 6712.05720872, ..., 5011.95407599,
        4958.44056326, 5018.71360619],
       ...,
       [4860.17007196, 5066.12086631, 5011.95407599, ..., 6558.52038069,
        4995.72217924, 4945.14494714],
       [4971.20655104, 5097.69170661, 4958.44056326, ..., 4995.72217924,
        6679.83092895, 4959.98391125],
       [4879.67868334, 5034.09766605, 5018.71360619, ..., 4945.14494714,
        4959.98391125, 6605.17063217]])

[0;31mOut[2:13]: [0m
array([[6638.78930873, 4954.86227183, 4929.13692585, ..., 4929.40446717,
        4986.7108257 , 4988.89217535],
       [4954.86227183, 6569.7342018 , 4919.06699744, ..., 4933.28317015,
        4987.33531995, 4881.66082614],
       [4929.13692585, 4919.06699744, 6646.73912693, ..., 4864.80317895,
        5004.22205429, 4940.04506403],
       ...,
       [4929.40446717, 4933.28317015, 4864.80317895, ..., 6562.06352389,
        4975.00797189, 4922.02962773],
       [4986.7108257 , 4987.33531995, 5004.22205429, ..., 4975.00797189,
        6724.50281105, 4979.87894362],
       [4988.89217535, 4881.66082614, 4940.04506403, ..., 4922.02962773,
        4979.87894362, 6601.58659428]])

[0;31mOut[3:13]: [0m
array([[6458.97497218, 4933.51340083, 4887.48950067, ..., 4896.80684146,
        4901.90465838, 4936.62178744],
       [4933.51340083, 6764.10891506, 5051.44551924, ..., 5029.93870267,
        5005.24567035, 5090.85666185],
       [4887.48950067, 5051.44551924, 6607.58719422, ..., 4931.07005764,
        4965.65978352, 5005.98816171],
       ...,
       [4896.80684146, 5029.93870267, 4931.07005764, ..., 6633.90954576,
        4920.26418979, 5024.89828036],
       [4901.90465838, 5005.24567035, 4965.65978352, ..., 4920.26418979,
        6616.47527018, 5034.33736194],
       [4936.62178744, 5090.85666185, 5005.98816171, ..., 5024.89828036,
        5034.33736194, 6724.25547082]])