# `uarray` NumPy Compatability

In [1]:
from uarray import *
import numpy as np
from numba import njit

## Original Expression

Let's look at this simple NumPy expression of calling the outer production of two values and then indexing it:

In [2]:
def some_fn(a, b):
    return np.multiply.outer(a, b)[5]

We can see that this does a lot of extra work, since we discard most of the results of the outer product after indexing. We can look at the time:

In [3]:
args = [np.arange(1000), np.arange(10)]

In [4]:
# NBVAL_IGNORE_OUTPUT
%timeit some_fn(*args)

24.6 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


## Uarray reduced

Now let's use uarray's `optimize` decorator to create an updated function that specifes the dimensionality of the arrays to produced an optimized form:

In [5]:
# enable_logging()

In [6]:
optimized_some_fn = optimize(args[0].shape, args[1].shape)(some_fn)

Now let's try our function out to see if it's faster:

In [7]:
# NBVAL_IGNORE_OUTPUT
%timeit optimized_some_fn(*args)

5.63 µs ± 57.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Yep about 10x as fast. Let's look at how this is done! First, we create an abstract representation of the array operations:

In [8]:
optimized_some_fn.__optimize_steps__['resulting_expr']

Index(
    Sequence(Int(1), Vector(Scalar(Int(5)))),
    OuterProduct(
        BinaryUfunc(np.ufunc(multiply)),
        Sequence(
            Int(1000),
            UnaryFunction(
                Scalar(
                    Content(CallUnary(GetItem(NPArray(Expression(Name("a", Load())))), Unbound(variable_name="i2")))
                ),
                Unbound(variable_name="i2"),
            ),
        ),
        Sequence(
            Int(10),
            UnaryFunction(
                Scalar(
                    Content(CallUnary(GetItem(NPArray(Expression(Name("b", Load())))), Unbound(variable_name="i3")))
                ),
                Unbound(variable_name="i3"),
            ),
        ),
    ),
)


Then, we compile that to Python AST:

In [9]:
print(optimized_some_fn.__optimize_steps__['ast_as_source'])



def fn(a, b):
    i_5 = ()
    i_6 = 10
    i_1 = ((i_6,) + i_5)
    i_0 = np.empty(i_1)
    i_2 = 10
    for i_3 in range(i_2):
        i_4 = i_0[i_3]
        i_9 = 5
        i_10 = a
        i_13 = i_10[i_9]
        i_11 = i_3
        i_12 = b
        i_14 = i_12[i_11]
        i_4 = (i_13 * i_14)
        i_0[i_3] = i_4
    return i_0



## Numba optimized

To give this an extra speed boost, we can compile the returned expression with Numba:

In [10]:
numba_optimized = njit(optimized_some_fn)

In [11]:
# NBVAL_IGNORE_OUTPUT
# run once first to compile
numba_optimized(*args)
                
%timeit numba_optimized(*args)

809 ns ± 19.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


Great, another speedup!

## Unkown dimensionality?

What if we want to produce a version of the function that works on any dimensional input? Or if we just want to actually defer to NumPy's implementation and not replace `outer`? We simply omit the `with_dim` methods and we get back an abstract representation that is compiled without any knowledge of the dimensionality:

In [12]:
dims_not_known = optimize(some_fn)

In [13]:
dims_not_known.__optimize_steps__['resulting_expr']

Index(
    Sequence(Int(1), Vector(Scalar(Int(5)))),
    OuterProduct(
        BinaryUfunc(np.ufunc(multiply)), NPArray(Expression(Name("a", Load()))), NPArray(Expression(Name("b", Load())))
    ),
)


In [14]:
print(dims_not_known.__optimize_steps__['ast_as_source'])



def fn(a, b):
    i_18 = 5
    i_16 = a
    i_17 = b
    i_19 = np.multiply.outer(i_16, i_17)
    i_15 = i_19[i_18]
    return i_15

