# Defining `ufuncs` using `vectorize`

You have been able to define your own NumPy [`ufuncs`](http://docs.scipy.org/doc/numpy/reference/ufuncs.html) for quite some time, but it's a little involved.  

You can read through the [documentation](http://docs.scipy.org/doc/numpy/user/c-info.ufunc-tutorial.html), the example they post there is a ufunc to perform 

$$f(a) = \log \left(\frac{a}{1-a}\right)$$

It looks like this:

```c
static void double_logit(char **args, npy_intp *dimensions,
                            npy_intp* steps, void* data)
{
    npy_intp i;
    npy_intp n = dimensions[0];
    char *in = args[0], *out = args[1];
    npy_intp in_step = steps[0], out_step = steps[1];

    double tmp;

    for (i = 0; i < n; i++) {
        /*BEGIN main ufunc computation*/
        tmp = *(double *)in;
        tmp /= 1-tmp;
        *((double *)out) = log(tmp);
        /*END main ufunc computation*/

        in += in_step;
        out += out_step;
    }
}
```

And **note**, that's just for a `double`.  If you want `floats`, `long doubles`, etc... you have to write all of those, too.  And then create a `setup.py` file to install it.  And I left out a bunch of boilerplate stuff to set up the import hooks, etc...

# Making your first ufunc

We can use Numba to define ufuncs without all of the pain.

In [None]:
import numpy
import math

Let's define a function that operates on two inputs

In [None]:
def trig(a, b):
    return math.sin(a**2) * math.exp(b)

In [None]:
trig(1, 1)

Seems reasonable.  However, the `math` library only works on scalars.  If we try to pass in arrays, we'll get an error.

In [None]:
a = numpy.ones((5,5))
b = numpy.ones((5,5))

In [None]:
trig(a, b)

In [None]:
from numba import vectorize

In [None]:
vec_trig = vectorize()(trig)

In [None]:
vec_trig(a, b)

And just like that, the scalar function `trig` is now a NumPy `ufunc` called `vec_trig`

Note that this is a "Dynamic UFunc" with no signature given.  

How does it compare to just using NumPy?  Let's check

In [None]:
def numpy_trig(a, b):
    return numpy.sin(a**2) * numpy.exp(b)

In [None]:
a = numpy.random.random((1000, 1000))
b = numpy.random.random((1000, 1000))

In [None]:
%timeit vec_trig(a, b)

In [None]:
%timeit numpy_trig(a, b)

What happens if we do specify a signature?  Is there a speed boost?

In [None]:
vec_trig = vectorize('float64(float64, float64)')(trig)

In [None]:
%timeit vec_trig(a, b)

No, not really.  But(!), if we have a signature, then we can add the target `kwarg`.

In [None]:
vec_trig = vectorize('float64(float64, float64)', target='parallel')(trig)

In [None]:
%timeit vec_trig(a, b)

Automatic multicore operations!

**Note**: `target='parallel'` is not always the best option.  There is overhead in setting up the threading, so if the individual scalar operations that make up a `ufunc` are simple you'll probably get better performance in serial.  If the individual operations are more expensive (like trig!) then parallel is (usually) a good option.

### Passing multiple signatures

If you use multiple signatures, they have to be listed in order of most specific -> least specific

In [None]:
@vectorize(['int32(int32, int32)',
            'int64(int64, int64)',
            'float32(float32, float32)',
            'float64(float64, float64)'])
def trig(a, b):
    return math.sin(a**2) * math.exp(b)

In [None]:
trig(1, 1)

In [None]:
trig(1., 1.)

In [None]:
trig.ntypes

## [Exercise: Clipping an array](./exercises/07.Vectorize.Exercises.ipynb#Exercise:-Clipping-an-array)

Yes, NumPy has a `clip` ufunc already, but let's pretend it doesn't.  

Create a Numba vectorized ufunc that takes a vector `a`, a lower limit `amin` and an upper limit `amax`.  It should return the vector `a` with all values clipped such that $a_{min} < a < a_{max}$:

In [None]:
# %load snippets/clip.py

In [None]:
a = numpy.random.random((5000))

In [None]:
amin = .2
amax = .6

In [None]:
%timeit vec_truncate_serial(a, amin, amax)

In [None]:
%timeit vec_truncate_par(a, amin, amax)

In [None]:
%timeit numpy.clip(a, amin, amax)

In [None]:
a = numpy.random.random((100000))

In [None]:
%timeit vec_truncate_serial(a, amin, amax)

In [None]:
%timeit vec_truncate_par(a, amin, amax)

In [None]:
%timeit numpy.clip(a, amin, amax)

## [Exercise: Create `logit` ufunc](./exercises/07.Vectorize.Exercises.ipynb#Exercise:-Create-logit-ufunc)

Recall from above that this is a ufunc which performs this operation:

$$f(a) = \log \left(\frac{a}{1-a}\right)$$

In [None]:
# %load snippets/logit.py

In [None]:
logit(a)

## Performance of `vectorize` vs. regular array-wide operations

In [None]:
@vectorize
def discriminant(a, b, c):
    return b**2 - 4 * a * c

In [None]:
a = numpy.arange(10000)
b = numpy.arange(10000)
c = numpy.arange(10000)

In [None]:
%timeit discriminant(a, b, c)

In [None]:
%timeit b**2 - 4 * a * c

What's going on?

* Each array operation creates a temporary copy
* Each of these arrays are loaded into and out of cache a whole bunch

In [None]:
del a, b, c