In [28]:
%matplotlib inline

In [29]:
import numpy as np
from numba import jit, njit, vectorize
import random


# Tutorial 1
https://www.youtube.com/watch?v=x58W9A2lnQc&list=WL&index=6&t=1s

## Reading list
- [Deviation from Python Semantics](https://numba.pydata.org/numba-doc/dev/reference/pysemantics.html)
- [Compiling code with @jit](http://numba.pydata.org/numba-doc/latest/user/jit.html#eager-compilation)
- [Flexible specialization with @generated_jit](https://numba.pydata.org/numba-doc/dev/user/generated-jit.html)
- [Compiling classes with @jitclass](https://numba.pydata.org/numba-doc/dev/user/jitclass.html)
- [Troubleshooting and tips](https://numba.pydata.org/numba-doc/dev/user/troubleshoot.html)
- [Types and signatures](https://numba.pydata.org/numba-doc/dev/reference/types.html)
- [Compiling code ahead of time](https://numba.pydata.org/numba-doc/dev/user/pycc.html)
- [Performance tips](https://numba.pydata.org/numba-doc/dev/user/performance-tips.html#fastmath)
- [The Threading layers](https://numba.pydata.org/numba-doc/dev/user/threading-layer.html)

## GPU stuff
- [Numba for CUDA GPUs](https://numba.pydata.org/numba-doc/dev/cuda/index.html)
- [Numba for AMD ROC GPUs](https://numba.pydata.org/numba-doc/dev/roc/index.html)

## FAQ
- [FAQ](https://numba.pydata.org/numba-doc/dev/user/faq.html)

## Numba site example

In [30]:
def monte_carlo_pi(nsamples):
    acc = 0
    for i in range(nsamples):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples


Without jit

In [31]:
%time monte_carlo_pi(10000)

CPU times: total: 0 ns
Wall time: 14 ms


3.1704

In [32]:
monte_carlo_pi_jit = jit()(monte_carlo_pi)


First run takes a while to compile

In [33]:
%time monte_carlo_pi_jit(10000)

CPU times: total: 78.1 ms
Wall time: 132 ms


3.1392

Consecutive runs are blazing fast

In [34]:
%time monte_carlo_pi_jit(10000)


CPU times: total: 0 ns
Wall time: 0 ns


3.1496

## Failing with Numba

### jit vs njit

In [35]:
def original_function(input_list):
    output_list = []
    for item in input_list:
        if item % 2 == 0:
            output_list.append(2)
        else:
            output_list.append('1')
    return output_list

test_array = list(range(100000))

In [36]:
%time _ = original_function(test_array)

CPU times: total: 15.6 ms
Wall time: 16 ms


In [37]:
jitted_function = jit()(original_function)

In [38]:
%time _ = jitted_function(test_array)

Compilation is falling back to object mode WITH looplifting enabled because Function "original_function" failed type inference due to: [1m[1m[1mInvalid use of BoundFunction(list.append for list(int64)<iv=None>) with parameters (Literal[str](1))
[0m
[0m[1mDuring: resolving callee type: BoundFunction(list.append for list(int64)<iv=None>)[0m
[0m[1mDuring: typing of call at C:\Users\demva\AppData\Local\Temp\ipykernel_13304\2450009860.py (7)
[0m
[1m
File "C:\Users\demva\AppData\Local\Temp\ipykernel_13304\2450009860.py", line 7:[0m
[1mdef original_function(input_list):
    <source elided>
        else:
[1m            output_list.append('1')
[0m            [1m^[0m[0m
[0m
  def original_function(input_list):
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "original_function" failed type inference due to: [1m[1mCannot determine Numba type of <class 'numba.core.dispatcher.LiftedLoop'>[0m
[1m
File "C:\Users\demva\AppData\Local\Temp\ip

CPU times: total: 328 ms
Wall time: 384 ms


Here we should see an error which is suppressed the second time the function is ran. In the error we can see that numba expected `int64` type entries in the `output_list`, but was given a string.

Now, jit just throws a warning and falls back on 'object mode', which is basically same as python. To avoid this we want to use njit because it will throw an actual error that we have to then resolve in order to get the proper speed boost.

In [39]:
njitted_function = njit()(original_function)

In [40]:
# %time _ = njitted_function(test_array)

We can comment out the broken function

### Fixing the function (attempt 1)

In [41]:
def original_function(input_list):
    output_list = []
    for item in input_list:
        if item % 2 == 0:
            output_list.append(2)
        else:
            output_list.append(1)
    return output_list


test_array = list(range(100000))


In [42]:
%time _ = original_function(test_array)

CPU times: total: 0 ns
Wall time: 20 ms


In [43]:
njitted_function = njit()(original_function)

In [44]:
%time _ = njitted_function(test_array)

Encountered the use of a type that is scheduled for deprecation: type 'reflected list' found for argument 'input_list' of function 'original_function'.

For more information visit https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-reflection-for-list-and-set-types
[1m
File "C:\Users\demva\AppData\Local\Temp\ipykernel_13304\1362277157.py", line 1:[0m
[1m[1mdef original_function(input_list):
[0m[1m^[0m[0m
[0m


CPU times: total: 375 ms
Wall time: 453 ms


Here comes up another warning talking about "reflection lists". Basically this is to do with python lists being confusing for numba, so they should never be used.

These should be avoided as much as possible with numba. Here's a link about it:
https://numba.pydata.org/numba-doc/dev/user/faq.html

### Fixing the function (attempt 2)

In [45]:
def original_function(input_list):
    output_list = []
    for item in input_list:
        if item % 2 == 0:
            output_list.append(2)
        else:
            output_list.append(1)
    return output_list


test_array = np.arange(100000)

In [46]:
%time _ = original_function(test_array)

CPU times: total: 0 ns
Wall time: 40 ms


In [47]:
njitted_function = njit()(original_function)

In [48]:
%time _ = njitted_function(test_array)

CPU times: total: 109 ms
Wall time: 130 ms


In [49]:
%time _ = njitted_function(test_array)

CPU times: total: 0 ns
Wall time: 2.02 ms


Now we're running at a much better 1ms

## Vectorize

In [50]:
@vectorize
def scalar_computation(num):
    if num % 2 == 0:
        return 2
    else:
        return 1

In [61]:
%time scalar_computation(test_array)

CPU times: total: 0 ns
Wall time: 2 ms


array([2, 1, 2, ..., 1, 2, 1], dtype=int64)

This is supposed be significantly faster than the previous function (at least the guy in the video had it faster), but it is faster anyways.

Anyways, the point is that the original function had an undetermined-size list. If we re-write it using numpy it runs as fast as vectorize.

In [52]:
@njit
def fixed_function(input_list):
    output_list = np.zeros_like(input_list)
    for ii, item in enumerate(input_list):
        if item % 2 == 0:
            output_list[ii] = 2
        else:
            output_list[ii] = 1
    return output_list

In [55]:
%time fixed_function(test_array)

CPU times: total: 0 ns
Wall time: 998 µs


array([2, 1, 2, ..., 1, 2, 1])

Now it takes about the same amount of time as vectorize

Whether you want to use @vectorize basically depends on how you want to write your functions - explicitly or implicitly passing arrays to them.

# Other

In [54]:
dims = (2,3)

@jit(nopython=True)
def empty(dims):
    return np.empty(dims, np.float64)  # np.float64 instead of np.float


empty(dims)


array([[-1.36311572e+57, -1.36311572e+57, -1.36311572e+57],
       [-1.36311572e+57, -1.36311572e+57, -1.36311572e+57]])