## Is this just magic?  What is Numba doing to make code run quickly?

Let's define a trivial example function.

In [22]:
from numba import jit

In [23]:
@jit
def add(a, b):
    return a + b

In [24]:
add(1, 1)

2

Numba examines Python bytecode and then translates this into an 'intermediate representation'.  To view this IR, run (compile) `add` and you can access the `inspect_types` method.

In [25]:
add.inspect_types()

add (int64, int64)
--------------------------------------------------------------------------------
# File: <ipython-input-23-1c683d2d00ee>
# --- LINE 1 --- 

@jit

# --- LINE 2 --- 

def add(a, b):

    # --- LINE 3 --- 
    # label 0
    #   a = arg(0, name=a)  :: int64
    #   b = arg(1, name=b)  :: int64
    #   $0.3 = a + b  :: int64
    #   del b
    #   del a
    #   $0.4 = cast(value=$0.3)  :: int64
    #   del $0.3
    #   return $0.4

    return a + b




Ok.  Numba is has correctly inferred the type of the arguments, defining things as `int64` and running smoothly.  

(What happens if you do `add(1., 1.)` and then `inspect_types`?)

In [26]:
add(1., 1.)

2.0

In [27]:
add.inspect_types()

add (int64, int64)
--------------------------------------------------------------------------------
# File: <ipython-input-23-1c683d2d00ee>
# --- LINE 1 --- 

@jit

# --- LINE 2 --- 

def add(a, b):

    # --- LINE 3 --- 
    # label 0
    #   a = arg(0, name=a)  :: int64
    #   b = arg(1, name=b)  :: int64
    #   $0.3 = a + b  :: int64
    #   del b
    #   del a
    #   $0.4 = cast(value=$0.3)  :: int64
    #   del $0.3
    #   return $0.4

    return a + b


add (float64, float64)
--------------------------------------------------------------------------------
# File: <ipython-input-23-1c683d2d00ee>
# --- LINE 1 --- 

@jit

# --- LINE 2 --- 

def add(a, b):

    # --- LINE 3 --- 
    # label 0
    #   a = arg(0, name=a)  :: float64
    #   b = arg(1, name=b)  :: float64
    #   $0.3 = a + b  :: float64
    #   del b
    #   del a
    #   $0.4 = cast(value=$0.3)  :: float64
    #   del $0.3
    #   return $0.4

    return a + b




### What about the actual LLVM code?

You can see the actual LLVM code generated by Numba using the `inspect_llvm()` method.  Since it's a `dict`, doing the following will be slightly more visually friendly.

In [29]:
for k, v in add.inspect_asm().items():
    print(k, v)

(float64, float64) 	.section	__TEXT,__text,regular,pure_instructions
	.macosx_version_min 10, 11
	.globl	___main__.add$6.float64.float64
	.align	4, 0x90
___main__.add$6.float64.float64:
	vaddsd	%xmm1, %xmm0, %xmm0
	vmovsd	%xmm0, (%rdi)
	xorl	%eax, %eax
	retq

	.globl	_cpython.__main__.add$6.float64.float64
	.align	4, 0x90
_cpython.__main__.add$6.float64.float64:
	.cfi_startproc
	pushq	%r15
Ltmp0:
	.cfi_def_cfa_offset 16
	pushq	%r14
Ltmp1:
	.cfi_def_cfa_offset 24
	pushq	%r13
Ltmp2:
	.cfi_def_cfa_offset 32
	pushq	%r12
Ltmp3:
	.cfi_def_cfa_offset 40
	pushq	%rbx
Ltmp4:
	.cfi_def_cfa_offset 48
	subq	$32, %rsp
Ltmp5:
	.cfi_def_cfa_offset 80
Ltmp6:
	.cfi_offset %rbx, -48
Ltmp7:
	.cfi_offset %r12, -40
Ltmp8:
	.cfi_offset %r13, -32
Ltmp9:
	.cfi_offset %r14, -24
Ltmp10:
	.cfi_offset %r15, -16
	movq	%rdi, %rbx
	movabsq	$_.const.add, %r10
	movabsq	$_PyArg_UnpackTuple, %r11
	leaq	24(%rsp), %r8
	leaq	16(%rsp), %r9
	movl	$2, %edx
	movl	$2, %ecx
	xorl	%eax, %eax
	movq	%rsi, %rdi
	movq	%r10, %rsi
	call

## But there's a caveat

Now, watch what happens when the function we want to speed-up operates on object data types.

In [30]:
def add_object_n2_times(a, b, n):
    cumsum = 0
    for i in range(n):
        for j in range(n):
            cumsum += a.x + b.x
    
    return cumsum

In [31]:
class MyInt(object):
    def __init__(self, x):
        self.x = x

In [32]:
a = MyInt(5)
b = MyInt(6)

In [33]:
%timeit add_object_n2_times(a, b, 500)

10 loops, best of 3: 41.7 ms per loop


In [34]:
add_object_jit = jit()(add_object_n2_times)

In [35]:
%timeit add_object_jit(a, b, 500)

The slowest run took 6.34 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 50.2 ms per loop


In [36]:
add_object_jit.inspect_types()

add_object_n2_times (pyobject, pyobject, int64)
--------------------------------------------------------------------------------
# File: <ipython-input-30-99c680a668e7>
# --- LINE 1 --- 

def add_object_n2_times(a, b, n):

    # --- LINE 2 --- 
    # label 0
    #   a = arg(0, name=a)  :: pyobject
    #   b = arg(1, name=b)  :: pyobject
    #   n = arg(2, name=n)  :: pyobject
    #   $const0.1 = const(int, 0)  :: pyobject
    #   cumsum = $const0.1  :: pyobject
    #   del $const0.1

    cumsum = 0

    # --- LINE 3 --- 
    #   jump 6
    # label 6
    #   jump 9
    # label 9
    #   $35 = const(LiftedLoop, LiftedLoop(<function add_object_n2_times at 0x10dfae8c8>))  :: XXX Lifted Loop XXX
    #   $36 = call $35(a, b, cumsum, n)  :: XXX Lifted Loop XXX
    #   del n
    #   del b
    #   del a
    #   del $35
    #   cumsum = static_getitem(index_var=None, index=0, value=$36)  :: pyobject
    #   del $36
    #   jump 72

    for i in range(n):

        # --- LINE 4 --- 

        for j

## What's all this pyobject business?  

This means it has been compiled in `object` mode.  This can be a faster than regular python if it can do loop lifting, but not that fast.  
We want those `pyobjects` to be `int64` or another type that can be inferred by Numba. Your best bet is forcing `nopython` mode: this will throw an error if Numba finds itself in object mode, so that you _know_ that it can't give you speed.

For the full list of supported Python and NumPy features in `nopython` mode, see the Numba documentation here: http://numba.pydata.org/numba-doc/latest/reference/pysupported.html

## Figuring out what isn't working

In [None]:
%%file nopython_failure.py
from numba import jit

class MyInt(object):
    def __init__(self, x):
        self.x = x
        
@jit
def add_object(a, b):
    for i in range(100):
        c = i
        f = i + 7
        l = c + f
        
    return a.x + b.x

a = MyInt(5)
b = MyInt(6)

add_object(a, b)

In [None]:
!numba --annotate-html fail.html nopython_failure.py

[fail.html](fail.html)

## Forcing `nopython` mode

In [37]:
add_object_jit = jit(nopython=True)(add_object_n2_times)

In [38]:
# This will fail
add_object_jit(a, b, 5)

UntypedAttributeError: Failed at nopython (nopython frontend)
Unknown attribute 'x' of type pyobject
File "<ipython-input-30-99c680a668e7>", line 5
[1] During: typing of get attribute at <ipython-input-30-99c680a668e7> (5)

This error may have been caused by the following argument(s):
- argument 0: cannot determine Numba type of value <__main__.MyInt object at 0x10df3d048>
- argument 1: cannot determine Numba type of value <__main__.MyInt object at 0x10df3d080>


In [None]:
from numba import njit

In [None]:
add_object_jit = njit(add_object_n2_times)

In [None]:
# This will also fail
add_object_jit(a, b, 5)

## Other compilation flags

There are two other main compilation flags for `@jit`

```python
cache=True
```

if you don't want to always want to get dinged by the compilation time for every run

```python
nogil=True
```

This releases the GIL.  Note, however, that it doesn't do anything else, like make your program threadsafe.  You have to manage all of those things on your own (use `concurrent.futures`).