# About the notebook

The notebook is based on [GTC 2017 Numba Tutorial Notebook 1: Numba Basics](https://github.com/ContinuumIO/gtc2017-numba/blob/master/1%20-%20Numba%20Basics.ipynb). I have simplified some parts and added some information to clarify some concepts.

# What is Numba?

- function compiler
- type-specializaing
- just-in-time
- numerically-focused

### Note: Decorators

When saying **"funtions are first-class objects"**, it means that "functions can be passed around, and used as arguments, just like any other value (e.g, string, int, float)"  

**Decorators are functions that transform Python functions with the same name.**
```
just_some_function = my_decorator(just_some_function)
```

Python allows you to simplify the calling of decorators using the **@** symbol (this is called **"pie" syntax**). for example, the syntax become:
```
@my_decorator
def just_some_function():
    ...
```

information from [Primer on Python Decorators](https://realpython.com/primer-on-python-decorators/)

# First Steps

Let's write our first Numba function and compile it for the CPU. The Numba compiler is typically enabled by applying a decorator to a Python function. Decorators are functions that transform Python functions. Here we will use the CPU compilation decorator:

In [1]:
from numba import jit
import math

In [2]:
@jit
def hypot(x, y):
    # Implementation from https://en.wikipedia.org/wiki/Hypot
    x = abs(x);
    y = abs(y);
    t = min(x, y);
    x = max(x, y);
    t = t / x;
    return x * math.sqrt(1+t*t)

The above code is equivalent to writing:

In [3]:
def hypot(x, y):
    x = abs(x);
    y = abs(y);
    t = min(x, y);
    x = max(x, y);
    t = t / x;
    return x * math.sqrt(1+t*t)

hypot = jit(hypot)

In [4]:
hypot(3.0, 4.0)

5.0

The first time we call hypot, the compiler is triggered and compiles a machine code implementation for float inputs. Numba also saves the original Python implementation of the function in the **.py_func** attribute, so we can call the original Python code to make sure we get the same answer:

In [5]:
hypot.py_func(3.0, 4.0)

5.0

# Benchmarking

Original python function

In [6]:
%timeit hypot.py_func(3.0, 4.0)

592 ns ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


Numba version

In [7]:
%timeit hypot(3.0, 4.0)

152 ns ± 0.0243 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


Function inside python module

In [8]:
%timeit math.hypot(3.0, 4.0)

92.8 ns ± 0.00417 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


**Python's built-in is even faster than Numba!** This is because Numba does introduce some overhead to each function call that is larger than the function call overhead of Python itself. Extremely fast functions (like the above one) will be hurt by this.

(However, if you call one Numba function from another one, there is very little function overhead, sometimes even zero if the compiler inlines the function into the other one.)

# How does Numba work?

![](https://github.com/ContinuumIO/gtc2017-numba/raw/6ddaeec9baecf07df1a22e3e685d5f6e3b4f33d9/img/numba_flowchart.png)

We can see the result of type inference by using the .inspect_types() method, which prints an annotated version of the source code:

In [9]:
hypot.inspect_types()

hypot (float64, float64)
--------------------------------------------------------------------------------
# File: <ipython-input-3-977262a9be06>
# --- LINE 1 --- 
# label 0
#   del x
#   del $0.1
#   del $0.3
#   del y
#   del $0.4
#   del $0.6
#   del $0.7
#   del $0.10
#   del y.1
#   del x.1
#   del $0.11
#   del $0.14
#   del t
#   del $0.17
#   del $0.19
#   del t.1
#   del $const0.21
#   del $0.24
#   del $0.25
#   del $0.20
#   del x.2
#   del $0.26
#   del $0.27

def hypot(x, y):

    # --- LINE 2 --- 
    #   x = arg(0, name=x)  :: float64
    #   y = arg(1, name=y)  :: float64
    #   $0.1 = global(abs: <built-in function abs>)  :: Function(<built-in function abs>)
    #   $0.3 = call $0.1(x, func=$0.1, args=[Var(x, <ipython-input-3-977262a9be06> (2))], kws=(), vararg=None)  :: (float64,) -> float64
    #   x.1 = $0.3  :: float64

    x = abs(x);

    # --- LINE 3 --- 
    #   $0.4 = global(abs: <built-in function abs>)  :: Function(<built-in function abs>)
    #   $0.6 = cal

# When Things Go Wrong

Numba cannot compile all Python code. Some functions don't have a Numba-translation, and **some kinds of Python types can't be efficiently compiled at all (yet)**. For example, Numba does not support **dictionaries** (as of this tutorial):

In [13]:
@jit
def cannot_compile(x):
    return x['key']

cannot_compile(dict(key='value'))

'value'

Wait, what happened?? By default, Numba will fall back to a mode, called **"object mode,"** which does not do type-specialization. Object mode exists to enable other Numba functionality, but in many cases, you want Numba to tell you if type inference fails. You can force "nopython mode" (the other compilation mode) by passing arguments to the decorator:

In [15]:
@jit(nopython=True)
def cannot_compile(x):
    return x['key']

cannot_compile(dict(key='value'))

TypingError: Failed at nopython (nopython frontend)
Invalid usage of getitem with parameters (pyobject, const('key'))
 * parameterized
File "<ipython-input-15-42c374763781>", line 3
[1] During: typing of intrinsic-call at <ipython-input-15-42c374763781> (3)
[2] During: typing of static-get-item at <ipython-input-15-42c374763781> (3)

This error may have been caused by the following argument(s):
- argument 0: cannot determine Numba type of <class 'dict'>


## Note: object mode and nopython mode

### [object mode](https://numba.pydata.org/numba-doc/dev/glossary.html#term-object-mode)
- Python C API
- often no faster than python interpreted code

A Numba compilation mode that generates code that handles all values as Python objects and **uses the Python C API** to perform all operations on those objects. **Code compiled in object mode will often run no faster than Python interpreted code**, unless the Numba compiler can take advantage of loop-jitting.

### [nopython mode](https://numba.pydata.org/numba-doc/dev/glossary.html#term-nopython-mode)
- Not access the Python C API
- produces the highest performance code

A Numba compilation mode that generates code that **does not access the Python C API**. **This compilation mode produces the highest performance code**, but requires that the native types of all values in the function can be inferred. Unless otherwise instructed, the @jit decorator will automatically fall back to object mode if nopython mode cannot be used.

# Exercise

Below is a function that **loops over two input NumPy arrays and puts their sum into the output array**. 

Modify this function to call the hypot function we defined above. We will learn a more efficient way to write such functions in a future section.

(Make sure to execute all the cells in this notebook so that hypot is defined.)

In [16]:
@jit(nopython=True)
def ex1(x, y, out):
    for i in range(x.shape[0]):
        out[i] = x[i] + y[i]

In [17]:
import numpy as np

in1 = np.arange(10, dtype=np.float64)
in2 = 2 * in1 + 1
out = np.empty_like(in1)

print('in1:', in1)
print('in2:', in2)

ex1(in1, in2, out)

print('out:', out)

in1: [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
in2: [ 1.  3.  5.  7.  9. 11. 13. 15. 17. 19.]
out: [ 1.  4.  7. 10. 13. 16. 19. 22. 25. 28.]


In [18]:
# This test will fail until you fix the ex1 function
np.testing.assert_almost_equal(out, np.hypot(in1, in2))

AssertionError: 
Arrays are not almost equal to 7 decimals

(mismatch 90.0%)
 x: array([ 1.,  4.,  7., 10., 13., 16., 19., 22., 25., 28.])
 y: array([ 1.       ,  3.1622777,  5.3851648,  7.6157731,  9.8488578,
       12.083046 , 14.3178211, 16.5529454, 18.7882942, 21.023796 ])

## My answer

Modify this function to call the hypot function we defined above. We will learn a more efficient way to write such functions in a future section.

```
# review the code of hypot
def hypot(x, y):
    # Implementation from https://en.wikipedia.org/wiki/Hypot
    x = abs(x);
    y = abs(y);
    t = min(x, y);
    x = max(x, y);
    t = t / x;
    return x * math.sqrt(1+t*t)
```

In [22]:
@jit(nopython=True)
def ex1(x, y, out):
    for i in range(x.shape[0]):
        #out[i] = x[i] + y[i]
        out[i] = hypot(x[i], y[i])

In [23]:
import numpy as np

in1 = np.arange(10, dtype=np.float64)
in2 = 2 * in1 + 1
out = np.empty_like(in1)

print('in1:', in1)
print('in2:', in2)

ex1(in1, in2, out)

print('out:', out)

# This test will fail until you fix the ex1 function
np.testing.assert_almost_equal(out, np.hypot(in1, in2))

in1: [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
in2: [ 1.  3.  5.  7.  9. 11. 13. 15. 17. 19.]
out: [ 1.          3.16227766  5.38516481  7.61577311  9.8488578  12.08304597
 14.31782106 16.55294536 18.78829423 21.02379604]
