<a href="https://colab.research.google.com/github/deepakk7195/IISC_CDS_DS/blob/Scalable_ML_GenAI/Addl_NB_(ungraded)_Numba.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Certification Program in Computational Data Science
## A program by IISc and TalentSprint
### Additional Notebook (Ungraded): Introduction to Numba

## Learning Objectives

At the end of the experiment, you will be able to:

* use the jit decorator to improve the performance
* understand the difference between Numba’s compilation modes
* understand limitations of Numba with examples
* vectorize code for use as a ufunc

## Information

#### Numba in a Nutshell

Numba is a Python module which translates a subset of Python and NumPy code into high-speed machine code. Numba allows the compilation of selected portions of pure Python code to native code, and generates optimized machine code using the LLVM (Low Level Virtual Machine) compiler infrastructure.

With a few simple annotations, array-oriented and math-heavy Python code can be just-in-time (JIT) optimized to achieve performance similar to C, C++ and Fortran, without having to switch languages or Python interpreters.

**High-Level architecture of Numba**

The Numba translation process can be translated in a set of important steps ranging from the Bytecode analysis to the final machine code generation. The picture bellow illustrates this process, where the green boxes correspond to the frontend of the Numba compiler and the blue boxes belong to the backend.

![Image](https://cdn.iisc.talentsprint.com/CDS/Images/numba.png)

To know more about Numba click [here](https://towardsdatascience.com/speed-up-your-algorithms-part-2-numba-293e554c5cc1)


In [None]:
# Upgrade numba
!pip install numba --upgrade

Importing necessary packages

In [None]:
from numba import * # Importing all the functions present in numba package
import numpy as np # Importing numpy package under a name np

Let us first write a small python code to find the sums of all the elements of a given array and then understand its implementation using numba.

In [None]:
# Python version code
# Defining a function
def ArraySum(array):
    m, n = array.shape # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not Pythonic style)
    total = 0 # Defining a variable
    for j in range(m): # iterating over rows
        for i in range(n): # iterating over columns
            total += array[j, i] # calculating the sum
    return total # returning the sum of elements of an array

In [None]:
A = np.random.random((200,200)) # Generating a numpy array
ArraySum(A) # Calling the ArraySum function

Now let us time the execution of ArraySum function while calculating the sum of elements in array 'A'

In [None]:
# timing the execution
%timeit ArraySum(A)

To know more about the timeit function click [here](https://docs.python.org/3/library/timeit.html)

Now let us see how to speed up execution of ArraySum function while calculating the sum of elements in array 'A' using numba

**Jit as function call**

In [None]:
sum_array_numba = jit()(ArraySum) # Calling the jit compiler

The function **sum_array_numba** is a version of **ArraySum** that is “targeted” for JIT-compilation.

In [None]:
# Timing the excution of sum_array_numba function

%timeit sum_array_numba(A)

From the above codes, we can see that execution of the code gets faster using JIT Compiler. Now let us write numpy version of the code to calculate the sum of elements in an array and timeit

In [None]:
A.sum() #using in-built sum function to find sum of elements in an array (Its better idea; Pythonic style)

In [None]:
# Timing the code
%timeit A.sum()

To know more about the sum function click [here](https://docs.python.org/3/library/functions.html#sum)

In the above code, we have created a JIT compiled version **ArraySum** of via the call **jit()(ArraySum)**. In practice this would typically be done using an alternative **decorator** syntax.

To know more about Python decorators click [here](https://link.medium.com/rixEI1907db)

**Decorator Notation**

 To target a function for JIT compilation we will put **@jit** before the ArraySum function definition.

In [None]:
@jit
# Defining a function
def ArraySum(array):
    m, n = array.shape # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not  Pythonic style)
    total = 0 # Defining a variable
    for j in range(m): # iterating over rows
        for i in range(n): # iterating over columns
            total += array[j, i] # calculating the sum
    return total # returning the sum of elements of an array

In [None]:
# Timing the execution
%timeit ArraySum(A)

#### Think for a While!!

- How does Numba get the code to run quickly?

Numba examines Python bytecode and then translates this into an 'intermediate representation'. We can view this using inspect_types method.

In [None]:
ArraySum.inspect_types() # Inspecting the types

From the above results, we can infer that
- every line of Python code is preceded by several lines of Numba IR(Intermediate Representations) code that gives a glimpse into what Numba is doing to the Python code behind the scenes.
- at the end of most lines there are type annotations that show how Numba is treating variables and function calls.

### Compilation modes

There are two important modes: nopython and object. The nopython completely avoids the python interpreter and translates the full code to native instructions that can be run without the help of Python . However, if for some reason, that mode is not available (for example, when using unsupported Python features or external libraries) the compilation will fall back to the object mode, where it uses the Python interpreter when it is unable to compile some code . Naturally, the nopython mode is the one which offers the best performance gains.

**nopython mode**

In [None]:
@jit(nopython=True)
# Defining a function
def ArraySum(array):
    m, n = array.shape # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not  Pythonic style)
    total = 0 # Defining a variable
    for j in range(m): # iterating over rows
        for i in range(n): # iterating over columns
            total += array[j, i] # calculating the sum
    return total # returning the sum of elements of an array

In [None]:
# Calling the above defined function and timing it
%timeit ArraySum(A)

#### Compilation flags for jit

There are two other main compilation flags for @jit

**a. cache mode**

if we don't always want to be caught up in compile time for each run, we could use cache mode. This will actually save the compiled function into something like a pyc file in your \__pycache\__ directory, so even between sessions we should have fast performance of the function / code.

In [None]:
@jit(cache=True)
# Defining a function
def ArraySum(array):
    m, n = array.shape # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not  Pythonic style)
    total = 0 # Defining a variable
    for j in range(m): # iterating over rows
        for i in range(n): # iterating over columns
            total += array[j, i] # calculating the sum
    return total # returning the sum of elements of an array

In [None]:
# Calling the above defined function and timing it
%timeit ArraySum(A)

**b. nogil mode**

Whenever Numba optimizes Python code to native code that only works on native types and variables (rather than Python objects), it is not necessary anymore to hold Python’s global interpreter lock (GIL). Numba will release the GIL when entering such a compiled function if you passed nogil=True.

To know more about nogil mode click [here](https://docs.python.org/3/glossary.html#term-global-interpreter-lock)

In [None]:
# Performing multi-threading using nogil
@jit(nogil=True) # Option to release the gil
# Defining a function
def ArraySum(array):
    m, n = array.shape # shape of a array
    # This is a bad idea
    total = 0 # Defining a variable
    for j in range(m): # iterating over rows
        for i in range(n): # iterating over columns
            total += array[j, i] # calculating the sum
    return total # returning the sum of elements of an array

In [None]:
# Calling the above defined function and timing it
%timeit ArraySum(A)

Now let us add Add fastmath=True to trade accuracy for speed in some computations and time it

In [None]:
@jit(fastmath=True)
# Defining a function
def ArraySum(array):
    m, n = array.shape # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not  Pythonic style)
    total = 0 # Defining a variable
    for j in range(m): # iterating over rows
        for i in range(n): # iterating over columns
            total += array[j, i] # calculating the sum
    return total # returning the sum of elements of an array

In [None]:
# Calling the above defined function and timing it
%timeit ArraySum(A)

#### ParallelAccelerator

- ParallelAccelerator is a special compiler pass contributed by Intel Labs
    - Todd A. Anderson, Ehsan Totoni, Paul Liu
    - Based on similar contribution to Julia
- Automatically generates mulithreaded code in a Numba compiled-function:
    - Array expressions and reductions
    - Random functions
    - Dot products
    - Reductions
    - Explicit loops indicated with prange() call
    
To know more about Parallel Accelerator click [here](https://numba.pydata.org/numba-doc/dev/user/parallel.html)


Now let us add Parallel = True tag in the @jil to use multi-core CPU via threading and to perform automatic parallelization

In [None]:
# without using parallel tag

@jit
def f(x): # Defining a function
    return np.cos(x) ** 2 + np.sin(x) ** 2 # calculating the value

In [None]:
data = np.random.random((10000000))

In [None]:
%timeit f(data)

In [None]:
# Using parallel tag
@jit(parallel=True)
def f(x):
    return np.cos(x) ** 2 + np.sin(x) ** 2

In [None]:
%timeit f(data)

Before we drive deep into Numba, let us try to understand few limitations of Numba

In [None]:
# Example 1
@jit
def hello(n):
    return ["hell0", 44] * 4

In [None]:
%timeit hello(1)

After the above code, we will get the desired output but with a warning as Compilation is falling back to object mode. Now let us run the above code in nopython mode to see the limitation.

In [None]:
# Example 1
@jit(nopython=True)
def hello(n):
    return ["hell0", 44]

In [None]:
# Example 2
@jit(nopython=True)
def display():
    data = {"numbers":[1, 3, 4], "evens":[2, 4, 6]}
    return data["numbers"]

To know more about limitations of Numba click [here](https://www.oreilly.com/library/view/python-high-performance/9781787282896/6e5cc5c4-ad53-4657-b502-6630dd9efced.xhtml)

#### Universal Functions (Ufuncs)

- Ufuncs are a core concept in NumPy for array-oriented computing.
- A function with scalar inputs is broadcast across the elements of the input arrays:
    - np.add([1, 2, 3], 5) = [6, 7, 8]
- Parallelism is present, by construction. Numba will generate loops and can automatically multi-thread if requested.

To know more about Numpy Ufuncs click [here](https://numpy.org/doc/stable/reference/ufuncs.html)

In [None]:
# Numpy ufuncs
print(np.add(4, 5)) # Adding two numbers
print(np.add([1, 4, 5], 6)) # Adding 6 to the elements in the list
print(np.add(1, [3, 4])) # Adding 1 to the elements in the list
print(np.add.accumulate([4, 5, 7, 2, 4])) # Accumulate the result of applying the operator to all elements.

In [None]:
# Numba ufuncs
# Function to add two values
@vectorize("(int64, int64)")
def add(x, y):
    # adding the values
    return x + y

In [None]:
print(add(4, 5)) # Adding two numbers
print(add([1, 4, 5], 6)) # Adding 6 to the elements in the list
print(add(1, [3, 4])) # Adding 1 to the elements in the list
print(add.accumulate([4, 5, 7, 2, 4])) # Accumulate the result of applying the operator to all elements.

To know more about vectorize decorator click [here](https://numba.pydata.org/numba-doc/dev/user/vectorize.html)

#### Research Question

1. Write a code to approximate $\pi$ by Monte Carlo and, compare speed with and without Numba when the sample size is large.

    To know about $\pi$ by Monte Carlo click [here](https://medium.com/cantors-paradise/estimating-%CF%80-using-monte-carlo-simulations-3459a84b5ef9)