<a href="https://colab.research.google.com/github/Prakum14/Testfiles/blob/master/M8_AST_01_Numba_C.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Certification Programme in AI and MLOps
## A Program by IISc and TalentSprint
### Assignment 1: Introduction to Numba

## Learning Objectives

At the end of the experiment, you will be able to:

* use the jit decorator to improve the performance
* understand the difference between Numba’s compilation modes
* understand limitations of Numba with examples
* vectorize code for use as a ufunc

## Information

#### Numba in a Nutshell

Numba is a Python module which translates a subset of Python and NumPy code into high-speed machine code. Numba allows the compilation of selected portions of pure Python code to native code, and generates optimized machine code using the LLVM (Low Level Virtual Machine) compiler infrastructure.

With a few simple annotations, array-oriented and math-heavy Python code can be just-in-time (JIT) optimized to achieve performance similar to C, C++ and Fortran, without having to switch languages or Python interpreters.

**High-Level architecture of Numba**

The Numba translation process can be translated in a set of important steps ranging from the Bytecode analysis to the final machine code generation. The picture bellow illustrates this process, where the green boxes correspond to the frontend of the Numba compiler and the blue boxes belong to the backend.

![Image](https://cdn.iisc.talentsprint.com/CDS/Images/numba.png)

To know more about Numba click [here](https://towardsdatascience.com/speed-up-your-algorithms-part-2-numba-293e554c5cc1)


### Install dependencies

In [1]:
%%capture
!pip -q install numpy==1.26.0 --force-reinstall
!pip -q install numba==0.58.1     # Install numba

# Restart the Runtime after running this cell.

## <font color='#992211'>Restart the Runtime</font>

### Setup Steps:

In [1]:
#@title Please enter your registration id to start: { run: "auto", display-mode: "form" }
Id = "2416218" #@param {type:"string"}

In [2]:
#@title Please enter your password (your registered phone number) to continue: { run: "auto", display-mode: "form" }
password = "8975485400" #@param {type:"string"}

In [3]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()

notebook= "M8_AST_01_Numba_C" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")

    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    ipython.magic("notebook -e "+ notebook + ".ipynb")

    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:
        print(r["err"])
        return None
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getAnswer() and getComplexity() and getAdditional() and getConcepts() and getComments() and getMentorSupport():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional,
              "concepts" : Concepts, "record_id" : submission_id,
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook,
              "feedback_experiments_input" : Comments,
              "feedback_mentor_support": Mentor_support}
      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:
        print(r["err"])
        return None
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://aimlops-iisc.talentsprint.com/notebook_submissions")
        #print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
        return submission_id
    else: submission_id


def getAdditional():
  try:
    if not Additional:
      raise NameError
    else:
      return Additional
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None

def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None


# def getWalkthrough():
#   try:
#     if not Walkthrough:
#       raise NameError
#     else:
#       return Walkthrough
#   except NameError:
#     print ("Please answer Walkthrough Question")
#     return None

def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None


def getMentorSupport():
  try:
    if not Mentor_support:
      raise NameError
    else:
      return Mentor_support
  except NameError:
    print ("Please answer Mentor support Question")
    return None

def getAnswer():
  try:
    if not Answer:
      raise NameError
    else:
      return Answer
  except NameError:
    print ("Please answer Question")
    return None


def getId():
  try:
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()
else:
  print ("Please complete Id and Password cells before running setup")



Setup completed successfully


### Import required packages

In [4]:
from numba import jit, vectorize      # Importing all the functions present in numba package
import numpy as np                    # Importing numpy package under a name np

#import warnings
#warnings.filterwarnings("ignore")

Let us first write a small python code to find the sums of all the elements of a given array and then understand its implementation using numba.

In [5]:
# Python version code
# Defining a function
def ArraySum(array):
    m, n = array.shape               # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not Pythonic style)
    total = 0                        # Defining a variable
    for j in range(m):               # iterating over rows
        for i in range(n):           # iterating over columns
            total += array[j, i]     # calculating the sum
    return total                     # returning the sum of elements of an array

In [6]:
A = np.random.random((200,200))  # Generating a numpy array
ArraySum(A)                      # Calling the ArraySum function

20084.239787464845

Now let us time the execution of ArraySum function while calculating the sum of elements in array 'A'

In [7]:
# timing the execution
%timeit ArraySum(A)

7.7 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


To know more about the timeit function click [here](https://docs.python.org/3/library/timeit.html)

Now let us see how to speed up execution of ArraySum function while calculating the sum of elements in array 'A' using numba

**Jit as function call**

In [8]:
sum_array_numba = jit()(ArraySum)      # Calling the jit compiler

  sum_array_numba = jit()(ArraySum)      # Calling the jit compiler


The function **sum_array_numba** is a version of **ArraySum** that is “targeted” for JIT-compilation.

In [9]:
# Timing the excution of sum_array_numba function

%timeit sum_array_numba(A)

62.9 µs ± 7.44 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


From the above codes, we can see that execution of the code gets faster using JIT Compiler. Now let us write numpy version of the code to calculate the sum of elements in an array and timeit

In [10]:
A.sum()        # using in-built sum function to find sum of elements in an array (Its better idea; Pythonic style)

20084.23978746482

In [11]:
# Timing the code
%timeit A.sum()

14.4 µs ± 1.8 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


To know more about the sum function click [here](https://docs.python.org/3/library/functions.html#sum)

In the above code, we have created a JIT compiled version **ArraySum** of via the call **jit()(ArraySum)**. In practice this would typically be done using an alternative **decorator** syntax.

To know more about Python decorators click [here](https://link.medium.com/rixEI1907db)

**Decorator Notation**

 To target a function for JIT compilation we will put **@jit** before the ArraySum function definition.

In [12]:
@jit
# Defining a function
def ArraySum(array):
    m, n = array.shape                 # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not  Pythonic style)
    total = 0                          # Defining a variable
    for j in range(m):                 # iterating over rows
        for i in range(n):             # iterating over columns
            total += array[j, i]       # calculating the sum
    return total                       # returning the sum of elements of an array

  @jit


In [13]:
# Timing the execution
%timeit ArraySum(A)

59.2 µs ± 386 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


#### Think for a While!!

- How does Numba get the code to run quickly?

Numba examines Python bytecode and then translates this into an 'intermediate representation'. We can view this using inspect_types method.

In [14]:
ArraySum.inspect_types()      # Inspecting the types

ArraySum (Array(float64, 2, 'C', False, aligned=True),)
--------------------------------------------------------------------------------
# File: <ipython-input-12-553a806b45af>
# --- LINE 1 --- 
# label 0
#   array = arg(0, name=array)  :: array(float64, 2d, C)

@jit

# --- LINE 2 --- 

# Defining a function

# --- LINE 3 --- 

def ArraySum(array):

    # --- LINE 4 --- 
    #   $6load_attr.1 = getattr(value=array, attr=shape)  :: UniTuple(int64 x 2)
    #   $16unpack_sequence.4 = exhaust_iter(value=$6load_attr.1, count=2)  :: UniTuple(int64 x 2)
    #   del $6load_attr.1
    #   $16unpack_sequence.2 = static_getitem(value=$16unpack_sequence.4, index=0, index_var=None, fn=<built-in function getitem>)  :: int64
    #   $16unpack_sequence.3 = static_getitem(value=$16unpack_sequence.4, index=1, index_var=None, fn=<built-in function getitem>)  :: int64
    #   del $16unpack_sequence.4
    #   m = $16unpack_sequence.2  :: int64
    #   del $16unpack_sequence.2
    #   n = $16unpack_sequence

From the above results, we can infer that
- every line of Python code is preceded by several lines of Numba IR(Intermediate Representations) code that gives a glimpse into what Numba is doing to the Python code behind the scenes.
- at the end of most lines there are type annotations that show how Numba is treating variables and function calls.

### Compilation modes

There are two important modes: **nopython** and **object**.

> The **nopython** completely avoids the python interpreter and translates the full code to native instructions that can be run without the help of Python .

>However, if for some reason, that mode is not available (for example, when using unsupported Python features or external libraries) the compilation will fall back to the **object** mode, where it uses the Python interpreter when it is unable to compile some code .

Naturally, the nopython mode is the one which offers the best performance gains.

**nopython mode**

In [15]:
@jit(nopython=True)
# Defining a function
def ArraySum(array):
    m, n = array.shape                 # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not  Pythonic style)
    total = 0                          # Defining a variable
    for j in range(m):                 # iterating over rows
        for i in range(n):             # iterating over columns
            total += array[j, i]       # calculating the sum
    return total                       # returning the sum of elements of an array

In [16]:
# Calling the above defined function and timing it
%timeit ArraySum(A)

59.6 µs ± 375 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


#### Compilation flags for jit

There are two other main compilation flags for @jit

**a. cache mode**

if we don't always want to be caught up in compile time for each run, we could use cache mode. This will actually save the compiled function into something like a pyc file in your \_\_pycache\_\_ directory, so even between sessions we should have fast performance of the function / code.

In [17]:
@jit(cache=True)
# Defining a function
def ArraySum(array):
    m, n = array.shape                 # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not  Pythonic style)
    total = 0                          # Defining a variable
    for j in range(m):                 # iterating over rows
        for i in range(n):             # iterating over columns
            total += array[j, i]       # calculating the sum
    return total                       # returning the sum of elements of an array

  @jit(cache=True)


In [18]:
# Calling the above defined function and timing it
%timeit ArraySum(A)

60.7 µs ± 1.12 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


**b. nogil mode**

Whenever Numba optimizes Python code to native code that only works on native types and variables (rather than Python objects), it is not necessary anymore to hold Python's global interpreter lock (GIL). Numba will release the GIL when entering such a compiled function if you passed `nogil=True`.

To know more about nogil mode click [here](https://docs.python.org/3/glossary.html#term-global-interpreter-lock)

In [19]:
# Performing multi-threading using nogil

@jit(nogil=True)  # Option to release the gil
# Defining a function
def ArraySum(array):
    m, n = array.shape                 # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not  Pythonic style)
    total = 0                          # Defining a variable
    for j in range(m):                 # iterating over rows
        for i in range(n):             # iterating over columns
            total += array[j, i]       # calculating the sum
    return total                       # returning the sum of elements of an array

  @jit(nogil=True)  # Option to release the gil


In [20]:
# Calling the above defined function and timing it
%timeit ArraySum(A)

59.2 µs ± 152 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


Now let us add Add `fastmath=True` to trade accuracy for speed in some computations and time it

In [21]:
@jit(fastmath=True)
# Defining a function
def ArraySum(array):
    m, n = array.shape                 # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not  Pythonic style)
    total = 0                          # Defining a variable
    for j in range(m):                 # iterating over rows
        for i in range(n):             # iterating over columns
            total += array[j, i]       # calculating the sum
    return total                       # returning the sum of elements of an array

  @jit(fastmath=True)


In [22]:
# Calling the above defined function and timing it
%timeit ArraySum(A)

8.03 µs ± 286 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


#### ParallelAccelerator

- ParallelAccelerator is a special compiler pass contributed by Intel Labs
    - Todd A. Anderson, Ehsan Totoni, Paul Liu
    - Based on similar contribution to Julia
- Automatically generates mulithreaded code in a Numba compiled-function:
    - Array expressions and reductions
    - Random functions
    - Dot products
    - Reductions
    - Explicit loops indicated with prange() call
    
To know more about Parallel Accelerator click [here](https://numba.pydata.org/numba-doc/dev/user/parallel.html)


Now let us add `Parallel = True` tag in the @jil to use multi-core CPU via threading and to perform automatic parallelization

In [23]:
# without using parallel tag

@jit
def f(x): # Defining a function
    return np.cos(x) ** 2 + np.sin(x) ** 2        # calculating the value

  @jit


In [24]:
data = np.random.random((10000000))

In [25]:
%timeit f(data)

248 ms ± 7.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [26]:
# Using parallel tag

@jit(parallel=True)
def f(x):
    return np.cos(x) ** 2 + np.sin(x) ** 2

  @jit(parallel=True)


In [27]:
%timeit f(data)

214 ms ± 9.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Before we drive deep into Numba, let us try to understand few limitations of Numba

In [28]:
# Example 1
@jit
def hello(n):
    return ["hell0", 44] * 4

  @jit


In [29]:
%timeit hello(1)

Compilation is falling back to object mode WITH looplifting enabled because Function "hello" failed type inference due to: No implementation of function Function(<built-in function mul>) found for signature:
 
 >>> mul(LiteralList((Literal[str](hell0), Literal[int](44))), Literal[int](4))
 
There are 14 candidate implementations:
  - Of which 7 did not match due to:
  Overload in function 'MulList.generic': File: numba/core/typing/listdecl.py: Line 114.
    With argument(s): '(Poison<LiteralList((Literal[str](hell0), Literal[int](44)))>, int64)':
   Rejected as the implementation raised a specific error:
     TypingError: Poison type used in arguments; got Poison<LiteralList((Literal[str](hell0), Literal[int](44)))>
  raised from /usr/local/lib/python3.11/dist-packages/numba/core/types/functions.py:236
  - Of which 6 did not match due to:
  Overload of function 'mul': File: <numerous>: Line N/A.
    With argument(s): '(LiteralList((Literal[str](hell0), Literal[int](44))), Literal[int](

340 ns ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


After the above code, we will get the desired output but with a warning as Compilation is falling back to object mode. Now let us run the above code in nopython mode to see the limitation.

In [30]:
# Example 1
@jit(nopython=True)
def hello(n):
    return ["hell0", 44]

In [31]:
# Example 2
@jit(nopython=True)
def display():
    data = {"numbers":[1, 3, 4], "evens":[2, 4, 6]}
    return data["numbers"]

To know more about limitations of Numba click [here](https://www.oreilly.com/library/view/python-high-performance/9781787282896/6e5cc5c4-ad53-4657-b502-6630dd9efced.xhtml)

#### Universal Functions (Ufuncs)

- Ufuncs are a core concept in NumPy for array-oriented computing.
- A function with scalar inputs is broadcast across the elements of the input arrays:
    - np.add([1, 2, 3], 5) = [6, 7, 8]
- Parallelism is present, by construction. Numba will generate loops and can automatically multi-thread if requested.

To know more about Numpy Ufuncs click [here](https://numpy.org/doc/stable/reference/ufuncs.html)

In [32]:
# Numpy ufuncs
print(np.add(4, 5))             # Adding two numbers
print(np.add([1, 4, 5], 6))     # Adding 6 to the elements in the list
print(np.add(1, [3, 4]))        # Adding 1 to the elements in the list
print(np.add.accumulate([4, 5, 7, 2, 4]))       # Accumulate the result of applying the operator to all elements.

9
[ 7 10 11]
[4 5]
[ 4  9 16 18 22]


In [33]:
# Numba ufuncs
# Function to add two values
@vectorize("(int64, int64)")
def add(x, y):
    # adding the values
    return x + y

In [34]:
print(add(4, 5))              # Adding two numbers
print(add([1, 4, 5], 6))      # Adding 6 to the elements in the list
print(add(1, [3, 4]))         # Adding 1 to the elements in the list
print(add.accumulate([4, 5, 7, 2, 4])) # Accumulate the result of applying the operator to all elements.

9
[ 7 10 11]
[4 5]
[ 4  9 16 18 22]


To know more about vectorize decorator click [here](https://numba.pydata.org/numba-doc/dev/user/vectorize.html)

#### Research Question

1. Write a code to approximate $\pi$ by Monte Carlo and, compare speed with and without Numba when the sample size is large.

    To know about $\pi$ by Monte Carlo click [here](https://drive.google.com/file/d/1S1Jo-WAllaBh6JhRPIIc3kLSufgaPRjn/view?usp=sharing)

### Please answer the questions below to complete the experiment:




In [35]:
# @title Select the FALSE statement below: { run: "auto", form-width: "500px", display-mode: "form" }
Answer = "Numba is a library that performs JIT compilation that translates pure python code to optimized machine code at runtime using the LLVM industry standard compiler" #@param ["","Just-in-time (JIT) compilation means compilation of a function at execution time, as opposed to compilation of a function in a separate step before running the program code", "nopython=True compiles the decorated function so that it will run entirely with the involvement of the Python interpreter", "Numba is a library that performs JIT compilation that translates pure python code to optimized machine code at runtime using the LLVM industry standard compiler"]

In [36]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Good and Challenging for me" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [37]:
#@title If it was too easy, what more would you have liked to be added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "Nothing" #@param {type:"string"}


In [38]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["","Yes", "No"]


In [39]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [40]:
#@title Mentor Support: { run: "auto", vertical-output: true, display-mode: "form" }
Mentor_support = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [41]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id = return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Your submission is successful.
Ref Id: 7204
Date of submission:  18 May 2025
Time of submission:  09:59:25
View your submissions: https://aimlops-iisc.talentsprint.com/notebook_submissions
