# Introduction to Numpy powered by oneAPI

### Learning Obejctives:

- Desribe why replacing inefficient code, such as time consuming loops, wastes resources, and time
- Describe why using Python for highly repetitive small tasks is inefficient
- Describe the additive value of leveraging packages such as Numpy which are powered by oneAPI in a cloud world 
- Describe the importance of keeping oneAPI and 3rd party package such as Numpy, Scipy and others is important
- Enumerate ways in which Numpy accelerates code
- Apply loop replacement methodologies in a variety of scenarios


#### Here is a list of topics we will explore in this module:
- The "WHY", Why use Numpy as replacement for loops?: Its FAST!
- Numpy Universal Functions or ufuncs
- Numpy Broadcasting 
- Numpy Aggregations
- Numpy Where
- Numpy Select
- A quick refernce to Scipy algorithms to set the satge for the next module 

### Replacing Inefficient code
![SLowWadeWater.PNG](Assets/SlowWadeWater.png)

Code that written inefficiently:
- Be less readable (less pythonic)
- Can consume more time
- Waste energy 
- Waste purchased or leased resources

This module will focus on trying to simulatenously make code readbale as well as more efficient as measured by how well we accerlate code examples. While the code exmpale themselves are small examples, the techniques described are application in a wide variety of scenarios in AI.

### Python loops are bad for performance
**Python is great!** Its a great  language for AI. There are many, many advantages in using Python especially for data science.
- easy to program (dont worry about data types and persnicity syntax at least relative to C/C++ and other languges
- FAST for deveoping code!
- leverages huge array of libaries to conquer any domain
- lots of quick answers to common issues in Stack Exchange

#### Python, however, is slow for Massively repeating small tasks - such as found in loops! **Python loops are SLOW**

- Compared to C, C++, Fortran and other typed languages
- Python is forced to look up every occurnace and type of variable in a loop to determine what operations it can perform on that data type
- It cannot usually take advantage of advances in hardware in terms of vector width increases, multiple cores, new instructions from a new HW instruction set, new AI accelerators, effective amd cache meory layout, and more

#### BUT: Python has library remedies to these ills!
- Importing key libnraries shift the burden of computation to highly efficient code
- Numpy, for example, through its focus on elementwise efficient operations, gives indirect access to the efficiencies afforded in "C"  
- libraies included in oneAPI and Numpy, Scipy, Scikit-learn all powered by oneAPI give access to modern advancements in hardware level: access to better cache and memory usage, access to low level vector instructions, and more.
- By leveraging packages such as these powered by oneAPI AND keeping libraries up to date, more capability is added to your underlyig frameworks so that moving code, epsecially in a cloud world, can give you ready access to hardware accerlations, in many cases, without having to modify code this vectorized code
- Routines are written in C (based on CPython framework)
- Numpy arrays are densely packed arrays of homogeneous type. Python lists, by contrast, are arrays of pointers to objects, even when all of them are of the same type. So, you get the benefits of not having to check data types, and you also get locality of reference. Also, many Numpy operations are implemented in C, avoiding the general cost of loops in Python, pointer indirection and per-element dynamic type checking. The speed boost depends on which operations you’re performing. 
            
**Goal of this module: Search and destroy (replace) loops**

Avoid loops if you can - find an alternative if possible. Sometime it cannot be done - true data dependecies may limit our options. But many, many time there are alternatives.


**The problem** 
- Loops isolate your code from hardware and software advances that update frequently.
- They prevent you from effectively using key underlying resources - it is a waste.
- They consume your time!

### Reference:

- [Video:  **Losing your Loops Fast Numerical Computing with NumPy** by Jake VanderPlas ](https://www.youtube.com/watch?v=EEUXKG97YRw). 

- [Book:  **Python Data Science Handbook** by Jake VanderPlas](https://jakevdp.github.io/PythonDataScienceHandbook/). 

- [Book:  **Elegant SciPy: The Art of Scientific Python** by by Juan Nunez-Iglesias, Stéfan van der Walt, Harriet Dashnow](https://www.amazon.com/Elegant-SciPy-Art-Scientific-Python/dp/1491922877)

- [Article:  **The Ultimate NumPy Tutorial for Data Science Beginners**](https://www.analyticsvidhya.com/blog/2020/04/the-ultimate-numpy-tutorial-for-data-science-beginners/) :   by Aniruddha April 28, 2020 at www.analyticsvidhya.com

- [Academic Lecture pdf: **Vectorization** by  Aaron Birkland Cornell CAC](http://www.cac.cornell.edu/education/training/StampedeJune2013/Vectorization-2013_06_18.pdf)

# Exercises (7 in total):

Do a page search for each **Exercise** in this notebook. Complete All seven exercises. Code in cells above each exercise may give insight into a solid approach

## Why use Numpy as replacement for loops?

## Its FAST!

In this section we will explore a smattering a different Numpy approaches that lead to accelerations over naive loops

The bigger (more iterations) of a loop the better Numpy gets and the bigger (more dimensions of data) the better Numpy gets as a general rule.

Ultimately, we are hunting to **"BIG LOOPS"**. What is a BIG LOOP? One that consumes a lot of time! Sometimes, even a loop with somehat smaller iteration can be time consuming because each iteration takes long by itself. Well call these BIG LOOP too.

#### Compare different ways of computing Log10 of a lager vector

In this next sectoin, we will create a list of 1-million random floating point numbers. Then we will use **for** loop to iterate over its elements, take **Log10** and store the value in another list. We'll compare the execution speed with that of a direct NumPy Log10 operation.

for this log10 problem, We will compare
- Naive loop
- Map function
- List Comprehension
- Numpy

#### Import updated libraries

In [2]:

import numpy as np
from math import log10 as lg10
import time
import matplotlib.pyplot as plt
import random
import time
%matplotlib inline

Whatever loopy code you have - spend time looking for alternatives such as this. The acceleration can be exrardinary

# Numpy Aggregation

Aggregation is where we operate on an array and generate resulting data with a smaller dimension than the original array

The aggregations can typically be done using different axes to control the direction

![Aggregation0.png](attachment:1485cd7a-0b51-4e75-8006-45e52c161eb4.png)

![Aggregation1.PNG](attachment:4bd02f93-aec1-4e30-801a-471d93955a8b.PNG)

Common examples in AI are:
- min
- max
- sum
- mean
- std ... among others

----------------------------------------------------------------------------------
| Functions | Description | 
| --- | --- |
| np.mean() | Compute the arithmetic mean along the specified axis. |
| np.std() | Compute the standard deviation along the specified axis. |
| np.var() | Compute the variance along the specified axis. |
| np.sum() | Sum of array elements over a given axis. |
| np.prod() | Return the product of array elements over a given axis. |
| np.cumsum() | Return the cumulative sum of the elements along a given axis. |
| np.cumprod() | Return the cumulative product of elements along a given axis. |
| np.min(), np.max() | Return the minimum / maximum of an array or minimum along an axis. |
| np.argmin(), np.argmax() | Returns the indices of the minimum / maximum values along an axis |
| np.all() | Test whether all array elements along a given axis evaluate to True. |
| np.any() | Test whether any array element along a given axis evaluates to True. |


Specialty calcualtions exist so always eamine your code with a view to simply and remove loops with off the shelf solutions

For example, in AI there re times we need to add the values of the diagonal of special arrays.

For very long vectors these will accelerate noticibly and more so for larger multdimensional arrays

Below is a naive approach for addng all the diagnoal elements of a smallish array of 1000 x 1000. So the accerlation is reasonable but not outlandish


In [3]:
A = np.arange(1_000_000).reshape(1000, 1000)
Diag = 0

t1 = time.time()
for i in range(len(A)):
    for j in range(len(A)): 
        if i == j:
            Diag += A[i,j]
t2 = time.time()
Elapsed_Diag_base = t2-t1
print("elapsed time: ", Elapsed_Diag_base)
print("Diag: ", Diag)

elapsed time:  0.06961965560913086
Diag:  499999500


## Exercise:

Use a search engine to find numpy method to find the sum of the diagonals of this array.
- Hint: trace
- Hint: Diag = np.trace(A)

In [4]:
t1 = time.time()
#### insert your code below #####
Diag = np.trace(A)

#################################    
t2 = time.time()
Elapsed_Diag_numpy = t2 - t1
print("elapsed time: ", Elapsed_Diag_numpy)
print("Diag: ", Diag)
print("Acceleration: {:4.0f}X".format(Elapsed_Diag_base/Elapsed_Diag_numpy))

elapsed time:  0.0009424686431884766
Diag:  499999500
Acceleration:   74X


In [5]:
print("Done")

Done
