# Introduction to Numpy powered by oneAPI

### Learning Objectives: 

- Describe why replacing inefficient code, such as time-consuming loops, wastes resources, and time
- Describe why using Python for highly repetitive small tasks is inefficient
- Describe the additive value of leveraging packages such as NumPy which are powered by oneAPI in a cloud world 
- Describe the importance of keeping oneAPI and 3rd party package such as NumPy, SciPy and others is important
- Enumerate ways in which NumPy accelerates code
- Apply loop replacement methodologies in a variety of scenarios



#### Here is a list of topics we will explore in this module:
- The "WHY", Why use NumPy as replacement “for loops”?: Its FAST!
- NumPy Universal Functions or ufuncs
- NumPy Broadcasting 
- NumPy Aggregations
- NumPy Where
- NumPy Select
- A quick reference to SciPy algorithms to set the stage for the next module 


### Replacing Inefficient code
![SLowWadeWater.PNG](Assets/SlowWadeWater.png)

Code that written inefficiently:
- Be less readable (less pythonic)
- Can consume more time
- Waste energy 
- Waste purchased or leased resources


This module will focus on trying to simultaneously make code readable as well as more efficient as measured by how well we accelerate code examples. While the code example themselves are small examples, the techniques described are application in a wide variety of scenarios in AI.

### Python loops are bad for performance
**Python is great!** Its a great language for AI. There are many, many advantages in using Python especially for data science.
- Easy to program (don’t worry about data types and fussy syntax at least relative to C/C++ and other languages
- FAST for developing code!
- Leverages huge array of libraries to conquer any domain
- Lots of quick answers to common issues in Stack Exchange


#### Python, however, is slow for Massively repeating small tasks - such as found in loops! **Python loops are SLOW**

- Compared to C, C++, Fortran and other typed languages
- Python is forced to look up every occurrence and type of variable in a loop to determine what operations it can perform on that data type
- It cannot usually take advantage of advances in hardware in terms of vector width increases, multiple cores, new instructions from a new HW instruction set, new AI accelerators, effective cache memory layout, and more


#### BUT: Python has library remedies to these ills!
- Importing key libraries shift the burden of computation to highly efficient code
- NumPy, for example, through its focus on elementwise efficient operations, gives indirect access to the efficiencies afforded in "C" 
- libraries included in oneAPI and NumPy, SciPy, Scikit-learn all powered by oneAPI give access to modern advancements in hardware level: access to better cache and memory usage, access to low level vector instructions, and more.
- By leveraging packages such as these powered by oneAPI AND keeping libraries up to date, more capability is added to your underlying frameworks so that moving code, especially in a cloud world, can give you ready access to hardware acceleration, in many cases, without having to modify code this vectorized code
- Routines are written in C (based on Cython framework)
- NumPy arrays are densely packed arrays of homogeneous type. Python lists, by contrast, are arrays of pointers to objects, even when all of them are of the same type. So, you get the benefits of not having to check data types, and you also get locality of reference. Also, many NumPy operations are implemented in C, avoiding the general cost of loops in Python, pointer indirection and per-element dynamic type checking. The speed boost depends on which operations you’re performing. 

            
**Goal of this module: Search and destroy (replace) loops**

Avoid loops if you can - find an alternative if possible. Sometimes it cannot be done - true data dependencies may limit our options. But many, many time there are alternatives.


**The problem** 
- Loops isolate your code from hardware and software advances that update frequently.
- They prevent you from effectively using key underlying resources - it is a waste.
- They consume your time!


### Reference:

- [Video:  **Losing your Loops Fast Numerical Computing with NumPy** by Jake VanderPlas ](https://www.youtube.com/watch?v=EEUXKG97YRw). 

- [Book:  **Python Data Science Handbook** by Jake VanderPlas](https://jakevdp.github.io/PythonDataScienceHandbook/). 

- [Book:  **Elegant SciPy: The Art of Scientific Python** by by Juan Nunez-Iglesias, Stéfan van der Walt, Harriet Dashnow](https://www.amazon.com/Elegant-SciPy-Art-Scientific-Python/dp/1491922877)

- [Article:  **The Ultimate NumPy Tutorial for Data Science Beginners**](https://www.analyticsvidhya.com/blog/2020/04/the-ultimate-numpy-tutorial-for-data-science-beginners/) :   by Aniruddha April 28, 2020 at www.analyticsvidhya.com

- [Academic Lecture pdf: **Vectorization** by  Aaron Birkland Cornell CAC](http://www.cac.cornell.edu/education/training/StampedeJune2013/Vectorization-2013_06_18.pdf)

# Exercises (7 in total):

Do a page search for each **Exercise** in this notebook. Complete All seven exercises. Code in cells above each exercise may give insight into a solid approach

## Why use Numpy as replacement for loops?

## Its FAST!

In this section we will explore a smattering a different NumPy approaches that lead to accelerations over naive loops

The bigger (more iterations) of a loop the better NumPy gets and the bigger (more dimensions of data) the better NumPy gets generally.

Ultimately, we are hunting to "BIG LOOPS". What is a BIG LOOP? One that consumes a lot of time! Sometimes, even a loop with somewhat smaller iteration can be time consuming because each iteration takes long by itself. Well call these BIG LOOP too.


#### Compare different ways of computing Log10 of a larger vector

In this next section, we will create a list of 1-million random floating-point numbers. Then we will use for loop to iterate over its elements, take Log10 and store the value in another list. We'll compare the execution speed with that of a direct NumPy Log10 operation.

For this log10 problem, we will compare:

- Naive loop
- Map function
- List Comprehension
- NumPy


#### Import updated libraries

In [42]:

import numpy as np
from math import log10 as lg10
import time
import matplotlib.pyplot as plt
import random
%matplotlib inline

np.__version__

'1.21.4'

# Pairwise Distance Comparisons

In [43]:
import numpy as np
from scipy.spatial import distance_matrix
from scipy.spatial import distance
from sklearn.metrics import pairwise_distances

a = np.array([[0,0], [1,1], [2,2], [3,3]])
b = np.array([[0,0], [-1,-1], [-2,-2], [-3,-3], [-4,-4]])

print("a.shape",a.shape)
print("b.shape",b.shape)
print('Euclidean')
print(distance_matrix(a, b))
print('Manhattan')
print(distance.cdist(a, b, 'cityblock'))


a.shape (4, 2)
b.shape (5, 2)
Euclidean
[[0.         1.41421356 2.82842712 4.24264069 5.65685425]
 [1.41421356 2.82842712 4.24264069 5.65685425 7.07106781]
 [2.82842712 4.24264069 5.65685425 7.07106781 8.48528137]
 [4.24264069 5.65685425 7.07106781 8.48528137 9.89949494]]
Manhattan
[[ 0.  2.  4.  6.  8.]
 [ 2.  4.  6.  8. 10.]
 [ 4.  6.  8. 10. 12.]
 [ 6.  8. 10. 12. 14.]]


# Euclidean Distance using broadcasting


In [44]:
# Euclidean Distance

np.linalg.norm(a[:, None, :] - b[None, :, :], axis=-1)

array([[0.        , 1.41421356, 2.82842712, 4.24264069, 5.65685425],
       [1.41421356, 2.82842712, 4.24264069, 5.65685425, 7.07106781],
       [2.82842712, 4.24264069, 5.65685425, 7.07106781, 8.48528137],
       [4.24264069, 5.65685425, 7.07106781, 8.48528137, 9.89949494]])

# Manhattan Distance using broadcasting

In [45]:
# Manhattan Distance

np.sum(np.abs(a[:, None, :] - b[None, :, :]), axis=-1)

array([[ 0,  2,  4,  6,  8],
       [ 2,  4,  6,  8, 10],
       [ 4,  6,  8, 10, 12],
       [ 6,  8, 10, 12, 14]])

In [46]:
def PointsDist(a,b):
    s = 0
    for i in range(len(a)):
        s += (a[i]-b[i])**2
    return np.sqrt(s)

In [47]:
a = np.array([1,1])
b = np.array([-1,-1])

PointsDist(a,b)

2.8284271247461903

In [48]:
np.random.seed(42)
a = np.random.randint(10, size=(20000,2))
b = np.random.randint(10, size=(20000,2))
timing = {}

In [49]:
t1 = time.time()
distance.cdist(a, b, 'euclidean')
timing['scipy cdist'] = time.time() - t1

In [50]:
t1 = time.time()
distance_matrix(a, b)
timing['sklearn pairwise_distances'] = time.time() - t1

In [51]:
t1 = time.time()
np.linalg.norm(a[:, None, :] - b[None, :, :], axis=-1)
timing['broadcasting'] = time.time() - t1

In [52]:
from sklearn.metrics.pairwise import euclidean_distances
t1 = time.time()
euclidean_distances(a,b)
timing['sklearn euclidean_distances'] = time.time() - t1

In [53]:
from sklearnex import patch_sklearn, unpatch_sklearn
unpatch_sklearn()
from sklearn.metrics.pairwise import cosine_distances
from sklearn.metrics import pairwise_distances

t1 = time.time()
#cosine_distances(a.reshape(-1,1),b.reshape(-1,1))
pairwise_distances(a.reshape(-1,1),b.reshape(-1,1),metric='cosine')
timing['sklearn cosine_distances'] = time.time() - t1

In [54]:

patch_sklearn()
from sklearn.metrics.pairwise import cosine_distances
from sklearn.metrics import pairwise_distances

t1 = time.time()
pairwise_distances(a.reshape(-1,1),b.reshape(-1,1),metric='cosine')
timing['sklearnEX cosine_distances'] = time.time() - t1

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


In [55]:
timing

{'scipy cdist': 2.0619640350341797,
 'sklearn pairwise_distances': 13.997550010681152,
 'broadcasting': 14.801432371139526,
 'sklearn euclidean_distances': 1.840453863143921,
 'sklearn cosine_distances': 9.12495756149292,
 'sklearnEX cosine_distances': 9.102379322052002}