## Recap: NumPy Indexing and Selection

`ndarrays` can be indexed using the standard Python `x[obj]` syntax, where x is the array and obj the selection. There are three kinds of indexing available: 
   - field access, 
   - basic slicing, 
   - advanced indexing. 
 
Which one occurs depends on obj.
   - https://docs.python.org/release/2.3.5/whatsnew/section-slices.html
   - https://realpython.com/pandas-settingwithcopywarning/s


`Referencing narrays follows the principles: Right Assignment ( = xxx)`
- `Slicing arrays returns views, so the initial narray can be modified`
- `Using index and mask arrays returns copies.`

`Referencing narrays follows the principles: Left assignment (xxx = )`
- `Slicing, index and mask arrays returns views, so the initial narray can be modified`





## Vectorization
    - https://numpy.org/doc/stable/glossary.html#term-vectorization
    
- Most of the application has to deal with a large number of datasets. Hence, a non-computationally-optimal function can become a huge bottleneck in your algorithm and can take result in a model that takes ages to run. To make sure that the code is computationally efficient, we will use vectorization.


Vectorization describes the absence of any explicit looping, indexing, etc., in the code: these things are taking place, of course, just “behind the scenes” in optimized, pre-compiled C code. 

Vectorized code has many advantages, among which are:
- vectorized code is more concise and easier to read
- fewer lines of code generally means fewer bugs
- the code more closely resembles standard mathematical notation (making it easier, typically, to correctly code mathematical constructs)
- vectorization results in more “Pythonic” code. Without vectorization, our code would be littered with inefficient and difficult to read for loops.

Various operations are being performed over vector such as: 
- dot product of vectors which is also known as scalar product as it produces single output, 
- outer products which results in square matrix of dimension equal to length X length of the vectors, 
- element wise multiplication which products the element of same indexes and dimension of the matrix remain unchanged. 
    
Instead of executing operations on individual array items, one at a time, your code is much more efficient if you try to stick to array operations. This is called *vectorization*. This way, you can benefit from NumPy's many optimizations.

In [4]:
# Dot product 
import time 
import numpy as np
import array 

# 8 bytes size int 
a = array.array('q') 
for i in range(100000): 
    a.append(i); 

b = array.array('q') 
for i in range(100000, 200000): 
    b.append(i) 

# classic dot product of vectors implementation 
tic = time.process_time() 
dot = 0.0; 

for i in range(len(a)): 
    dot += a[i] * b[i] 

toc = time.process_time() 

print("dot_product = "+ str(dot)); 
print("Computation time = " + str(1000*(toc - tic )) + "ms") 

dot_product = 833323333350000.0
Computation time = 31.25ms


In [5]:
n_tic = time.process_time() 
n_dot_product = np.dot(a, b) 
n_toc = time.process_time() 

print("\nn_dot_product = "+str(n_dot_product)) 
print("Computation time = "+str(1000*(n_toc - n_tic ))+"ms") 


n_dot_product = 833323333350000
Computation time = 0.0ms


## Broadcasting
    - https://numpy.org/doc/stable/user/basics.broadcasting.html
    - https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html
    - https://numpy.org/doc/stable/user/basics.broadcasting.html#module-numpy.doc.broadcasting
    
    
The term broadcasting refers to how numpy treats arrays with different Dimension during arithmetic operations which lead to certain constraints, the smaller array is broadcast across the larger array so that they have compatible shapes.
     
    Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python as we know that Numpy implemented in C. It does this without making needless copies of data and which leads to efficient algorithm implementations. There are cases where broadcasting is a bad idea because it leads to inefficient use of memory that slow down the computation.
    
    
NumPy operations are usually done on pairs of arrays on an element-by-element basis. In the simplest case, the two arrays must have exactly the same shape, as in the following example:

In [6]:
a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0, 2.0, 2.0])
a * b

array([2., 4., 6.])

NumPy’s broadcasting rule relaxes this constraint when the arrays’ shapes meet certain constraints. The simplest broadcasting example occurs when an array and a scalar value are combined in an operation:

In [8]:
a = np.array([1.0, 2.0, 3.0])
b = 2.0
a * b

array([2., 4., 6.])

The result is equivalent to the previous example where b was an array. We can think of the scalar b being stretched during the arithmetic operation into an array with the same shape as a. The new elements in b are simply copies of the original scalar. The stretching analogy is only conceptual. NumPy is smart enough to use the original scalar value without actually making copies so that broadcasting operations are as memory and computationally efficient as possible.


When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions and works its way forward. Two dimensions are compatible when

1. they are equal, or
2. one of them is 1

If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the size that is not 1 along each axis of the inputs.

Arrays do not need to have the same number of dimensions. For example, if you have a 256x256x3 array of RGB values, and you want to scale each color in the image by a different value, you can multiply the image by a one-dimensional array with 3 values. Lining up the sizes of the trailing axes of these arrays according to the broadcast rules, shows that they are compatible:
`Image  (3d array): 256 x 256 x 3`

`Scale  (1d array):             3`

`Result (3d array): 256 x 256 x 3`

When either of the dimensions compared is one, the other is used. In other words, dimensions with size 1 are stretched or “copied” to match the other.

In the following example, both the A and B arrays have axes with length one that are expanded to a larger size during the broadcast operation:

----------------------------------

`A      (4d array):  8 x 1 x 6 x 1`

`B      (3d array):      7 x 1 x 5`

`Result (4d array):  8 x 7 x 6 x 5`

----------------------------------

`A      (2d array):  5 x 4`

`B      (1d array):      1`

`Result (2d array):  5 x 4`

----------------------------------

`A      (2d array):  5 x 4`

`B      (1d array):      4`

`Result (2d array):  5 x 4`

----------------------------------

`A      (3d array):  15 x 3 x 5`

`B      (3d array):  15 x 1 x 5`

`Result (3d array):  15 x 3 x 5`

----------------------------------

`A      (3d array):  15 x 3 x 5`

`B      (2d array):       3 x 5`

`Result (3d array):  15 x 3 x 5`

----------------------------------

`A      (3d array):  15 x 3 x 5`

`B      (2d array):       3 x 1`

`Result (3d array):  15 x 3 x 5`

----------------------------------

`A      (1d array):  3`

`B      (1d array):  4 # trailing dimensions do not match`

----------------------------------

`A      (2d array):      2 x 1`

`B      (3d array):  8 x 4 x 3 # second from last dimensions mismatched`


In [2]:
import numpy as np

In [3]:
x = np.arange(4)
xx = x.reshape(4,1)
y = np.ones(5)
z = np.ones((3,4))

print('x', x.shape)

print('y', y.shape)

print('x+y', x + y)

x (4,)
y (5,)


ValueError: operands could not be broadcast together with shapes (4,) (5,) 