### NumPy review and Vectorization

In [1]:
import numpy as np

Run the following Code Cells.

In [2]:
# NumPy routines which allocate memory and fill arrays with value
a = np.zeros(4);                print(f"np.zeros(4) :   a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.zeros((4,));             print(f"np.zeros(4,) :  a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

np.zeros(4) :   a = [0. 0. 0. 0.], a shape = (4,), a data type = float64
np.zeros(4,) :  a = [0. 0. 0. 0.], a shape = (4,), a data type = float64


In [3]:
a = np.arange(4.);              print(f"np.arange(4.):     a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

np.arange(4.):     a = [0. 1. 2. 3.], a shape = (4,), a data type = float64


In [4]:
a = np.random.random_sample(4); print(f"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

np.random.random_sample(4): a = [0.70420108 0.83887211 0.76012042 0.57428879], a shape = (4,), a data type = float64


In [5]:
a = np.random.rand(4);          print(f"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

np.random.rand(4): a = [0.64082766 0.88644441 0.60145449 0.80316571], a shape = (4,), a data type = float64


values can be specified manually as well. 

In [6]:
a = np.array([5,4,3,2]);  print(f"np.array([5,4,3,2]):  a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.array([5.,4,3,2]); print(f"np.array([5.,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

np.array([5,4,3,2]):  a = [5 4 3 2], a shape = (4,), a data type = int32
np.array([5.,4,3,2]): a = [5. 4. 3. 2.], a shape = (4,), a data type = float64


### Indexing

In [7]:
#vector indexing operations on 1-D vectors
a = np.arange(10)
print(a)

[0 1 2 3 4 5 6 7 8 9]


In [8]:
#access an element
print(f"a[2]  = {a[2]}")
print(f"a[2].shape is {a[2].shape}                   => Accessing an element returns a scalar")

a[2]  = 2
a[2].shape is ()                   => Accessing an element returns a scalar


In [9]:
# access the last element, negative indexes count from the end
print(f"a[-1] = {a[-1]}")

a[-1] = 9


Access the element `(7)` in `a` using the negative index, write the code below

In [10]:
a[-3]

7

In [11]:
#indexs must be within the range of the vector or they will produce and error
c=a[10]

IndexError: index 10 is out of bounds for axis 0 with size 10

You can also use this format to nicely show you the error

In [None]:
try:
    c = a[10]
except Exception as e:
    print("The error message you'll see is:")
    print(e)

### Slicing
Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:

In [12]:
#vector slicing operations
a = np.arange(10)
print(f"a         = {a}")

#access 5 consecutive elements (start:stop:step)
c = a[2:7:1];     print("a[2:7:1] = ", c)

# access 3 elements separated by two 
c = a[2:7:2];     print("a[2:7:2] = ", c)

# access all elements index 3 and above
c = a[3:];        print("a[3:]    = ", c)

# access all elements below index 3
c = a[:3];        print("a[:3]    = ", c)

# access all elements
c = a[:];         print("a[:]     = ", c)

a         = [0 1 2 3 4 5 6 7 8 9]
a[2:7:1] =  [2 3 4 5 6]
a[2:7:2] =  [2 4 6]
a[3:]    =  [3 4 5 6 7 8 9]
a[:3]    =  [0 1 2]
a[:]     =  [0 1 2 3 4 5 6 7 8 9]


Code everything in one Code Cell
<br>1-Create an array from 5 to 61 with step of 3 and name is `bb`
<br>2-rint it
<br>3-Access 4 elements separated by two 
<br>4-Access any 4 consecutive elements
<br>5-Access all elements below index 4
<br>6-Access all elements index 5 and above

In [15]:
bb = np.arange(5,61,3)

# 1. Create an array from 5 to 61 with step of 3
print(f"bb = {bb}")

# 2. Access 4 elements separated by two
print(f"bb[1:9:2] = {bb[1:9:2]}")

# 3. Access any 4 consecutive elements
print(f"bb[2:6] = {bb[2:6]}")

# 4. Access all elements below index 4
print(f"bb[:4] = {bb[:4]}")

# 5. Access all elements from index 5 and above
print(f"bb[5:] = {bb[5:]}")


bb = [ 5  8 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53 56 59]
bb[1:9:2] = [ 8 14 20 26]
bb[2:6] = [11 14 17 20]
bb[:4] = [ 5  8 11 14]
bb[5:] = [20 23 26 29 32 35 38 41 44 47 50 53 56 59]


#### Single vector operations

In [16]:
a = np.array([1,2,3,4])
print(f"a             : {a}")
# negate elements of a
b = -a 
print(f"b = -a        : {b}")

# sum all elements of a, returns a scalar
b = np.sum(a) 
print(f"b = np.sum(a) : {b}")

b = np.mean(a)
print(f"b = np.mean(a): {b}")

b = a**2
print(f"b = a**2      : {b}")

a             : [1 2 3 4]
b = -a        : [-1 -2 -3 -4]
b = np.sum(a) : 10
b = np.mean(a): 2.5
b = a**2      : [ 1  4  9 16]


#### Vector Vector element-wise operations
Most of the NumPy arithmetic, logical and comparison operations apply to vectors as well. These operators work on an element-by-element basis. For example 
$$ c_i = a_i + b_i $$

In [17]:
a = np.array([ 1, 2, 3, 4])
b = np.array([-1,-2, 3, 4])
print(f"Binary operators work element wise: {a + b}")

Binary operators work element wise: [0 0 6 8]


Of course, for this to work correctly, the vectors must be of the same size:

In [18]:
#try a mismatched vector operation
c = np.array([1, 2])
try:
    d = a + c
except Exception as e:
    print("The error message you'll see is:")
    print(e)

The error message you'll see is:
operands could not be broadcast together with shapes (4,) (2,) 


#### Scalar Vector operations
Vectors can be 'scaled' by scalar values. A scalar value is just a number. The scalar multiplies all the elements of the vector.

In [19]:
a = np.array([1, 2, 3, 4])

# multiply a by a scalar
b = 5 * a 
print(f"b = 5 * a : {b}")

b = 5 * a : [ 5 10 15 20]


#### Vector Vector dot product
The dot product is a mainstay of Linear Algebra and NumPy. This is an operation used extensively in this course and should be well understood. The dot product is shown below.
<br><img src="./images/C1_W2_Lab04_dot_notrans.gif" width=400> 

The dot product multiplies the values in two vectors element-wise and then sums the result.
Vector dot product requires the dimensions of the two vectors to be the same. 

Let's implement our own version of the dot product below:

**Using a for loop**, implement a function which returns the dot product of two vectors. The function to return given inputs $a$ and $b$:
$$ x = \sum_{i=0}^{n-1} a_i b_i $$
Assume both `a` and `b` are the same shape.

For additonal information regarding `for loop`, refer to https://www.w3schools.com/python/python_for_loops.asp

In [20]:
m1=np.array([0,2,4])
m2=np.array([1,3,5])

In [21]:
range(m1.shape[0])

range(0, 3)

In [22]:
for i in range(m1.shape[0]):
    s=s+m1[i]*m2[i]
    print (s)

NameError: name 's' is not defined

In [23]:
s=0
for i in range(m1.shape[0]):
    s=s+m1[i]*m2[i]
    print (s)

0
6
26


In [24]:
s=0
for i in range(m1.shape[0]):
    s=s+m1[i]*m2[i]
print (s)

26


Now, I want you to do the dot product of [1,2,3,4] and [5,6,7,8] in one Code Cell. Use `arange` function and put the output in `y` and print it.

In [28]:

a = np.arange(1, 5)  # [1, 2, 3, 4]
b = np.arange(5, 9)  # [5, 6, 7, 8]

y = np.dot(a, b)

print(f"Dot product of {a} and {b} = {y}")


Dot product of [1 2 3 4] and [5 6 7 8] = 70


In [29]:
def my_dot(a, b): 
    """
   Compute the dot product of two vectors
 
    Args:
      a (ndarray (n,)):  input vector 
      b (ndarray (n,)):  input vector with same dimension as a
    
    Returns:
      x (scalar): 
    """
    x=0
    for i in range(a.shape[0]):
        x = x + a[i] * b[i]
    return x

In [30]:
# Using the function we just made
a = np.array([1, 2, 3, 4])
b = np.array([-1, 4, 3, 2])
my_dot(a,b)

24

Note, the dot product is expected to return a scalar value. 

Let's try the same operations using `np.dot`.  

In [31]:
# test 1-D
a = np.array([1, 2, 3, 4])
b = np.array([-1, 4, 3, 2])
c = np.dot(a, b)
print(f"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} ") 
c = np.dot(b, a)
print(f"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} ")

NumPy 1-D np.dot(a, b) = 24, np.dot(a, b).shape = () 
NumPy 1-D np.dot(b, a) = 24, np.dot(a, b).shape = () 


Above, you will note that the results for 1-D matched our implementation.

#### The Need for Speed: vector vs for loop
We utilized the NumPy  library because it improves speed memory efficiency. Let's demonstrate:

In [32]:
import time

In [33]:
np.random.seed(1)
a = np.random.rand(10000000)  # very large arrays
b = np.random.rand(10000000)

In [34]:
tic = time.time()  # capture start time
c = my_dot(a,b)
toc = time.time()  # capture end time

print(f"my_dot(a, b) =  {c:.4f}")
print(f"loop version duration: {1000*(toc-tic):.4f} ms ")

my_dot(a, b) =  2501072.5817
loop version duration: 8751.0645 ms 


In [35]:
tic = time.time()  # capture start time
c = np.dot(a, b)
toc = time.time()  # capture end time

print(f"np.dot(a, b) =  {c:.4f}")
print(f"Vectorized version duration: {1000*(toc-tic):.4f} ms ")

np.dot(a, b) =  2501072.5817
Vectorized version duration: 40.5660 ms 


In [36]:
del(a);del(b)  #remove these big arrays from memory

So, vectorization provides a large speed up in this example. This is because NumPy makes better use of available data parallelism in the underlying hardware. GPU's and modern CPU's implement Single Instruction, Multiple Data (SIMD) pipelines allowing multiple operations to be issued in parallel. This is critical in Machine Learning where the data sets are often very large.

#### Vector Vector operations
- Going forward, our examples will be stored in an array, `X_train` of dimension (m,n). This will be explained more in context, but here it is important to note it is a 2 Dimensional array or matrix (see next section on matrices).
- `w` will be a 1-dimensional vector of shape (n,).
- we will perform operations by looping through the examples, extracting each example to work on individually by indexing X. For example:`X[i]`
- `X[i]` returns a value of shape (n,), a 1-dimensional vector. Consequently, operations involving `X[i]` are often vector-vector.  

That is a somewhat lengthy explanation, but aligning and understanding the shapes of your operands is important when performing vector operations.

In [37]:
X = np.array([[1],[2],[3],[4]])
w = np.array([2])
c = np.dot(X[1], w)

print(f"X[1] has shape {X[1].shape}")
print(f"w has shape {w.shape}")
print(f"c has shape {c.shape}")

X[1] has shape (1,)
w has shape (1,)
c has shape ()


--------

#### Matrix Creation
The same functions that created 1-D vectors will create 2-D or n-D arrays. Below, the shape tuple is provided to achieve a 2-D result. Notice how NumPy uses brackets to denote each dimension. Notice further than NumPy, when printing, will print one row per line.

In [52]:
a = np.zeros((1, 5))                                       
print(f"a shape = {a.shape}, a = {a}")                     

a = np.zeros((2, 1))                                                                   
print(f"a shape = {a.shape}, a = {a}") 

a = np.random.random_sample((1, 1))  
print(f"a shape = {a.shape}, a = {a}") 

a shape = (1, 5), a = [[0. 0. 0. 0. 0.]]
a shape = (2, 1), a = [[0.]
 [0.]]
a shape = (1, 1), a = [[0.04997798]]


One can also manually specify data. Dimensions are specified with additional brackets matching the format in the printing above.

In [53]:
a = np.array([[5], [4], [3]]);   print(f" a shape = {a.shape}, np.array: a = {a}")

 a shape = (3, 1), np.array: a = [[5]
 [4]
 [3]]


a = np.array([[5],   # One can also
              [4],   # separate values
              [3]]); #into separate rows
print(f" a shape = {a.shape}, np.array: a = {a}")

#### Matrix indexing
Matrices include a second index. The two indexes describe [row, column]. Access can either return an element or a row/column. See below:

In [54]:
a = np.arange(6).reshape(-1, 2)   #reshape is a convenient way to create matrices
print(f"a.shape: {a.shape}, \n a= {a}")

a.shape: (3, 2), 
 a= [[0 1]
 [2 3]
 [4 5]]


What does `\n` do? answer here => New Line

Change numbers inside `reshape` to (-1,1), (3,4),(2,-1) in three Code Cells below and see what happens

In [55]:
a = np.arange(6).reshape(-1, 1)
print(f"a.shape: {a.shape}, \n a = \n{a}")

a.shape: (6, 1), 
 a = 
[[0]
 [1]
 [2]
 [3]
 [4]
 [5]]


In [68]:
a = np.arange(12).reshape(3, 4)
print(f"a.shape: {a.shape}, \n a = \n{a}")

a.shape: (3, 4), 
 a = 
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [69]:
a = np.arange(6).reshape(2, -1)
print(f"a.shape: {a.shape}, \n a = \n{a}")


a.shape: (2, 3), 
 a = 
[[0 1 2]
 [3 4 5]]


In [71]:
a = np.arange(6).reshape(-1, 2) 
print(a[2, 0])
print(a[2, 0].shape)
print(type(a[2, 0]))

4
()
<class 'numpy.int32'>


In [72]:
#access an element
print(f"\na[2,0].shape: is {a[2, 0].shape}, \na[2,0] = {a[2, 0]},          type(a[2,0]) = {type(a[2, 0])} \n\nAccessing an element returns a scalar")


a[2,0].shape: is (), 
a[2,0] = 4,          type(a[2,0]) = <class 'numpy.int32'> 

Accessing an element returns a scalar


In [73]:
#access a row
print(f"a[2].shape is   {a[2].shape} \na[2]   = {a[2]} \ntype(a[2])   = {type(a[2])}")

a[2].shape is   (2,) 
a[2]   = [4 5] 
type(a[2])   = <class 'numpy.ndarray'>


It is worth drawing attention to the last example. Accessing a matrix by just specifying the row will return a *1-D vector*.

**Reshape**  
The previous example used [reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) to shape the array.  
`a = np.arange(6).reshape(-1, 2) `   
This line of code first created a *1-D Vector* of six elements. It then reshaped that vector into a *2-D* array using the reshape command. This could have been written:  
`a = np.arange(6).reshape(3, 2) `  
To arrive at the same 3 row, 2 column array.
The -1 argument tells the routine to compute the number of rows given the size of the array and the number of columns.


#### Slicing 2d array
Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:

In [74]:
#vector 2-D slicing operations
a = np.arange(20).reshape(-1, 10)
print(f"a = \n{a}")

a = 
[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]]


In [75]:
#access 5 consecutive elements (start:stop:step)
print("a[0, 2:7:1] = ", a[0, 2:7:1], ",  a[0, 2:7:1].shape =", a[0, 2:7:1].shape, "a 1-D array")

a[0, 2:7:1] =  [2 3 4 5 6] ,  a[0, 2:7:1].shape = (5,) a 1-D array


In [76]:
#access 5 consecutive elements (start:stop:step) in two rows
print("a[:, 2:7:1] = \n", a[:, 2:7:1], ",  a[:, 2:7:1].shape =", a[:, 2:7:1].shape, "a 2-D array")

a[:, 2:7:1] = 
 [[ 2  3  4  5  6]
 [12 13 14 15 16]] ,  a[:, 2:7:1].shape = (2, 5) a 2-D array


In [77]:
# access all elements
print("a[:,:] = \n", a[:,:], ",  a[:,:].shape =", a[:,:].shape)

a[:,:] = 
 [[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]] ,  a[:,:].shape = (2, 10)


In [78]:
# access all elements in one row (very common usage)
print("a[1,:] = ", a[1,:], ",  a[1,:].shape =", a[1,:].shape, "a 1-D array")
# same as
print("a[1]   = ", a[1],   ",  a[1].shape   =", a[1].shape, "a 1-D array")

a[1,:] =  [10 11 12 13 14 15 16 17 18 19] ,  a[1,:].shape = (10,) a 1-D array
a[1]   =  [10 11 12 13 14 15 16 17 18 19] ,  a[1].shape   = (10,) a 1-D array


## -----------