# NumPy and Vectorization

In [1]:
import numpy as np
import time


## Useful References
- NumPy Documentation including a basic introduction: [NumPy.org](https://NumPy.org/doc/stable/)
- A challenging feature topic: [NumPy Broadcasting](https://NumPy.org/doc/stable/user/basics.broadcasting.html)


# Vectors
 - Vectors are ordered arrays of numbers.
 - The number of elements in the array is often referred to as the dimension though mathematicians may prefer *rank*.
 - The elements of a vector can be referenced with an index. In math settings, indexes typically run from 1 to n. In computer science indexing will typically run from 0 to n-1.
 - The $0^{th}$ element, of the vector $\mathbf{x}$ is $x_0$.


# Vector Creation

In [3]:
a = np.zeros(4);

print(a)                        # [0. 0. 0. 0.]
print(a.shape)                  # (4,)
print(a.shape[0])               #  4
print(a.dtype)                  # float64

b = np.random.random_sample(4)

print(b)                        # random array, example: [0.18196338 0.84879154 0.11797209 0.24751849]
print(b.shape)                  # (4,)
print(b.shape[0])               #  4
print(b.dtype)                  # float64

[0. 0. 0. 0.]
(4,)
4
float64
[0.9129051  0.93204295 0.80457786 0.62561488]
(4,)
4
float64


Some data creation routines do not take a shape tuple:

In [4]:
a = np.arange(4.)

print(a)               # [0. 1. 2. 3.]

print(a.shape)         # (4,)
print(a.shape[0])      #  4
print(a.dtype)         # float64

[0. 1. 2. 3.]
(4,)
4
float64


Values can be specified manually

In [5]:
a = np.array([5,4,3,2])
b = np.array([5.,4,3,2])

print(a)              # [5 4 3 2]
print(a.shape)        # (4,)
print(a.shape[0])     # 4
print(a.dtype)        # int64

print(b)              # [5. 4. 3. 2.]
print(b.shape)        # (4,)
print(b.shape[0])     # 4
print(b.dtype)        # float64

[5 4 3 2]
(4,)
4
int64
[5. 4. 3. 2.]
(4,)
4
float64


These have all created a one-dimensional vector  `a` with four elements. `a.shape` returns the dimensions. Here we see a.shape = `(4,)` indicating a 1-d array with 4 elements.  


## Operations on Vectors

### Indexing
Elements of vectors can be accessed via indexing and slicing.

- **Indexing** means referring to *an element* of an array by its position within the array.  
- **Slicing** means getting a *subset* of elements from an array based on their indices.  

NumPy starts indexing at zero so the 3rd element of a vector $\mathbf{a}$ is `a[2]`.

In [6]:
a = np.arange(10)

print(a)              # [0 1 2 3 4 5 6 7 8 9]
print(a[2])           # 2
print(a[2].shape)     # ()

print(a[-1])          # 9 (ie. the last element)

[0 1 2 3 4 5 6 7 8 9]
2
()
9


### Slicing
Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid.

In [7]:
a = np.arange(10)

print(a)           # [0 1 2 3 4 5 6 7 8 9]

b = a[0:7:1]       # [0 1 2 3 4 5 6]

c = a[0:7:2]       # [0 2 4 6]

d = a[3:]          # [3 4 5 6 7 8 9]

e = a[:3]          # [0 1 2]

f = a[:]           # [0 1 2 3 4 5 6 7 8 9]


[0 1 2 3 4 5 6 7 8 9]



### Single vector operations

In [8]:
a = np.array([1,2,3,4])

print(a)                       # [1 2 3 4]

b = -a
print(b)                       # [-1 -2 -3 -4]

b = np.sum(a)
print(b)                       # 10

b = np.mean(a)
print(b)                       # 2.5

b = a**2
print(b)                       # [1 4 9 16]



[1 2 3 4]
[-1 -2 -3 -4]
10
2.5
[ 1  4  9 16]


### Vector Vector element-wise operations

In [9]:
a = np.array([ 1, 2, 3, 4])
b = np.array([-1,-2, 3, 4])

c = a + b
print(c)                          # [0 0 6 8]

[0 0 6 8]



### Scalar Vector operations
Vectors can be 'scaled' by scalar values. A scalar value is just a number. The scalar multiplies all the elements of the vector.

In [10]:
a = np.array([1, 2, 3, 4])

b = 5 * a
print(b)                        # [5 10 15 20]

[ 5 10 15 20]


### Vector Vector dot product

The dot product multiplies the values in two vectors element-wise and then sums the result.
Vector dot product requires the dimensions of the two vectors to be the same.


**Using a for loop**, implement a function which returns the dot product of two vectors. The function to return given inputs $a$ and $b$:
$$ x = \sum_{i=0}^{n-1} a_i b_i $$
Assume both `a` and `b` are the same shape.

In [19]:
def dotProduct(a, b):
  # Computes the dot product of two vectors
  m = a.shape[0]

  dot_product = 0
  for i in range(m):
    f = a[i] * b[i]
    dot_product += f

  return dot_product


In [18]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

print(dotProduct(a,b))

x = np.array([1, 3, 2])
y = np.array([3, 2, 1])

print(dotProduct(x,y))

70
11


Note, the dot product is expected to return a scalar value.

The same operations using `np.dot`.  

In [13]:
a = np.array([1, 2, 3, 4])
b = np.array([-1, 4, 3, 2])

c = np.dot(a, b)
print(c)
c = np.dot(b, a)
print(c)

24
24


## The Need for Speed: vector vs for loop
We utilized the NumPy  library because it improves speed memory efficiency. Let's demonstrate:

In [11]:
x = 30
print(f"{x:.4f}")

30.0000


In [22]:
np.random.seed(1)
'''
Calling np.random.seed(1) ensures that the random number generation starts
from the same state each time you run the code, so the values of a and b will
be the same across different runs, provided that the code leading to their
generation remains the same.(NB. a is not equal to b)
'''
a = np.random.rand(10000000)
b = np.random.rand(10000000)

# Vectorized version
start_time = time.time()
c = np.dot(a,b)
end_time = time.time()

print(f"{c:.4f}") # dot product to four decimal points
print(f"Vectorized version time: {1000*(end_time-start_time):.4f} ms")

# Loop Version
start_time = time.time()
c = dotProduct(a,b)
end_time = time.time()

print(f"{c:.4f}") # dot product to four decimal points
print(f"Loop version time: {1000*(end_time-start_time):.4f} ms")

2501072.5817
Vectorized version time: 12.9504 ms
2501072.5817
Loop version time: 3715.7433 ms


Vectorization provides a large speed up.

- This is because NumPy makes better use of available data parallelism in the underlying hardware.
- GPU's and modern CPU's implement Single Instruction, Multiple Data (SIMD) pipelines allowing multiple operations to be issued in parallel.
- This is critical in Machine Learning where the data sets are often very large.


### Vector Vector operations

- Going forward, examples will be stored in an array, `X_train` of dimension (m,n). It is a 2 Dimensional array or matrix.

- `w` will be a 1-dimensional vector of shape (n,).
- we will perform operations by looping through the examples, extracting each example to work on individually by indexing X. For example:`X[i]`
- `X[i]` returns a value of shape (n,), a 1-dimensional vector. Consequently, operations involving `X[i]` are often vector-vector.  

In [37]:
X = np.array([[1],[2],[3],[4]])

print("For X:")
print(X)
print(X.shape)
print(X.shape[0])
print(X[1].shape)

w = np.array([2])
print("\nFor w:")
print(w)
print(w.shape)
print(w.shape[0])
print(w[0].shape)

c = np.dot(X[1], w)
print("\nFor c:")
print(c)
print(c.shape)



For X:
[[1]
 [2]
 [3]
 [4]]
(4, 1)
4
(1,)

For w:
[2]
(1,)
1
()

For c:
4
()



# Matrices


- Matrices, are two dimensional arrays. The elements of a matrix are all of the same type.
- The elements of a matrix can be referenced with a two dimensional index.   


## NumPy Arrays

- Matrices have a two-dimensional (2-D) index [m,n].
- 2-D matrices are used to hold training data.

## Matrix Creation

In [None]:
a = np.zeros((1, 5))
print(f"a shape = {a.shape}, a = {a}")

a = np.zeros((2, 1))
print(f"a shape = {a.shape}, a = {a}")

a = np.random.random_sample((1, 1))
print(f"a shape = {a.shape}, a = {a}")

a shape = (1, 5), a = [[0. 0. 0. 0. 0.]]
a shape = (2, 1), a = [[0.]
 [0.]]
a shape = (1, 1), a = [[0.44236513]]


In [46]:
a = np.zeros((1, 5))
print(a)                   # [[0. 0. 0. 0. 0.]]
print(a.shape)             # (1, 5)

a = np.zeros((2, 1))
print(a)                   # [[0.],[0.]]
print(a.shape)             # (2, 1)

a = np.random.random_sample((1, 1))
print(a)                   # [[0.16252618]] random number
print(a.shape)             # (1, 1)

[[0. 0. 0. 0. 0.]]
(1, 5)
[[0.]
 [0.]]
(2, 1)
[[0.77390955]]
(1, 1)


Dimensions are specified with additional brackets matching the format in the printing above.

In [3]:
a = np.array([[5], [4], [3]])

print(a)
print(a.shape)


[[5]
 [4]
 [3]]
(3, 1)


## Operations on Matrices

### Indexing


- The two indexes describe [row, column].
- Access can either return an element or a row/column.

In [None]:
#vector indexing operations on matrices
a = np.arange(6).reshape(-1, 2)   #reshape is a convenient way to create matrices
print(f"a.shape: {a.shape}, \na= {a}")

#access an element
print(f"\na[2,0].shape:   {a[2, 0].shape}, a[2,0] = {a[2, 0]},     type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\n")

#access a row
print(f"a[2].shape:   {a[2].shape}, a[2]   = {a[2]}, type(a[2])   = {type(a[2])}")

a.shape: (3, 2), 
a= [[0 1]
 [2 3]
 [4 5]]

a[2,0].shape:   (), a[2,0] = 4,     type(a[2,0]) = <class 'numpy.int64'> Accessing an element returns a scalar

a[2].shape:   (2,), a[2]   = [4 5], type(a[2])   = <class 'numpy.ndarray'>


In [9]:
a = np.arange(6).reshape(-1, 2)
'''
This reshapes the array into a 2D array with 2 columns (because of the 2).
The -1 in reshape is a placeholder that means "infer the appropriate number
of rows based on the total number of elements and the number of columns".
In this case, since there are 6 elements and we want 2 columns, the array
will be reshaped into 3 rows (because 6/2 =3).
'''
print(a)                         # [[0 1], [2 3], [4 5]]
print(a.shape)                   # (3, 2)
print(a[2])                      # [4 5]
print(a[2][0])                   #  4

[[0 1]
 [2 3]
 [4 5]]
(3, 2)
[4 5]
4


Accessing a matrix by just specifying the row will return a *1-D vector*.

### Slicing
- Slicing creates an array of indices using a set of three values (`start:stop:step`).
- A subset of values is also valid.

In [4]:
a = np.arange(20).reshape(-1,10)

print(a)                # [[ 0  1  2  3  4  5  6  7  8  9],[10 11 12 13 14 15 16 17 18 19]]

print(a[0][2:7:1])      # [2 3 4 5 6]

print(a[:][:])          # [[ 0  1  2  3  4  5  6  7  8  9],[10 11 12 13 14 15 16 17 18 19]]

print(a[1][:])          # [10 11 12 13 14 15 16 17 18 19]


[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]]
[2 3 4 5 6]
[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]]
[10 11 12 13 14 15 16 17 18 19]
