# worksheet5: Numpy Part 1

In [None]:
%%html
<h1>Numpy Basics</h1>

- Python library optimized for numerical computing
- ability to create and manipulate n-dimensional arrays, critical for ML/DL applications

In [None]:
import numpy as np

### Basics

#### simple 1D array

In [None]:
array_1D = np.array([2,2,4,6,8,10])

In [None]:
array_1D

In [None]:
type(array_1D)

#### dimensions of array (.ndim attribute)

In [None]:
array_1D.ndim

#### shape of array (.shape attribute)

In [None]:
array_1D.shape

#### Q: get unique elements of the array

#### Python list to numpy array

In [None]:
list_1D = [2,3,6,8,10]
np.array(list_1D)

#### basic useful functionality
Experiment with the following:
- create an array from a range of numbers (np.arange(n)), where n is the data_size
- create an array using start, stop (stop is exclusive), step (np.arange(start,stop,step))
- create evenly spaced numbers of size n over a specified interval (np.linspace(start,stop,n))
- create an array of ones (np.ones(n))
- create an array of zeros (np.zeros(n))

#### `np.arange`

In [None]:
np.arange(10)

#### Q: which python function has similar functionality as `np.arange`? 
- 2 mins

In [None]:
np.arange(start=10,stop=20,step=2)

#### Q: calculate the square of all even numbers between 0 to 10
- 10 is included
- store results in an numpy array

In [None]:
np.array(
    [num**2 for num in np.arange(start=0, stop=11, step=2)]
)

#### `np.linspace`

In [None]:
np.linspace(2,3,10)

`np.ones`, `np.zeros`

In [None]:
np.ones(5)

In [None]:
np.zeros(5)

`np.random.rand`
- default generates random normal number in the range [0, 1)

In [None]:
np.random.rand(10)

### Numpy Datatypes
- extended datatype support compared to vanilla python
- e.g. boolean and complex numbers

#### explore datatype of object with .dtype attr

In [None]:
array_1D

In [None]:
array_1D.dtype

In [None]:
array_float = np.linspace(2,10,6)

In [None]:
array_float.dtype

#### manually assign dtypes

In [None]:
array_assign_dtype =  np.array([3,7,9,11])
array_assign_dtype.dtype

In [None]:
array_assign_dtype =  np.array([3,7,9,11], dtype='float64')
array_assign_dtype.dtype

### Indexing and Fancy Indexing (1-D)

#### Indexing

In [None]:
a = np.random.rand(10)

In [None]:
a[0]

In [None]:
print(a[0])

#### Fancy Indexing
- advanced mechanism for indexing array elements based on integers or boolean (also called masking)
- Select elements with even indices in the array

##### Option 1: Boolean indexing 


In [None]:
random_ints = np.random.randint(1,20,10)

In [None]:
random_ints

In [None]:
random_ints % 2 == 0

#### access even array elements using boolean result

In [None]:
random_ints[random_ints % 2 == 0]

##### Option 2: Integer mask
- This uses int arrays themselves as indexes

In [None]:
random_ints

In [None]:
even_indices = np.array([0,3,4,9])

In [None]:
even_indices

In [None]:
random_ints[even_indices]

#### Q: what would be the output of the code below:
random_ints[np.array([1,1,2])]

### check for nan

In [None]:
np.isnan(random_ints)

In [None]:
x = np.array([1,2,3,np.nan])

#### Q: what would be the output of:
x[~np.isnan(x)]

#### assign constant to a slice

In [None]:
x[:3] = 1

In [None]:
x

#### Q: will assign constant to a slice work in python? 

### Compare runtime behaviours
- In the following code, we will calculate the sum of all squared numbers of 1 to 100 and see how much time the calculation will take. We do it 100 times and report the total time so that our measurement is accurate enough.
- Knowing how to optimize numpy takes great skill and practise

In [None]:
a = np.array(a)

In [None]:
a

In [None]:
a*a

In [None]:
a*2

#### Q: what will a regular python list operation output for above?


In [None]:
import timeit
normal_py_sec = timeit.timeit('sum(x*x for x in range(100))',
number=100)
naive_np_sec = timeit.timeit('sum(na*na)',
setup="import numpy as np; na=np.arange(100)",
number=100)
good_np_sec = timeit.timeit('na.dot(na)',
setup="import numpy as np; na=np.arange(100)",
number=100)
print('Normal Python: {} sec'.format(normal_py_sec))
print('Naive NumPy: {} sec'.format(naive_np_sec))
print('Good NumPy: {} sec'.format(good_np_sec))

### Slicing a 1D array

In [None]:
random_ints

In [None]:
random_ints[:3]

In [None]:
# b is a view
b = random_ints[:3]

In [None]:
b += 1

#### Q: what would be the output of:
- b
- random_ints


In [None]:
b = random_ints[:3].copy()
b += 1

In [None]:
random_ints

In [None]:
b

In [None]:
random_ints[1:5]

### Q: Experiment with other useful functions
- np.log(array)
- np.sum(array)
- np.exp(array)
- np.array().max()
- np.array().min()
- np.flip(array)


## Collaborative exercises

### Exercise 1

Implement a sigmoid function on toy dataset you create with your team
- The function should accept an input numpy array of length 10 and calculate the sigmoid function `f(z) = 1/(1+e^(-z))`
- This function only returns values between 0 and 1, and is used in models that return prob
- First, implement this function using vanilla python, using the `exp` function from the `math` module. Run the function on your input. What do you observe?
- Second, implement this function using numpy function for exponentiation. Run the function on your input. Run the function on your input. What do you observe this time?
- Third, implement this function using numpy functions for addition, exponentation and division. Run the function on your input. What do you observe this time?
- Time the computations for all three cases listed above. Which one runs the fastest?
- Now create a large input array of size 100000 random normal numbers, and pass it to the three functions. Which one runs the fastest?

### Exercise 2

Implement a function that returns a MSE estimate, given two arrays as input

- The formula is available anywhere online, for eg. `https://statisticsbyjim.com/regression/mean-squared-error-mse/`
- Construct with your team, an input array of predictions, and an input array of observed, each with three values
- Calculate MSE using these two arrays as input
- How long does your code take to run?
- What if you passed two different sized input arrays: prediction array of size 1, and observed array of size 3? Does it work? Why, or why not?
- What if you now passed prediction array of size 2, and observed array of size 3? Does it work? Why, or why not?
- Modify your original function to raise an appropriate error or exception to handle unequal sized inputs, and run modified function on all cases