<img src="img/01.png">

# 01. NumPy

## 01.01 What is NumPy?

#### According to NumPy website: https://www.numpy.org

NumPy is the fundamental package for scientific computing with Python. It contains among other things:
- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transform, and random number capabilities

#### According to [REF1](../README.md) :

NumPy (short for Numerical Python) pro‐vides an **efficient interface to store and operate on dense data** buffers. In some ways, NumPy arrays are **like Python’s built-in list type, but** NumPy arrays provide **much more efficient** storage and data operations as the arrays grow larger in size.

## 01.02 NumPy arrays - basics

### 01.02.01 Why we need NumPy arrays?

Making computations with NumPy is much more faster :)

In [1]:
import numpy as np
long_list = [i for i in range(1_000_000)]
long_array = np.array(long_list)

In [2]:
%%timeit
# standart python
sum(long_list)

7.44 ms ± 804 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [3]:
%%timeit
# numpy
np.sum(long_array)

716 µs ± 99.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


### 01.02.02 Creating NumPy arrays

1) First option is to create array from python elements

In [4]:
np.array(2)

array(2)

In [5]:
np.array([1,2,3,4])

array([1, 2, 3, 4])

In [6]:
np.array([[1,2,3],[4,5,6],[7,8,9]])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [7]:
np.array((1,2,3))

array([1, 2, 3])

2) Creating array using NumPy functions

In [8]:
np.zeros(shape=(3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [9]:
np.ones(shape=(3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [10]:
np.full(shape=(3,3), fill_value=np.pi)

array([[3.14159265, 3.14159265, 3.14159265],
       [3.14159265, 3.14159265, 3.14159265],
       [3.14159265, 3.14159265, 3.14159265]])

In [11]:
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [12]:
np.random.randint(low=-3, high=10, size=(4,3))

array([[-1, -2, -1],
       [ 2, -1,  8],
       [ 9,  9,  7],
       [-2,  5,  6]])

In [13]:
np.random.randn(2,3)

array([[ 0.55607603, -0.53019198,  1.35885044],
       [-0.18005786,  0.92083392, -0.10689384]])

### 01.02.03 Attributes of NumPy arrays

1) dtype

In [14]:
sample_matrix1 = np.array([2,3,4], dtype=np.int8)
sample_matrix2 = np.array([130,3,4], dtype=np.int8)

In [15]:
sample_matrix1

array([2, 3, 4], dtype=int8)

In [16]:
# in this case we declared something which is not supported
# by `np.int8` which covers range (–128 to 127)
sample_matrix2

array([-126,    3,    4], dtype=int8)

In [17]:
result = sample_matrix1 + np.array([130, 129, 128])
result

array([132, 132, 132])

In [18]:
result.dtype

dtype('int64')

In [19]:
# we will cover operations for numpy arrays in a couple of minutes
# beare with me :)
sample_matrix1 + 2000

array([2002, 2003, 2004], dtype=int16)

In [20]:
result = sample_matrix1 + 2000.2
result

array([2002.2, 2003.2, 2004.2])

In [21]:
result.dtype

dtype('float64')

In [22]:
# sometines kind ('i', 'f', 'O') of dtype might be useful!
result.dtype.kind

'f'

2) other

In [23]:
np.random.seed(0)
sample_array = np.random.randint(0, 10, size=(2,3,4))
sample_array

array([[[5, 0, 3, 3],
        [7, 9, 3, 5],
        [2, 4, 7, 6]],

       [[8, 8, 1, 6],
        [7, 7, 8, 1],
        [5, 9, 8, 9]]])

In [24]:
for attr in ['dtype', 'ndim', 'shape', 'size', 'itemsize', 'nbytes']:
    attr_val = sample_array.__getattribute__(attr)
    print(f"{attr:10} is  {attr_val}")

dtype      is  int64
ndim       is  3
shape      is  (2, 3, 4)
size       is  24
itemsize   is  8
nbytes     is  192


### 01.02.04 Accessing array elements (indexing, slicing) 
1) 1D arrays

In [25]:
sample_array = np.array([x for x in range(100, 110)])
sample_array

array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])

In [26]:
sample_array[3]

103

In [27]:
sample_array[-3]

107

In [28]:
sample_array[1:9:3]

array([101, 104, 107])

In [29]:
sample_array[9:2:-2]

array([109, 107, 105, 103])

In [30]:
# fancy indexing
# we pass arrays of indices in place of single scalars.
sample_array[[1,2,1,2,1,5,8]]

array([101, 102, 101, 102, 101, 105, 108])

2) 2D arrays

In [31]:
sample_array = np.array([x for x in range(100, 120)]).reshape(-1,4)
sample_array

array([[100, 101, 102, 103],
       [104, 105, 106, 107],
       [108, 109, 110, 111],
       [112, 113, 114, 115],
       [116, 117, 118, 119]])

In [32]:
sample_array[2]

array([108, 109, 110, 111])

In [33]:
sample_array[2,3]

111

In [34]:
sample_array[:,3]

array([103, 107, 111, 115, 119])

In [35]:
sample_array[::2,:1:-1]

array([[103, 102],
       [111, 110],
       [119, 118]])

In [36]:
# fancy indexing
sample_array[[1,2,1]]

array([[104, 105, 106, 107],
       [108, 109, 110, 111],
       [104, 105, 106, 107]])

In [37]:
sample_array[[1,2,1],2]

array([106, 110, 106])

In [38]:
sample_array[[1,2,1],[0,1,0]]

array([104, 109, 104])

**IMPORTANT** Crucial difference between slicing NumPy arrays  and slicing python lists is that in case of NumPy we get a **view** not a copy of the sliced object.

In [39]:
print("PYTHON >>>")
my_list = [x for x in range(10)]
print("my list:", my_list)
sliced_list = my_list[:5]
print(f"sliced list: {sliced_list}")
sliced_list[0]=100
print(f"modified sliced list: {sliced_list}")
print("my list:", my_list)

PYTHON >>>
my list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
sliced list: [0, 1, 2, 3, 4]
modified sliced list: [100, 1, 2, 3, 4]
my list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [40]:
print("NUMPY >>>")
my_array = np.array([x for x in range(10)])
print("my array:", my_array)
sliced_array = my_array[:5]
print(f"sliced array: {my_array}")
sliced_array[0]=100
print(f"modified sliced array: {sliced_array}")
print("my array:", my_array)

NUMPY >>>
my array: [0 1 2 3 4 5 6 7 8 9]
sliced array: [0 1 2 3 4 5 6 7 8 9]
modified sliced array: [100   1   2   3   4]
my array: [100   1   2   3   4   5   6   7   8   9]


In [41]:
# to get a copy of an array use `copy` method
my_array2 = my_array.copy()

**EXCERCISE 03.01**

Write a function to convert python list into One-Hot-Embedding Vectors:

`IN:`
```python
indices=[0,1,3,1], depth=5  
```
`OUT:`
```python
np.array([
    [1,0,0,0,0],  
    [0,1,0,0,0],  
    [0,0,0,1,0],  
    [0,1,0,0,0]])
```

In [42]:
def values_to_ohe(indices: list, depth: int):
    if max(indices) > depth-1:
        raise ValueError(f"Given vector of indices:'{indices}' contains " \
                         f"values beyond range (0, depth-1): (0,{depth-1})")
    ### INSERT YOUR CODE HERE ####
    raise NotImplementedError
    ### STOP YOUR CODE HERE ####
    return result

In [43]:
# ### Simle tests to check if you implemented correct functionality
# assert np.array_equal(values_to_ohe([0,1,3,1], 5), 
#                       np.array([
#                           [1., 0., 0., 0., 0.],
#                           [0., 1., 0., 0., 0.],
#                           [0., 0., 0., 1., 0.],
#                           [0., 1., 0., 0., 0.]]))

# assert np.array_equal(values_to_ohe([2,1,2], 3), 
#                       np.array([
#                           [0., 0., 1.],
#                           [0., 1., 0.],
#                           [0., 0., 1.]]))

In [44]:
### TO SHOW SOLUTION USE LINE BELOW ###
# %load ../91_solutions/ex3_1.py

### 01.02.05 Shape manipulation
1) Reshape

In [45]:
flat_array = np.array([i for i in range(100,124)])
flat_array

array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,
       113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123])

In [46]:
# standard reshape
array3d = flat_array.reshape(2,3,4)
array3d

array([[[100, 101, 102, 103],
        [104, 105, 106, 107],
        [108, 109, 110, 111]],

       [[112, 113, 114, 115],
        [116, 117, 118, 119],
        [120, 121, 122, 123]]])

In [47]:
# reshape with one unknown dimension
array3d = flat_array.reshape(-1,3,4)
array3d

array([[[100, 101, 102, 103],
        [104, 105, 106, 107],
        [108, 109, 110, 111]],

       [[112, 113, 114, 115],
        [116, 117, 118, 119],
        [120, 121, 122, 123]]])

In [48]:
# actally using any negative will work but inserting `-1` is a really good practice 
array3d = flat_array.reshape(-112141,3,4)
array3d

array([[[100, 101, 102, 103],
        [104, 105, 106, 107],
        [108, 109, 110, 111]],

       [[112, 113, 114, 115],
        [116, 117, 118, 119],
        [120, 121, 122, 123]]])

In [49]:
# don't try this at home
# array3d = flat_array.reshape(-1,-1, 4)

In [50]:
# in case you with to access flatten object (very useful in Data Visualization)
[i for i in array3d[0].flat]

[100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111]

### 01.02.06 Stacking, concatenation

1) Concatenate

In [51]:
first_array = np.array([i for i in range(100,112)]).reshape(3,4)
second_array = np.array([i for i in range(200,212)]).reshape(3,4)

In [52]:
first_array

array([[100, 101, 102, 103],
       [104, 105, 106, 107],
       [108, 109, 110, 111]])

In [53]:
second_array

array([[200, 201, 202, 203],
       [204, 205, 206, 207],
       [208, 209, 210, 211]])

In [54]:
np.concatenate((first_array, second_array), axis = 0)

array([[100, 101, 102, 103],
       [104, 105, 106, 107],
       [108, 109, 110, 111],
       [200, 201, 202, 203],
       [204, 205, 206, 207],
       [208, 209, 210, 211]])

In [55]:
np.concatenate((first_array, second_array), axis = 1)

array([[100, 101, 102, 103, 200, 201, 202, 203],
       [104, 105, 106, 107, 204, 205, 206, 207],
       [108, 109, 110, 111, 208, 209, 210, 211]])

In [56]:
np.concatenate((first_array, second_array), axis = -1)

array([[100, 101, 102, 103, 200, 201, 202, 203],
       [104, 105, 106, 107, 204, 205, 206, 207],
       [108, 109, 110, 111, 208, 209, 210, 211]])

2) Stack

In [57]:
# stack in another dimension
result = np.stack([first_array, second_array])
print("result shape=", result.shape)
result

result shape= (2, 3, 4)


array([[[100, 101, 102, 103],
        [104, 105, 106, 107],
        [108, 109, 110, 111]],

       [[200, 201, 202, 203],
        [204, 205, 206, 207],
        [208, 209, 210, 211]]])

In [58]:
# stacking as an altrnative for concatenation
np.vstack((first_array, second_array))

array([[100, 101, 102, 103],
       [104, 105, 106, 107],
       [108, 109, 110, 111],
       [200, 201, 202, 203],
       [204, 205, 206, 207],
       [208, 209, 210, 211]])

In [59]:
np.hstack((first_array, second_array))

array([[100, 101, 102, 103, 200, 201, 202, 203],
       [104, 105, 106, 107, 204, 205, 206, 207],
       [108, 109, 110, 111, 208, 209, 210, 211]])

In [60]:
# stacking in place for one array with 'np.newaxis'
print("first array shape=", first_array.shape)
result = first_array[:, np.newaxis, :]
print("result shape=", result.shape)
result

first array shape= (3, 4)
result shape= (3, 1, 4)


array([[[100, 101, 102, 103]],

       [[104, 105, 106, 107]],

       [[108, 109, 110, 111]]])

In [61]:
# stacking in place for one array with 'None'
print("first array shape=", first_array.shape)
result = first_array[:, None, :, None]
print("result shape=", result.shape)
result

first array shape= (3, 4)
result shape= (3, 1, 4, 1)


array([[[[100],
         [101],
         [102],
         [103]]],


       [[[104],
         [105],
         [106],
         [107]]],


       [[[108],
         [109],
         [110],
         [111]]]])

honorable mentions:
```python
np.split
np.hsplit
np.vsplit
```

## 01.03 Computation on NumPy Arrays

### 01.03.01 Python functions

In [62]:
my_array = np.array([k for k in range(4,30,3)])
my_array

array([ 4,  7, 10, 13, 16, 19, 22, 25, 28])

In [63]:
sum(my_array)

144

In [64]:
my_array - 1

array([ 3,  6,  9, 12, 15, 18, 21, 24, 27])

In [65]:
my_array % 5

array([4, 2, 0, 3, 1, 4, 2, 0, 3])

In [66]:
my_array // 7

array([0, 1, 1, 1, 2, 2, 3, 3, 4])

### 01.03.02 NumPy functions

In [67]:
from dstip_utils.utils import print_with_shape

In [68]:
my_array

array([ 4,  7, 10, 13, 16, 19, 22, 25, 28])

In [69]:
np.sum(my_array)

144

In [70]:
np.prod(my_array)

17041024000

In [71]:
np.multiply(my_array, 0.5)

array([ 2. ,  3.5,  5. ,  6.5,  8. ,  9.5, 11. , 12.5, 14. ])

In [72]:
my_array2d = np.array([k for k in range(15)]).reshape(-1, 3)
my_array2d

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [73]:
print_with_shape(np.mean(my_array2d, axis = 0))

Shape is:  (3,)


array([6., 7., 8.])

In [74]:
print_with_shape(np.mean(my_array2d, axis = 1))

Shape is:  (5,)


array([ 1.,  4.,  7., 10., 13.])

In [75]:
print_with_shape(np.mean(my_array2d, axis = 0, keepdims=True))

Shape is:  (1, 3)


array([[6., 7., 8.]])

In [76]:
print_with_shape(np.mean(my_array2d, axis = 1, keepdims=True))

Shape is:  (5, 1)


array([[ 1.],
       [ 4.],
       [ 7.],
       [10.],
       [13.]])

### 01.03.03 Broadcasting - friend or enemy?

According to [REF1](../README.md)

<img src="img/02.png">


Rules of Broadcasting
Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:

* Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
* Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
* Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

In [77]:
basic_array = np.random.randint(10, size = (2,3))
first_rule_array = np.random.randint(10, size = (5,2,3))
second_rule_array = np.random.randint(10, size = (1,3))
third_rule_array = np.random.randint(10, size = (2,3,5))

In [78]:
(basic_array + first_rule_array).shape

(5, 2, 3)

In [79]:
(basic_array + second_rule_array).shape

(2, 3)

In [80]:
# uncomment to see what will happen if we brake 'Rule 3'
# (basic_array + third_rule_array).shape

**EXCERCISE 03.02**

Write your own Softmax implementation in NumPy:

The formula for the softmax function $\sigma(x)$ for a vector $x = \{x_0, x_1, ..., x_{n-1}\}$ is $$\sigma(x)_j = \frac{e^{x_j}}{\sum_k e^{x_k}}$$


`IN:`
```python
array=array([
    [ 2,  3,  5,  7],
    [11, 13, 17, 19],
    [23, 29, 31, 37]])
axis=1
```
`OUT:`
```python
array([[0.006, 0.016, 0.117, 0.862],
       [0.   , 0.002, 0.119, 0.879],
       [0.   , 0.   , 0.002, 0.997]])
```

In [81]:
def np_softmax(array: np.array, axis = -1):
    ### INSERT YOUR CODE HERE ###
    raise NotImplementedError
    ### STOP YOUR CODE HERE ###
    return result

In [82]:
# ### Simle tests to check if you implemented correct functionality
# from scipy.special import softmax as scipy_softmax

# testing_examples = [
#     np.array([1,2,3]),
#     np.array([[1,2,3]]),
#     np.array([x for x in range(12)]).reshape(3,-1),
#     np.array([x for x in range(9)]).reshape(3,-1),
#     np.array([1e9,1e9+1,1e9+2]),
# ]
# for te in testing_examples:
#     for axis in range(te.ndim):
#         np.testing.assert_allclose(scipy_softmax(te,axis=axis), np_softmax(te, axis=axis))

In [83]:
### TO SHOW SOLUTION USE LINE BELOW ###
# %load ../91_solutions/ex3_2.py

### 01.03.04 Boolean and Masking
* Basics

In [84]:
array1 = np.array([x for x in range(12)]).reshape(3,-1)
array1

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [85]:
mask1 = array1 % 3 == 0
mask1

array([[ True, False, False,  True],
       [False, False,  True, False],
       [False,  True, False, False]])

In [86]:
# we get flat array
array1[mask1]

array([0, 3, 6, 9])

In [87]:
# what is the amount of numbers greater than 5
np.sum(array1 > 5)

6

In [88]:
np.sum(array1 > 5, axis=0, keepdims=True)

array([[1, 1, 2, 2]])

In [89]:
np.sum(array1 > 5, axis=1, keepdims=True)

array([[0],
       [2],
       [4]])

In [90]:
np.mean(array1 > 5, axis=1, keepdims=True)

array([[0. ],
       [0.5],
       [1. ]])

* All, Any, Boolean operators

In [91]:
array1

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [92]:
mask1 = array1 % 5 == 0
mask1

array([[ True, False, False, False],
       [False,  True, False, False],
       [False, False,  True, False]])

In [93]:
np.all(mask1)

False

In [94]:
np.any(mask1, axis=0, keepdims=True)

array([[ True,  True,  True, False]])

In [95]:
mask_div2 = array1 % 2 == 0
mask_div2

array([[ True, False,  True, False],
       [ True, False,  True, False],
       [ True, False,  True, False]])

In [96]:
mask_div3 = array1 % 3 == 0
mask_div3

array([[ True, False, False,  True],
       [False, False,  True, False],
       [False,  True, False, False]])

In [97]:
# np.bitwise_or (as an alternative)
mask_div2 | mask_div3

array([[ True, False,  True,  True],
       [ True, False,  True, False],
       [ True,  True,  True, False]])

In [98]:
# np.bitwise_and (as an alternative)
mask_div2 & mask_div3

array([[ True, False, False, False],
       [False, False,  True, False],
       [False, False, False, False]])

In [99]:
# Note: this may not work as you expect! 

# mask_div2 and mask_div3
# mask_div2 or mask_div3

**EXCERCISE 03.03**

Write your own function to manually check function gradient:

`IN:`
```python
cost: Callable
grad: Callable
x: np.array
```

`OUT:`
```python
print("Gradient check passed!")
```

In [100]:
# How our data will look like?
cost = lambda x: np.sum(x**2)
grad = lambda x: x*2
x = np.random.randn(2,3)
x

array([[ 1.11971196, -0.45792242,  0.4253934 ],
       [-0.02797118,  1.47598983,  0.6467801 ]])

In [101]:
# Create an itertor
x_iter = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not x_iter.finished:
    print(x_iter.multi_index)
    x_iter.iternext()

(0, 0)
(0, 1)
(0, 2)
(1, 0)
(1, 1)
(1, 2)


In [102]:
from typing import Callable

# First implement a gradient checker by filling in the following functions
def gradcheck_numeric(cost: Callable, grad: Callable, x: np.ndarray):
    """
    cost - cost function
    grad - cost function gradients
    x -- the point (numpy array) to check the gradient at
    """
    fx = cost(x)    # Evaluate cost
    grads = grad(x) # Evaluate grads
    h = 1e-4        # Offset to calculate central difference
    prec = 1e-5     # Max relative difference for numerical computations
    
    assert x.dtype.kind == 'f' # we need float data

    # Iterate over all indexes ix in x to check the gradient.
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        ix = it.multi_index

        ### YOUR CODE HERE:
        raise NotImplementedError
        ### END YOUR CODE

        # Compare gradients
        reldiff = abs(numgrad - grads[ix]) / max(1, abs(numgrad), abs(grads[ix]))
        if reldiff > 1e-5:
            print ("Gradient check failed.")
            print ("First gradient error found at index %s" % str(ix))
            print ("Your gradient: %f \t Numerical gradient: %f" % (
                grads[ix], numgrad))
            return

        it.iternext() # Step to next dimension

    print ("Gradient check passed!")

In [103]:
# ### Uncomment these cells to run checks ####
# cost_mean = lambda x: np.mean(x ** 2)
# cost_mean_grad = lambda x: (x * 2)/x.size

# gradcheck_numeric(cost_mean, cost_mean_grad, np.array(np.pi))         # scalar test
# gradcheck_numeric(cost_mean, cost_mean_grad, np.random.randn(10,))    # 1-D test
# gradcheck_numeric(cost_mean, cost_mean_grad, np.random.randn(4,5))    # 2-D test

In [104]:
### TO SHOW SOLUTION USE LINE BELOW ###
# %load ../91_solutions/ex3_3.py