## CS210 - Recitation 2

### NumPy

Scientific computing library that contains high-performance multidimensional array objects and tools for working with these arrays. More information can be obtained from [this link.](https://docs.scipy.org/doc/numpy/reference/index.html)

#### Installation

You may use _pip_ to install the library.

``` .sh
pip install numpy
```

Or if you are using Anaconda (you may not need it since most of the libraries, including Numpy, come built-in with Anaconda):

``` .sh
conda install -c anaconda numpy
```

Once you install the library, use the following import statement to make sure that everything is fine.

``` .py
>>> import numpy
```

In [0]:
# np as an alias for numpy
# now, we can use the library with np prefix
import numpy as np

### Arrays

Very similar to lists; however, Numpy arrays can only contain elements  
that are all the same data type. Arrays consume smaller memory and have a better runtime behavior.

###### Array creation

We can initialize numpy arrays from Python lists and access elements with indices.

``` py
>>> a = np.array([1, 2, 3])
>>> print(a[0])
    1
```

In addition, we can index ndarrays with tuples. For example, in order to get the element  
at first row, second column we can use the following statement.

``` py
>>> a = np.array([[1., 2., 3.], [3., 4., 5.], [6., 7., 8.]])
>>> print(a[0,1])  # row index, column index
    2
```

###### Array attributes

- number of dimensions (ndim) -> the rank of the array.

``` .py
>>> ndarray.ndim
```
- the shape of an array (shape) -> the dimensions of the array.

``` .py
>>> ndarray.shape
```
- size -> the total number of elements of the array.

``` .py
>>> ndarray.size
```
- data type (dtype) -> an object describing the type of the elements in the array.

``` .py
>>> ndarray.dtype
```

In [0]:
ndarray = np.array([1, 2, 3])

# returns the number of dimensions, in other words the length of shape attribute
print("rank: {}".format(ndarray.ndim))

# a tuple that represents the size of each dimension
print("shape: {}".format(ndarray.shape))

# total number of elements in the array
print("size: {}".format(ndarray.size))

# the data type of the elements
print("dtype: {}".format(ndarray.dtype))

print("first element {}".format(ndarray[0]))  # indexing to access an element

rank: 1
shape: (3,)
size: 3
dtype: int64
first element 1


In [0]:
# another example with a 2d nested list
ndarray = np.array([[1, 2, 3], [3, 4, 5], [6, 7, 8]], dtype=float)  # dtype parameter

print("rank: {}".format(ndarray.ndim))
print("shape: {}".format(ndarray.shape))
print("size: {}".format(ndarray.size))
print("dtype: {}".format(ndarray.dtype))

row = 0
col = 1
print("element @ row: {}, col: {} -> {}".format(row, col, ndarray[row,col]))  # since we have a 2d array

rank: 2
shape: (3, 3)
size: 9
dtype: float64
element @ row: 0, col: 1 -> 2.0


### ndarray Functions

We also have array utilities that help us create special ndarrays.

``` py
>>> a = np.zeros((2,2))
>>> print(a)
    [[ 0.  0.]
    [ 0.  0.]]
```

``` py
>>> a = np.ones((2,2))
>>> print(a)
    [[ 1.  1.]
    [ 1.  1.]]
```

``` py
>>> a = np.full((2,2), 3)  # creates a constant array
>>> print(a)               
    [[ 3.  3.]
    [ 3.  3.]]
```

``` py
>>> a = np.eye(2)  # creates an identity matrix
>>> print(a)              
    [[ 1.  0.]
    [ 0.  1.]]
```

``` py
>>> a = np.random.random((2,2))  # creates an array filled with random values
>>> print(a)                     # a possible output
    [[ 0.91940167  0.08143941]
    [ 0.68744134  0.87236687]]
```

### More on array creation

In native Python, we have `range` statement to create a sequence that can be converted to a list.

``` py
>>> lst = [i for i in range(1,10,2)]
>>> print(lst)
    [1, 3, 5, 7, 9]
```

With Numpy, we can create more sophisticated arrays.

``` py
>>> ndarray = np.arange(8,20,4, dtype=float)  # start, end (excluding), stride
>>> ndarray
    array([ 8., 12., 16.])
```

``` py
>>> ndarray = np.linspace(10, 100, 5)  # start, end (including), number of elements
>>> ndarray                            # equally separated elements
    array([ 10. ,  32.5,  55. ,  77.5, 100. ])
```

Another great tool of Numpy is the `reshape` function that allows us to change the shape of the ndarray.

``` py
>>> ndarray = np.arange(35).reshape(5,7)
>>> ndarray
    array([[ 0,  1,  2,  3,  4,  5,  6],
           [ 7,  8,  9, 10, 11, 12, 13],
           [14, 15, 16, 17, 18, 19, 20],
           [21, 22, 23, 24, 25, 26, 27],
           [28, 29, 30, 31, 32, 33, 34]])
```

### Array Indexing

As mentioned before, Numpy has a flexible indexing mechanism compared to native Python.

``` py
>>> ndarray = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
>>> ndarray[:2, 1:3]
    [[2 3]
    [6 7]]
```

One tricky method is to use integer array indexing.

``` py
>>> ndarray = np.array([[1, 2], [3, 4], [5, 6]])
>>> ndarray[[1, 1, 0, 2]]  # each element represents an index 
    array([[3, 4],
           [3, 4],
           [1, 2],
           [5, 6]])
```

Boolean indexing

``` py
>>> ndarray = np.array([[1, 2, 3, 4, 5, 6]])
>>> ndarray > 5
    array([[False, False, False, False, False,  True]])
    
>>> ndarray[ndarray > 5]
    array([6])
```

### Data types

In addition to the existing native Python data types, such as `int` and `float`, Numpy has its own numeric data types as well. Numpy tries to guess a datatype when you create an array.

``` py
>>> ndarray = np.array([[1, 2, 3, 4, 5, 6]])
>>> ndarray.dtype
    dtype('int64')
```

We can change the data type of an array with additional `dtype` parameter;

``` py
>>> ndarray = np.array([[1, 2, 3, 4, 5, 6]], dtype=np.int32)
>>> ndarray.dtype
    dtype('int32')
```

Or, after the array is created.

``` py
>>> ndarray = np.array([[1, 2, 3, 4, 5, 6]])
>>> ndarray.dtype = np.float64
>>> ndarray.dtype
    dtype('float64')
```

Deciding on the data type may affect the performance depending on the context of the problem.  
Full list of available data types can be observed from this [link.](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html)

### Array Math

Mathematical functions operate elementwise on arrays.

In [0]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

print("scalar summation")
print(x + 5)
print()

print("array-wise summation")
print(x + y)
print()

print(np.add(x, y))  # same as the previous one
print()

print("subtraction")
print(x - y)
print()

print(np.subtract(x, y))
print()

print("multiplication")
print(x * y)
print()

print(np.multiply(x, y))
print()

print("divison")
print(x / y)
print()

print(np.divide(x, y))
print()

print("square root")
print(np.sqrt(x))

scalar summation
[[6. 7.]
 [8. 9.]]

array-wise summation
[[ 6.  8.]
 [10. 12.]]

[[ 6.  8.]
 [10. 12.]]

subtraction
[[-4. -4.]
 [-4. -4.]]

[[-4. -4.]
 [-4. -4.]]

multiplication
[[ 5. 12.]
 [21. 32.]]

[[ 5. 12.]
 [21. 32.]]

divison
[[0.2        0.33333333]
 [0.42857143 0.5       ]]

[[0.2        0.33333333]
 [0.42857143 0.5       ]]

square root
[[1.         1.41421356]
 [1.73205081 2.        ]]


In [0]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])

print("inner-product of vectors")
print(v.dot(w))
print(np.dot(v, w))
print()

print("element-wise product")
print(x * y)
print()

print("Matrix product")
print(x.dot(y))
print(np.dot(x, y))
print()

print("np.matmul can also be used")
print(np.matmul(x, y))

inner-product of vectors
219
219

element-wise product
[[ 5 12]
 [21 32]]

Matrix product
[[19 22]
 [43 50]]
[[19 22]
 [43 50]]

np.matmul can also be used
[[19 22]
 [43 50]]


In [0]:
x = np.array([[1,2],[3,4]])

# sum of all elements in the array
print(np.sum(x))

# sum of each column
print(np.sum(x, axis=0))  

# sum of each row
print(np.sum(x, axis=1))

10
[4 6]
[3 7]


In [0]:
# taking the transpose of an array
x = np.array([[1,2], [3,4]])

print("original matrix")
print(x)
print()

print("transposed")
print(x.T)

original matrix
[[1 2]
 [3 4]]

transposed
[[1 3]
 [2 4]]


In [0]:
x = np.arange(10)

print("Mean value is: ", x.mean())
print("Max value is: ", x.max())
print("Min value is: ", x.min())

Mean value is:  4.5
Max value is:  9
Min value is:  0


In [0]:
x = np.arange(10).reshape(2, 5)

# axis=0 column wise
print("Mean value is: ", x.mean(axis=0))
print("Max value is: ", x.max(axis=0))
print("Min value is: ", x.min(axis=0))

Mean value is:  [2.5 3.5 4.5 5.5 6.5]
Max value is:  [5 6 7 8 9]
Min value is:  [0 1 2 3 4]


### Back to reshaping

The shape of an array can be changed with various commands.

``` py
>>> x = np.arange(10)
>>> x.reshape(2, 5)  # returns the array with a modified shape
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])
       
>>> x.resize(5, 2)  # modifies the array itself
>>> x
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])
```

If a dimension is given as -1 in a reshaping operation, the other dimensions are automatically calculated.

``` py
>>> x = np.arange(12)
>>> x.reshape(3, -1)  # number of cols is to be guessed by numpy
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
```

In some cases, we need the flatten version of the array. 

``` py
>>> x = np.arange(10).reshape(2, 5)
>>> y = x.flatten()
>>> y.shape
(10,)
```

``` py
>>> x = np.arange(10).reshape(2, 5)
>>> y = x.ravel()
>>> y.shape
(10,)
```

`ravel`and `flatten` have the same effect on arrays. The difference between is the new array created using ravel is actually a reference to the parent array. So, any changes to the new array will affect the parent as well. But is memory efficient since it does not create a copy.

### Random

The `random` module provides nice functions to generate random numbers (and also statistical distributions) of any given shape.

``` py
# random numbers between [0,1) of shape 2,2
>>> np.random.rand(2,2)
[[0.60332224 0.37830503]
 [0.1736118  0.38234196]]
 
# standard normal distribution with mean=0 and variance=1 of shape 2,2
>>> np.random.randn(2,2)
[[-2.10285019  0.4114038 ]
 [-0.65259364 -0.05859651]]

# random integers between [0, 10) of shape 2,2
>>> np.random.randint(0, 10, size=[2,2])
[[8 9]
 [9 9]]

# one random number between [0,1)
>>> np.random.random()
0.8150009405258001

# random numbers between [0,1) of shape 2,2
>>> np.random.random(size=[2,2])
[[0.64642465 0.22353422]
 [0.97918432 0.44304317]]

# rick 10 items from a given list, with equal probability
>>> np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10)
['o' 'e' 'i' 'a' 'i' 'a' 'e' 'u' 'e' 'i']

# pick 10 items from a given list with a predefined probabilities
>> np.random.choice(['a', 'e', 'i', 'o', 'u'], 
    size=10, p=[0.3, .1, 0.1, 0.4, 0.1])
['o' 'o' 'o' 'o' 'o' 'i' 'o' 'e' 'e' 'u']
```

## Exercise 1

- Create a `5x5` matrix that contains `random integers between 0-50`.
- Normalize the matrix so that `all rows sum to 1.0.`
- Finally, show that the summation of all rows add up to 1.

In [0]:
# your code

## Exercise 2

- Create a function named `euclidean_dist`, which takes two parameters as numpy arrays and returns the euclidean distance between them.
 
**Euclidean Distance**

$\sqrt{\sum_i^N \, (v_i - u_i)^2}$

In [0]:
# your code