# DSC4001-01 Exercise 04

**This exercise notebook will go through the data types in Python:**


* NumPy Arrays
* NumPy Array Operations


NumPy (Numpy) is a Linear Algebra Library for Python. It is so important for Data Science with Python because almost all of the libraries in the PyData Ecosystem rely on NumPy as one of their main building blocks. 

Once you've installed NumPy you can import it as a library.

NumPy has many built-in functions and capabilities. We will focus on some of the most important aspects of NumPy: arrays, number generation, ... 

In [None]:
import numpy as np
np.random.seed(0)

## NumPy Arrays

### Creating NumPy Arrays

We can create an array (or matrix) by directly converting a list (or list of lists)

In [None]:
my_list = [1,4,2,5,3]
my_list

In [None]:
my_arr = np.array(my_list)
my_arr

In [None]:
my_matrix = [[1,2,3],[4,5,6],[7,8,9]]
print(my_matrix)

In [None]:
np.array(my_matrix)

In [None]:
my_arr = np.array(my_list, dtype='float32')
my_arr

There are lots of built-in ways to generate Arrays:

* ``arange``: return evenly spaced values (mostly integers) within a given interval
* ``zeros`` and ``ones``: generate arrays of zeros or ones
* ``linspace``: return evenly spaced numbers (mostly floats) over a specified interval
* ``eye``: create an identity matrix

In [None]:
print(np.arange(0,10))
print(np.arange(0,11,2))

In [None]:
print(np.zeros(3))
print(np.zeros((3,4)))

In [None]:
print(np.ones(3))
print(np.ones((3,2,2)))

In [None]:
print(np.linspace(0,10,3))
print(np.linspace(0,10,7))

In [None]:
print(np.eye(4))

NumPy also has lots of ways to create random number arrays:

* ``rand``: create an array of the given shape with random samples from a uniform distribution over $[0,1)$
* ``randn``: create an array of the given shape with random samples from the standard normal distribution
* ``randint``: return random integers in the interval



In [None]:
print(np.random.rand(2))
print(np.random.rand(3,4))

In [None]:
print(np.random.randn(2))
print(np.random.randn(3,4))

In [None]:
print(np.random.randint(2, size=5))
print(np.random.randint(1, size=5))
print(np.random.randint(1, 100, 10))

In [None]:
# 2x4 array of ints between 0 and 4, inclusive:
print(np.random.randint(5, size=(2,4)))

# 2x3 array with 3 different upper bounds:
print(np.random.randint(1, [3,5,10], size=(10,3)))

### Array Attributes and Methods

Let's play with some useful attributes and methods

In [None]:
arr = np.random.randint(10, size=(3,4,2))
arr

In [None]:
print('arr ndim: ', arr.ndim)
print('arr shape: ', arr.shape)
print('arr size: ', arr.size)
print('arr dtype: ', arr.dtype)
print('arr itemsize: ', arr.itemsize, 'bytes')
print('arr nbytes: ', arr.nbytes, 'bytes')

Reshape 

* ``reshape``: return an array containing the same data with a new shape


In [None]:
print(arr)
print(arr.shape)

In [None]:
arr.reshape(4,3,2)

In [None]:
arr.reshape(2,2,6)

max, min, argmax, argmin


In [None]:
rand_arr = np.random.randint(0,50,10)
rand_arr

In [None]:
print(rand_arr.max())
print(rand_arr.min())

print(rand_arr.argmax())
print(rand_arr.argmin())

##Indexing and Slicing:

* ``arr[1st dim index, 2nd dim index, ...]``
* Remember that index starts from ``0`` ...
* ... and Negative intex starts from ``-1`` 

In [None]:
arr = np.random.randint(10, size=(3,4,2))
arr

In [None]:
# Get a value at an index
print(arr[0,0,0])
print(arr[0,1,1])
print(arr[2,2,1])

In [None]:
# Get values in a range
print(arr[:1, :2, :1])

In [None]:
print(arr[:2, ::2, 0])

``Broadcasting``: NumPy arrays differ from a normal Python list because of their ability to broadcast:

``view``: Array slices return views rather than copies of the array data. If we modify subarray, the original array is changed. This avoids memory problems!

``copy``: To get a data copy, need to use .copy()


In [None]:
sub_arr = arr[1]
print(sub_arr)

In [None]:
sub_arr[:,1] = 5
print(sub_arr)

In [None]:
print(arr)

In [None]:
arr_copy = arr.copy()
arr_copy

In [None]:
arr_copy[1,:,1] = 100
print(arr_copy)
print(arr) 

Selection: how to use brackets for selection based on comparison operators

In [None]:
arr = np.arange(1,11)
arr 

In [None]:
bool_arr = arr > 4
bool_arr

In [None]:
arr[bool_arr]

In [None]:
x = 7
arr[arr < x]

## NumPy Operations


### Arithmetic operations

We can easily perform array with array arithmetic operations:

* $+$ or ``np.add``: addition
* $-$ or ``np.subtract``: subtraction
* $*$ or ``np.multiply``: multiplication
* $/$ or ``np.divide``: division
* $**$ or ``np.power``: exponentiation
* $%$ or ``np.mod``: modulus / remainder

In [None]:
arr = np.arange(0, 10)
arr

In [None]:
arr + arr

In [None]:
arr - arr

In [None]:
arr * arr

In [None]:
# if devision by zero, it gives ``nan``
arr / arr

In [None]:
arr ** 3

Broadcasting

In [None]:
arr_1 = np.arange(3)
print(arr_1)

arr_1 + 5

In [None]:
arr_2 = np.ones((3,3))
print(arr_1)
print(arr_2)

arr_1 + arr_2

In [None]:
print(arr_1)
print(arr_1.reshape(3,1))

arr_1 + arr_1.reshape((3,1))

### Universal Array Functions

NumPy comes with many universal array functions, which are essentially just mathematical operations you can use to perform the operation across the array.

* ``np.sum``: compute sum of elements
* ``np.prod``: compute product of elements
* ``np.mean``: compute mean of elements
* ``np.std``: compute standard deviation
* ``np.var``: compute variance
* ``np.median``: compute median of elements
* ``np.sqrt``: compute square root
* ``np.exp``: exponentiation
* ``np.log``: natural logarithm
* ``np.abs``: absolute value


In [None]:
arr = np.arange(0, 10)
arr

In [None]:
np.sum(arr)

In [None]:
np.prod(arr[1:4])

In [None]:
print(np.mean(arr))
print(np.std(arr))
print(np.var(arr))

In [None]:
np.sqrt(arr)

In [None]:
np.exp(arr)

In [None]:
print(np.log(arr))
print(np.log2(arr))

We can specify the ``axis`` along which the aggregate is computed: Note that the axis index also starts from ``0``

In [None]:
arr = np.random.randint(1,10,size=(3,4))
arr

In [None]:
print(arr.sum())

In [None]:
print(arr.sum(axis=0))
print(arr.sum(axis=1))

In [None]:
print(arr.mean())
print(arr.mean(axis=0))
print(arr.mean(axis=1))

In [None]:
print(arr.max())
print(arr.max(axis=0))
print(arr.max(axis=1))

Comparison operators

* $==$ or ``np.equal``: equal 
* $!=$ or ``np.not_equal``: not equal
* $<$ or ``np.less``: less
* $<=$ or ``np.less_equal``: less equal
* $>$ or ``np.greater``: greater
* $>=$ or ``np.greater_equal``: greater equal

In [None]:
arr = np.arange(0, 10)
arr

In [None]:
np.less(arr, 5)

In [None]:
np.less_equal(arr, 5)

Sorting arrays:

* ``np.sort``: return a sorted version of the array
* ``np.argsort``: return the indices of the sorted elements

In [None]:
arr = np.random.randint(0,50,10)
arr

In [None]:
arr.sort()
arr

In [None]:
print(np.sort(arr))
print(arr)

In [None]:
arr = np.random.randint(0,50,size=(3,4))
arr

In [None]:
# default axis = -1
print(np.sort(arr))
print(np.sort(arr, axis=0))
print(np.sort(arr, axis=1))

## Quiz

Now we've learned about NumPy Arrays. Let's test your knowledge.

1. Create an array of 10 zeros

2. Create an array of 10 ones

3. Create an array of the integers from 10 to 25

4. Create an array of 5 values evenly spaced between 0 to 1 (inclusive): Output should be ``array([0., 0.25, 0.5, 0.75, 1.])``

5. Create a 4 x 3 matrix with integers ranging from 1 to 12

hint: use arange and reshape

6. Create a random number between 0 and 1

7. Create a 5x5 matrix with 25 random numbers sampled from a standard normal distribution

8. Create the following matrix: 

``array([[0.01, 0.02, 0.03, 0.04, 0.05],[0.06, 0.07, 0.08, 0.09, 0.1], [0.11, 0.12, 0.13, 0.14, 0.15], [0.16, 0.17, 0.18, 0.19, 0.2]]])``

hint: first create matrix with integers ranging from 1 to 20, and divide it by 100


You will be given a matrix, and be asked to replicate the resulting matrix outputs by using **indexing, slicing, and selection**

In [None]:
mat = np.arange(1,17).reshape(4,4)
mat

9. Output should be 
``array([[5,6,7,8],[9,10,11,12]])``

10. Output should be ``10``

11. Output should be ``array([[3], [7], [11]])``

Note that the output is not ``array([3, 7, 11])``: make ``3``,``7``,``11`` in separate bracket, i.e., output shape should be ``(3,1)``, not ``(3,)``.

12. Output should be ``array([[6,7],[14,15]])``

With new matrix:

In [None]:
rand_mat = np.array([[4, 19, 48, 4], [43, 13, 39, 36], [23, 6, 24, 44]])
rand_mat

13. Compute the sum of all values in rand_mat

14. Compute the standard deviation of the values in rand_mat

15. Get the mean of all the columns in rand_mat

16. Get the max and its index of all the columns in rand_mat

In [None]:
# max

In [None]:
# argmax