# DSCI511 Python Numpy Review Session <Quiz 2>

**NUMPY IS IMPORTANT**

<img src="img/np_important.png" alt="Its You" width="500px">


[image source](https://www.nature.com/articles/s41586-020-2649-2)

In [2]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', 85)
pd.set_option('display.max_rows', 85)

## 0. Import Modules

VS code Demo

## 1. Python List VS Numpy Array

**Pros of Numpy** 
- Less bytes of memory because of fixed type (we can explicitly set the data type) 
- Faster access (see more articles about why: [Link1](https://www.labri.fr/perso/nrougier/from-python-to-numpy/#memory-layout), [Link2](https://www.datadiscuss.com/proof-that-numpy-is-much-faster-than-normal-python-array/), [Link3](https://www.jessicayung.com/numpy-arrays-memory-and-strides/))

<img src="img/li_np.png" alt="Its You" width="900px">

## 2. NumPy Array Attributes

### 2.1 Basics

In [89]:
# -----------------------  create an array (more coming up) --------------
arr = np.array([[1, 3], [2, 4]], dtype='int8')
print(f'the arr is \n {arr} \n')

# --------------- type ----------------
# array type - numpy.ndarray
print(f'the type: {type(arr)} \n')

# element type - int8
print(f'the element type: {arr.dtype} \n')

# -------- shape vs dimension----------
# shape
print(f'the array shape: {arr.shape} \n')

# dimension
print(f'the array dimention: {arr.ndim} \n')

# --------------- size ----------------
# array size - how many elements
print(f'# of elements: {arr.size} \n')

# element size
print(f'memory usage (bytes) of each element: {arr.itemsize} \n')

# memory usage
print(f'memory usage (bytes) of the array: {arr.nbytes} \n')


the arr is 
 [[1 3]
 [2 4]] 

the type: <class 'numpy.ndarray'> 

the element type: int8 

the array shape: (2, 2) 

the array dimention: 2 

# of elements: 4 

memory usage (bytes) of each element: 1 

memory usage (bytes) of the array: 4 



> **side note**: how to understand multi-dimensional array

In [106]:
a = np.arange(120).reshape(2, 3, 4, 5)
a

array([[[[  0,   1,   2,   3,   4],
         [  5,   6,   7,   8,   9],
         [ 10,  11,  12,  13,  14],
         [ 15,  16,  17,  18,  19]],

        [[ 20,  21,  22,  23,  24],
         [ 25,  26,  27,  28,  29],
         [ 30,  31,  32,  33,  34],
         [ 35,  36,  37,  38,  39]],

        [[ 40,  41,  42,  43,  44],
         [ 45,  46,  47,  48,  49],
         [ 50,  51,  52,  53,  54],
         [ 55,  56,  57,  58,  59]]],


       [[[ 60,  61,  62,  63,  64],
         [ 65,  66,  67,  68,  69],
         [ 70,  71,  72,  73,  74],
         [ 75,  76,  77,  78,  79]],

        [[ 80,  81,  82,  83,  84],
         [ 85,  86,  87,  88,  89],
         [ 90,  91,  92,  93,  94],
         [ 95,  96,  97,  98,  99]],

        [[100, 101, 102, 103, 104],
         [105, 106, 107, 108, 109],
         [110, 111, 112, 113, 114],
         [115, 116, 117, 118, 119]]]])

### 2.2 strides (understand reshape)

[stride tricks in CNN](https://jessicastringham.net/2017/12/31/stride-tricks/)

In [91]:
# understanding strides helps you understand reshape and other array manipulation.
# if you access a[i, j], it just calculated as i * strides[i] + j * strides[j]

# create an array (3, 2)
a1 = np.array([[1, 2], [3, 4], [5, 6]], dtype='int8')
print(a1, '\n')

# shape - (3, 2)
print(f'a1 array shape: {a1.shape} \n')
# strides - (2, 1) 
print(f'a1 strides: {a1.strides} \n') # - (every 2 bytes, increment a row, every 1 byte, increment a column in the array).

# reshape
re_a1 = a1.reshape(2,3)
# re_a1 = a1.reshape(2, -1) # using -1 will calculate the dimension for you (if possible)
print(f'after reshape: \n {re_a1} \n')

# shape - (2, 3)
print(f're_a1 array shape: {re_a1.shape} \n')
# strides - (3, 1)
print(f're_a1 strides: {re_a1.strides}') # - (every 3 bytes, increment a row, every 1 byte, increment a column in the array).

[[1 2]
 [3 4]
 [5 6]] 

a1 array shape: (3, 2) 

a1 strides: (2, 1) 

after reshape: 
 [[1 2 3]
 [4 5 6]] 

re_a1 array shape: (2, 3) 

re_a1 strides: (3, 1)


> **side note**

In [58]:
# heterogenous array
a = np.array([['a', 'b', 'c'], 1, 3.14159], dtype='object')

array([list(['a', 'b', 'c']), 1, 3.14159], dtype=object)

## 3. Create an array

In [72]:
# 1. np.array(list)
np_li1 = np.array([1, 2, 3, 4])
print(np_li1)

np_li2 = np.array([5, 6, 7, 8])
print(np_li2)


# 2. np.zeros(shape) - Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)


# 3. np.ones(shape) - Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)


# 4. np.full(shape) - Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)


# 5. np.arange(start, end, step) - Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)


# 6. np.linspace(start, end, size) - Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)


# 7. np.random.random(shape) - Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

# 8. np.random.normal(mean, standard_devia, shape) - Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

# 9. np.random.randint(start, end, shape) - Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

# 10. np.eye(a_number) Create a 3x3 identity matrix
np.eye(3)
np.identity(5)

# 11. Repeat an array
arr = np.array([[1,2,3]]) 
r1 = np.repeat(arr,3, axis=0)

> **side note**   
> numpy array has more operation than python list including:
> - [math](https://numpy.org/doc/stable/reference/routines.math.html) (+ - * / \**)
> - [linear algebra](https://numpy.org/doc/stable/reference/routines.linalg)(matmul)
> - [Statistics](https://numpy.org/doc/stable/reference/routines.statistics.html) (min, max, sum)

In [104]:
np_arr1 = np.array([1, 2, 3, 4])
np_arr2 = np.array([5, 6, 7, 8])

# 1. math (+ - * /)
# ----------------------------------
# for numpy array - YES

print(f'+ for array - adding two arrays: {np_arr1 + np_arr2} \n')

print(f'np.concatenate - appending two arrays: {np.concatenate([np_arr1, np_arr2])} \n')

# for python list - NO
py_li1 = [1, 2, 3, 4]
py_li2 = [5, 6, 7, 8]
print(f'+ for python lists - appending two lists: {py_li1 + py_li2} \n')
# mul_lis = py_li1 * py_li2





# 2. stat (sqrt, sum, power)
# ----------------------------------
sides = np.array([3, 4])
print(f'triangle side: {np.sqrt(np.sum([np.power(sides[0], 2), np.power(sides[1], 2)]))} \n')






# 3. linear algebra (matrix product)
a = np.ones((2,3)) 
print(f'a is \n {a} \n') 
b = np.full((3,2), 2) 
print(f'b is \n {b} \n') 
print(f'matrix product is \n {np.matmul(a,b)}')

+ for array - adding two arrays: [ 6  8 10 12] 

np.concatenate - appending two arrays: [1 2 3 4 5 6 7 8] 

+ for python lists - appending two lists: [1, 2, 3, 4, 5, 6, 7, 8] 

triangle side: 5.0 

a is 
 [[1. 1. 1.]
 [1. 1. 1.]] 

b is 
 [[2 2]
 [2 2]
 [2 2]] 

matrix product is 
 [[6. 6.]
 [6. 6.]]


## 4. Array Indexing

In [81]:
arr = np.arange(1, 13).reshape(3, 4)
print('arr is')
print(arr, '\n')

# 1. Numeric Indexing
print(f'arr[-1] is \n {arr[-1]} \n')

print(f'arr[0:2] is \n {arr[0:2]} \n')

print(f'arr[2:0:-1] is \n {arr[2:0:-1]} \n')

print(f'arr[:, 2] is \n {arr[:, 2]} \n')

print(f'arr[2, 0] is \n {arr[2, 0]} \n')

# 2. boolean indexing
print(f'arr[arr > 6] is \n {arr[arr > 6]}')

arr is
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]] 

arr[-1] is 
 [ 9 10 11 12] 

arr[0:2] is 
 [[1 2 3 4]
 [5 6 7 8]] 

arr[2:0:-1] is 
 [[ 9 10 11 12]
 [ 5  6  7  8]] 

arr[:, 2] is 
 [ 3  7 11] 

arr[2, 0] is 
 9 

arr[arr > 6] is 
 [ 7  8  9 10 11 12]


## 5. Summary

<img src="img/np_summary.png" alt="Its You" width="1700px">


[image source](https://www.nature.com/articles/s41586-020-2649-2)

# Questions?

## 1) Deep copy VS Shallow copy for numpy array

In [29]:
import copy
print('------------------ a ---------------------')
# a = np.array([1, 2, "hello", 5])
a = np.array([1, 'm', 3], dtype=object)
print(f"a's id is {id(a)}")

print('------------------ b ---------------------')
b = np.copy(a)
print(f"b's id is {id(b)}")

b[2] = 10
print(f'a is \n {a}')
print(f'b is \n {b}')

# print('------------------ c ---------------------')
# c = copy.copy(a)
# print(f"c's id is {id(c)}")

# c[3] = 11
# print(f'a is \n {a}')
# print(f'c is \n {c}')

# print('------------------ d ---------------------')

# d = copy.deepcopy(a)
# print(f"d's id is {id(d)}")

# d[3] = 12
# print(f'a is \n {a}')
# print(f'd is \n {d}')

------------------ a ---------------------
a's id is 140654537262784
------------------ b ---------------------
b's id is 140654537265024
a is 
 [1 'm' 3]
b is 
 [1 'm' 10]
