<h1 align="center">Data Manipulation Use Numpy</h1>


## 1. Introduction
This notebook focuses on numpy, a fundamental library that widely used in data science. Numpy is a library storing and manupulating data. NumPy's main object is homogeneous multidimensional array. It is table of elements (ussualy numbers) of all same types, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes. This notebook is divided into three parts: 1D array, 2D array and multi-dimensional array.

## 2. 1D Array
1D array, a vector is a list of elements

In [1]:
import time
import numpy as np
# start time
start = time.time()
# one way to create a vector is from list
v = np.array([1, 2, 2], np.float64)
print(f' vector v has: {v.ndim} dimension with shape {v.shape} and type {v.dtype}')
# it also possible to create a vector with same shape as v 
# np.zeros_like(); np.ones_like(); np.empty_like()
v_zero = np.zeros_like(v)
print(f'the zeor vetor has: {v_zero.ndim} dimension with shape {v_zero.shape} and type {v_zero.dtype}')
# create vector fill with a number
v_seven = np.full_like(v, 7)
print(f'the zeor vetor has: {v_seven.ndim} dimension with shape {v_seven.shape} and type {v_seven.dtype}')

 vector v has: 1 dimension with shape (3,) and type float64
the zeor vetor has: 1 dimension with shape (3,) and type float64
the zeor vetor has: 1 dimension with shape (3,) and type float64


In [2]:
# create a vector with range of number
v_stop = np.arange(10) # stop at 10
print(f'v_stop: {v_stop}')
v_start_stop = np.arange(2, 10) # start at 2 stop at 10
print(f'v_start_stop: {v_start_stop}')
v_start_stop_step = np.arange(2, 10, 2) # start at 2 stop at 10 with step 2
print(f'v_start_stop_step: {v_start_stop_step}')
# create create a vector with evenly spaced number in a range
v_linspace = np.linspace(1, 9, 10) # start at 1 stop at 10 with 10 number
print(f'v_linspace: {v_linspace}')
print(f'note that linspace will include the stop number')

v_stop: [0 1 2 3 4 5 6 7 8 9]
v_start_stop: [2 3 4 5 6 7 8 9]
v_start_stop_step: [2 4 6 8]
v_linspace: [1.         1.88888889 2.77777778 3.66666667 4.55555556 5.44444444
 6.33333333 7.22222222 8.11111111 9.        ]
note that linspace will include the stop number


In [3]:
# random number
# a new interface for random number is np.random
rng = np.random.default_rng() # create a random number generator
v_random = rng.random(3)
print(f'v_random: {v_random}')
# random integer
v_random_int = rng.integers(1, 10, 3, endpoint=True) # set include end point; return uniform random integer
print(f'v_random_int: {v_random_int}')
v_random_normal = rng.normal(0, 1, 3) # return normal random number with mean 0 and std 1
print(f'v_random_normal: {v_random_normal}')


v_random: [0.40303597 0.11833715 0.46480547]
v_random_int: [5 2 6]
v_random_normal: [ 0.11985137 -0.75694375  0.97754483]


In [4]:
# indexing and slicing is similar to list
v_index = np.arange(10)
print(f'this is the vector: {v_index}')
print(f"this is first element: {v_index[0]}")
print(f'this slice from second element upto fourth element: {v_index[1:4]}')
print(f'this is a slice from second last element to the end: {v_index[-2:]}')
print(f'This select every second element: {v_index[::2]}')
print(f'this is fancy indexing select element at indext 3, 5, 7: {v_index[[3, 5, 7]]}')


this is the vector: [0 1 2 3 4 5 6 7 8 9]
this is first element: 0
this slice from second element upto fourth element: [1 2 3]
this is a slice from second last element to the end: [8 9]
This select every second element: [0 2 4 6 8]
this is fancy indexing select element at indext 3, 5, 7: [3 5 7]


Boolean indexing also works in 1D array. It is a powerful tool to filter data. Let's see an example.

In [5]:
v_bool = np.array([1, 2, 3, 4, 5, 6, 7, 5, 4, 3, 2, 1])
print(f'are there any element greater than 6: {np.any(v_bool > 6)}')
print(f'are all element greater than 5: {np.all(v_bool > 5)}')
print(f'compare element with 5: {v_bool == 5}')
print(f'these are elements that greater than 5: {v_bool[v_bool > 5]}')

are there any element greater than 6: True
are all element greater than 5: False
compare element with 5: [False False False False  True False False  True False False False False]
these are elements that greater than 5: [6 7]


## 3. 2D Array
2D array is a maxtrix. It's creation syntax is similar to 1D array with extra argument for shape. Let's have examples:

In [6]:
m_ones = np.ones((3, 3))
print(f'matrix of ones:\n {m_ones}')
m_list = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(f'matrix from list:\n {m_list}')
m_full = np.full((3, 4), 5) # create a matrix with shape (3, 4) fill with 5
print(f'this is a matrix fill with 5:\n {m_full} \n the matrix has shape {m_full.shape}')

matrix of ones:
 [[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
matrix from list:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
this is a matrix fill with 5:
 [[5 5 5 5]
 [5 5 5 5]
 [5 5 5 5]] 
 the matrix has shape (3, 4)


### 3.1 Indexing

In [7]:
# indexing and slicing is similar to list, zeor base index
m_index = np.arange(16).reshape(4, 4)
print(f'this is the index matrix:\n {m_index}')

this is the index matrix:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


In [8]:
print(f'this all return a view not a copy')
print(f'this select second row and all column: {m_index[1]}')
print(f'this select second column and all row: {m_index[:, 1]}')
print(f'this select element at second row and second column: {m_index[1, 1]}')
print(f'this select elements from second last row and second last column: {m_index[-2:, -2:]}')
print(f'this select every second row start from 0 and every second column start from 1:\n {m_index[::2, 1::2]}')

this all return a view not a copy
this select second row and all column: [4 5 6 7]
this select second column and all row: [ 1  5  9 13]
this select element at second row and second column: 5
this select elements from second last row and second last column: [[10 11]
 [14 15]]
this select every second row start from 0 and every second column start from 1:
 [[ 1  3]
 [ 9 11]]


To me experience, when working with high-dimensional array (in NumPy) or tensor (in Pytorch or TensorFlow), one most important task is to manipulate the shape of the array/tensor. Mastering this skill is critical for data preprocessing and model building. This notebook focuses on array but  same concept can be applied to tensor.

### 3.2. Reshape
As the word sugested, reshape is to change the shape of an array. This includes changing from lower dimension to higher dimension and vice versa, from 1D to 2D; switching between row and column, transpose. In this notebook, I focuses in `.np.reshape()` and `np.transpose()` method. But there are some other useful methods like `.np.ravel()`, `.np.flatten()`, `.np.squeeze()`, `np.swapaxes()`. Let's have examples:
Note that in Numpy 1D array is considered as row vector. Let's have examples:

In [9]:
m_original = np.arange(20)
print(f'this is the original matrix:\n {m_original}')

this is the original matrix:
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [10]:
m_2D = m_original.reshape(4, 5)
print(f'this is 2D matrix from m_original:\n {m_2D}')

this is 2D matrix from m_original:
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]


In [11]:
m_1D_3D = m_original.reshape(2, 2, 5)
print(f'this is 3D matrix from m_original:\n {m_1D_3D}')

this is 3D matrix from m_original:
 [[[ 0  1  2  3  4]
  [ 5  6  7  8  9]]

 [[10 11 12 13 14]
  [15 16 17 18 19]]]


In [12]:
m_2D_3D_front = m_2D.reshape(1, 4, 5) # adding a new dimension at the front
print(f'this is 3D matrix from m_2D:\n {m_2D_3D_front}')

this is 3D matrix from m_2D:
 [[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]]]


In [13]:
m_2D_3D_back = m_2D.reshape(4, 5, 1) # adding a new dimension at the back
print(f'this is 3D matrix from m_2D:\n {m_2D_3D_back}')

this is 3D matrix from m_2D:
 [[[ 0]
  [ 1]
  [ 2]
  [ 3]
  [ 4]]

 [[ 5]
  [ 6]
  [ 7]
  [ 8]
  [ 9]]

 [[10]
  [11]
  [12]
  [13]
  [14]]

 [[15]
  [16]
  [17]
  [18]
  [19]]]


In [14]:
m_2D_3D_middle = m_2D.reshape(4, 1, 5) # adding a new dimension at the middle
print(f'this is 3D matrix from m_2D:\n {m_2D_3D_middle}')

this is 3D matrix from m_2D:
 [[[ 0  1  2  3  4]]

 [[ 5  6  7  8  9]]

 [[10 11 12 13 14]]

 [[15 16 17 18 19]]]


In [15]:
# flatten a matrix to 1D; collagse all dimensions
m_2D_3D_middle_flatten = m_2D_3D_middle.flatten()
print(f'this is 1D matrix from m_2D_3D_middle:\n {m_2D_3D_middle_flatten}')

this is 1D matrix from m_2D_3D_middle:
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In previous examples, reshape is use to chagne shape of array by specifying new shape while keeping all elements remain same. We change the shape but also change number of each element in each axis.

If we want to keep number of elements in each dimension remain same and only switch between dimensions, the `np.transpose()` serve for this purpose. Let's see an example:

In [16]:
m_for_transpose = np.arange(6).reshape(2, 3)
print(f'this is the original matrix:\n {m_for_transpose}')
print(f'this is switch between row and column:\n {np.transpose(m_for_transpose)}')

this is the original matrix:
 [[0 1 2]
 [3 4 5]]
this is switch between row and column:
 [[0 3]
 [1 4]
 [2 5]]


In [17]:
# tranpose for higer dimension
# we provide a list of dimension to switch as tuple
m_3D = np.arange(24).reshape(2, 3, 4)
print(f'this is the original 3D matrix:\n {m_3D}')
print(f'this is switch between first and second dimension:\n {np.transpose(m_3D, (1, 0, 2))}')
print(f'this switch between second and third dimension:\n {np.transpose(m_3D, (0, 2, 1))}')
print(f'this switch between first and third dimension:\n {np.transpose(m_3D, (2, 1, 0))}')

this is the original 3D matrix:
 [[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
this is switch between first and second dimension:
 [[[ 0  1  2  3]
  [12 13 14 15]]

 [[ 4  5  6  7]
  [16 17 18 19]]

 [[ 8  9 10 11]
  [20 21 22 23]]]
this switch between second and third dimension:
 [[[ 0  4  8]
  [ 1  5  9]
  [ 2  6 10]
  [ 3  7 11]]

 [[12 16 20]
  [13 17 21]
  [14 18 22]
  [15 19 23]]]
this switch between first and third dimension:
 [[[ 0 12]
  [ 4 16]
  [ 8 20]]

 [[ 1 13]
  [ 5 17]
  [ 9 21]]

 [[ 2 14]
  [ 6 18]
  [10 22]]

 [[ 3 15]
  [ 7 19]
  [11 23]]]


### 3.3. Join Arrays
Numpy provides wide range of array join methods such as `np.vstack()`, `np.hstack()`, `np.concatenate()`. This notebook focuses in np.concatenate() method. Let's see an example:

In [23]:
join_1 = np.arange(4).reshape(2, 2)
join_2 = np.arange(4, 8).reshape(2, 2)
np.concatenate((join_1, join_2), axis=0) # join along row

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])

In [24]:
np.concatenate((join_1, join_2), axis=1) # join along column


array([[0, 1, 4, 5],
       [2, 3, 6, 7]])

### 3.4 Split Arrays
Numpy provides wide range of array split methods such as `np.vsplit()`, `np.hsplit()`, `np.split()`. This notebook focuses in np.split() method. Let's see an example:

In [28]:
m_split = np.arange(16).reshape(4, 4)
print(f'this is the original matrix:\n {m_split}')

this is the original matrix:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


In [None]:
# split into two matrix along row
# it return a list of matrix
np.split(m_split, 2, axis=0)


[array([[0, 1, 2, 3],
        [4, 5, 6, 7]]),
 array([[ 8,  9, 10, 11],
        [12, 13, 14, 15]])]

In [30]:
# split into two matrix along column
# it return a list of matrix
np.split(m_split, 2, axis=1)

[array([[ 0,  1],
        [ 4,  5],
        [ 8,  9],
        [12, 13]]),
 array([[ 2,  3],
        [ 6,  7],
        [10, 11],
        [14, 15]])]

In [32]:
# split at specific index
# split at index 1 and 3 along row
# similar with columns if axis=1
np.split(m_split, [1, 3], axis=0) # return 3 matrix [0:1], [1:3], [3:]

[array([[0, 1, 2, 3]]),
 array([[ 4,  5,  6,  7],
        [ 8,  9, 10, 11]]),
 array([[12, 13, 14, 15]])]

### 3.5 Replicate Arrays
There are two methods to replicate array in numpy: `np.repeat()` and `np.tile()`. Let's see an example:
`np.tile()` is used to replicate whole array while `np.repeat()` is used to replicate each element after themselves in array. Let's see an example:

In [34]:
m_replicate = np.arange(4).reshape(2, 2)
print(f'this is the original matrix:\n {m_replicate}')

this is the original matrix:
 [[0 1]
 [2 3]]


In [36]:
# if no axis is provided, it will flatten the matrix
np.repeat(m_replicate, 2)

array([0, 0, 1, 1, 2, 2, 3, 3])

In [37]:
# repeat each elements 2 times along row
np.repeat(m_replicate, 2, axis=0)

array([[0, 1],
       [0, 1],
       [2, 3],
       [2, 3]])

In [38]:
# repeat each elements 2 times along column
np.repeat(m_replicate, 2, axis=1)

array([[0, 0, 1, 1],
       [2, 2, 3, 3]])

In [None]:
# np.tile will replicate input whole array along specified dimensions
np.tile(m_replicate, (3, 2)) # repeat array 3 time along first dimension and 2 time along second dimension

array([[0, 1, 0, 1],
       [2, 3, 2, 3],
       [0, 1, 0, 1],
       [2, 3, 2, 3],
       [0, 1, 0, 1],
       [2, 3, 2, 3]])

In [41]:
# repeat array 2 times first dim, 2 times second dim, 3 times third dim
np.tile(m_replicate, (2, 2, 3))

array([[[0, 1, 0, 1, 0, 1],
        [2, 3, 2, 3, 2, 3],
        [0, 1, 0, 1, 0, 1],
        [2, 3, 2, 3, 2, 3]],

       [[0, 1, 0, 1, 0, 1],
        [2, 3, 2, 3, 2, 3],
        [0, 1, 0, 1, 0, 1],
        [2, 3, 2, 3, 2, 3]]])

### 3.6 Delete and Add Elements
Elements can be added or removed from array using `np.delete()` and `np.insert()` methods. They return a copy array. Let's see an example:

In [53]:
m_delete = np.arange(1, 16).reshape(3, 5)
print(f'this is the original matrix:\n {m_delete}')

this is the original matrix:
 [[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]]


In [55]:
# delete second column
# change axis to 0 to delete second row
np.delete(m_delete, 1, axis=1)

array([[ 1,  3,  4,  5],
       [ 6,  8,  9, 10],
       [11, 13, 14, 15]])

In [57]:
# delete second and second last column
np.delete(m_delete, [1, -1], axis=1)

array([[ 1,  3,  4],
       [ 6,  8,  9],
       [11, 13, 14]])

In [58]:
# delete from second col to second last col
np.delete(m_delete, slice(1, -1), axis=1)

array([[ 1,  5],
       [ 6, 10],
       [11, 15]])

In [73]:
m_insert_1 = np.arange(6).reshape(2, 3)
print(f'this is matrix 1:\n {m_insert_1}')
m_insert_2 = np.arange(6, 12).reshape(2, 3)
print(f'this is the matrix 2:\n {m_insert_2}')

this is matrix 1:
 [[0 1 2]
 [3 4 5]]
this is the matrix 2:
 [[ 6  7  8]
 [ 9 10 11]]


In [71]:
# insert zero to m_insert_1 at fist and last column
# change axis to 0 to insert at row accordingly
np.insert(m_insert_1, [0, 3], 0, axis=1)

array([[0, 0, 1, 2, 0],
       [0, 3, 4, 5, 0]])

In [75]:
# insert m_insert_2 into m_insert_1 at second column
# change axis to 0 to insert at row accordingly
np.insert(m_insert_1, [1], m_insert_2, axis=1)

array([[ 0,  6,  7,  8,  1,  2],
       [ 3,  9, 10, 11,  4,  5]])

## 4. Multi-dimensional Array
For more than two dimensions array, same rule apply.

In [18]:
end = time.time()
print(f'Total running time: {end - start:.3f} seconds')

Total running time: 0.146 seconds
