# Numpy handbook for a Data Scientist

#### Written by https://linkedin.com/in/bongsang/
---

> I think the most efficient way to understand about new API is not plenty of words but simple sample codes.
> I hope that all the below sample codes will be helpful to you.

## Contents
1. Numpy array information
   - ndim, shape, size, dtype, itemsize
   

2. Numpy array creation
   - from list
   - using utils: zeros(), ones()
   - from sequence range, spacing
   
   
3. Numpy array reshaping
   - np.reshape((shape, ..,..))


4. Numpy array operations
   - Product: element, matrix
   - Sum: row sum, col sum
   - Boolean checking: NAN check, equality


5. Numpy array slicing and indexing
   - Slicing array in one or multi dimension
   - Negative slicing
   - Ellipsis


In [1]:
import numpy as np

## 1. Numpy array information

In [2]:
def print_info(x):
    print(x)
    print(f"Type = {type(x)}")
    print(f"[ndim] The number of dimensions(axes) of the array = {x.ndim}")
    print(f"[shape] The size of the array in each dimension(axe) = {x.shape}")
    print(f"[size] The total number of elements of the array = {x.size}")
    print(f"[dtype] The type of the elements in the array = {x.dtype}")
    print(f"[itemsize] the size in bytes of each element of the array = {x.itemsize}")

## 2. Numpy array creation
### 2.1 Numpy array can be created from list or tuple as a argument. Not from numeric arguments. 
- np.array(1, 2, 3, 4)  # Error!

In [3]:
a = np.array([1, 2, 3, 4]) # Correct! from list
print_info(a)


[1 2 3 4]
Type = <class 'numpy.ndarray'>
[ndim] The number of dimensions(axes) of the array = 1
[shape] The size of the array in each dimension(axe) = (4,)
[size] The total number of elements of the array = 4
[dtype] The type of the elements in the array = int64
[itemsize] the size in bytes of each element of the array = 8


In [4]:
b = np.array((1, 2, 3, 4)) # Correct! from tuple
print_info(b)


[1 2 3 4]
Type = <class 'numpy.ndarray'>
[ndim] The number of dimensions(axes) of the array = 1
[shape] The size of the array in each dimension(axe) = (4,)
[size] The total number of elements of the array = 4
[dtype] The type of the elements in the array = int64
[itemsize] the size in bytes of each element of the array = 8


In [5]:
c = np.array([[1, 2, 3],
             [4, 5, 6],
             [7, 8, 9]])
print_info(c)


[[1 2 3]
 [4 5 6]
 [7 8 9]]
Type = <class 'numpy.ndarray'>
[ndim] The number of dimensions(axes) of the array = 2
[shape] The size of the array in each dimension(axe) = (3, 3)
[size] The total number of elements of the array = 9
[dtype] The type of the elements in the array = int64
[itemsize] the size in bytes of each element of the array = 8


### 2.2 Util functions can create an array full of zeros or ones with shape argument.

In [6]:
np.zeros((3, 3))


array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [7]:
np.ones((3, 3))


array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

### 2.3 Sequence array can be created from range arguments with a step.
 - Default is increased 1 step from zero.
 - The end of the range is not included.


In [8]:
a = np.arange(10)
print(a)

[0 1 2 3 4 5 6 7 8 9]


In [9]:
b = np.arange(0, 10, 1)
print(b)

[0 1 2 3 4 5 6 7 8 9]


In [10]:
c = np.arange(2, 10, 2)
print(c)

[2 4 6 8]


### 2.4 Sequence array can be created by slicing.
 - The end of the range is included.

In [11]:
b = np.linspace(0, 10, 5)
print_info(b)

[ 0.   2.5  5.   7.5 10. ]
Type = <class 'numpy.ndarray'>
[ndim] The number of dimensions(axes) of the array = 1
[shape] The size of the array in each dimension(axe) = (5,)
[size] The total number of elements of the array = 5
[dtype] The type of the elements in the array = float64
[itemsize] the size in bytes of each element of the array = 8


## 3. Numpy array reshaping
### Display with layout with below order:
 - the last axis is printed from left to right,
 - the second-to-last is printed from top to bottom,
 - the rest are also printed from top to bottom, with each slice separated from the next by an empty line.

In [12]:
a = np.arange(10)
print(f"a = {a}")

reshape_a = a.reshape((5, 2))
print(f"After reshaping, the previous a = {a}")
print(f"The reshaped a = \n{reshape_a}")

a = [0 1 2 3 4 5 6 7 8 9]
After reshaping, the previous a = [0 1 2 3 4 5 6 7 8 9]
The reshaped a = 
[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]


In [13]:
b = np.arange(12)
b.reshape((3, 2, 2))

array([[[ 0,  1],
        [ 2,  3]],

       [[ 4,  5],
        [ 6,  7]],

       [[ 8,  9],
        [10, 11]]])

In [14]:
c = np.arange(10000)
c.reshape((100, 100))


array([[   0,    1,    2, ...,   97,   98,   99],
       [ 100,  101,  102, ...,  197,  198,  199],
       [ 200,  201,  202, ...,  297,  298,  299],
       ...,
       [9700, 9701, 9702, ..., 9797, 9798, 9799],
       [9800, 9801, 9802, ..., 9897, 9898, 9899],
       [9900, 9901, 9902, ..., 9997, 9998, 9999]])

## 4. Numpy array Operations
 - Element product: A * B
 - Matrix product: A @ B , A.dot(B)
 - Sum: sum(-1) means row sum, sum(0) means col sum
 - NAN checking: np.isnan()
 - Boolean: A[A>2] returns 1-D index array A[A>2] += 100 return the same shape of the array

**Important!**
 - Received boolean array returns True related elements, so the full shape with True/False return 1-D index list array
 - But, the comination of boolean checking and other operations return original shape of the source array


 

In [15]:
A = np.array([[1, 2],
              [3, 4]])

B = np.array([[5, 6],
              [7, 8]])

print("elementwise product")
print(A*B)


elementwise product
[[ 5 12]
 [21 32]]


In [16]:
print("matrix product")
print(A@B)


matrix product
[[19 22]
 [43 50]]


In [17]:
print("matrix product")
print(A.dot(B))


matrix product
[[19 22]
 [43 50]]


In [18]:
A

array([[1, 2],
       [3, 4]])

In [19]:
# Row sum
A.sum(-1)

array([3, 7])

In [20]:
# Col sum
A.sum(0)

array([4, 6])

**Combination of Sum, Boolean, and Slicing**

In [21]:
A[A.sum(-1)<4, :]

array([[1, 2]])

**NAN checking**

In [22]:
X = np.array([[1, 2],
                [np.nan, 3],
                [4, 5],
                [np.nan, 6]])
X

array([[ 1.,  2.],
       [nan,  3.],
       [ 4.,  5.],
       [nan,  6.]])

In [23]:
np.isnan(X)

array([[False, False],
       [ True, False],
       [False, False],
       [ True, False]])

In [24]:
X[~np.isnan(X)]

array([1., 2., 3., 4., 5., 6.])

--- 

In [25]:
B>=3

array([[ True,  True],
       [ True,  True]])

In [26]:
boolean = (B>=3)
boolean

array([[ True,  True],
       [ True,  True]])

In [27]:
B[boolean] # return 1-D array

array([5, 6, 7, 8])

In [28]:
C = np.array([[[1, 2], [3, 4]],
              [[5, 6], [7, 8]],
              [[9, 10], [11, 12]]])
C

array([[[ 1,  2],
        [ 3,  4]],

       [[ 5,  6],
        [ 7,  8]],

       [[ 9, 10],
        [11, 12]]])

In [29]:
boolean = (C>3)
boolean

array([[[False, False],
        [False,  True]],

       [[ True,  True],
        [ True,  True]],

       [[ True,  True],
        [ True,  True]]])

In [30]:
C[boolean]

array([ 4,  5,  6,  7,  8,  9, 10, 11, 12])

In [31]:
print(f"shape = {C.shape}")

shape = (3, 2, 2)


In [32]:
C[[True, False, True]]

array([[[ 1,  2],
        [ 3,  4]],

       [[ 9, 10],
        [11, 12]]])

**Boolean cheacking and operation return not 1-D array but the same dimension.

In [33]:
C[C>3] += 100
C

array([[[  1,   2],
        [  3, 104]],

       [[105, 106],
        [107, 108]],

       [[109, 110],
        [111, 112]]])

## 5. Numpy arrary indexing
### 5.1 Slicing in one dimension
 - Basic slicing : [start : stop : step]
 - Negative indices means counting from the end of the array.
 - All arrays generated by basic slicing are always views of the original array.
 - Default step is one.
 - Blank means min/max of the start and end.
 

In [34]:
a = np.arange(10)
print(f'original array = {a}')

b = a[1:7:2]
print(f'sliced array = {b}')
print(f'original array = {a}')


original array = [0 1 2 3 4 5 6 7 8 9]
sliced array = [1 3 5]
original array = [0 1 2 3 4 5 6 7 8 9]


### 5.2 Slicing in multi dimensions
 - We can slice only in the first dimension(axe). No matter how the shape is complex, the effective dimension is the first like (2, .., .., .., .....) 
 - Ex) [ [1, 2, 3],
         [4, 5, 6],
         [7, 8, 9] ]
 - Possible slicing example: [[1, 2, 3]] or [[1, 2, 3], [4, 5, 6]]
 - Impossible : [[1, 2], [4, 5]]


In [35]:
a = np.array([[[1,1], [2,2], [3,3]],
              [[4,4], [5,5], [6,6]]])
print_info(a)

[[[1 1]
  [2 2]
  [3 3]]

 [[4 4]
  [5 5]
  [6 6]]]
Type = <class 'numpy.ndarray'>
[ndim] The number of dimensions(axes) of the array = 3
[shape] The size of the array in each dimension(axe) = (2, 3, 2)
[size] The total number of elements of the array = 12
[dtype] The type of the elements in the array = int64
[itemsize] the size in bytes of each element of the array = 8


In [36]:
a[:1]

array([[[1, 1],
        [2, 2],
        [3, 3]]])

In [37]:
b = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9],
              [10, 11, 12]])

c = b[:2]
print_info(c)

[[1 2 3]
 [4 5 6]]
Type = <class 'numpy.ndarray'>
[ndim] The number of dimensions(axes) of the array = 2
[shape] The size of the array in each dimension(axe) = (2, 3)
[size] The total number of elements of the array = 6
[dtype] The type of the elements in the array = int64
[itemsize] the size in bytes of each element of the array = 8


In [38]:
b = np.array([[[1], [2], [3]],
              [[4], [5], [6]],
              [[7], [8], [9]],
              [[10], [11], [12]]])

print_info(b)

[[[ 1]
  [ 2]
  [ 3]]

 [[ 4]
  [ 5]
  [ 6]]

 [[ 7]
  [ 8]
  [ 9]]

 [[10]
  [11]
  [12]]]
Type = <class 'numpy.ndarray'>
[ndim] The number of dimensions(axes) of the array = 3
[shape] The size of the array in each dimension(axe) = (4, 3, 1)
[size] The total number of elements of the array = 12
[dtype] The type of the elements in the array = int64
[itemsize] the size in bytes of each element of the array = 8


In [39]:
b[:2]


array([[[1],
        [2],
        [3]],

       [[4],
        [5],
        [6]]])

### 5.3 Negative slicing
 - All indices are zero-based: for the $i_{th}$ index $n_i$, the valid range is $0 \leq n_i < d_i$ where $d_i$ is the $i_{th}$ element of the shape of the array.
 - Negative indices are interpreted as counting from the end of the array
 - If $n_i$ < 0, it means $n_i + d_i$
 - [-start + length : end -(length+1) : -1]

In [40]:
a = np.arange(4)
length = a.size
a

array([0, 1, 2, 3])

In [41]:
a[-1 + length : 0 - (length+1) : -1]

array([3, 2, 1, 0])

In [42]:
a[-2 + length : 0 - (length+1) : -1]

array([2, 1, 0])

In [43]:
a[-1 + length : 1-(length+1)+1 : -1]

array([3, 2])

In [44]:
a[-2 + length : 1 - (length+1) : -1]

array([2, 1])

In [45]:
b = np.array([[1, 2],
              [3, 4],
              [5, 6]])
b

array([[1, 2],
       [3, 4],
       [5, 6]])

In [46]:
b[::-1]

array([[5, 6],
       [3, 4],
       [1, 2]])

In [47]:
c = np.array([[[1, 2], [3, 4]],
              [[5, 6], [7, 8]]])
c

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

In [48]:
c[::-1]

array([[[5, 6],
        [7, 8]],

       [[1, 2],
        [3, 4]]])

### 5.4 Ellipsis expands to the number of : objects needed for the selection tuple to index all dimensions.
 - Usage: [..., 0]
 - The end of shape is deleted. ex) (2, 3, 4) -> (2, 3)
 - If the number which is not able to fit in the new shape array, it is going to be flow away.

In [49]:
c = b[..., 0]
print_info(c)

[1 3 5]
Type = <class 'numpy.ndarray'>
[ndim] The number of dimensions(axes) of the array = 1
[shape] The size of the array in each dimension(axe) = (3,)
[size] The total number of elements of the array = 3
[dtype] The type of the elements in the array = int64
[itemsize] the size in bytes of each element of the array = 8


In [50]:
c[::-1]

array([5, 3, 1])

In [51]:
d = np.array([[[[1], [2], [3]],
              [[4], [5], [6]]],
              [[[7], [8], [9]],
              [[10], [11], [12]]]])
print_info(d)

[[[[ 1]
   [ 2]
   [ 3]]

  [[ 4]
   [ 5]
   [ 6]]]


 [[[ 7]
   [ 8]
   [ 9]]

  [[10]
   [11]
   [12]]]]
Type = <class 'numpy.ndarray'>
[ndim] The number of dimensions(axes) of the array = 4
[shape] The size of the array in each dimension(axe) = (2, 2, 3, 1)
[size] The total number of elements of the array = 12
[dtype] The type of the elements in the array = int64
[itemsize] the size in bytes of each element of the array = 8


In [52]:
e = d[..., 0]
print_info(e)

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]
Type = <class 'numpy.ndarray'>
[ndim] The number of dimensions(axes) of the array = 3
[shape] The size of the array in each dimension(axe) = (2, 2, 3)
[size] The total number of elements of the array = 12
[dtype] The type of the elements in the array = int64
[itemsize] the size in bytes of each element of the array = 8


In [53]:
f = e[...,0]
print_info(f)

[[ 1  4]
 [ 7 10]]
Type = <class 'numpy.ndarray'>
[ndim] The number of dimensions(axes) of the array = 2
[shape] The size of the array in each dimension(axe) = (2, 2)
[size] The total number of elements of the array = 4
[dtype] The type of the elements in the array = int64
[itemsize] the size in bytes of each element of the array = 8


In [54]:
g = f[..., 0]
print_info(g)

[1 7]
Type = <class 'numpy.ndarray'>
[ndim] The number of dimensions(axes) of the array = 1
[shape] The size of the array in each dimension(axe) = (2,)
[size] The total number of elements of the array = 2
[dtype] The type of the elements in the array = int64
[itemsize] the size in bytes of each element of the array = 8


In [55]:
e.shape

(2, 2, 3)

In [56]:
e[:,:,:,np.newaxis].shape

(2, 2, 3, 1)

# Good luck ^^;