### Packt >
# Python Data Analysis - Third Edition

## NumPy arrays

NumPy arrays are a series of homogenous items. Homogenous means the array will have all the elements of the same data type.

The main benefit of an array is its certainty of storage size because of its same type of items. A Python list uses a loop to iterate the elements and perform operations on them. Another benefit of **NumPy arrays is to offer vectorized operations instead of iterating each item** and performing operations on it. NumPy arrays are indexed just like a Python list and start from 0. NumPy uses an optimized C API for the fast processing of the array operations.

In [48]:
import numpy as np

### Creating an array

The options to create numpy array:

- **array()**: creates an array from the list of values
- **arange()**: creates an evenly spaced array specified by start, stop and step values
- **zeros()**: creates an array for a given dimension with all zeroes
- **ones()**: creates an array for a given dimension with all ones
- **fulls()**: generates an array with constant values
- **eyes()**: creates an identity matrix
- **random()**: creates an array with any given dimension

In [49]:
a = np.array([2, 4, 6, 8, 10])
a

array([ 2,  4,  6,  8, 10])

In [50]:
# arange(start,[stop],step) - creates an evenly spaced NumPy array
# default step = 1 (if doesn't set)
b = np.arange(100, 500, 64)
b

array([100, 164, 228, 292, 356, 420, 484])

In [51]:
np.zeros((3, 4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [52]:
np.ones((3, 4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [53]:
np.full((3, 4), 236)

array([[236, 236, 236, 236],
       [236, 236, 236, 236],
       [236, 236, 236, 236]])

In [54]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [55]:
np.random.random((3, 3))

array([[0.10671778, 0.92938073, 0.98829423],
       [0.54304881, 0.94501061, 0.18743092],
       [0.00490092, 0.89378929, 0.70207722]])

In [56]:
np.random.randn(3, 3)

array([[ 0.47308904,  0.53388436,  1.14530876],
       [ 0.05602078,  0.12798272, -0.7580227 ],
       [ 0.86172306,  0.37195651,  0.76808858]])

In [57]:
a.dtype

dtype('int64')

In [58]:
a.shape

(5,)

In [59]:
a = np.append(a, [12])

In [60]:
a.reshape(6, 1)

array([[ 2],
       [ 4],
       [ 6],
       [ 8],
       [10],
       [12]])

In [63]:
s = np.array([[14, 4], [16, 10]])
s[1, 1]

array([[14,  4],
       [16, 10]])

### Manipulating array shapes

- **reshape()** - will change the shape of the array
- **resize()** - also changes the size of the array, but it changes the shape of the original array
---
- **flatten()** - transforms an n-dimensional array into a one-dimensional array,  returns the actual array
- **ravel()** - also transforms an n-dimensional array into a one-dimensional array, returns the reference of the original array
---
- **transpose()** - transposes the given two-dimensional matrix


In [82]:
arr = np.arange(7, 99, 3)
arr.shape
arr = np.delete(arr, 30)

In [83]:
# doesn't change the initial array
print(arr.reshape(10, 3))
print(arr)

[[ 7 10 13]
 [16 19 22]
 [25 28 31]
 [34 37 40]
 [43 46 49]
 [52 55 58]
 [61 64 67]
 [70 73 76]
 [79 82 85]
 [88 91 94]]
[ 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76
 79 82 85 88 91 94]


In [87]:
# changes the initial array
print(arr.resize(10, 3))
print(arr)

None
[[ 7 10 13]
 [16 19 22]
 [25 28 31]
 [34 37 40]
 [43 46 49]
 [52 55 58]
 [61 64 67]
 [70 73 76]
 [79 82 85]
 [88 91 94]]


In [91]:
print(arr.flatten())
print(arr)

[ 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76
 79 82 85 88 91 94]
[[ 7 10 13]
 [16 19 22]
 [25 28 31]
 [34 37 40]
 [43 46 49]
 [52 55 58]
 [61 64 67]
 [70 73 76]
 [79 82 85]
 [88 91 94]]


In [92]:
print(arr.ravel())
print(arr)

[ 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76
 79 82 85 88 91 94]
[[ 7 10 13]
 [16 19 22]
 [25 28 31]
 [34 37 40]
 [43 46 49]
 [52 55 58]
 [61 64 67]
 [70 73 76]
 [79 82 85]
 [88 91 94]]


In [93]:
print(arr.transpose())

[[ 7 16 25 34 43 52 61 70 79 88]
 [10 19 28 37 46 55 64 73 82 91]
 [13 22 31 40 49 58 67 76 85 94]]


### The stacking of NumPy arrays

Stacking means joining the same dimensional arrays along with a new axis.


- **Horizontal stacking**: the same dimensional arrays are joined along with a horizontal axis using the **hstack()** and **concatenate()**
- **Vertical stacking**: the same dimensional arrays are joined along with a vertical axis using the **vstack()** and **concatenate()**
- **Depth stacking**: the same dimensional arrays are joined along with a third axis (depth) using the **dstack()**
- **Column stacking**: stacks multiple sequence one-dimensional arrays as columns into a single two-dimensional array using **column_stack**
- **Row stacking**: stacks multiple sequence one-dimensional arrays as rows into a single two-dimensional array using **row_stack()**


In [102]:
# define arrays
arr1 = np.arange(10, 50, 2).reshape(5, 4)
arr2 = np.arange(50, 90, 2).reshape(5,4)
print(f'array 1:\n {arr1} \n array2: \n {arr2}')

array 1:
 [[10 12 14 16]
 [18 20 22 24]
 [26 28 30 32]
 [34 36 38 40]
 [42 44 46 48]] 
 array2: 
 [[50 52 54 56]
 [58 60 62 64]
 [66 68 70 72]
 [74 76 78 80]
 [82 84 86 88]]


In [121]:
# stack arrays horizontally
arr_h_stack = np.hstack((arr1, arr2))
print(f'stack arrays using hstack():\n {arr_h_stack}')
arr_h_concat = np.concatenate((arr1, arr2), axis=1)
print(f'stack arrays using concatenate():\n {arr_h_concat}')

stack arrays using hstack():
 [[10 12 14 16 50 52 54 56]
 [18 20 22 24 58 60 62 64]
 [26 28 30 32 66 68 70 72]
 [34 36 38 40 74 76 78 80]
 [42 44 46 48 82 84 86 88]]
stack arrays using concatenate():
 [[10 12 14 16 50 52 54 56]
 [18 20 22 24 58 60 62 64]
 [26 28 30 32 66 68 70 72]
 [34 36 38 40 74 76 78 80]
 [42 44 46 48 82 84 86 88]]


In [122]:
# stack arrays vertically
arr_v_stack = np.vstack((arr1, arr2))
print(f'stack arrays using hstack():\n {arr_v_stack}')
arr_v_concat = np.concatenate((arr1, arr2), axis=0)
print(f'stack arrays using concatenate():\n {arr_v_concat}')

stack arrays using hstack():
 [[10 12 14 16]
 [18 20 22 24]
 [26 28 30 32]
 [34 36 38 40]
 [42 44 46 48]
 [50 52 54 56]
 [58 60 62 64]
 [66 68 70 72]
 [74 76 78 80]
 [82 84 86 88]]
stack arrays using concatenate():
 [[10 12 14 16]
 [18 20 22 24]
 [26 28 30 32]
 [34 36 38 40]
 [42 44 46 48]
 [50 52 54 56]
 [58 60 62 64]
 [66 68 70 72]
 [74 76 78 80]
 [82 84 86 88]]


In [123]:
# stack in depth along with a third axis
arr_d_stack = np.dstack((arr1, arr2))
arr_d_stack

array([[[10, 50],
        [12, 52],
        [14, 54],
        [16, 56]],

       [[18, 58],
        [20, 60],
        [22, 62],
        [24, 64]],

       [[26, 66],
        [28, 68],
        [30, 70],
        [32, 72]],

       [[34, 74],
        [36, 76],
        [38, 78],
        [40, 80]],

       [[42, 82],
        [44, 84],
        [46, 86],
        [48, 88]]])

In [124]:
# column stacking
arr_col_stack = np.column_stack((arr1.reshape(20), arr2.reshape(20)))
arr_col_stack

array([[10, 50],
       [12, 52],
       [14, 54],
       [16, 56],
       [18, 58],
       [20, 60],
       [22, 62],
       [24, 64],
       [26, 66],
       [28, 68],
       [30, 70],
       [32, 72],
       [34, 74],
       [36, 76],
       [38, 78],
       [40, 80],
       [42, 82],
       [44, 84],
       [46, 86],
       [48, 88]])

In [125]:
# row stacking
arr_raw_stack = np.row_stack((arr1.reshape(20), arr2.reshape(20)))
arr_raw_stack

array([[10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40,
        42, 44, 46, 48],
       [50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80,
        82, 84, 86, 88]])

### Partitioning NumPy arrays

- **Horizontal splitting**: the given array is divided into N equal sub-arrays along the horizontal axis using the **hsplit()**
- **Vertical splitting**: the given array is divided into N equal sub-arrays along the vertical axis using the **vsplit()** and **split()** 
- **Depth splitting**: the given array is divided into N equal sub-arrays along the third axis using **dsplit()**


In [142]:
arr.shape

(10, 3)

In [146]:
arr_h_split_1 = np.hsplit(arr, 3)
arr_h_split_2 = np.split(arr, 3, axis=1)
# result is the same
print(f'splitting with hsplit() function:\n {arr_h_split_1}')
print(f'splitting with split() function:\n {arr_h_split_2}')

splitting with hsplit() function:
 [array([[ 7],
       [16],
       [25],
       [34],
       [43],
       [52],
       [61],
       [70],
       [79],
       [88]]), array([[10],
       [19],
       [28],
       [37],
       [46],
       [55],
       [64],
       [73],
       [82],
       [91]]), array([[13],
       [22],
       [31],
       [40],
       [49],
       [58],
       [67],
       [76],
       [85],
       [94]])]
splitting with split() function:
 [array([[ 7],
       [16],
       [25],
       [34],
       [43],
       [52],
       [61],
       [70],
       [79],
       [88]]), array([[10],
       [19],
       [28],
       [37],
       [46],
       [55],
       [64],
       [73],
       [82],
       [91]]), array([[13],
       [22],
       [31],
       [40],
       [49],
       [58],
       [67],
       [76],
       [85],
       [94]])]


In [147]:
arr_v_split_1 = np.vsplit(arr, 5)
arr_v_split_2 = np.split(arr, 5)
# result is the same
print(f'splitting with vsplit() function:\n {arr_v_split_1}')
print(f'splitting with split() function:\n {arr_v_split_2}')

splitting with vsplit() function:
 [array([[ 7, 10, 13],
       [16, 19, 22]]), array([[25, 28, 31],
       [34, 37, 40]]), array([[43, 46, 49],
       [52, 55, 58]]), array([[61, 64, 67],
       [70, 73, 76]]), array([[79, 82, 85],
       [88, 91, 94]])]
splitting with split() function:
 [array([[ 7, 10, 13],
       [16, 19, 22]]), array([[25, 28, 31],
       [34, 37, 40]]), array([[43, 46, 49],
       [52, 55, 58]]), array([[61, 64, 67],
       [70, 73, 76]]), array([[79, 82, 85],
       [88, 91, 94]])]


In [140]:
arr_depth_2 = np.dstack((arr_v_split[0], arr_v_split[1]))
arr_depth_2

array([[[ 7, 52],
        [10, 55],
        [13, 58]],

       [[16, 61],
        [19, 64],
        [22, 67]],

       [[25, 70],
        [28, 73],
        [31, 76]],

       [[34, 79],
        [37, 82],
        [40, 85]],

       [[43, 88],
        [46, 91],
        [49, 94]]])

In [141]:
arr_d_split = np.dsplit(arr_depth_2, 2)
arr_d_split

[array([[[ 7],
         [10],
         [13]],
 
        [[16],
         [19],
         [22]],
 
        [[25],
         [28],
         [31]],
 
        [[34],
         [37],
         [40]],
 
        [[43],
         [46],
         [49]]]),
 array([[[52],
         [55],
         [58]],
 
        [[61],
         [64],
         [67]],
 
        [[70],
         [73],
         [76]],
 
        [[79],
         [82],
         [85]],
 
        [[88],
         [91],
         [94]]])]

In [153]:
# the astype() function converts the data type of the array
arr.dtype
arr_f = arr.astype(float)
arr_f.dtype

dtype('float64')

In [156]:
# the tolist() function converts a NumPy array into a Python list
type(arr)
list1 = arr.tolist()
type(list1)

list

In [163]:
# NumPy views and copies
arr_cv = np.arange(1, 5).reshape(2, 2)
# no copy only assignment
arr_new = arr_cv
# create a deep copy
arr_copy = arr_cv.copy()
# create shallow copy using View
arr_view = arr_cv.view()

In [165]:
# output object IDs
print("Original Array: ",id(arr_cv))
print("Assignment: ",id(arr_new))
print("Deep Copy: ",id(arr_copy))
print("Shallow Copy(View): ",id(arr_view))

Original Array:  140631775828928
Assignment:  140631775828928
Deep Copy:  140631776159104
Shallow Copy(View):  140631188988432


In [166]:
arr_cv[1] = [99, 89]
arr_cv

array([[ 1,  2],
       [99, 89]])

In [167]:
# Check values of array view
print("View Array:\n", arr_view)

# Check values of array copy
print("Copied Array:\n", arr_copy)

View Array:
 [[ 1  2]
 [99 89]]
Copied Array:
 [[1 2]
 [3 4]]


### Slicing and Indexing NumPy arrays

Indexing prefers to select a single value while slicing is used to select multiple values from an array.

In [176]:
arr = np.arange(0, 10)
# slicing takes three values: start, stop, and step
print(arr)
print(arr[3:6])
print(arr[3:])
# select values from the third value from the right side of the array to the end of the array
print(arr[-3:])
# start, stop, step
print(arr[2:7:2])

[0 1 2 3 4 5 6 7 8 9]
[3 4 5]
[3 4 5 6 7 8 9]
[7 8 9]
[2 4 6]


In [179]:
# Boolean indexing uses a Boolean expression in the place of indexes
arr = np.arange(21, 41, 2)
print("Orignial Array:\n",arr)
# Boolean Indexing
print("After Boolean Condition:",arr[arr>30])

Orignial Array:
 [21 23 25 27 29 31 33 35 37 39]
After Boolean Condition: [31 33 35 37 39]


In [184]:
arr = np.arange(1, 21).reshape(5, 4)
print("Orignial Array:\n",arr)
# selecting 2nd and 3rd row
ind = [1,2]
print("Selected 1st and 2nd Row:\n", arr[ind])
# selecting 3nd and 4th row
ind = [3,4]
print("Selected 1st and 2nd Row:\n", arr[ind])

Orignial Array:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]
 [17 18 19 20]]
Selected 1st and 2nd Row:
 [[ 5  6  7  8]
 [ 9 10 11 12]]
Selected 1st and 2nd Row:
 [[13 14 15 16]
 [17 18 19 20]]


In [185]:
# create row and column indices
row = np.array([1, 2])
col = np.array([2, 3])
print("Selected Sub-Array:", arr[row, col])

Selected Sub-Array: [ 7 12]


In [190]:
# broadcasting arrays
arr1 = np.arange(1, 5).reshape(2, 2)
arr2 = np.arange(5, 9).reshape(2, 2)
print(f'first matrix:\n {arr1}\n second matrix:\n {arr2}')
print(f'addition of two matrices:\n {arr1+arr2}')
print(f'multiplication of two matrices:\n {arr1*arr2}')
print(f'add a scalar value to the first matrix:\n {arr1+3}')
print(f'multiply with a scalar value to the second matrix:\n {arr2*3}')

first matrix:
 [[1 2]
 [3 4]]
 second matrix:
 [[5 6]
 [7 8]]
addition of two matrices:
 [[ 6  8]
 [10 12]]
multiplication of two matrices:
 [[ 5 12]
 [21 32]]
add a scalar value to the first matrix:
 [[4 5]
 [6 7]]
multiply with a scalar value to the second matrix:
 [[15 18]
 [21 24]]
