# Chapter 4

## Arrays of data

Arrays generally structure other (fundamental) objects of the *same data type* in rows and columns.<br>
<br>
An array represents an *i × j* matrix of elements.<br>
<br>
This concept generalizes to i × j × k cubes of elements in three dimensions as well as to general n-dimensional arrays of shape i × j × k × l × ....

In [6]:
v = [0.5, 0.75, 1.0, 1.5, 2.0]
m = [v, v, v]
m

[[0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0]]

$$
\mathbf{m} = 
\begin{bmatrix}
0.5 & 0.75 & 1.0 & 1.5 & 2.0 \\
0.5 & 0.75 & 1.0 & 1.5 & 2.0 \\
0.5 & 0.75 & 1.0 & 1.5 & 2.0
\end{bmatrix}
$$

In [7]:
m[1]

[0.5, 0.75, 1.0, 1.5, 2.0]

In [8]:
m[1][0]    # i*j = 1*0, 第二排*第一列

0.5

In [9]:
v1 = [0.5, 1.5]
v2 = [1, 2]

m = [v1, v2]
c = [m, m] 
c

[[[0.5, 1.5], [1, 2]], [[0.5, 1.5], [1, 2]]]

$$
c = \left[
\begin{bmatrix}
0.5 & 1.5 \\
1 & 2
\end{bmatrix},
\begin{bmatrix}
0.5 & 1.5 \\
1 & 2
\end{bmatrix}
\right]
$$

In [10]:
c[1][1][0]    #第二排*第二列*第一行

1

In [11]:
v = [0.5, 0.75, 1.0, 1.5, 2.0]
m = [v, v, v]
m

[[0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0]]

@：change the value of the first element of the v object and see what happens to the m object

In [12]:
v[0] = 'Python'
m

[['Python', 0.75, 1.0, 1.5, 2.0],
 ['Python', 0.75, 1.0, 1.5, 2.0],
 ['Python', 0.75, 1.0, 1.5, 2.0]]

In [13]:
from copy import deepcopy
v = [0.5, 0.75, 1.0, 1.5, 2.0]
m = 3 * [deepcopy(v), ] 

m

[[0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0]]

In [14]:
v[0] = 'Python' 

m

[[0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0]]

### Array implication

In [15]:
v = [0.5, 0.75, 1.0, 1.5, 2.0]
import array

a = array.array('f', v) 
a

array('f', [0.5, 0.75, 1.0, 1.5, 2.0])

In [16]:
a.append(0.5) # this means add 0.5 to the end of the array
a

array('f', [0.5, 0.75, 1.0, 1.5, 2.0, 0.5])

In [17]:
a.extend([0.75, 1.0]) # this means add 0.75 and 1.0 to the end of the array
a

array('f', [0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 0.75, 1.0])

In [18]:
2*a  # this means concatenate the array with itself, not multiply each element by 2

array('f', [0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 0.75, 1.0, 0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 0.75, 1.0])

Only *float* objects can be appended; other data types/type codes raise errors.

In [19]:
a.tolist()   # this means convert the array to a list

[0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 0.75, 1.0]

### An advantage of the array class is that it has built-in storage and retrieval functionality

In [20]:
f = open('array.apy', 'wb') 
a.tofile(f)   # this means write the array to a file in binary format

f.close()  # this means close the file

with open('array.apy', 'wb') as f: 
    a.tofile(f)  # this means write the array to a file in binary format

In [21]:
b = array.array('f') 

with open('array.apy', 'rb') as f: 
    b.fromfile(f, 5)    # this means read 5 elements from the file and store them in the array b

b

array('f', [0.5, 0.75, 1.0, 1.5, 2.0])

In [22]:
b = array.array('d')

with open('array.apy', 'rb') as f:
    b.fromfile(f, 2)    # this means read 2 elements from the file and store them in the array b    

b

array('d', [0.0004882813645963324, 0.12500002956949174])

## ***numpy.ndarray*** is just such a class, built with the specific goal of handling n-dimensional arrays both conveniently and efficiently

In [23]:
import numpy as np

a = np.array([0, 0.5, 0.75, 1.0, 1.5, 2.0])    # create a numpy array from a list
a

array([0.  , 0.5 , 0.75, 1.  , 1.5 , 2.  ])

In [24]:
type(a)

numpy.ndarray

In [25]:
a = np.array(['a', 'b', 'c'])   # create a numpy array from a list of strings
a

array(['a', 'b', 'c'], dtype='<U1')

In [26]:
a = np.arange(2, 20, 2)  # this means create an array of even numbers from 2 to 18
a

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18])

In [27]:
a[5:]   # this means get the elements from index 5 to the end

array([12, 14, 16, 18])

In [28]:
a[:2]  # this means get the elements from the start to index 2 (not including index 2)

array([2, 4])

In [29]:
a.sum()

np.int64(90)

In [30]:
a.std()

np.float64(5.163977794943222)

### (vectorized) mathematical operations

In [31]:
l = [0., 0.5, 1.5, 3., 5.]
2 * l

[0.0, 0.5, 1.5, 3.0, 5.0, 0.0, 0.5, 1.5, 3.0, 5.0]

In [32]:
a

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18])

In [33]:
2 * a

array([ 4,  8, 12, 16, 20, 24, 28, 32, 36])

In [34]:
a ** 2      # this means raise each element of the array to the power of 2

array([  4,  16,  36,  64, 100, 144, 196, 256, 324])

In [35]:
2 ** a  # this means raise 2 to the power of each element of the array

array([     4,     16,     64,    256,   1024,   4096,  16384,  65536,
       262144])

In [36]:
a ** a # this means raise each element of the array to the power of itself

array([                  4,                 256,               46656,
                  16777216,         10000000000,       8916100448256,
         11112006825558016,                   0, -497033925936021504])

### Universal functions applied by *Numpy*

In [37]:
np.exp(a) # this means calculate the exponential of each element of the array

array([7.38905610e+00, 5.45981500e+01, 4.03428793e+02, 2.98095799e+03,
       2.20264658e+04, 1.62754791e+05, 1.20260428e+06, 8.88611052e+06,
       6.56599691e+07])

In [38]:
np.sqrt(a) # this means calculate the square root of each element of the array

array([1.41421356, 2.        , 2.44948974, 2.82842712, 3.16227766,
       3.46410162, 3.74165739, 4.        , 4.24264069])

In [39]:
import math
math.sqrt(a) # this means calculate the square root of each element of the array using the math module, which does not support array operations and will raise an error

TypeError: only 0-dimensional arrays can be converted to Python scalars

In [None]:
%timeit np.sqrt(2.5) # this means measure the time it takes to calculate the square root of 2.5 using the numpy sqrt function

120 ns ± 4.67 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [None]:
%timeit math.sqrt(2.5) # this means measure the time it takes to calculate the square root of 2.5 using the math sqrt function, which is faster than the numpy sqrt function for scalar values

40.5 ns ± 2.59 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


## Multiple dimensions

In [None]:
b = np.array([a, a * 2]) 

b

array([[0.  , 0.5 , 0.75, 1.  , 1.5 , 2.  ],
       [0.  , 1.  , 1.5 , 2.  , 3.  , 4.  ]])

In [None]:
b[0]    # this means get the first row of the array b

array([0.  , 0.5 , 0.75, 1.  , 1.5 , 2.  ])

In [None]:
b[0, 2]  # this means get the element in the first row and third column of the array b

np.float64(0.75)

In [None]:
b[:, 1] # this means get the second column of the array b

array([0.5, 1. ])

In [None]:
b.sum()

np.float64(17.25)

In [None]:
b.sum(axis=0) # this means calculate the sum of each column of the array b

array([0.  , 1.5 , 2.25, 3.  , 4.5 , 6.  ])

In [None]:
b.sum(axis = 1) # this means calculate the sum of each row of the array b

array([ 5.75, 11.5 ])

**On another way...**

In [None]:
c = np.zeros((2, 3), dtype='i', order='C')     # this means create a 2 by 3 array of zeros with integer data type and C-style memory order
c

array([[0, 0, 0],
       [0, 0, 0]], dtype=int32)

In [None]:
c = np.ones((2, 3, 4), dtype='i', order='C')   # this means create a 2 by 3 by 4 array of ones with integer data type and C-style memory order

c

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int32)

In [None]:
d = np.zeros_like(c, dtype='f2', order='C')    # this means create an array of zeros with the same shape as c, but with float16 data type and C-style memory order

d

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]], dtype=float16)

In [None]:
d = np.zeros_like(c, dtype='f4', order='C')    # this means create an array of zeros with the same shape as c, but with float16 data type and C-style memory order

d

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]], dtype=float32)

In [None]:
d = np.zeros_like(c, dtype='f8', order='C')    # this means create an array of zeros with the same shape as c, but with float16 data type and C-style memory order

d

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])

In [None]:
d = np.ones_like(c, dtype='f2', order='C') 

d

array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]], dtype=float16)

In [None]:
e = np.empty((2, 3, 2)) 

e

array([[[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]]])

$$
\mathbf{e} = \left[
\begin{bmatrix}
0 & 0 \\
0 & 0 \\
0 & 0
\end{bmatrix},
\begin{bmatrix}
0 & 0 \\
0 & 0 \\
0 & 0
\end{bmatrix}
\right]
$$
<br>

$$
\mathbf{e} \in \mathbb{R}^{2\times3\times2},\quad \mathbf{e}_{i,j,k} = 0
$$

In [40]:
f = np.empty_like(c) 

f

array([[[4.67296746e-307, 1.69121096e-306],
        [1.06810675e-306, 1.89146896e-307]],

       [[7.56571288e-307, 3.11525958e-307],
        [1.24610723e-306, 1.29061142e-306]]])

In [41]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [42]:
g = np.linspace(5, 15, 12)  # this means create an array of 12 evenly spaced numbers between 5 and 15

g

array([ 5.        ,  5.90909091,  6.81818182,  7.72727273,  8.63636364,
        9.54545455, 10.45454545, 11.36363636, 12.27272727, 13.18181818,
       14.09090909, 15.        ])

For all these functions, one can provide the following parameters:
- **shape**  
  Either an int, a sequence of int objects, or a reference to another ndarray

- **dtype (optional)**  
  A dtype—these are NumPy-specific data types for ndarray objects

- **order (optional)**  
  The order in which to store elements in memory: C for C-like (i.e., row-wise) or F for Fortran-like (i.e., column-wise)

The ndarray object has built-in *dimensions* (axes).  
The ndarray object is *immutable*; its length (size) is fixed.  
It only allows for a *single data type* (np.dtype) for the whole array.

| dtype | Description           | Example                  |
|-------|------------------------|--------------------------|
| ?     | Boolean                | ? (True or False)        |
| i     | Signed integer         | i8 (64-bit)              |
| u     | Unsigned integer       | u8 (64-bit)              |
| f     | Floating point         | f8 (64-bit)              |
| c     | Complex floating point | c32 (256-bit)            |
| m     | timedelta              | m (64-bit)               |
| M     | datetime               | M (64-bit)               |
| O     | Object                 | O (pointer to object)    |
| U     | Unicode                | U24 (24 Unicode characters) |
| V     | Raw data (void)        | V12 (12-byte data block) |

## Metainformation

In [43]:
g.size   # this means get the number of elements in the array g

12

In [44]:
g.itemsize   # this means get the size in bytes of each element in the array g

8

In [45]:
g.ndim    # this means get the number of dimensions of the array g

1

In [46]:
g.shape    # this means get the shape of the array g, which is a tuple of the number of elements in each dimension

(12,)

In [47]:
g.dtype   # this means get the data type of the elements in the array g

dtype('float64')

In [48]:
g.nbytes   # this means get the total number of bytes consumed by the array g, which is equal to the number of elements times the size of each element

96

## Reshaping

In [49]:
g = np.arange(15) # this means create an array of integers from 0 to 14

g

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [51]:
g.shape    # this means get the shape of the array g, which is a tuple of the number of elements in each dimension

(15,)

In [52]:
g.reshape((3, 5))

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [53]:
h = g.reshape((5, 3))
h

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [54]:
h.T

array([[ 0,  3,  6,  9, 12],
       [ 1,  4,  7, 10, 13],
       [ 2,  5,  8, 11, 14]])

In [55]:
h.transpose()

array([[ 0,  3,  6,  9, 12],
       [ 1,  4,  7, 10, 13],
       [ 2,  5,  8, 11, 14]])

## Resizing

In [56]:
g

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [57]:
np.resize(g,(3,1)) # this means create a new array of shape (3, 1) by repeating the elements of g as necessary to fill the new shape. The resulting array will have 3 rows and 1 column, and will contain the first 3 elements of g in the first row, the next 3 elements in the second row, and the next 3 elements in the third row.

array([[0],
       [1],
       [2]])

In [58]:
np.resize(g, (3, 5)) # this means change the shape of the array g to 3 by 5, but it may modify the original array g if possible, otherwise it will return a new array with the new shape

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [None]:
np.resize(g, (5, 4)) # this means change the shape of the array g to 5 by 4, but it may modify the original array g if possible, otherwise it will return a new array with the new shape

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14,  0],
       [ 1,  2,  3,  4]])

## Stacking

In [62]:
h

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [63]:
np.hstack((h, h)) # this means stack the arrays h and h horizontally (column-wise) to create a new array with twice the number of columns as h  

array([[ 0,  1,  2,  0,  1,  2],
       [ 3,  4,  5,  3,  4,  5],
       [ 6,  7,  8,  6,  7,  8],
       [ 9, 10, 11,  9, 10, 11],
       [12, 13, 14, 12, 13, 14]])

In [None]:
np.hstack((h, 2*h))  # this means stack the arrays h and 2*h horizontally (column-wise) to create a new array with twice the number of columns as h, where the first half of the columns are from h and the second half of the columns are from 2*h, which is the array h multiplied by 2 element-wise

array([[ 0,  1,  2,  0,  2,  4],
       [ 3,  4,  5,  6,  8, 10],
       [ 6,  7,  8, 12, 14, 16],
       [ 9, 10, 11, 18, 20, 22],
       [12, 13, 14, 24, 26, 28]])

In [65]:
np.vstack((h, 0.5*h)) # this means stack the arrays h and 0.5*h vertically (row-wise) to create a new array with twice the number of rows as h, where the first half of the rows are from h and the second half of the rows are from 0.5*h, which is the array h multiplied by 0.5 element-wise

array([[ 0. ,  1. ,  2. ],
       [ 3. ,  4. ,  5. ],
       [ 6. ,  7. ,  8. ],
       [ 9. , 10. , 11. ],
       [12. , 13. , 14. ],
       [ 0. ,  0.5,  1. ],
       [ 1.5,  2. ,  2.5],
       [ 3. ,  3.5,  4. ],
       [ 4.5,  5. ,  5.5],
       [ 6. ,  6.5,  7. ]])

## flattening

In [66]:
h

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [67]:
h.flatten() # this means return a copy of the array h collapsed into one dimension, which is a 1D array containing all the elements of h in row-major order

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [None]:
h.flatten(order = 'C') # this means return a copy of the array h collapsed into one dimension in row-major order (C-style)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [69]:
h.flatten(order='F') # this means return a copy of the array h collapsed into one dimension, which is a 1D array containing all the elements of h in column-major order (Fortran-style order)

array([ 0,  3,  6,  9, 12,  1,  4,  7, 10, 13,  2,  5,  8, 11, 14])

In [None]:
for i in h.flat:
    print(i, end = ',')   # flatten

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,

In [None]:
for i in h.ravel(order = 'C'):
    print(i, end = ',')   # ravel is an alternative of flatten

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,

## boolean arrays

In [73]:
h

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [74]:
h > 8

array([[False, False, False],
       [False, False, False],
       [False, False, False],
       [ True,  True,  True],
       [ True,  True,  True]])

In [75]:
h <= 7

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True, False],
       [False, False, False],
       [False, False, False]])

In [76]:
h == 5

array([[False, False, False],
       [False, False,  True],
       [False, False, False],
       [False, False, False],
       [False, False, False]])

In [77]:
(h == 5).astype(int) # this means convert the boolean array (h == 5) to an integer array, where True is converted to 1 and False is converted to 0  

array([[0, 0, 0],
       [0, 0, 1],
       [0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

In [79]:
(h > 4) & (h <= 12) # this means create a boolean array where each element is True if the corresponding element of h is greater than 4 and less than or equal to 12, and False otherwise. The & operator is used for element-wise logical AND operation between the two conditions.

array([[False, False, False],
       [False, False,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True, False, False]])

following operations flatten the data:

In [81]:
h[h>8] # this means get the elements of h that are greater than 8

array([ 9, 10, 11, 12, 13, 14])

In [None]:
h[(h > 4) & (h <= 12)] # this means get the elements of h that are greater than 4 and less than or equal to 12

array([ 5,  6,  7,  8,  9, 10, 11, 12])

In [None]:
h[(h < 4) | (h >= 12)] # this means get the elements of h that are less than 4 or greater than or equal to 12, where the | operator is used for element-wise logical OR operation between the two conditions

array([ 0,  1,  2,  3, 12, 13, 14])

np.where allows definition of actions/operations depending on whether a condition is True or False

In [85]:
np.where(h > 7, 1, 0) # this means create an array of the same shape as h, where each element is 1 if the corresponding element of h is greater than 7, and 0 otherwise. The np.where function takes three arguments: the condition, the value to use when the condition is True, and the value to use when the condition is False.   

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 1],
       [1, 1, 1],
       [1, 1, 1]])

In [None]:
np.where(h % 2 == 0, 'even', 'odd') # The condition h % 2 == 0 checks if each element of h is even, and the np.where function assigns 'even' or 'odd' accordingly.

array([['even', 'odd', 'even'],
       ['odd', 'even', 'odd'],
       ['even', 'odd', 'even'],
       ['odd', 'even', 'odd'],
       ['even', 'odd', 'even']], dtype='<U4')

In [None]:
np.where(h <= 7, h * 2, h / 2) # this means create an array of the same shape as h, where each element is h * 2 if the corresponding element of h is less than or equal to 7, and h / 2 otherwise. The np.where function takes three arguments: the condition, the value to use when the condition is True, and the value to use when the condition is False.

array([[ 0. ,  2. ,  4. ],
       [ 6. ,  8. , 10. ],
       [12. , 14. ,  4. ],
       [ 4.5,  5. ,  5.5],
       [ 6. ,  6.5,  7. ]])

## Speed comparison

In [88]:
import random
I = 5000

In [92]:
%time mat = [[random.gauss(0, 1) for j in range(I)] \
for i in range(I)]



CPU times: total: 10.7 s
Wall time: 10.7 s


In [None]:
mat[0][:5]   # this means get the first 5 elements of the first row of the matrix mat

[-1.0706532335451093,
 -0.9355137280796387,
 -0.7056786964177011,
 1.0857437872607962,
 -0.5970023844435064]

In [94]:
%time sum([sum(l) for l in mat])

CPU times: total: 172 ms
Wall time: 173 ms


2961.532023296603

In [None]:
import sys
sum([sys.getsizeof(l) for l in mat])   # this means calculate the total memory size of the lists in the matrix mat by summing the sizes of each list using sys.getsizeof function

209400000

Use Numpy package for above...

In [96]:
%time mat = np.random.standard_normal((I, I))

CPU times: total: 734 ms
Wall time: 965 ms


In [97]:
%time mat.sum()

CPU times: total: 31.2 ms
Wall time: 25.4 ms


np.float64(-2699.885309684171)

In [98]:
mat.nbytes

200000000

In [99]:
sys.getsizeof(mat)

200000128