# Numpy Go

## Generating Arrays

In [90]:
import numpy as np

A = np.array([[n+10*m for n in range(5)] for m in range(5)], dtype='float64')
print('A:\n', A, end='\n\n')

B = np.random.randint(100, size=(5, 5)) # @1
print('B:\n', B, end='\n\n')

C = np.arange(25, dtype='complex').reshape(5, 5)
print('C:\n', C, end='\n\n')

A:
 [[ 0.  1.  2.  3.  4.]
 [10. 11. 12. 13. 14.]
 [20. 21. 22. 23. 24.]
 [30. 31. 32. 33. 34.]
 [40. 41. 42. 43. 44.]]

B:
 [[ 3 10 54 51  8]
 [85 51 51 59 43]
 [70  4 58 74 49]
 [52 76 16 82 36]
 [10 17 54 92 10]]

C:
 [[ 0.+0.j  1.+0.j  2.+0.j  3.+0.j  4.+0.j]
 [ 5.+0.j  6.+0.j  7.+0.j  8.+0.j  9.+0.j]
 [10.+0.j 11.+0.j 12.+0.j 13.+0.j 14.+0.j]
 [15.+0.j 16.+0.j 17.+0.j 18.+0.j 19.+0.j]
 [20.+0.j 21.+0.j 22.+0.j 23.+0.j 24.+0.j]]



@1: `randint(low, high=None, size=None, dtype='l')` Return random integers from the "discrete uniform" distribution of
the specified dtype in the "half-open" interval \[`low`, `high`). If `high` is `None` (the default), then results are from [0, `low`).

Common data types that can be used with dtype are: `int`, `float`, `complex`, `bool`, etc.

We can also explicitly define the **bit size** of the data types, for example: `int16`, `int64`, `float32`, `float64`, `complex128`, etc. Visit [numpy.data_types](https://www.numpy.org.cn/user_guide/numpy_basics/data_types.html) for more details.

In [89]:
import numpy as np

A = np.random.rand(5, 5) # uniform random numbers in [0,1]
print('A:\n', A, end='\n\n')

B = np.random.randn(5, 5) # standard normal distributed random numbers
print('B:\n', B, end='\n\n')

A:
 [[0.30724316 0.79087085 0.65633142 0.33814552 0.4396678 ]
 [0.03463735 0.54576797 0.18399033 0.34879007 0.36254229]
 [0.40384877 0.24965217 0.22323893 0.7375096  0.16318439]
 [0.80182033 0.72455729 0.41606049 0.49571567 0.82389849]
 [0.63157215 0.57396807 0.29438008 0.79058877 0.70691893]]

B:
 [[ 0.68346177 -0.3130113   0.86398395  0.96317944  0.75394931]
 [ 1.38441171  0.44999173  0.46438095  1.60060128 -0.90109521]
 [-0.46295955  1.64646929  1.60120506 -0.26872706 -0.60863494]
 [ 0.43785386 -0.3368861   0.12219972  1.14137366 -0.7246897 ]
 [ 1.34852095 -0.56821616  0.43062723  0.28836228 -0.09861638]]



### Generate from and Save to File

In [88]:
import numpy as np

data = np.genfromtxt('data.dat') # Load data from a text file, with missing values handled as specified. @1
print(data, end='\n\n')

data_modified = data[:, [0, 1, 2, 3]].astype('int64')
print(data_modified, end='\n\n')

np.savetxt('data_modified.txt', data_modified) # default format: '%.18e'
!cat data_modified.txt

[[ 1.80e+03  1.00e+00  1.00e+00 -6.10e+00 -6.10e+00 -6.10e+00  1.00e+00]
 [ 1.80e+03  1.00e+00  2.00e+00 -1.54e+01 -1.54e+01 -1.54e+01  1.00e+00]
 [ 1.80e+03  1.00e+00  3.00e+00 -1.50e+01 -1.50e+01 -1.50e+01  1.00e+00]
 [ 1.80e+03  1.00e+00  4.00e+00 -1.93e+01 -1.93e+01 -1.93e+01  1.00e+00]
 [ 1.80e+03  1.00e+00  5.00e+00 -1.68e+01 -1.68e+01 -1.68e+01  1.00e+00]
 [ 1.80e+03  1.00e+00  6.00e+00 -1.14e+01 -1.14e+01 -1.14e+01  1.00e+00]
 [ 1.80e+03  1.00e+00  7.00e+00 -7.60e+00 -7.60e+00 -7.60e+00  1.00e+00]
 [ 1.80e+03  1.00e+00  8.00e+00 -7.10e+00 -7.10e+00 -7.10e+00  1.00e+00]
 [ 1.80e+03  1.00e+00  9.00e+00 -1.01e+01 -1.01e+01 -1.01e+01  1.00e+00]
 [ 1.80e+03  1.00e+00  1.00e+01 -9.50e+00 -9.50e+00 -9.50e+00  1.00e+00]]

[[1800    1    1   -6]
 [1800    1    2  -15]
 [1800    1    3  -15]
 [1800    1    4  -19]
 [1800    1    5  -16]
 [1800    1    6  -11]
 [1800    1    7   -7]
 [1800    1    8   -7]
 [1800    1    9  -10]
 [1800    1   10   -9]]

1.800000000000000000e+03 1.000000000

@1: default dtype: None. Determined by the contents of each column, individually.

In [71]:
import numpy as np

A = np.diag(range(5))
np.save('diag_matrix.npy', A) # Save an array to a binary file in NumPy ``.npy`` format.

B = np.load('diag_matrix.npy')
print(B)

[[0 0 0 0 0]
 [0 1 0 0 0]
 [0 0 2 0 0]
 [0 0 0 3 0]
 [0 0 0 0 4]]


## Array Properties

In [74]:
import numpy as np

A = np.random.randint(100, size=(5, 5))

print(A.itemsize) # bytes per element
print(A.nbytes) # number of bytes
print(A.ndim) # number of dimensions

8
200
2


## Compare a[:][:] and a[:, :]

In [None]:
import numpy as np

I = np.eye(10)

# compare difference
print(I[0:4][0:3])
print(I[0:4, 0:3])

## Compare np.vstack(), np.hstack(), np.concatenate()

In [87]:
import numpy as np

a = np.array([1, 2, 3])
b = np.array([2, 3, 4])

print('vstack:\n', np.vstack((a, b)), end='\n\n') # Stack arrays in sequence vertically (row wise). @1
print('hstack:\n', np.hstack((a, b)), end='\n\n') # Stack arrays in sequence horizontally (column wise). @2
print('concatenate:\n', np.concatenate((a, b)), end='\n\n') # Join a sequence of arrays along an existing axis. @3

vstack:
 [[1 2 3]
 [2 3 4]]

hstack:
 [1 2 3 2 3 4]

concatenate:
 [1 2 3 2 3 4]



In [91]:
c = np.array([[1], [2], [3]])
d = np.array([[2], [3], [4]])

print('vstack:\n', np.vstack((c, d)), end='\n\n') # Stack arrays in sequence vertically (row wise). @1
print('hstack:\n', np.hstack((c, d)), end='\n\n') # Stack arrays in sequence horizontally (column wise). @2
print('concatenate:\n', np.concatenate((c, d)), end='\n\n') # Join a sequence of arrays along an existing axis. @3

vstack:
 [[1]
 [2]
 [3]
 [2]
 [3]
 [4]]

hstack:
 [[1 2]
 [2 3]
 [3 4]]

concatenate:
 [[1]
 [2]
 [3]
 [2]
 [3]
 [4]]



@1: This is equivalent to concatenation along the **first axis** after 1-D arrays of shape `(N,)` have been reshaped to `(1,N)`.

@2: This is equivalent to concatenation along the **second axis**, except for 1-D arrays where it concatenates along the first axis.

@3: `concatenate((a1, a2, ...), axis=0, out=None)`. **Any axis** can be choosen. Default axis is 0.

**Summary in Chinese:**

`np.concatenate` 默认是把矩阵的第 0 轴（第 1 个轴，批量轴）连接起来的. 特例：向量没有批量轴，此时直接首尾串联. 也可以选择 `axis`.

`np.vstack` 是通过 0 轴（第 1 个轴，批量轴）把矩阵连接起来的，特例：遇见向量时，通过为向量添加新轴（批量轴），然后再进行连接.

`np.hstack` 是通过第 2 个轴把矩阵连接起来的. 特例：向量没有 "第 2 个轴"，直接首位相接.

在面对有批量轴的矩阵时: `np.concatenate` 的默认行为和 `np.vstack` 是相同的. 其之间的不同仅在于面对向量时.

在面对向量时: `np.concatenate` 的默认行为和 `np.hstack` 是相同的. 

## Compare flatten(), ravel()

In [86]:
import numpy as np

A = np.array([[n+10*m for n in range(5)] for m in range(5)])

B = A.flatten() # Return a copy of the array collapsed into one dimension.
print('B:\n', B, end='\n\n')

C = A.ravel() # Same effect with flatten(). But as we change C, A also changes
print('C:\n', C, end='\n\n')

B[0] = 99
print('A after change B:\n', A, end='\n\n')

C[0] = 99
print('A after change C:\n', A, end='\n\n')

B:
 [ 0  1  2  3  4 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43
 44]

C:
 [ 0  1  2  3  4 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43
 44]

A after change B:
 [[ 0  1  2  3  4]
 [10 11 12 13 14]
 [20 21 22 23 24]
 [30 31 32 33 34]
 [40 41 42 43 44]]

A after change C:
 [[99  1  2  3  4]
 [10 11 12 13 14]
 [20 21 22 23 24]
 [30 31 32 33 34]
 [40 41 42 43 44]]



So we need to avoid use `np.ravel()` as much as possible

## Slicing

### Fancy Indexing

In [79]:
import numpy as np

A = np.array([[n+10*m for n in range(5)] for m in range(5)])
print('A:\n', A)

row_indices = [1, 2, 3] # slice these rows of A
A[row_indices]

A:
 [[ 0  1  2  3  4]
 [10 11 12 13 14]
 [20 21 22 23 24]
 [30 31 32 33 34]
 [40 41 42 43 44]]


array([[10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34]])

In [100]:
import numpy as np

A = np.array([[n+10*m for n in range(5)] for m in range(5)])
print('A:\n', A, end='\n\n')

row_mask = np.array([True, True, True, False, False])
col_mask = np.array([1, 1, 1, 0, 0], dtype=bool)
whole_mask = (10<A) * (A<40)
indices = np.where(whole_mask)

print('row_mask:\n', row_mask, end='\n\n')
print('col_mask:\n', col_mask, end='\n\n')

print('mask:\n', mask, end='\n\n')
print('mask indices:\n', indices, end='\n\n')

print('row_masked rows of A:\n', A[row_mask], end='\n\n') # First three rows of A
print('column_masked columns of A:\n', A[:, col_mask], end='\n\n') # First three columns of A
print('whole_masked A:\n', A[whole_mask])

A:
 [[ 0  1  2  3  4]
 [10 11 12 13 14]
 [20 21 22 23 24]
 [30 31 32 33 34]
 [40 41 42 43 44]]

row_mask:
 [ True  True  True False False]

col_mask:
 [ True  True  True False False]

mask:
 [[False False False False False]
 [False  True  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]
 [False False False False False]]

mask indices:
 (array([1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]), array([1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]))

row_masked rows of A:
 [[ 0  1  2  3  4]
 [10 11 12 13 14]
 [20 21 22 23 24]]

column_masked columns of A:
 [[ 0  1  2]
 [10 11 12]
 [20 21 22]
 [30 31 32]
 [40 41 42]]

whole_masked A:
 [11 12 13 14 20 21 22 23 24 30 31 32 33 34]


In [92]:
import numpy as np

A = np.array([[n+10*m for n in range(5)] for m in range(5)])

# compare these two:
print(A[[1,2,3], [1,2,3]], end='\n\n')
print(A[[1,2,3], 1:4], end='\n\n')

[11 22 33]

[[11 12 13]
 [21 22 23]
 [31 32 33]]

