# NumPy Basics: Arrays and Vectorized Computation
<br/>
NumPy, short for Numerical Python, is one of the most important foundational packages
for numerical computing in Python

## 4.1 The NumPy ndarray: A Multidimensional Array Object
NumPy最重要的一个特点就是其N维数组对象（即`ndarray` N-dimensional array），该对象是一个快速而灵活的大数据集容器。你可以利用这种数组对整块数据执行一些数学运算，其语法跟标量元素之间的运算一样。

In [1]:
import numpy as np
data = np.random.randn(2, 3)
data

array([[-1.99853705,  0.65872514,  0.74939594],
       [ 1.22605233,  0.67103549, -0.92052294]])

In [2]:
data * 10

array([[-19.98537046,   6.58725138,   7.49395936],
       [ 12.26052331,   6.71035487,  -9.20522944]])

In [3]:
data + data

array([[-3.99707409,  1.31745028,  1.49879187],
       [ 2.45210466,  1.34207097, -1.84104589]])

ndarray是一个通用的同构数据多维容器，也就是说，其中的所有元素必须是相同类型的。每个数组都有一个shape（一个表示各维度大小的元组）和一个dtype（一个用于说明数组数据类型的对象）：

In [4]:
print(data.shape) #没有（）
print(data.dtype) #没有（）

(2, 3)
float64


### Creating ndarrays
创建数组最简单的办法就是使用array函数。它接受一切序列型的对象（包括其他数组），然后产生一个新的含有传入数据的NumPy数组。

In [5]:
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

In [6]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [7]:
# check dimensions of the arrays
print(arr2.ndim)
print(arr2.shape)

2
(2, 4)


#### Zeros & Ones

In [8]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [9]:
np.zeros((3,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [10]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

由于NumPy关注的是数值计算，因此，如果没有特别指定，数据类型基本都是float64。也可以用下面的表创建

| Function          | Description                                                                                                                                                                            |
|-------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| array             | Convert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype or explicitly specifying a dtype; copies the input data by default           |
| asarray           | Convert input to ndarray, but do not copy if the input is already an ndarray                                                                                                           |
| arange            | Like the built-in range but returns an ndarray instead of a list                                                                                                                       |
| one, one_like     | Produce an array of all 1s with the given shape and dtype; ones_like takes another array and produces a ones array of the same shape and dtype                                         |
| zero, zero_like   | Like ones and ones_like but producing arrays of 0s instead                                                                                                                             |
| empty, empty_like | Create new arrays by allocating new memory, but do not populate with any values like ones and zeros                                                                                    |
| full, full_like   | Produce an array of the given shape and dtype with all values set to the indicated “fill value”, full_like takes another array and produces a filled array of the same shape and dtype |
| eye, identity     | Create a square N x N identity matrix (1s on the diagonal and 0s elsewhere)                                                                                                            |

### Data Types for ndarrays

In [11]:
# 初始化时，指定数据类型
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr1.dtype

dtype('float64')

In [12]:
# 指定数据类型
arr2 = np.array([1, 2, 3], dtype=np.int32)
arr2.dtype

dtype('int32')

#### 通过astype 进行数据类型转换

In [13]:
arr = np.array([1, 2, 3, 4, 5])
print(arr.dtype)
float_arr = arr.astype(np.float64)
print(float_arr)
print(float_arr.dtype)

int64
[1. 2. 3. 4. 5.]
float64


In [14]:
# convert string to numerical
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings.astype(float) #notice: float instead of float64

array([ 1.25, -9.6 , 42.  ])

In [15]:
# convert to same dtype
int_array = np.arange(10)
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)
int_array.astype(calibers.dtype)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

### Arithmetic with NumPy Arrays
数组很重要，因为它使你不用编写循环即可对数据执行批量运算。NumPy用户称其为矢量化（vectorization）。大小相等的数组之间的任何算术运算都会将运算应用到元素级：

In [16]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [17]:
print(arr * arr)
print(arr - arr)
print(arr * 2)

[[ 1.  4.  9.]
 [16. 25. 36.]]
[[0. 0. 0.]
 [0. 0. 0.]]
[[ 2.  4.  6.]
 [ 8. 10. 12.]]


In [18]:
# Boolean
arr1 = np.array([[1., 2., 3.], [4., 5., 6.]])
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr1 > arr2

array([[ True, False,  True],
       [False,  True, False]])

### Basic Indexing and Slicing
#### Indexing
1-D array

In [19]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [20]:
arr[5]

5

In [21]:
arr[:5]

array([0, 1, 2, 3, 4])

In [22]:
arr[5:8]

array([5, 6, 7])

In [23]:
# 注意普通list 不可以这样赋值
arr[5:8] = 12
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

#### 切片只是引用！浅拷贝！

In [24]:
arr_slice = arr[5:8]
arr_slice

array([12, 12, 12])

In [25]:
# 改变切片，也会改变原数组
arr_slice[1] = 1111
arr

array([   0,    1,    2,    3,    4,   12, 1111,   12,    8,    9])

In [26]:
# deep copy()
arr = np.array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])
copy = arr[5:8].copy()
copy[1] = 1111
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

2-D Array

In [27]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[2]

array([7, 8, 9])

In [28]:
print(arr2d[0][2])
print(arr2d[0, 2])

3
3


#### 二维数组的索引方式。轴0作为行，轴1作为列。
<img src='./resources/numpy index.png' alt='boosting' style="width: 200px;"/>

In [29]:
# 如果省略了后面的索引，则返回对象会是一个维度低一点的ndarray
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d.shape

(2, 2, 3)

In [30]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [31]:
arr3d[0]

array([[1, 2, 3],
       [4, 5, 6]])

In [32]:
old_values = arr3d[0].copy()
arr3d[0] = 42
arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [33]:
arr3d[0] = old_values
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [34]:
arr3d[0,1,2]

6

#### Slicing
1-D

In [35]:
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

In [36]:
arr[1:6]

array([ 1,  2,  3,  4, 12])

2-D

In [37]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [38]:
# first two rows
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [39]:
 arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

<img src='./resources/numpy slicing.png' alt='boosting' style="width: 400px;"/>


### Boolean Indexing

In [40]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7, 4)
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [41]:
data

array([[-0.94369109, -0.36190453, -0.03090493,  0.88695893],
       [ 0.15238414, -0.23065609,  0.08577557, -0.16590819],
       [ 0.64180204,  0.01480726,  0.6992835 ,  1.05281868],
       [-0.1162269 , -0.19638401, -1.0644653 ,  0.86332538],
       [ 0.79408234,  0.30599424,  0.53313465,  0.00775279],
       [ 0.29416067,  0.35411864, -0.17484025,  0.162245  ],
       [-2.09075163, -0.74954936, -0.19222341,  1.59341156]])

假设每个名字都对应data数组中的一行，而我们想要选出对应于名字"Bob"的所有行。跟算术运算一样，数组的比较运算（如==）也是矢量化的。因此，对names和字符串"Bob"的比较运算将会产生一个布尔型数组：

In [42]:
names == 'Bob'

array([ True, False, False,  True, False, False, False])

In [43]:
data[names == 'Bob']

array([[-0.94369109, -0.36190453, -0.03090493,  0.88695893],
       [-0.1162269 , -0.19638401, -1.0644653 ,  0.86332538]])

In [44]:
# 利用布尔限定行
data[names == 'Bob', 2:]

array([[-0.03090493,  0.88695893],
       [-1.0644653 ,  0.86332538]])

In [45]:
#利用布尔限定行
data[names == 'Bob', 3]

array([0.88695893, 0.86332538])

In [46]:
data[data < 0] = 0
data

array([[0.        , 0.        , 0.        , 0.88695893],
       [0.15238414, 0.        , 0.08577557, 0.        ],
       [0.64180204, 0.01480726, 0.6992835 , 1.05281868],
       [0.        , 0.        , 0.        , 0.86332538],
       [0.79408234, 0.30599424, 0.53313465, 0.00775279],
       [0.29416067, 0.35411864, 0.        , 0.162245  ],
       [0.        , 0.        , 0.        , 1.59341156]])

In [47]:
data[names != 'Joe'] = 7
data

array([[7.        , 7.        , 7.        , 7.        ],
       [0.15238414, 0.        , 0.08577557, 0.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [0.29416067, 0.35411864, 0.        , 0.162245  ],
       [0.        , 0.        , 0.        , 1.59341156]])

### Fancy indexing
花式索引（Fancy indexing）是一个NumPy术语，它指的是利用整数数组进行索引。假设我们有一个8×4数组：

In [48]:
arr = np.empty((8, 4))
for i in range(8):
    arr[i] = i
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

In [49]:
arr[[4, 3, 0, 6],:]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

In [50]:
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

一次传入多个索引数组会有一点特别。它返回的是一个一维数组，其中的元素对应各个索引元组：

In [51]:
arr = np.arange(32).reshape((8, 4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [52]:
# 行列两两组合:选出的是元素(1,0)、(5,3)、(7,1)和(2,2).
# 无论数组是多少维的，花式索引总是一维的。
arr[[1, 5, 7, 2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

In [53]:
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

花式索引跟切片不一样，它总是将数据复制到新数组中。

### Transposing Arrays and Swapping Axes
转置是重塑的一种特殊形式，它返回的是源数据的视图（不会进行任何复制操作）.

数组不仅有transpose方法，还有一个特殊的T属性：

In [54]:
arr = np.arange(15).reshape((3, 5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [55]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

在进行矩阵计算时，经常需要用到该操作，比如利用np.dot计算矩阵内积：

In [56]:
arr = np.random.randn(6, 3)
arr

array([[-0.18810736, -0.68371178, -1.51818464],
       [-1.87058629,  1.74382137,  0.03372371],
       [-0.54936902,  0.31310843, -0.34740168],
       [ 0.67755496, -0.58172426,  0.39476003],
       [ 1.27783315, -0.1052962 , -0.1582659 ],
       [ 0.22470409,  0.85954932, -1.00698553]])

In [57]:
np.dot(arr.T,arr)

array([[ 5.97871397, -3.64092608,  0.25231074],
       [-3.64092608,  4.6947271 , -0.09049596],
       [ 0.25231074, -0.09049596,  3.62161325]])

对于高维数组，`transpose`需要得到一个由轴编号组成的元组才能对这些轴进行转置.第一个轴被换成了第二个，第二个轴被换成了第一个，最后一个轴不变。

In [58]:
arr = np.arange(16).reshape((2, 2, 4))
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [59]:
arr.transpose((1, 0, 2))

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

In [60]:
# Same as
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [61]:
arr.swapaxes(1, 2)

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

## 4.2 Universal Functions: Fast Element-Wise Array Functions
通用函数（即ufunc）是一种对ndarray中的数据执行元素级运算的函数。你可以将其看做简单函数（接受一个或多个标量值，并产生一个或多个标量值）的矢量化包装器。

In [62]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [63]:
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [64]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

#### Binary ufunc

In [65]:
 x = np.random.randn(8)
 y = np.random.randn(8)
print(x)
print(y)

[-1.19691169  0.31259296 -0.30733137 -1.10315746  0.63931448  1.51134583
  0.02683957  0.72295825]
[ 0.66731431  0.38725361  0.37196765 -0.86739675 -0.14662643 -0.00551347
 -1.06370974  0.68969909]


In [66]:
np.maximum(x, y)

array([ 0.66731431,  0.38725361,  0.37196765, -0.86739675,  0.63931448,
        1.51134583,  0.02683957,  0.72295825])

有些ufunc的确可以返回多个数组。modf就是一个例子，它是Python内置函数divmod的矢量化版本，它会返回浮点数数组的小数和整数部分：

In [67]:
arr = np.random.randn(7) * 5
arr

array([-5.95474698,  3.47944162, -8.32845296,  0.33807012, -5.69256855,
        2.8251946 ,  0.30842283])

In [68]:
remainder, whole_part = np.modf(arr)
print(remainder)
print(whole_part)

[-0.95474698  0.47944162 -0.32845296  0.33807012 -0.69256855  0.8251946
  0.30842283]
[-5.  3. -8.  0. -5.  2.  0.]


Ufuncs可以接受一个out可选参数，这样就能在数组原地进行操作：

In [69]:
arr

array([-5.95474698,  3.47944162, -8.32845296,  0.33807012, -5.69256855,
        2.8251946 ,  0.30842283])

In [71]:
np.sqrt(arr,arr)

array([       nan, 1.36576943,        nan, 0.7625208 ,        nan,
       1.29646887, 0.74522365])

In [73]:
# arr 原array 被改变了
arr

array([       nan, 1.36576943,        nan, 0.7625208 ,        nan,
       1.29646887, 0.74522365])

#### Unary ufuncs <br/>


| Function                                         | Description                                                                                                 |
|--------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
| abs, fabs                                        | Compute the absolute value element-wise for integer, floating-point, or complex values                      |
| sqrt                                             | Compute the square root of each element (equivalent to arr ** 0.5)                                          |
| square                                           | Compute the square of each element (equivalent to arr ** 2)                                                 |
| exp                                              | Compute the exponent ex of each element                                                                     |
| log, log10,log2, log1p                           | Natural logarithm (base e), log base 10, log base 2, and log(1 + x), respectively                           |
| sign                                             | Compute the sign of each element: 1 (positive), 0 (zero), or –1 (negative)                                  |
| ceil                                             | Compute the ceiling of each element (i.e., the smallest integer greater than or equal to that number)       |
| floor                                            | Compute the floor of each element (i.e., the largest integer less than or equal to each element)            |
| rint                                             | Round elements to the nearest integer, preserving the dtype                                                 |
| modf                                             | Return fractional and integral parts of array as a separate array                                           |
| isnan                                            | Return boolean array indicating whether each value is NaN (Not a Number)                                    |
| isfinite, isinf                                  | Return boolean array indicating whether each element is finite (non-inf, non-NaN) or infinite, respectively |
| cos, cosh, sin, sinh, tan, tanh                  | Regular and hyperbolic trigonometric functions                                                              |
| arccos, arccosh, arcsin, arcsinh,arctan, arctanh | Inverse trigonometric functions                                                                             |
| logical_not                                      | Compute truth value of not x element-wise (equivalent to ~arr).                                             |
|                                                  |                                                                                                             |
|                                                  |                                                                                                             |

#### Binary universal functions <br/>
| Function | Description |
|----------|-------------|
|add| Add corresponding elements in arrays|
|subtract| Subtract elements in second array from first array|
|multiply| Multiply array elements|
|divide, floor_divide| Divide or floor divide (truncating the remainder)|
|power| Raise elements in first array to powers indicated in second array|
|maximum, fmax| Element-wise maximum; fmax ignores NaN|
|minimum, fmin| Element-wise minimum; fmin ignores NaN|
|mod| Element-wise modulus (remainder of division)|
|copysign| Copy sign of values in second argument to values in first argument|
|greater, greater_equal,less, less_equal,equal, not_equal|Perform element-wise comparison, yielding boolean array (equivalent to infix operators >, >=, <, <=, ==, !=)|
|logical_and,logical_or, logical_xor|Compute element-wise truth value of logical operation (equivalent to infix operators & &#00124;, ^)|

## 4.3 Array-Oriented Programming with Arrays
NumPy数组使你可以将许多种数据处理任务表述为简洁的数组表达式（否则需要编写循环）。用数组表达式代替循环的做法，通常被称为矢量化。一般来说，矢量化数组运算要比等价的纯Python方式快上一两个数量级（甚至更多），尤其是各种数值计算。