NumPy是用于科学计算的一个开源Python扩充程序库,它为Python提供了高性能的数组与矩阵运算处理能力.NumPy为Python带来了真正的多维数组功能，并且提供了丰富的函数库处理这些数组。它将常用的数学函数都支持向量化运算，使得这些数学函数能够直接对数组进行操作，将本来需要在Python级别进行的循环，放到C语言的运算中，明显地提高了程序的运算速度。

NumPy http://www.numpy.org/  

Pandas http://pandas.pydata.org/  

pandas库大量依赖NumPy数组来实现其Series以及DataFrame对象,NumPy同时也支持分片(slice
)以及向量化操作.所以我们在学习pandas前先来了解一下NumPy.

+ 任意维数的数组对象（ndarray，n-dimensional array object）
+ 通用函数对象（ufunc，universal function object）


In [1]:
import numpy as np

# ndarray
NumPy的核心功能是"ndarray"(即n-dimensional array，多维数组)数据结构。特点:

+ 连续内存分配
+ 向量化操作
+ 布尔选择
+ 分片(sliceability)
#### ndarray基础
+ ndarray.ndim 数组轴的个数，在python的世界中，轴的个数被称作秩
+ ndarray.shape 数组的维度。这是一个指示数组在每个维度上大小的整数元组。例如一个n排m列的矩阵，它的shape属性将是(2,3),这个元组的长度显然是秩，即维度或者ndim属性
+ ndarray.size 数组元素的总个数，等于shape属性中元组元素的乘积。
+ ndarray.dtype 一个用来描述数组中元素类型的对象，可以通过创造或指定dtype使用标准Python类型。另外NumPy提供它自己的数据类型。
+ ndarray.itemsize 数组中每个元素的字节大小。例如，一个元素类型为float64的数组itemsiz属性值为8(=64/8),又如，一个元素类型为complex32的数组item属性为4(=32/8).
+ ndarray.data 包含实际数组元素的缓冲区，通常我们不需要使用这个属性，因为我们总是通过索引来使用数组中的元素。


In [2]:
x = np.array([1,2,3,4,5]) #创建ndarray 及其属性
x

array([1, 2, 3, 4, 5])

In [3]:
x.ndim

1

In [4]:
x.shape

(5,)

In [5]:
x.size

5

In [6]:
x.itemsize

4

In [7]:
y = np.array([[1,2,3,4,5],[6,7,8,9,10]]) #lists of list
y

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [8]:
y.ndim

2

In [9]:
y.shape

(2, 5)

In [10]:
y.size

10

In [11]:
y.itemsize

4

In [13]:
x1 = np.array([1,2,3,4.6,5.0])
x1

array([ 1. ,  2. ,  3. ,  4.6,  5. ])

In [14]:
x2 = np.array([1]*5)
x2

array([1, 1, 1, 1, 1])

In [15]:
np.zeros(10)

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [16]:
np.array(range(1,10,2))

array([1, 3, 5, 7, 9])

In [17]:
range?

In [25]:
np.linspace(0,1,3)

array([ 0. ,  0.5,  1. ])

In [23]:
np.linspace?

### Chain Operation

In [27]:
x2 = np.arange(0,12).reshape(4,3)
x2

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [33]:
x3 = np.array(range(12)).reshape(4,3)
x3

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [35]:
np.size(x3,0)#行数

4

In [36]:
np.size(x3,1) #列数

3

In [37]:
np.zeros((2,3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [40]:
np.ones((2,3,4),dtype = float)

array([[[ 1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.]],

       [[ 1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.],
        [ 1.,  1.,  1.,  1.]]])

In [41]:
np.empty((5,6))

array([[  2.14126505e-284,   6.79038653e-313,   2.37663529e-312,
          2.05833592e-312,   2.41907520e-312,   2.56761491e-312],
       [  1.93101617e-312,   1.03977794e-312,   1.06099790e-312,
          1.08221785e-312,   1.10343781e-312,   1.12465777e-312],
       [  9.33678148e-313,   1.14587773e-312,   1.16709769e-312,
          1.18831764e-312,   1.20953760e-312,   1.03977794e-312],
       [  1.97345609e-312,   8.70018275e-313,   7.42698527e-313,
          2.22809558e-312,   2.46151512e-312,   6.79038654e-313],
       [  2.16443571e-312,   2.29175545e-312,   2.44029516e-312,
          2.12199580e-313,   2.55942694e-295,   1.94139779e-109]])

In [43]:
np.random.randint(0,10,3)

array([8, 1, 4])

In [44]:
np.random.randint?

In [45]:
np.random.random(3)

array([ 0.06061691,  0.89051858,  0.10264849])

### Boolean Selection

In [46]:
x

array([1, 2, 3, 4, 5])

In [47]:
x<2

array([ True, False, False, False, False], dtype=bool)

In [49]:
(x<2)|(x>4)

array([ True, False, False, False,  True], dtype=bool)

In [51]:
mask = x<3
mask

array([ True,  True, False, False, False], dtype=bool)

In [52]:
x[mask]   #实现数据选择功能

array([1, 2])

In [53]:
np.sum(x[mask])

3

In [54]:
y

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [56]:
x == y[0,]

array([ True,  True,  True,  True,  True], dtype=bool)

In [58]:
a1 = np.arange(9).reshape(3,3)
a2 = np.arange(9,0,-1).reshape(3,3)
a1<a2

array([[ True,  True,  True],
       [ True,  True, False],
       [False, False, False]], dtype=bool)

In [57]:
np.arange?

### slice 分片 start:end:step

In [65]:
x

array([1, 2, 3, 4, 5])

In [61]:
#对一维数据进行切片
x[3:5]

array([4, 5])

In [62]:
x[::-1]

array([5, 4, 3, 2, 1])

In [63]:
x[[0,2]]

array([1, 3])

In [66]:
#对多位数据切片
y

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [70]:
y[1:,1:3]

array([[7, 8]])

In [77]:
y[[0,1],2:]

array([[ 3,  4,  5],
       [ 8,  9, 10]])

### reshape

In [79]:
z = np.arange(0,9).reshape(3,3)
z

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [81]:
z.reshape(9)

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [82]:
z.ravel() #展开

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [87]:
#z,z.reshape() 与z.ravel共用同一存储空间
reshaped = z.reshape(np.size(z))
raveled = z.ravel()
flattened = z.flatten()
reshaped[2] = 1000
raveled[5]=2000#展开修改后影响z原始数据的值
flattened[0]=500  #展开修改后不影响z原始数据的值
z

array([[   0,    1, 1000],
       [   3,    4, 2000],
       [   6,    7,    8]])

In [90]:
flattened.shape=(3,3)
flattened

array([[ 500,    1, 1000],
       [   3,    4, 2000],
       [   6,    7,    8]])

### 合并

In [97]:
a = np.arange(0,9).reshape(3,3)
b = a*10
a

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [96]:
b

array([[ 0, 10, 20],
       [30, 40, 50],
       [60, 70, 80]])

In [99]:
np.vstack((a,b))

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 0, 10, 20],
       [30, 40, 50],
       [60, 70, 80]])

In [101]:
np.hstack((a,b))

array([[ 0,  1,  2,  0, 10, 20],
       [ 3,  4,  5, 30, 40, 50],
       [ 6,  7,  8, 60, 70, 80]])

In [102]:
np.concatenate((a,b),axis=0)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 0, 10, 20],
       [30, 40, 50],
       [60, 70, 80]])

In [103]:
np.concatenate((a,b),axis=1)

array([[ 0,  1,  2,  0, 10, 20],
       [ 3,  4,  5, 30, 40, 50],
       [ 6,  7,  8, 60, 70, 80]])

In [104]:
np.dstack((a,b))

array([[[ 0,  0],
        [ 1, 10],
        [ 2, 20]],

       [[ 3, 30],
        [ 4, 40],
        [ 5, 50]],

       [[ 6, 60],
        [ 7, 70],
        [ 8, 80]]])

In [105]:
one_d_a = np.arange(5)
one_d_b = (one_d_a + 1) * 10
np.column_stack((one_d_a, one_d_b))

array([[ 0, 10],
       [ 1, 20],
       [ 2, 30],
       [ 3, 40],
       [ 4, 50]])

In [106]:
np.row_stack((one_d_a, one_d_b))

array([[ 0,  1,  2,  3,  4],
       [10, 20, 30, 40, 50]])

### 通用函数

In [107]:
m = np.arange(10, 19).reshape(3, 3)
print (m)
print ("{0} min of the entire matrix".format(m.min()))
print ("{0} max of entire matrix".format(m.max()))
print ("{0} position of the min value".format(m.argmin()))
print ("{0} position of the max value".format(m.argmax()))
print ("{0} mins down each column".format(m.min(axis = 0)))
print ("{0} mins across each row".format(m.min(axis = 1)))
print ("{0} maxs down each column".format(m.max(axis = 0)))
print ("{0} maxs across each row".format(m.max(axis = 1)))

[[10 11 12]
 [13 14 15]
 [16 17 18]]
10 min of the entire matrix
18 max of entire matrix
0 position of the min value
8 position of the max value
[10 11 12] mins down each column
[10 13 16] mins across each row
[16 17 18] maxs down each column
[12 15 18] maxs across each row


In [108]:
a = np.arange(1,10)
a.mean(), a.std(), a.var()

(5.0, 2.5819888974716112, 6.666666666666667)