NumPy http://www.numpy.org/

NumPy是用于科学计算的一个开源Python扩充程序库，它为Python提供了高性能的数组与矩阵运算处理能力。NumPy为Python带来了真正的多维数组功能，并且提供了丰富的函数库处理这些数组。它将常用的数学函数都支持向量化运算，使得这些数学函数能够直接对数组进行操作，将本来需要在Python级别进行的循环，放到C语言的运算中，明显地提高了程序的运算速度。


Pandas http://pandas.pydata.org/  

pandas库大量依赖NumPy数组来实现其Series以及DataFrame对象，NumPy同时也支持分片（slice）以及向量化操作。所以我们在学习pandas前先来了解一下NumPy.

+ 任意维数的数组对象（ndarray，n-dimensional array object）
+ 通用函数对象（ufunc，universal function object）

In [1]:
import numpy as np

In [4]:
def squares(values):
    result = []
    for v in values:
        result.append(v * v)
    return result

to_square = range(10000)
%timeit squares(to_square)

983 µs ± 4.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [3]:
array_to_square = np.arange(0, 10000)
# vectorized operation
%timeit array_to_square ** 2

3.92 µs ± 111 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


# ndarray

NumPy的核心功能是"ndarray"（即N-dimensional array，多维数组）数据结构。特点:
+ 连续内存分配
+ 向量化操作
+ 布尔选择
+ 分片(sliceability)

## ndarray基础

+ ``ndarray.ndim`` 数组轴的个数，在python的世界中，轴的个数被称作秩
+ ``ndarray.shape`` 数组的维度。这是一个指示数组在每个维度上大小的整数元组。例如一个n排m列的矩阵，它的shape属性将是(2,3),这个元组的长度显然是秩，即维度或者ndim属性
+ ``ndarray.size`` 数组元素的总个数，等于shape属性中元组元素的乘积。
+ ``ndarray.dtype`` 一个用来描述数组中元素类型的对象，可以通过创造或指定dtype使用标准Python类型。另外NumPy提供它自己的数据类型。
+ ``ndarray.itemsize`` 数组中每个元素的字节大小。例如，一个元素类型为float64的数组itemsiz属性值为8(=64/8),又如，一个元素类型为complex32的数组item属性为4(=32/8).
+ ``ndarray.data`` 包含实际数组元素的缓冲区，通常我们不需要使用这个属性，因为我们总是通过索引来使用数组中的元素。

In [5]:
x=np.array([1,2,3,4,5])
x

array([1, 2, 3, 4, 5])

In [6]:
type(x)

numpy.ndarray

In [7]:
x.ndim

1

In [8]:
x.shape

(5,)

In [9]:
x.size

5

In [10]:
x.itemsize

8

In [11]:
x.data

<memory at 0x10ccb94c8>

In [12]:
y=np.array([[1,2,3,4,5],[6,7,8,9,10]])
y

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [13]:
y.ndim

2

In [14]:
y.size

10

In [15]:
y.itemsize

8

In [16]:
y.shape

(2, 5)

In [17]:
x1 = np.array([1, 2, 3, 4.0, 5.0])
x1.dtype,x.dtype

(dtype('float64'), dtype('int64'))

### 创建 ndarray

In [18]:
x2=np.array([1]*5)
x2

array([1, 1, 1, 1, 1])

In [19]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [20]:
np.array(range(10))

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [21]:
range?

[0;31mInit signature:[0m [0mrange[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
range(stop) -> range object
range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).
[0;31mType:[0m           type


In [22]:
np.array(range(0,10,2))

array([0, 2, 4, 6, 8])

In [23]:
np.array(range(10,0,-1))

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1])

In [26]:
np.linspace(0,10,5)

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

In [27]:
np.linspace?

[0;31mSignature:[0m [0mnp[0m[0;34m.[0m[0mlinspace[0m[0;34m([0m[0mstart[0m[0;34m,[0m [0mstop[0m[0;34m,[0m [0mnum[0m[0;34m=[0m[0;36m50[0m[0;34m,[0m [0mendpoint[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m [0mretstep[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m [0mdtype[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return evenly spaced numbers over a specified interval.

Returns `num` evenly spaced samples, calculated over the
interval [`start`, `stop`].

The endpoint of the interval can optionally be excluded.

Parameters
----------
start : scalar
    The starting value of the sequence.
stop : scalar
    The end value of the sequence, unless `endpoint` is set to False.
    In that case, the sequence consists of all but the last of ``num + 1``
    evenly spaced samples, so that `stop` is excluded.  Note that the step
    size changes when `endpoint` is False.
num : int, optional
    Number of samples to generate. Default is 50. M

In [28]:
# vectolized operations
print(x)
print(x1)
x+x1

[1 2 3 4 5]
[1. 2. 3. 4. 5.]


array([ 2.,  4.,  6.,  8., 10.])

In [29]:
x2 = np.arange(0, 12).reshape(4, 3) #chain operation
x2

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [30]:
np.size(x2)

12

In [31]:
np.size(x2,0)

4

In [32]:
np.size(x2,1)

3

In [34]:
np.zeros((2,3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [36]:
np.ones((2,3,4),dtype=int)

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]])

In [37]:
np.empty((5,6))

array([[ 2.10077583e-312,  1.15998412e-028,  2.44171989e+232,
         8.00801729e+159,  1.21359026e+132,  8.01768645e-096],
       [ 6.12743486e-154,  6.14099335e-071,  1.05132387e-153,
         6.01391519e-154,  8.12549826e-096,  4.57669057e-072],
       [ 1.81154498e-152,  1.20336039e+132,  8.01768645e-096,
         6.12743486e-154,  6.14099335e-071,  1.05132387e-153],
       [ 6.01391519e-154,  8.12549826e-096,  4.57669057e-072,
         4.56295599e-144,  2.86530729e+161,  8.93176633e+271],
       [ 4.98131536e+151,  6.35296669e-062,  2.00389942e+000,
         2.00389942e+000, -1.29074143e-231,  3.11109010e+231]])

In [42]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [43]:
np.random.randint(0,10,3)

array([3, 5, 1])

In [44]:
np.random.random(10)

array([0.09071368, 0.52538488, 0.60365084, 0.18172769, 0.93459968,
       0.06816214, 0.9920622 , 0.9114329 , 0.27830929, 0.79485533])

In [45]:
np.random?

[0;31mType:[0m        module
[0;31mString form:[0m <module 'numpy.random' from '/anaconda3/lib/python3.6/site-packages/numpy/random/__init__.py'>
[0;31mFile:[0m        /anaconda3/lib/python3.6/site-packages/numpy/random/__init__.py
[0;31mDocstring:[0m  
Random Number Generation

Utility functions
random_sample        Uniformly distributed floats over ``[0, 1)``.
random               Alias for `random_sample`.
bytes                Uniformly distributed random bytes.
random_integers      Uniformly distributed integers in a given range.
permutation          Randomly permute a sequence / generate a random sequence.
shuffle              Randomly permute a sequence in place.
seed                 Seed the random number generator.
choice               Random sample from 1-D array.


Compatibility functions
rand                 Uniformly distributed values.
randn                Normally distributed values.
ranf                 Uniformly distributed floating point numbers.
randint       

# 选择数组中元素

In [46]:
x

array([1, 2, 3, 4, 5])

In [47]:
x[0],x[3]

(1, 4)

In [48]:
y

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [49]:
y[1,3]

9

In [50]:
y[0,]

array([1, 2, 3, 4, 5])

In [51]:
y[:,1]

array([2, 7])

In [52]:
y[:,1:4]

array([[2, 3, 4],
       [7, 8, 9]])

# 选择数组中的布尔值

In [42]:
x

array([1, 2, 3, 4, 5])

In [43]:
x<2

array([ True, False, False, False, False], dtype=bool)

In [44]:
x<2 or x>4

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [45]:
(x<2) | (x>4)

array([ True, False, False, False,  True], dtype=bool)

In [46]:
mask=x<3
mask

array([ True,  True, False, False, False], dtype=bool)

In [47]:
x[mask]

array([1, 2])

In [48]:
np.sum(x<3)

2

In [49]:
y

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [50]:
x==y[0,]

array([ True,  True,  True,  True,  True], dtype=bool)

In [51]:
a1 = np.arange(9).reshape(3, 3)
a2 = np.arange(9, 0 , -1).reshape(3, 3)
a1 < a2

array([[ True,  True,  True],
       [ True,  True, False],
       [False, False, False]], dtype=bool)

# slice

start:end:step

In [52]:
x=np.arange(0,10)
x[3:9]

array([3, 4, 5, 6, 7, 8])

In [53]:
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [54]:
x[::2]

array([0, 2, 4, 6, 8])

In [55]:
x[[0,2]]

array([0, 2])

In [56]:
x[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [57]:
x[:6]

array([0, 1, 2, 3, 4, 5])

In [58]:
x[2:]

array([2, 3, 4, 5, 6, 7, 8, 9])

In [59]:
y=np.arange(0,16).reshape(4,4)
y

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [60]:
y[:,1:3]

array([[ 1,  2],
       [ 5,  6],
       [ 9, 10],
       [13, 14]])

In [61]:
y[1:3,2:4]

array([[ 6,  7],
       [10, 11]])

In [62]:
y[[1,3],:]

array([[ 4,  5,  6,  7],
       [12, 13, 14, 15]])

# reshape

In [65]:
x=np.arange(0,9)
y=x.reshape(3,3)
y

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [85]:
y.reshape(9)

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [66]:
y #注意这里y没有变

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [64]:
y.ravel()

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [67]:
reshaped = y.reshape(np.size(y))
raveled = y.ravel()

reshaped[2] = 1000
raveled[5] = 2000
y

array([[   0,    1, 1000],
       [   3,    4, 2000],
       [   6,    7,    8]])

In [68]:
reshaped

array([   0,    1, 1000,    3,    4, 2000,    6,    7,    8])

In [111]:
y = np.arange(0, 9).reshape(3,3)
flattened = y.flatten()

flattened[0] = 1000
flattened

array([1000,    1,    2,    3,    4,    5,    6,    7,    8])

In [112]:
y

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [113]:
flattened.shape = (3, 3)
flattened

array([[1000,    1,    2],
       [   3,    4,    5],
       [   6,    7,    8]])

In [114]:
flattened.T

array([[1000,    3,    6],
       [   1,    4,    7],
       [   2,    5,    8]])

http://stackoverflow.com/questions/33116936/differences-between-x-ravel-and-x-reshapes0s1s2-when-number-of-axes-known
http://www.python-course.eu/matrix_arithmetic.php

# 合并

In [70]:
a = np.arange(9).reshape(3, 3)
b = (a + 1) * 10
a

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [116]:
b

array([[10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

In [117]:
np.hstack((a, b))

array([[ 0,  1,  2, 10, 20, 30],
       [ 3,  4,  5, 40, 50, 60],
       [ 6,  7,  8, 70, 80, 90]])

In [118]:
np.vstack((a, b))

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

In [119]:
np.concatenate((a, b), axis = 0)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

In [120]:
np.concatenate((a, b), axis = 1)

array([[ 0,  1,  2, 10, 20, 30],
       [ 3,  4,  5, 40, 50, 60],
       [ 6,  7,  8, 70, 80, 90]])

In [71]:
np.dstack((a, b))

array([[[ 0, 10],
        [ 1, 20],
        [ 2, 30]],

       [[ 3, 40],
        [ 4, 50],
        [ 5, 60]],

       [[ 6, 70],
        [ 7, 80],
        [ 8, 90]]])

In [122]:
one_d_a = np.arange(5)
one_d_b = (one_d_a + 1) * 10
np.column_stack((one_d_a, one_d_b))

array([[ 0, 10],
       [ 1, 20],
       [ 2, 30],
       [ 3, 40],
       [ 4, 50]])

In [123]:
np.row_stack((one_d_a, one_d_b))

array([[ 0,  1,  2,  3,  4],
       [10, 20, 30, 40, 50]])

# matrix

+ 加
+ 减
+ 乘
+ 内积
+ 外积

##  算术运算

+ +
+ -
+ *
+ /
+ **
+ %

In [91]:
x = np.array([1,5,2])
y = np.array([7,4,1])
x + y

array([8, 9, 3])

![](./img/vector_addition.png)

In [92]:
x * y

array([ 7, 20,  2])

In [93]:
x - y

array([-6,  1,  1])

![](./img/vector_subtraction.png)

In [94]:
x / y

array([ 0.14285714,  1.25      ,  2.        ])

In [95]:
x % y

array([1, 1, 0], dtype=int32)

# matrix

http://baike.baidu.com/item/%E5%90%91%E9%87%8F%E7%A7%AF?fromtitle=cross+product&type=syn

## scalar product/dot product

$$\vec{a}\cdot\vec{b}=|\vec{a}||\vec{b}|cos\angle(\vec{a},\vec{b})$$

$$\vec{a}\cdot\vec{b}=a_{1}b_{1}+a_{2}b_{2}+a_{3}b_{3}$$

关于[latex](http://www.mohu.org/info/symbols/symbols.htm)的一些常用符号

In [96]:
x = np.array([1,2,3])
y = np.array([-7,8,9])
np.dot(x,y)

36

In [98]:
dot = np.dot(x,y)
x_modulus = np.sqrt((x*x).sum())
y_modulus = np.sqrt((y*y).sum())
cos_angle = dot / x_modulus / y_modulus # cosine of angle between x and y
angle = np.arccos(cos_angle)
print("angle=",angle)
print(angle * 360 / 2 / np.pi) # angle in degrees
x_modulus*y_modulus*cos_angle

angle= 0.808233789011
46.3083849702


36.0

## matrix

In [99]:
x = np.array( ((2,3), (3, 5)) )
y = np.array( ((1,2), (5, -1)) )
print( x * y)
x = np.matrix( ((2,3), (3, 5)) )
y = np.matrix( ((1,2), (5, -1)) )
print(x * y)

[[ 2  6]
 [15 -5]]
[[17  1]
 [28  1]]


![](./img/matrix_product2.jpeg)

In [100]:
x = np.array( ((2,3), (3, 5)) )
y = np.matrix( ((1,2), (5, -1)) )
np.dot(x,y)

matrix([[17,  1],
        [28,  1]])

In [101]:
np.mat(x) * np.mat(y)

matrix([[17,  1],
        [28,  1]])

假设有有4人,Tom,Mike,Jason,Jack买了三种食品  
Tom: 100g A,175g B 210g C  
Mike: 90g A, 160g B ,150g C  
Jason:200g A, 50g B,100g C  
Jack:120g A,310g C  

A 2.98/100g  
B 3.90/100g  
C 1.99/100g  

In [9]:
NumPersons = np.array([[100,175,210],[90,160,150],[200,50,100],[120,0,310]])
Price_per_100_g = np.array([2.98,3.90,1.99])
Price_in_Cent = np.dot(NumPersons,Price_per_100_g)
Price_in_Euro = Price_in_Cent/np.array([100,100,100,100])
Price_in_Euro

array([ 13.984,  11.907,   9.9  ,   9.745])

## cross product

$$\vec{a}\cdot\vec{b}=|\vec{a}||\vec{b}|sin\angle(\vec{a},\vec{b})|\vec{n}|$$

![](./img/Cross_product_vector.png)

这里$|\vec{n}|$是一个垂直于由$\vec{a}$和$\vec{b}$构成平面的单位向量，它的方向根据右手法则获得 

In [102]:
x = np.array([0,0,1])
y = np.array([0,1,0])

np.cross(x,y)

array([-1,  0,  0])

In [103]:
np.cross(y,x)

array([1, 0, 0])

# 通用函数

In [73]:
m = np.arange(10, 19).reshape(3, 3)
print (m)
print ("{0} min of the entire matrix".format(m.min()))
print ("{0} max of entire matrix".format(m.max()))
print ("{0} position of the min value".format(m.argmin()))
print ("{0} position of the max value".format(m.argmax()))
print ("{0} mins down each column".format(m.min(axis = 0)))
print ("{0} mins across each row".format(m.min(axis = 1)))
print ("{0} maxs down each column".format(m.max(axis = 0)))
print ("{0} maxs across each row".format(m.max(axis = 1)))

[[10 11 12]
 [13 14 15]
 [16 17 18]]
10 min of the entire matrix
18 max of entire matrix
0 position of the min value
8 position of the max value
[10 11 12] mins down each column
[10 13 16] mins across each row
[16 17 18] maxs down each column
[12 15 18] maxs across each row


In [74]:
a = np.arange(1,10)
a.mean(), a.std(), a.var()

(5.0, 2.5819888974716112, 6.666666666666667)

In [128]:
a.sum(), a.prod()

(45, 362880)

In [75]:
a

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [129]:
a.cumsum(), a.cumprod()

(array([ 1,  3,  6, 10, 15, 21, 28, 36, 45], dtype=int32),
 array([     1,      2,      6,     24,    120,    720,   5040,  40320,
        362880], dtype=int32))

In [77]:
a = np.arange(10)
(a < 5).any() # any < 5? all

False

In [76]:
a<5

array([ True,  True,  True,  True, False, False, False, False, False], dtype=bool)

+ 创建array:arange, array, copy, empty, empty_like, eye, fromfile, fromfunction, identity, linspace, logspace, mgrid, ogrid, ones, ones_like, r , zeros, zeros_like
+ 操作:array split, column stack, concatenate, diagonal, dsplit, dstack, hsplit, hstack, item, newaxis, ravel, repeat, reshape, resize, squeeze, swapaxes, take, transpose, vsplit, vstack
+ all, any, nonzero, where
+ argmax, argmin, argsort, max, min, ptp, searchsorted, sort
+ choose, compress, cumprod, cumsum, inner, fill, imag, prod, put, putmask, real, sum
+ 基本统计:cov, mean, std, var
+ 基本线性代数:cross, dot, outer, svd, vdot
