# Numpy基础知识学习
  NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

At the core of the NumPy package, is the `ndarray object`. This encapsulates n-dimensional arrays of homogeneous data types.

Differences with python:
1. NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.
2. The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.
3. NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.
4. A growing plethora of scientific and mathematical Python-based packages are using NumPy arrays; though these typically support Python-sequence input, they convert such input to NumPy arrays prior to processing, and they often output NumPy arrays. 


- 官方网站：https://docs.scipy.org/doc/numpy/



## Section 1: be familar with numpy
1. 初始化一个ndarray
2. 矩阵相乘
3. ndarray存储的类型<br>
bool_,int_,intc,intp,int8,int16,int32,int64,uint8,uint16,uint32,uint64,float_,float16,float32,float64,complex_,complex64,complex128,

In [2]:
import numpy as np
score = np.array([[90,100,80],
                  [70,87,92],
                  [80,72,60],
                  [89,85,86],
                  [100,97,95]],dtype=np.int64)
ones = [1,1,1,1,1]
weight = np.array(ones)/len(ones)
res = np.dot(weight,score)
print(res,res.dtype)

[85.8 88.2 82.6] float64


### Array creation
1. Conversion from other Python structures (e.g., lists, tuples)
2. Intrinsic numpy array creation objects (e.g., arange, ones, zeros,linspace etc.)
3. Use of special library functions (e.g., random)
4. Reading arrays from disk, either from standard or custom formats

In [3]:
import numpy as np
print('-----       conversion      -------')
print(np.array([2,3,1,0]))
print(np.array(range(10)))
print('----- intrinsic numpy array -------')
print(np.zeros((2, 3)))
print(np.ones(10))
print(np.diag(np.ones(5)))
print('----- special library function -------')
print('//randint(1,30)')
print(np.random.randint(1,30))
print('//rand(4,3)')
print(np.random.rand(4,3))           # 生成范围在[0,1]的随机浮点型数字
print('//choice(10,3)')
print(np.random.choice(10,3))


-----       conversion      -------
[2 3 1 0]
[0 1 2 3 4 5 6 7 8 9]
----- intrinsic numpy array -------
[[0. 0. 0.]
 [0. 0. 0.]]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]
----- special library function -------
//randint(1,30)
20
//rand(4,3)
[[0.83413407 0.58561419 0.46077176]
 [0.26359379 0.8424286  0.71321036]
 [0.96891165 0.08461212 0.7957113 ]
 [0.12360638 0.64701291 0.49151608]]
//choice(10,3)
[9 2 7]


### Indexing

In [5]:
import numpy as np
a = np.array([0,1,2,3,4,5,6,7,8,9])
print(type(a))
print(a[5])
print(a[0:3])
print(a[-1])
print(a[:])

<class 'numpy.ndarray'>
5
[0 1 2]
9
[0 1 2 3 4 5 6 7 8 9]


In [32]:
import numpy as np
data = np.array([0,1,2,3,4,5,6,7,8,9,10])
num_train = 8
num_val   = 3
mask = range(num_train)
data_train = data[mask]
print('{}'.format("train data:"),data_train)
mask = range(num_train,num_train+num_val)     # range(start,stop): [start,stop)
data_val = data[mask]
print('{}'.format("val data:"),data_val)

train data: [0 1 2 3 4 5 6 7]
val data: [ 8  9 10]


### np.reshape

In [39]:
a = np.array([
              [[1,2],[3,4]],
              [[5,6],[7,8]],
              [[9,10],[11,12]]
             ])
print(a.shape)
b = np.reshape(a,(a.shape[0],-1))
print(b)

(3, 2, 2)
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


### expand_dims

In [3]:
X_train = np.array([1,2,3])
print(X_train)
X_train = np.expand_dims(X_train, axis=-1)
print(X_train)

[1 2 3]
[[1]
 [2]
 [3]]


### random.choice
- param: replace

replace=True: 可以从a 中反复选取同一个元素。 

replace=False: a 中同一个元素只能被选取一次。

In [68]:
import numpy as np
print('-----       vector random choice      -------')
data = np.array([1,2,3,4,5,6,7,8,9,10])
mask = np.random.choice(10,3)
print(mask)
print(data[mask])
print('-----       matrix random choice      -------')
data = np.array([[0,1,1],
                [0,2,2],
                [3,3,0],
                [4,0,4]])
mask = np.random.choice(data.shape[0],2)
print(mask)
print(data[mask])


-----       vector random choice      -------
[1 7 4]
[2 8 5]
-----       matrix random choice      -------
[1 2]
[[0 2 2]
 [3 3 0]]
-----       random choice:param replace     -------
[1 5 3]
[2 6 4]


### matrix运算：add/sub

In [52]:
import numpy as np
print('-----       number + matrix      -------')
a = np.array([[0,1,1],
              [0,2,2],
              [3,3,0],
              [4,0,4]])
b = 4
print(a+b)
print('-----       vector + matrix 按行加     -------')
b = np.array([1,1,1])
print(a+b)
print('-----       vector + matrix 按列加     -------')
b = np.array([1,1,1,1]).reshape(4,1)
print(a+b)

-----       number + matrix      -------
[[4 5 5]
 [4 6 6]
 [7 7 4]
 [8 4 8]]
-----       vector + matrix      -------
[[1 2 2]
 [1 3 3]
 [4 4 1]
 [5 1 5]]
-----       vector + matrix      -------
[[1 2 2]
 [1 3 3]
 [4 4 1]
 [5 1 5]]


### matrix运算：mul/div

In [1]:
import numpy as np
print('-----       vector * matrix      -------')
a = np.array([[0,1,1],
              [0,2,2],
              [3,3,0],
              [4,0,4]])
b = np.array([2,3,4,5]).reshape(4,1)
print(a*b)

-----       vector * matrix      -------
[[ 0  2  2]
 [ 0  6  6]
 [12 12  0]
 [20  0 20]]


In [5]:
a = np.array([[1,1,1],
              [2,2,2],
              [3,3,3]])
m = a.dot(a)
print(m)
print(a*a)
b = np.array([5,5,5])
print(a*b)
print(a+b)

[[ 6  6  6]
 [12 12 12]
 [18 18 18]]
[[1 1 1]
 [4 4 4]
 [9 9 9]]
[[ 5  5  5]
 [10 10 10]
 [15 15 15]]
[[6 6 6]
 [7 7 7]
 [8 8 8]]


## section 2:


### Mask
1. 从二维矩阵中提取，每行提取一个值。
2. 修改特定位置的值

In [56]:
import numpy as np
a = np.array([[0,2,3],
              [0,5,6],
              [8,7,0],
              [7,0,1]])
mask_col = np.array([0,0,2,1])      # 存放提取位置的col-index
mask_row = np.arange(4)             # 存放提取位置的row-index
mask     = [mask_row,mask_col]
res = a[mask]
print(res)

a[mask] = 1                         # 可以通过mask修改矩阵特定位置的值
print(a)

[0 0 0 0]
[[1 2 3]
 [1 5 6]
 [8 7 1]
 [7 1 1]]


### Mask
1. 对所有符合条件的位置，重新赋值

In [18]:
import numpy as np
a = np.array([[0,2,3],
              [0,5,6],
              [8,7,0],
              [7,0,1]])
mask = a > 2
print(mask)
a[mask] = 0
print(a)

[[False False  True]
 [False  True  True]
 [ True  True False]
 [ True False False]]
[[0 2 0]
 [0 0 0]
 [0 0 0]
 [0 0 1]]


### sum

In [55]:
a = np.array([[0,1,1],
              [0,2,2],
              [3,3,0],
              [4,0,4]])
b = np.sum(a)
print(b)
c = np.sum(a,1)    #各列相加，结果的维数与行数一致
print(c)
d = np.sum(a,0)    #各行相加，结果的维数与列数一致
print(d)

20
[2 4 6 8]
[7 6 7]


### argmax
numpy.argmax(a, axis=None, out=None)

Parameters:	

- a : array_like Input array.
- axis : int, optional By default, the index is into the flattened array, otherwise along the specified axis.

- out : array, optional。If provided, the result will be inserted into this array. It should be of the appropriate shape and dtype.

- Returns:	
index_array : ndarray of ints Array of indices into the array. It has the same shape as a.shape with the dimension along axis removed.

In [76]:
a = np.array([[0,1,1],
              [0,2,2],
              [3,3,0],
              [4,0,4]])
index = np.argmax(a,axis=0)  # 按行比较
print(index)
print(a[index,np.arange(3)])   

index = np.argmax(a,axis=1)  # 按列比较
print(index)


[4 3 4]
[1 1 0 0]


#### maximum

In [2]:
import numpy as np
a = np.array([[0,1,1],
              [0,2,2],
              [3,3,0],
              [4,0,4]])
print(np.maximum(2,a))
print(a[a>0])
print(a>0)

[[2 2 2]
 [2 2 2]
 [3 3 2]
 [4 2 4]]
[1 1 2 2 3 3 4 4]
[[False  True  True]
 [False  True  True]
 [ True  True False]
 [ True False  True]]


### max

In [17]:
import numpy as np
a = np.array([[0,1,1],
              [0,2,2],
              [3,3,0],
              [4,0,4]])
print(np.max(a))
print(np.max(a,axis=1)) #沿着轴1，即列的方向做比较
print(np.max(a,axis=0)) #沿着轴0，即行的方向做比较

4
[1 2 3 4]
[4 3 4]


### 矩阵对角问题

In [None]:
# 创建对角阵
a = np.array([1,2,3,4])
b = np.diag(a)
print(b)
# 提取对角元素
c = np.diag(b)
print(c)


### 矩阵合并与添加
- hstack 多添加几列
- vstack 多添加几行

In [12]:
import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[1,2,3],[4,5,6],[7,8,9]])

mask = np.random.choice(3,3,replace=False)
print(mask)
c = np.vstack((a[mask],b[mask]))
print(c)

[1 0 2]
[[4 5 6]
 [1 2 3]
 [7 8 9]
 [4 5 6]
 [1 2 3]
 [7 8 9]]


In [17]:
import numpy as np
a = np.zeros(10)
b = np.ones(10)
c = np.hstack((a,b))
print(c)
print(c==1)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[False False False False False False False False False False  True  True
  True  True  True  True  True  True  True  True]


## 求满足特定条件元素的索引


### 求每行/列最大值索引
numpy.argmax(a, axis=None, out=None)

Returns the indices of the maximum values along an axis.

In [6]:
import numpy as np
a = np.array([[1,2,3,4],
              [5,6,8,8],
              [9,18,11,12]])
# 如果不指定轴，则按照将矩阵平铺成向量求最大值索引
print(np.argmax(a)) 
# 第一轴进行比较（按行比较）获取每列最大值的行索引
idxs = np.argmax(a,0)  
print(idxs)
# 第二轴进行比较（按列比较）获取每行最大值的列索引
idxs = np.argmax(a,1)
print(idxs)

9
[2 2 2 2]
[3 2 1]


## 练习

### 维度扩展

- 使用np.reshape()

In [5]:
import numpy as np
a = np.array([[1,2],
              [3,4]])
print(a.shape)
#b = np.reshape(a,(1,a.shape[0],a.shape[1]))
b = np.reshape(a,(1,)+a.shape)
print(b)
print(b.shape)

(2, 2)
[[[1 2]
  [3 4]]]
(1, 2, 2)


- 使用np.newaxis()

In [13]:
import numpy as np
a = np.array([[1,2],
              [3,4]])
print(a.shape)
#b = a[np.newaxis,:,:]  #ok
#b = a[np.newaxis,:]    #ok
b = a[np.newaxis]
print(b)
print(b.shape)

(2, 2)
[[[1 2]
  [3 4]]]
(1, 2, 2)


In [14]:
b = a[:,np.newaxis,:]
print(b)
print(b.shape)

[[[1 2]]

 [[3 4]]]
(2, 1, 2)


# random
np.random类

### numpy.random.seed()
seed()可以保证了在同一次运行中连续使用多次random得到不同的结果，但重新运行可以得到与上次运行同样的结果。

In [4]:
import numpy as np
np.random.seed(231)
for i in range(3):
    x = np.random.randn(2,3)
    print("--------------")
    print(x)

--------------
[[ 0.41794341  1.39710028 -1.78590431]
 [-0.70882773 -0.07472532 -0.77501677]]
--------------
[[-0.1497979   1.86172902 -1.4255293 ]
 [-0.3763567  -0.34227539  0.29490764]]
--------------
[[-0.83732373  0.95218767  1.32931659]
 [ 0.52465245 -0.14809998  0.88953195]]


In [20]:
c = np.random.randint(1,5,size=(3,5,5))
print(c.shape)
print(c)
print(c[0,:,:])

(3, 5, 5)
[[[1 4 3 3 2]
  [3 4 2 4 2]
  [2 1 1 1 2]
  [4 3 3 1 3]
  [2 2 3 2 2]]

 [[2 1 4 1 2]
  [3 2 1 4 4]
  [3 1 1 2 4]
  [2 4 4 1 2]
  [4 4 4 4 2]]

 [[4 1 4 3 4]
  [3 4 3 2 3]
  [2 3 1 4 1]
  [4 2 1 1 2]
  [3 2 2 2 3]]]
[[2 1 4 1 2]
 [3 2 1 4 4]
 [3 1 1 2 4]
 [2 4 4 1 2]
 [4 4 4 4 2]]


### random.rand(d0,d1,d2...dn)
创建指定维度的矩阵满足`[0,1)`均匀分布。

In [8]:
import numpy as np
x = np.ones((2,3))
m = np.random.rand(2,3)
print(m)
p = m<0.4
print(p)
print(x*p)

[[0.64597798 0.04273061 0.61705425]
 [0.26200791 0.71747807 0.99686535]]
[[False  True False]
 [ True False False]]
[[0. 1. 0.]
 [1. 0. 0.]]


### 高斯矩阵
- np.random.randn
输出的是均值=0，标准差为1的高斯矩阵，
- np.random.nomal
可以指定均值，标准差，维度大小

In [14]:
import numpy as np
a = np.random.randn(2,3)*2
b = np.random.normal(0,2,(2,3))
print(a)
print(b)

[[ 1.38025586  4.28970843 -0.37593854]
 [-0.25863409  0.84407187  1.71028294]]
[[ 1.83443816  0.59560862  1.04718473]
 [ 4.2749247  -1.73347933 -0.98769928]]


In [12]:
import numpy as np
a = np.random.randn(1000) * 2
print("mean:{}".format(np.mean(a)))
print("std:{}".format(np.std(a)))



mean:-0.06620128898604043
std:1.980867119218788


In [None]:
http://www.cnblogs.com/sench/p/9683905.html

### np.pad

# Save or load numpy

In [4]:
a = np.random.randn(2,3)*2
np.save("filename.npy",a) 

In [5]:
b = np.load("filename.npy")
b

array([[-1.47295776,  1.68217604, -1.39034703],
       [ 1.57932087,  1.76822766, -1.00277456]])

In [6]:
import numpy as np
x = np.array(12)
print(x.ndim)
x = np.array([1,2,3,4])
print(x.ndim)

0
1
