## Numpy 常用指令练习

- 学习目标
1. numpy 数组创建
2. numpy 数组属性
3. numpy 数组操作
4. numpy 数组运算
5. numpy 数组切片
6. numpy 数组拼接
7. numpy 数组排序
8. numpy 数组统计
9. numpy 数组IO
10. numpy 数组数据类型

###  Creating Arrays

In [2]:
import numpy as np

In [3]:
a = np.array([1,2,3])
print(a)

[1 2 3]


In [4]:
b = np.array([(1.5,2,3), (4,5,6)], dtype = float)
print(b[0,:])
print(b[:,1])

[1.5 2.  3. ]
[2. 5.]


In [5]:
c = np.array([[(1.5,2,3), (4,5,6)], [(3,2,1), (4,5,6)]], 
                 dtype = float)
print(c[0,:,:])
print(c[:,0,:])
print(c[:,:,0])

[[1.5 2.  3. ]
 [4.  5.  6. ]]
[[1.5 2.  3. ]
 [3.  2.  1. ]]
[[1.5 4. ]
 [3.  4. ]]


###  Initial Placeholders

In [6]:
# 创建一个全为0的数组
np.zeros((3,4))
# 创建一个全为1的数组，dtype可以指定数据类型
np.ones((2,3,4),dtype=np.int16)
# np.arange 创建一个一维数组，指定起始值，终止值，步长
d = np.arange(10,25,5)
# np.linspace 创建一个一维数组，指定起始值，终止值，元素个数，endpoint表示是否包含终止值
d1 = np.linspace(0,2,9)    
e = np.full((2,2),7)            
f = np.eye(2)                  
g = np.random.random((2,2))    
# np.empty 创建一个未初始化的数组，他是内存的随机值
h = np.empty((3,2))

###  I/O     Saving & Loading On Disk

In [7]:
np.save('my_array', a)
# npz 与 npy的区别是npz可以保存多个数组
np.savez('array.npz', a, b)
# 加载 npz
npzfile = np.load('array.npz')
a,b = npzfile['arr_0'], npzfile['arr_1']
print(a)
print(b)

[1 2 3]
[[1.5 2.  3. ]
 [4.  5.  6. ]]


### I/O Saving & Loading Text Files

In [8]:
np.savetxt("myfile.txt", a, delimiter=",")
aa = np.loadtxt("myfile.txt", delimiter=",")
print(aa)
# 创建csv文件,delimiter指定分隔符, 数据见不换行
np.savetxt("test.csv", b, delimiter=",", fmt="%d")
# 加载csv文件
b = np.loadtxt('test.csv', delimiter=',')
print

[1. 2. 3.]


<function print>

###  Data Types

numpy 的数据类型包括
- np.int64 有符号64位整数
- np.float32 标准单精度浮点数
- np.complex 复数
- np.bool 布尔值
- np.object Python对象
- np.string_ 固定长度的字符串
- np.unicode_ 固定长度的Unicode字符串
注：string_ 和unicode_ 在未来的版本中可能会被移除

### Array Attributes

In [9]:
print(a) 
print(a.shape) # 查看数组形状
print(a.ndim) # 查看数组维度
print(a.size)  # 查看数组元素个数
print(a.dtype) # 查看数组数据类型
print(a.dtype.name) # 查看数组数据类型名称
# 数据类型转换
print(a.astype(float))

[1 2 3]
(3,)
1
3
int64
int64
[1. 2. 3.]


### Asking for Help

In [10]:
# np.info() 可以查看numpy的文档
print(np.info(np.ndarray.dtype))

Data-type of the array's elements.


    Setting ``arr.dtype`` is discouraged and may be deprecated in the
    future.  Setting will replace the ``dtype`` without modifying the
    memory (see also `ndarray.view` and `ndarray.astype`).

Parameters
----------
None

Returns
-------
d : numpy dtype object

See Also
--------
ndarray.astype : Cast the values contained in the array to a new data-type.
ndarray.view : Create a view of the same data but a different data-type.
numpy.dtype

Examples
--------
>>> x
array([[0, 1],
       [2, 3]])
>>> x.dtype
dtype('int32')
>>> type(x.dtype)
<type 'numpy.dtype'>
None


In [11]:
print(np.info(np.einsum))

einsum(subscripts, *operands, out=None, dtype=None, order='K',
       casting='safe', optimize=False)

Evaluates the Einstein summation convention on the operands.

Using the Einstein summation convention, many common multi-dimensional,
linear algebraic array operations can be represented in a simple fashion.
In *implicit* mode `einsum` computes these values.

In *explicit* mode, `einsum` provides further flexibility to compute
other array operations that might not be considered classical Einstein
summation operations, by disabling, or forcing summation over specified
subscript labels.

See the notes and examples for clarification.

Parameters
----------
subscripts : str
    Specifies the subscripts for summation as comma separated list of
    subscript labels. An implicit (classical Einstein summation)
    calculation is performed unless the explicit indicator '->' is
    included as well as subscript labels of the precise output form.
operands : list of array_like
    These are the a

### Mathematical Functions

In [12]:
a,b

(array([1, 2, 3]),
 array([[1., 2., 3.],
        [4., 5., 6.]]))

In [13]:
g=a-b
g


array([[ 0.,  0.,  0.],
       [-3., -3., -3.]])

In [14]:
np.subtract(a,b)

array([[ 0.,  0.,  0.],
       [-3., -3., -3.]])

In [15]:
a+b

array([[2., 4., 6.],
       [5., 7., 9.]])

In [16]:
np.add(a,b)

array([[2., 4., 6.],
       [5., 7., 9.]])

In [17]:
a/b

array([[1.  , 1.  , 1.  ],
       [0.25, 0.4 , 0.5 ]])

In [18]:
np.divide(a,b)

array([[1.  , 1.  , 1.  ],
       [0.25, 0.4 , 0.5 ]])

In [19]:
a*b

array([[ 1.,  4.,  9.],
       [ 4., 10., 18.]])

In [20]:
np.multiply(a,b)

array([[ 1.,  4.,  9.],
       [ 4., 10., 18.]])

In [21]:
np.exp(a)

array([ 2.71828183,  7.3890561 , 20.08553692])

In [22]:
np.sqrt(a)

array([1.        , 1.41421356, 1.73205081])

In [23]:
f,e

(array([[1., 0.],
        [0., 1.]]),
 array([[7, 7],
        [7, 7]]))

In [24]:
# dot 矩阵乘法
e.dot(f)

array([[7., 7.],
       [7., 7.]])

### Aggregate Functions

In [25]:
print(c)
a.sum() # 求和
# print(c.min(axis=2))  # 求最小值
a.max()  # 求最大值
print(a.mean()) # 求平均值
a.std()  # 求标准差
a.var()  # 求方差
np.median(a) # 求中位数
print(b)
b.cumsum(axis=1) # 求累积和

[[[1.5 2.  3. ]
  [4.  5.  6. ]]

 [[3.  2.  1. ]
  [4.  5.  6. ]]]
2.0
[[1. 2. 3.]
 [4. 5. 6.]]


array([[ 1.,  3.,  6.],
       [ 4.,  9., 15.]])

In [26]:
# 求相关系数，1表示完全正相关，-1表示完全负相关，0表示无相关
print(a,b)
print(np.corrcoef(a,b)) # 求相关系数
# 求协方差，协方差是衡量两个变量之间线性相关程度的一个统计量
# 具体过程是：先将a,b组合成一个新的矩阵，然后求这个矩阵的协方差矩阵

[1 2 3] [[1. 2. 3.]
 [4. 5. 6.]]
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


###  Copying Arrays & Sorting Arrays

In [27]:
print(a)
# 创建一个视图，视图与原数组共享数据，修改视图会影响原数组，修改原数组也会影响视图
h = a.view() 
# 创建一个副本，副本与原数组不共享数据，修改副本不会影响原数组，修改原数组也不会影响副本
a1 = np.copy(a)
a2 = a.copy()
a[0] = 100
# 查看h,a1,a2
print(h,a1,a2)

[1 2 3]
[100   2   3] [1 2 3] [1 2 3]


In [28]:
# 排序
a = [1,3,2]
a.sort()
a

[1, 2, 3]

###  Subsetting, Slicing, Indexing

In [29]:
a[2],b[1,2]

(3, np.float64(6.0))

In [30]:
b[[1, 0, 1, 0]][:,[0,1,2,0]]

array([[4., 5., 6., 4.],
       [1., 2., 3., 1.],
       [4., 5., 6., 4.],
       [1., 2., 3., 1.]])

### Array Manipulation

In [31]:
print(np.transpose(b))

[[1. 4.]
 [2. 5.]
 [3. 6.]]


In [32]:
b.T

array([[1., 4.],
       [2., 5.],
       [3., 6.]])

In [33]:
print(b.ravel())

[1. 2. 3. 4. 5. 6.]


In [34]:
print(b.T.ravel())

[1. 4. 2. 5. 3. 6.]


In [35]:
# resize
b.resize((2,3))
b

array([[1., 2., 3.],
       [4., 5., 6.]])

In [36]:
b.resize((3,2))
b.T

array([[1., 3., 5.],
       [2., 4., 6.]])

In [37]:
np.append(a,[4,5,6])

array([1, 2, 3, 4, 5, 6])

In [38]:
np.insert(a,1,5)

array([1, 5, 2, 3])

In [39]:
np.delete(a,[1])

array([1, 3])

In [45]:
at = np.array([[1,2],[3,4]])
bt = np.array([[5,6],[7,8]])
print(at, bt)

[[1 2]
 [3 4]] [[5 6]
 [7 8]]


In [None]:
np.r_[at,bt]
# vstack 与 r_ 的区别是，vstack是按行拼接，r_是按列拼接

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

In [47]:
np.column_stack((at,bt))

array([[1, 2, 5, 6],
       [3, 4, 7, 8]])

In [48]:
np.vstack((at,bt))

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

In [50]:
np.c_[at,bt]

array([[1, 2, 5, 6],
       [3, 4, 7, 8]])

In [56]:
b

array([[1., 2.],
       [3., 4.],
       [5., 6.]])

In [55]:
np.vsplit(b, 3)

[array([[1., 2.]]), array([[3., 4.]]), array([[5., 6.]])]

In [64]:
c

array([[[1.5, 2. , 3. ],
        [4. , 5. , 6. ]],

       [[3. , 2. , 1. ],
        [4. , 5. , 6. ]]])

In [70]:
np.hsplit(c, 2)

[array([[[1.5, 2. , 3. ]],
 
        [[3. , 2. , 1. ]]]),
 array([[[4., 5., 6.]],
 
        [[4., 5., 6.]]])]

In [None]:
np.hsplit(c, (0, 1, 2))

[array([], shape=(2, 0, 3), dtype=float64),
 array([[[1.5, 2. , 3. ]],
 
        [[3. , 2. , 1. ]]]),
 array([[[4., 5., 6.]],
 
        [[4., 5., 6.]]]),
 array([], shape=(2, 0, 3), dtype=float64)]

In [72]:
import numpy as np

c = np.arange(24).reshape(2, 3, 4)
print("原数组 c:")
print(c)

原数组 c:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


In [74]:
result = np.hsplit(c,(1,2) )
print("np.hsplit(c, 1) 的结果:")
for arr in result:
    print(arr)

np.hsplit(c, 1) 的结果:
[[[ 0  1  2  3]]

 [[12 13 14 15]]]
[[[ 4  5  6  7]]

 [[16 17 18 19]]]
[[[ 8  9 10 11]]

 [[20 21 22 23]]]
