## Numpy的优点

In [1]:

"""
1. 为何要使用 NumPy:
    NumPy 的速度比 Python 列表的速度快了好几百倍。
    这是因为 NumPy 数组本身能节省内存，并且 NumPy 在执行算术、统计和线性代数运算时采用了优化算法。
"""

import numpy as np
import time
x = np.random.random(100000000)

start = time.time()
sum(x)/len(x)
end = time.time()
print(end - start)

9.646975040435791


In [2]:
start = time.time()
np.mean(x)
end = time.time()
print(end - start)


0.09040713310241699


## 新建和保存

In [18]:
"""
创建 NumPy ndarray
"""
y = np.array([1,2,3,4,5,6,7])
print(y)
print(type(y)) #n维数组

[1 2 3 4 5 6 7]
<class 'numpy.ndarray'>


In [19]:
y.dtype

dtype('int64')

In [23]:
z = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
print(z)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [24]:
z.shape

(4, 3)

In [25]:
z.size

12

In [26]:
x = np.array(['hello','world'])
print(x)

['hello' 'world']


In [27]:
print(x.shape)

(2,)


In [28]:
print(type(x))

<class 'numpy.ndarray'>


In [29]:
print(x.dtype)

<U5


In [30]:
"""
# 向上转型
"""

z = np.array([1, 2.5, 4]) 
print(z.dtype)

float64


In [31]:
d = np.array([1,2,3])
print(d.dtype)

int64


In [34]:
"""
    元素数据类型 (dtype)
当 NumPy 创建 ndarray 时，它会自动根据用于创建 ndarray 的元素的类型为其分配 dtype。
"""

x = np.array([2.4, 2.3, 4.5], dtype = np.int64)
print(x.dtype)

int64


In [35]:
print(x)


[2 2 4]


In [36]:
x = np.array([12,19,20,21])
print(x)

[12 19 20 21]


In [37]:
np.save('my_array', x)


In [40]:
y = np.load('my_array.npy')
print(y)

[12 19 20 21]


In [None]:
"""
使用内置函数创建 ndarray
"""

In [49]:
# 创建一个具有指定形状的 ndarray，其中的元素全是 0
x = np.zeros((3,4))
print(x)
print(type(x))
print(x.dtype) 

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
<class 'numpy.ndarray'>
float64


In [57]:
# 创建一个具有指定形状的 ndarray，其中的元素全是 1
x = np.ones((3,2))
#x = np.ones((3,2), dtype = np.int64)
print(x)
print(type(x))
print(x.shape)
print(x.dtype) # np.ones() 函数也默认地创建一个 dtype 为 float64 的数组。可以使用关键字 dtype 更改数据类型。

[[1. 1.]
 [1. 1.]
 [1. 1.]]
<class 'numpy.ndarray'>
(3, 2)
float64


In [5]:
# np.full() 函数默认地创建一个数据类型和用于填充数组的常数值相同的数组。 
x = np.full((2,3),5) 
# np.full() 函数默认地创建一个数据类型和用于填充数组的常数值相同的数组。你可以使用关键字 dtype 更改数据类型。
#x = np.full((2,3),5, dtype = np.float64) 
print(x)
print(type(x))
print(x.dtype)

[[5. 5. 5.]
 [5. 5. 5.]]
<class 'numpy.ndarray'>
float64


In [8]:
# 创建一个对应于单位矩阵的方形 N x Nndarray
X = np.eye(5)

print(X)
print(type(X))
print(X.dtype) # np.eye() 函数也默认地创建一个 dtype 为 float64 的数组。你可以使用关键字 dtype 更改数据类型。

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]
<class 'numpy.ndarray'>
float64


In [9]:
# 创建对角矩阵。对角矩阵是仅在主对角线上有值的方形矩阵
X = np.diag([10,20,30,50])

print(X)
print(type(X))
print(X.dtype)

[[10  0  0  0]
 [ 0 20  0  0]
 [ 0  0 30  0]
 [ 0  0  0 50]]
<class 'numpy.ndarray'>
int64


In [10]:
x = np.arange(10)

print(x)
print(type(x))
print(x.dtype)

[0 1 2 3 4 5 6 7 8 9]
<class 'numpy.ndarray'>
int64


In [11]:
x = np.arange(4,10)

print(x)
print(type(x))
print(x.dtype)

[4 5 6 7 8 9]
<class 'numpy.ndarray'>
int64


In [15]:
# 将创建一个秩为 1 的 ndarray，其中包含位于半开区间 [start, stop) 内并均匀分布的值，step 表示两个相邻值之间的差。
x = np.arange(1,14,3)

print(x)
print(x.shape)
print(type(x))
print(x.dtype)

# x 具有在 1 和 13 之间的序列整数，但是所有相邻值之间的差为 3。

[ 1  4  7 10 13]
(5,)
<class 'numpy.ndarray'>
int64


In [29]:
"""
np.arange() 函数允许间隔为非整数，例如 0.3，但是由于浮点数精度有限，输出通常不一致。
因此，如果需要非整数间隔，通常建议使用函数 np.linspace()。
"""
x = np.linspace(0,25,10)

print(x)
print(x.shape)
print(type(x))
print(x.dtype)


#返回一个 ndarray，其中包含 10 个在闭区间 [0, 25] 内均匀分布的元素。
#还可以看出，在此示例中，起始和结束点 0 和 25 都包含在内。 

[ 0.          2.77777778  5.55555556  8.33333333 11.11111111 13.88888889
 16.66666667 19.44444444 22.22222222 25.        ]
(10,)
<class 'numpy.ndarray'>
float64


In [30]:
# 可以不包含区间的结束点（就像 np.arange() 函数一样），方法是在 np.linspace() 函数中将关键字 endpoint 设为 False

x = np.linspace(0,25,10, endpoint = False)

print(x)
print(x.shape)
print(type(x))
print(x.dtype)


[ 0.   2.5  5.   7.5 10.  12.5 15.  17.5 20.  22.5]
(10,)
<class 'numpy.ndarray'>
float64


In [21]:

x = np.arange(20)
print('Original x = ', x)

x = np.reshape(x, (4,5))
print('Reshaped x = \n', x)

print()
print('x has dimensions:', x.shape)
print('x is an object of type:', type(x))
print('The elements in x are of type:', x.dtype) 

Original x =  [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
Reshaped x = 
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]

x has dimensions: (4, 5)
x is an object of type: <class 'numpy.ndarray'>
The elements in x are of type: int64


In [23]:
Y = np.arange(20).reshape(4, 5)

print('Y = \n', Y)
print()

# We print information about Y
print('Y has dimensions:', Y.shape)
print('Y is an object of type:', type(Y))
print('The elements in Y are of type:', Y.dtype) 

Y = 
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]

Y has dimensions: (4, 5)
Y is an object of type: <class 'numpy.ndarray'>
The elements in Y are of type: int64


In [None]:
"""

注意，当我们将 reshape() 当做方法使用时，它应用为 ndarray.reshape(new_shape)。
这样会将 ndarray 转换为指定形状 new_shape。
和之前一样，请注意，new_shape 应该与 ndarray 中的元素数量保持一致。

在上述示例中，函数 np.arange(20) 创建了一个 ndarray 并当做将被 reshape() 方法调整形状的 ndarray。
因此，如果将 reshape() 当做方法使用，我们不需要将 ndarray 当做参数传递给 reshape() 函数，只需传递 new_shape 参数。
"""




In [24]:
X = np.linspace(0,50,10, endpoint=False).reshape(5,2)

# We print X
print()
print('X = \n', X)
print()

# We print information about X
print('X has dimensions:', X.shape)
print('X is an object of type:', type(X))
print('The elements in X are of type:', X.dtype)


X = 
 [[ 0.  5.]
 [10. 15.]
 [20. 25.]
 [30. 35.]
 [40. 45.]]

X has dimensions: (5, 2)
X is an object of type: <class 'numpy.ndarray'>
The elements in X are of type: float64


In [33]:
"""
随机 ndarray

先使用 np.random.random(shape) 函数创建具有给定形状的 ndarray，其中包含位于半开区间 [0.0, 1.0) 内的随机浮点数。

"""

X = np.random.random((3,3))

# We print X
print()
print('X = \n', X)
print()

# We print information about X
print('X has dimensions:', X.shape)
print('X is an object of type:', type(X))
print('The elements in x are of type:', X.dtype)



X = 
 [[0.08177358 0.72967485 0.82369626]
 [0.2676428  0.94889624 0.74971608]
 [0.94745619 0.78699773 0.41310231]]

X has dimensions: (3, 3)
X is an object of type: <class 'numpy.ndarray'>
The elements in x are of type: float64


In [34]:
X = np.random.randint(4,15,size=(3,2)) 
#  会创建一个具有给定形状的 ndarray，其中包含在半开区间 [start, stop) 内的随机整数。

# We print X
print()
print('X = \n', X)
print()

# We print information about X
print('X has dimensions:', X.shape)
print('X is an object of type:', type(X))
print('The elements in X are of type:', X.dtype)


X = 
 [[11  7]
 [10  9]
 [ 7 10]]

X has dimensions: (3, 2)
X is an object of type: <class 'numpy.ndarray'>
The elements in X are of type: int64


In [35]:
# 创建一个 1,000 x 1,000 ndarray，其中包含从正态分布（均值为 0，标准差为 0.1）中随机抽样的浮点数。
X = np.random.normal(0, 0.1, size=(1000,1000))

# We print X
print()
print('X = \n', X)
print()

# We print information about X
print('X has dimensions:', X.shape)
print('X is an object of type:', type(X))
print('The elements in X are of type:', X.dtype)
print('The elements in X have a mean of:', X.mean())
print('The maximum value in X is:', X.max())
print('The minimum value in X is:', X.min())
print('X has', (X < 0).sum(), 'negative numbers')
print('X has', (X > 0).sum(), 'positive numbers')


# ndarray 中的随机数字的平均值接近 0，X 中的最大值和最小值与 0（平均值）保持对称，正数和负数的数量很接近。


X = 
 [[ 2.04322594e-02 -4.77114470e-02  4.50307948e-02 ... -1.62053618e-02
  -4.75238448e-02 -6.27606979e-02]
 [ 2.24916098e-01 -4.32345427e-02 -3.91974940e-02 ...  2.14089498e-01
  -1.10039800e-01 -7.35491039e-02]
 [ 2.61725718e-03  7.29816775e-03 -1.99337520e-02 ... -1.43015447e-04
   7.75239575e-02  9.74718946e-02]
 ...
 [-4.77411310e-02 -1.00541162e-01  8.92454883e-02 ...  8.25553482e-02
   1.45010197e-02  1.54546332e-02]
 [-1.76609446e-01  3.25784791e-02 -3.82888247e-02 ... -5.43586508e-03
  -7.39444520e-02 -7.76347365e-02]
 [ 2.77491420e-02  2.42577193e-02 -1.02186542e-01 ...  3.18175416e-02
  -8.62335531e-02 -4.40930385e-02]]

X has dimensions: (1000, 1000)
X is an object of type: <class 'numpy.ndarray'>
The elements in X are of type: float64
The elements in X have a mean of: -1.874033782552222e-06
The maximum value in X is: 0.49924200449033745
The minimum value in X is: -0.46060595959025785
X has 500140 negative numbers
X has 499860 positive numbers


## 访问

In [59]:
# 如何访问秩为 1 的 ndarray 中的元素：
x = np.array([1, 2, 3, 4, 5])

# We print x
print()
print('x = ', x)
print()

# Let's access some elements with positive indices
print('This is First Element in x, x[0] :', x[0]) 
print('This is Second Element in x, x[1] :', x[1])
print('This is Fifth (Last) Element in x, x[4]:', x[4])
print()

# Let's access the same elements with negative indices
print('This is First Element in x, x[-5]:', x[-5])
print('This is Second Element in x, x[-4] :', x[-4])
print('This is Fifth (Last) Element in x, x[-1]:', x[-1])


x =  [1 2 3 4 5]

This is First Element in x, x[0] : 1
This is Second Element in x, x[1] : 2
This is Fifth (Last) Element in x, x[4]: 5

This is First Element in x, x[-5]: 1
This is Second Element in x, x[-4] : 2
This is Fifth (Last) Element in x, x[-1]: 5


In [61]:
# We create a 3 x 3 rank 2 ndarray that contains integers from 1 to 9
X = np.array([[1,2,3],[4,5,6],[7,8,9]])

# We print X
print()
print('X = \n', X)
print()

# Let's access some elements in X
print('This is (0,0) Element in X:', X[0,0]) # 要访问秩为 2 的 ndarray 中的元素，我们需要提供两个索引，格式为 [row, column]
print('This is (0,1) Element in X:', X[0,1])
print('This is (2,2) Element in X:', X[2,2])


X = 
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

This is (0,0) Element in X: 1
This is (0,1) Element in X: 2
This is (2,2) Element in X: 9


## 修改

In [60]:
#修改秩为 1 的 ndarray 中的元素。方法是访问要更改的元素，然后使用 = 符号分配新的值：

# We create a rank 1 ndarray that contains integers from 1 to 5
x = np.array([1, 2, 3, 4, 5])

# We print the original x
print()
print('Original:\n x = ', x)
print()

# We change the fourth element in x from 4 to 20
x[3] = 20

# We print x after it was modified 
print('Modified:\n x = ', x)



Original:
 x =  [1 2 3 4 5]

Modified:
 x =  [ 1  2  3 20  5]


In [41]:
# 可以像针对秩为 1 的 ndarray 一样修改秩为 2 的 ndarray 中的元素

# We create a 3 x 3 rank 2 ndarray that contains integers from 1 to 9
X = np.array([[1,2,3],[4,5,6],[7,8,9]])

# We print the original x
print('Original:\n X = \n', X)
print()

# We change the (0,0) element in X from 1 to 20
X[0,0] = 20

# We print X after it was modified 
print('Modified:\n X = \n', X)

Original:
 X = 
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

Modified:
 X = 
 [[20  2  3]
 [ 4  5  6]
 [ 7  8  9]]


## 删除

In [47]:
# 添加元素及删除元素
# 可以使用 np.delete(ndarray, elements, axis) 函数删除元素。
# 此函数会沿着指定的轴从给定 ndarray 中删除给定的元素列表。
# 对于秩为 1 的 ndarray，不需要使用关键字 axis。
# 对于秩为 2 的 ndarray，axis = 0 表示选择行，axis = 1 表示选择列。

# We create a rank 1 ndarray 
x = np.array([1, 2, 3, 4, 5])

# We print x
print('Original x = ', x)

# We delete the first and last element of x
x = np.delete(x, [0,4])

# We print x with the first and last element deleted
print('Modified x = ', x)


Original x =  [1 2 3 4 5]
Modified x =  [2 3 4]


In [69]:
# 对于秩为 1 的 ndarray，不需要使用关键字 axis。
# 对于秩为 2 的 ndarray，axis = 0 表示选择行，axis = 1 表示选择列。

# We create a rank 2 ndarray
Y = np.array([[1,2,3],[4,5,6],[7,8,9]])

# We print Y
print('Original Y = \n', Y)

# We delete the first row of y
w = np.delete(Y, 0, axis=0)

# We delete the first and last column of y
v = np.delete(Y, [0,2], axis=1)

# We print w
print('w = \n', w)

# We print v
print('v = \n', v)

# 注意，当我们将行或列附加到秩为 2 的 ndarray 中时，行或列的形状必须正确，以与秩为 2 的 ndarray 的形状相符。

Original Y = 
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
w = 
 [[4 5 6]
 [7 8 9]]
v = 
 [[2]
 [5]
 [8]]


## 插入

In [68]:
# 向 ndarray 中附加值
# 我们可以使用 np.append(ndarray, elements, axis) 函数向 ndarray 中附加值。
# 该函数会将给定的元素列表沿着指定的轴附加到 ndarray 中

# We create a rank 1 ndarray 
x = np.array([1, 2, 3, 4, 5])

# We print x
print('Original x = ', x)

# We append the integer 6 to x
x = np.append(x, 6)

# We print x
print()
print('x = ', x)

# We append the integer 7 and 8 to x
x = np.append(x, [7, 8])

# We print x
print()
print('x = ', x)


Original x =  [1 2 3 4 5]

x =  [1 2 3 4 5 6]

x =  [1 2 3 4 5 6 7 8]


In [71]:
# We create a rank 2 ndarray 
Y = np.array([[1,2,3],[4,5,6]])

# We print Y
print('Original Y = \n', Y)

# We append a new row containing 7,8,9 to y
v = np.append(Y, [[7,8,9]], axis=0)

# We print v
print()
print('v = \n', v)

Original Y = 
 [[1 2 3]
 [4 5 6]]

v = 
 [[1 2 3]
 [4 5 6]
 [7 8 9]]


In [76]:
print('Original Y = \n', Y)
# We append a new column containing 9 and 10 to y
q = np.append(Y,[[9],[10]], axis=1)

# We print q
print()
print('q = \n', q)

Original Y = 
 [[1 2 3]
 [4 5 6]]

q = 
 [[ 1  2  3  9]
 [ 4  5  6 10]]


In [70]:
"""
使用 np.insert(ndarray, index, elements, axis) 函数向 ndarray 中插入值。

此函数会将给定的元素列表沿着指定的轴插入到 ndarray 中，并放在给定的索引前面。
"""
# We create a rank 1 ndarray 
x = np.array([1, 2, 5, 6, 7])

print('Original x = ', x)

# We insert the integer 3 and 4 between 2 and 5 in x. 
x = np.insert(x,2,[3,4])

print('x = ', x)

Original x =  [1 2 5 6 7]
x =  [1 2 3 4 5 6 7]


In [79]:
# We create a rank 2 ndarray 
Y = np.array([[1,2,3],[7,8,9]])

print('Original Y = \n', Y)

# We insert a row between the first and last row of y
w = np.insert(Y,1,[4,5,6],axis=0)

# We print w
print()
print('w = \n', w)

Original Y = 
 [[1 2 3]
 [7 8 9]]

w = 
 [[1 2 3]
 [4 5 6]
 [7 8 9]]


In [80]:
print('Original Y = \n', Y)
# We insert a column full of 5s between the first and second column of y
v = np.insert(Y,1,5, axis=1)

# We print v
print()
print('v = \n', v)

Original Y = 
 [[1 2 3]
 [7 8 9]]

v = 
 [[1 5 2 3]
 [7 5 8 9]]


### 堆叠

In [82]:
# NumPy 还允许我们将 ndarray 上下堆叠起来，或者左右堆叠。
# 可以使用 np.vstack() 函数进行垂直堆叠，
# 或 使用 np.hstack() 函数进行水平堆叠。
# 请务必注意，为了堆叠 ndarray，ndarray 的形状必须相符。我们来看一些示例：

# We create a rank 1 ndarray 
x = np.array([1,2])
# We create a rank 2 ndarray 
Y = np.array([[3,4],[5,6]])

print('x = ', x)
print('Y = \n', Y)

# We stack x on top of Y
z = np.vstack((x,Y)) # 垂直堆叠

# We print z
print()
print('z = \n', z)

x =  [1 2]
Y = 
 [[3 4]
 [5 6]]

z = 
 [[1 2]
 [3 4]
 [5 6]]


In [84]:
po = x.reshape(2,1)
print('po = \n', po)
print()

# We stack x on the right of Y. We need to reshape x in order to stack it on the right of Y. 
w = np.hstack((Y,x.reshape(2,1))) # 水平堆叠

# We print w
print('w = \n', w)

w = 
 [[3 4 1]
 [5 6 2]]


## 切片

In [86]:
"""
切片方式是在方括号里用冒号 : 分隔起始和结束索引。通常，你将遇到三种类型的切片：

1. ndarray[start:end]
2. ndarray[start:]
3. ndarray[:end]

请注意，在第一种方法和第三种方法中，结束索引不包括在内。
注意，因为 ndarray 可以是多维数组，在进行切片时，通常需要为数组的每个维度指定一个切片。

"""

# We create a 4 x 5 ndarray that contains integers from 0 to 19
X = np.arange(20).reshape(4, 5)
# We print X
print('X = \n', X)
print()

Z = X[1:4,2:5]
print('Z = \n', Z)

X = 
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]

Z = 
 [[ 7  8  9]
 [12 13 14]
 [17 18 19]]


In [87]:
W = X[1:,2:5]

print('W = \n', W)

W = 
 [[ 7  8  9]
 [12 13 14]
 [17 18 19]]


In [88]:
Y = X[:3,2:5]

print('Y = \n', Y)

Y = 
 [[ 2  3  4]
 [ 7  8  9]
 [12 13 14]]


In [89]:
v = X[2,:]
print('v = ', v)

v =  [10 11 12 13 14]


In [90]:
q = X[:,2]
print('q = ', q)

q =  [ 2  7 12 17]


In [91]:
R = X[:,2:3]
print('R = \n', R)

R = 
 [[ 2]
 [ 7]
 [12]
 [17]]


In [93]:
"""请务必注意，如果对 ndarray 进行切片并将结果保存到新的变量中，就像之前一样，数据不会复制到新的变量中。"""

Z = X[1:4,2:5]
print('Z = \n', Z)

# 原始数组 X 的切片没有复制到变量 Z 中。X 和 Z 现在只是同一个 ndarray 的两个不同名称。
# 我们提到，切片只是创建了原始数组的一个视图。
# 也就是说，如果对 Z 做出更改，也会更改 X 中的元素。

Z = 
 [[ 7  8  9]
 [12 13 14]
 [17 18 19]]


In [95]:
# We create a 4 x 5 ndarray that contains integers from 0 to 19
X = np.arange(20).reshape(4, 5)

# We print X
print('X = \n', X)
print()

# We select all the elements that are in the 2nd through 4th rows and in the 3rd to 4th columns
Z = X[1:4,2:5]

# We print Z
print('Z = \n', Z)
print()

# We change the last element in Z to 555
Z[2,2] = 555

# We print X
print('X = \n', X)
print()


"""

如果对 Z 做出更改，X 也会更改。
"""

X = 
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]

Z = 
 [[ 7  8  9]
 [12 13 14]
 [17 18 19]]

X = 
 [[  0   1   2   3   4]
 [  5   6   7   8   9]
 [ 10  11  12  13  14]
 [ 15  16  17  18 555]]



In [96]:
"""
我们想创建一个新的 ndarray，其中包含切片中的值的副本，需要使用 np.copy() 函数。
np.copy(ndarray) 函数会创建给定 ndarray 的一个副本。
此函数还可以当做方法使用，就像之前使用 reshape 函数一样。


我们来看看之前的相同示例，但是现在创建数组副本。我们将 copy 同时当做函数和方法。

"""

# We create a 4 x 5 ndarray that contains integers from 0 to 19
X = np.arange(20).reshape(4, 5)

# We print X
print('X = \n', X)
print()

# create a copy of the slice using the np.copy() function
Z = np.copy(X[1:4,2:5])

#  create a copy of the slice using the copy as a method
W = X[1:4,2:5].copy()

# We change the last element in Z to 555
Z[2,2] = 555

# We change the last element in W to 444
W[2,2] = 444

# We print X
print()
print('X = \n', X)

# We print Z
print()
print('Z = \n', Z) 

# We print W
print()
print('W = \n', W)

# 通过使用 copy 命令，我们创建了完全相互独立的新 ndarray。

X = 
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]


X = 
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]

Z = 
 [[  7   8   9]
 [ 12  13  14]
 [ 17  18 555]]

W = 
 [[  7   8   9]
 [ 12  13  14]
 [ 17  18 444]]


In [101]:
"""
通常，我们会使用一个 ndarray 对另一个 ndarray 进行切片、选择或更改另一个 ndarray 的元素。
"""

# We create a 4 x 5 ndarray that contains integers from 0 to 19
X = np.arange(20).reshape(4, 5)

# We create a rank 1 ndarray that will serve as indices to select elements from X
indices = np.array([1,3])

# We print X
print('X = \n', X)
print()

# We print indices
print('indices = ', indices)
print()

# We use the indices ndarray to select the 2nd and 4th row of X
Y = X[indices,:]

# We use the indices ndarray to select the 2nd and 4th column of X
Z = X[:, indices]

# We print Y
print('Y = \n', Y)
print()

# We print Z
print('Z = \n', Z)

X = 
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]

indices =  [1 3]

Y = 
 [[ 5  6  7  8  9]
 [15 16 17 18 19]]

Z = 
 [[ 1  3]
 [ 6  8]
 [11 13]
 [16 18]]


In [103]:
"""
NumPy 还提供了从 ndarray 中选择特定元素的内置函数。

例如，np.diag(ndarray, k=N) 函数会以 N 定义的对角线提取元素。默认情况下，k=0，表示主对角线。
k > 0 的值用于选择在主对角线之上的对角线中的元素，k < 0 的值用于选择在主对角线之下的对角线中的元素。
"""
# We create a 4 x 5 ndarray that contains integers from 0 to 19
X = np.arange(25).reshape(5, 5)

# We print X
print('X = \n', X)
print()

# We print the elements in the main diagonal of X
print('z =', np.diag(X))

# We print the elements above the main diagonal of X
print('y =', np.diag(X, k=1))

# We print the elements below the main diagonal of X
print('w = ', np.diag(X, k=-1))

X = 
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]

z = [ 0  6 12 18 24]
y = [ 1  7 13 19]
w =  [ 5 11 17 23]


In [106]:
"""
从 ndarray 中提取唯一的元素。

我们可以使用 np.unique() 函数查找 ndarray 中的唯一元素。
np.unique(ndarray) 函数会返回给定 ndarray 中的 唯一元素
"""

# Create 3 x 3 ndarray with repeated values
X = np.array([[1,2,3],[5,2,8],[1,2,3]])

# We print X
print('X = \n', X)

# We print the unique elements of X 
print('The unique elements in X are:',np.unique(X))

X = 
 [[1 2 3]
 [5 2 8]
 [1 2 3]]
The unique elements in X are: [1 2 3 5 8]


## 布尔型索引、集合运算和排序

In [110]:
"""
在很多情况下，我们不知道要选择的元素的索引。
例如，假设有一个 10,000 x 10,000 ndarray，其中包含从 1 到 15,000 的随机整数，
我们只想选择小于 20 的整数。这时候就要用到布尔型索引，

对于布尔型索引，我们将使用逻辑参数（而不是确切的索引）选择元素。
"""


# We create a 5 x 5 ndarray that contains integers from 0 to 24
X = np.arange(25).reshape(5, 5)

# We print X
print()
print('Original X = \n', X)
print()

# We use Boolean indexing to select elements in X:
print('The elements in X that are greater than 10:', X[X > 10])
print('The elements in X that lees than or equal to 7:', X[X <= 7])
print('The elements in X that are between 10 and 17:', X[(X > 10) & (X < 17)])

# We use Boolean indexing to assign the elements that are between 10 and 17 the value of -1
X[(X > 10) & (X < 17)] = -1

# We print X
print()
print('X = \n', X)


Original X = 
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]

The elements in X that are greater than 10: [11 12 13 14 15 16 17 18 19 20 21 22 23 24]
The elements in X that lees than or equal to 7: [0 1 2 3 4 5 6 7]
The elements in X that are between 10 and 17: [11 12 13 14 15 16]

X = 
 [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 -1 -1 -1 -1]
 [-1 -1 17 18 19]
 [20 21 22 23 24]]


In [112]:
"""
除了布尔型索引之外，NumPy 还允许进行集合运算。

可以用来比较 ndarray，例如查找两个 ndarray 中的相同元素。

"""
# We create a rank 1 ndarray
x = np.array([1,2,3,4,5])

# We create a rank 1 ndarray
y = np.array([6,7,2,8,4])

# We print x
print('x = ', x)

# We print y
print('y = ', y)

# We use set operations to compare x and y:
print()
print('The elements that are both in x and y:', np.intersect1d(x,y))
print('The elements that are in x that are not in y:', np.setdiff1d(x,y))
print('All the elements of x and y:',np.union1d(x,y))

x =  [1 2 3 4 5]
y =  [6 7 2 8 4]

The elements that are both in x and y: [2 4]
The elements that are in x that are not in y: [1 3 5]
All the elements of x and y: [1 2 3 4 5 6 7 8]


In [119]:
"""
在 NumPy 中对 ndarray 进行排序。

我们将了解如何使用 np.sort() 函数以不同的方式对秩为 1 和 2 的 ndarray 进行排序。
和我们之前看到的其他函数一样，sort 函数也可以当做方法使用。
但是，对于此函数来说，数据在内存中的存储方式有很大变化。

当 np.sort() 当做 函数 使用时，它不会对ndarray进行就地排序，即不更改被排序的原始 ndarray。
如果将 sort 当做 方法 ，ndarray.sort() 会就地排序 ndarray，即原始数组会变成排序后的数组。

"""
# We create an unsorted rank 1 ndarray
x = np.random.randint(1,11,size=(10,))

# We print x
print()
print('Original x = ', x)

# We sort x and print the sorted array using sort as a function.
print()
print('Sorted x (out of place):', np.sort(x))

# When we sort out of place the original array remains intact. To see this we print x again
print()
print('x after sorting:', x)


Original x =  [ 7  1  8 10  4  2  1  9  3  6]

Sorted x (out of place): [ 1  1  2  3  4  6  7  8  9 10]

x after sorting: [ 7  1  8 10  4  2  1  9  3  6]


In [120]:
"""
注意，np.sort() 会对数组进行排序，但是如果被排序的 ndarray 具有重复的值，np.sort() 将在排好序的数组中保留这些值。
但是，我们可以根据需要，同时使用 sort 函数和 unique 函数仅对 x 中的唯一元素进行排序。

我们来看看如何对上述 x 中的唯一元素进行排序：

"""
# We sort x but only keep the unique elements in x
print(np.sort(np.unique(x)))
print('x after sorting:', x)

[ 1  2  3  4  6  7  8  9 10]
x after sorting: [ 7  1  8 10  4  2  1  9  3  6]


In [125]:
""" 
将 sort 当做方法，原地对 ndarray 进行排序：  
"""
# We create an unsorted rank 1 ndarray
x = np.random.randint(1,11,size=(10,))

# We print x
print('Original x = ', x)

# We sort x and print the sorted array using sort as a method.
x.sort()

# When we sort in place the original array is changed to the sorted array. To see this we print x again
print()
print('x after sorting:', x)

Original x =  [ 4  7  2  4  3  9  8  5 10  4]

x after sorting: [ 2  3  4  4  4  5  7  8  9 10]


In [122]:
"""
在对秩为 2 的 ndarray 进行排序时，我们需要在 np.sort() 函数中指定是按行排序，还是按列排序。
为此，我们可以使用关键字 axis。
"""
# We create an unsorted rank 2 ndarray
X = np.random.randint(1,11,size=(5,5))

# We print X
print('Original X = \n', X)
print()

# We sort the columns of X and print the sorted array
print()
print('X with sorted columns :\n', np.sort(X, axis = 0))

Original X = 
 [[10  1  9  2  8]
 [ 4  1  4  9  3]
 [ 3 10  6  6  2]
 [ 1  3  9  1  6]
 [ 6  7  7  4  5]]


X with sorted columns :
 [[ 1  1  4  1  2]
 [ 3  1  6  2  3]
 [ 4  3  7  4  5]
 [ 6  7  9  6  6]
 [10 10  9  9  8]]


In [124]:
# We sort the rows of X and print the sorted array
print('Original X = \n', X)
print()
print('X with sorted rows :\n', np.sort(X, axis = 1))

Original X = 
 [[10  1  9  2  8]
 [ 4  1  4  9  3]
 [ 3 10  6  6  2]
 [ 1  3  9  1  6]
 [ 6  7  7  4  5]]

X with sorted rows :
 [[ 1  2  8  9 10]
 [ 1  3  4  4  9]
 [ 2  3  6  6 10]
 [ 1  1  3  6  9]
 [ 4  5  6  7  7]]


## 算术运算和广播

In [None]:
""" NumPy 如何对 ndarray 进行算术运算  
 
 NumPy 允许对 ndarray 执行元素级运算以及矩阵运算。
 
1. 如何对 ndarray 进行元素级运算。
为了进行元素级运算，NumPy 有时候会用到广播功能。
广播一词用于描述 NumPy 如何对具有不同形状的 ndarray 进行元素级算术运算。

例如，在标量和 ndarray 之间进行算术运算时，会隐式地用到广播。
 
"""  


    

In [126]:
# 元素级加减乘除运算

"""
我们可以在 NumPy 中使用 np.add() 等函数，或者使用 + 等算术符号，后者与数学方程式的写法更像。
这两种形式都执行相同的运算，唯一的区别是如果采用函数方式，函数通常都具有各种选项，可以通过关键字调整这些选项。

请注意，在进行元素级运算时，对其执行运算的 ndarray 必须具有相同的形状或者可以广播。

我们将在这节课的稍后阶段详细讲解这方面的知识。我们先对秩为 1 的 ndarray 执行元素级算术运算：

"""
# We create two rank 1 ndarrays
x = np.array([1,2,3,4])
y = np.array([5.5,6.5,7.5,8.5])

# We print x
print()
print('x = ', x)

# We print y
print()
print('y = ', y)
print()

# We perfrom basic element-wise operations using arithmetic symbols and functions
print('x + y = ', x + y)
print('add(x,y) = ', np.add(x,y))
print()
print('x - y = ', x - y)
print('subtract(x,y) = ', np.subtract(x,y))
print()
print('x * y = ', x * y)
print('multiply(x,y) = ', np.multiply(x,y))
print()
print('x / y = ', x / y)
print('divide(x,y) = ', np.divide(x,y))


x =  [1 2 3 4]

y =  [5.5 6.5 7.5 8.5]

x + y =  [ 6.5  8.5 10.5 12.5]
add(x,y) =  [ 6.5  8.5 10.5 12.5]

x - y =  [-4.5 -4.5 -4.5 -4.5]
subtract(x,y) =  [-4.5 -4.5 -4.5 -4.5]

x * y =  [ 5.5 13.  22.5 34. ]
multiply(x,y) =  [ 5.5 13.  22.5 34. ]

x / y =  [0.18181818 0.30769231 0.4        0.47058824]
divide(x,y) =  [0.18181818 0.30769231 0.4        0.47058824]


In [127]:
"""
   对秩为 2 的 ndarray 执行相同的元素级算术运算。同样，为了执行这些运算，ndarray 的形状必须一样或者可广播。
"""
# We create two rank 2 ndarrays
X = np.array([1,2,3,4]).reshape(2,2)
Y = np.array([5.5,6.5,7.5,8.5]).reshape(2,2)

# We print X
print()
print('X = \n', X)

# We print Y
print()
print('Y = \n', Y)
print()

# We perform basic element-wise operations using arithmetic symbols and functions
print('X + Y = \n', X + Y)
print()
print('add(X,Y) = \n', np.add(X,Y))
print()
print('X - Y = \n', X - Y)
print()
print('subtract(X,Y) = \n', np.subtract(X,Y))
print()
print('X * Y = \n', X * Y)
print()
print('multiply(X,Y) = \n', np.multiply(X,Y))
print()
print('X / Y = \n', X / Y)
print()
print('divide(X,Y) = \n', np.divide(X,Y))



X = 
 [[1 2]
 [3 4]]

Y = 
 [[5.5 6.5]
 [7.5 8.5]]

X + Y = 
 [[ 6.5  8.5]
 [10.5 12.5]]

add(X,Y) = 
 [[ 6.5  8.5]
 [10.5 12.5]]

X - Y = 
 [[-4.5 -4.5]
 [-4.5 -4.5]]

subtract(X,Y) = 
 [[-4.5 -4.5]
 [-4.5 -4.5]]

X * Y = 
 [[ 5.5 13. ]
 [22.5 34. ]]

multiply(X,Y) = 
 [[ 5.5 13. ]
 [22.5 34. ]]

X / Y = 
 [[0.18181818 0.30769231]
 [0.4        0.47058824]]

divide(X,Y) = 
 [[0.18181818 0.30769231]
 [0.4        0.47058824]]


In [128]:
# 对 ndarray 的所有元素应用数学函数，例如 sqrt(x)。

# We create a rank 1 ndarray
x = np.array([1,2,3,4])

# We print x
print()
print('x = ', x)

# We apply different mathematical functions to all elements of x
print()
print('EXP(x) =', np.exp(x))
print()
print('SQRT(x) =',np.sqrt(x))
print()
print('POW(x,2) =',np.power(x,2)) # We raise all elements to the power of 2


x =  [1 2 3 4]

EXP(x) = [ 2.71828183  7.3890561  20.08553692 54.59815003]

SQRT(x) = [1.         1.41421356 1.73205081 2.        ]

POW(x,2) = [ 1  4  9 16]


In [129]:
"""
NumPy 的另一个重要特性是具有大量不同的统计学函数。统计学函数为我们提供了关于 ndarray 中元素的统计学信息。
"""

# We create a 2 x 2 ndarray
X = np.array([[1,2], [3,4]])

# We print x
print()
print('X = \n', X)
print()

print('Average of all elements in X:', X.mean())
print('Average of all elements in the columns of X:', X.mean(axis=0))
print('Average of all elements in the rows of X:', X.mean(axis=1))
print()
print('Sum of all elements in X:', X.sum())
print('Sum of all elements in the columns of X:', X.sum(axis=0))
print('Sum of all elements in the rows of X:', X.sum(axis=1))
print()
print('Standard Deviation of all elements in X:', X.std())
print('Standard Deviation of all elements in the columns of X:', X.std(axis=0))
print('Standard Deviation of all elements in the rows of X:', X.std(axis=1))
print()
print('Median of all elements in X:', np.median(X))
print('Median of all elements in the columns of X:', np.median(X,axis=0))
print('Median of all elements in the rows of X:', np.median(X,axis=1))
print()
print('Maximum value of all elements in X:', X.max())
print('Maximum value of all elements in the columns of X:', X.max(axis=0))
print('Maximum value of all elements in the rows of X:', X.max(axis=1))
print()
print('Maximum value of all elements in X:', X.min())
print('Maximum value of all elements in the columns of X:', X.min(axis=0))
print('Maximum value of all elements in the rows of X:', X.min(axis=1))


X = 
 [[1 2]
 [3 4]]

Average of all elements in X: 2.5
Average of all elements in the columns of X: [2. 3.]
Average of all elements in the rows of X: [1.5 3.5]

Sum of all elements in X: 10
Sum of all elements in the columns of X: [4 6]
Sum of all elements in the rows of X: [3 7]

Standard Deviation of all elements in X: 1.118033988749895
Standard Deviation of all elements in the columns of X: [1. 1.]
Standard Deviation of all elements in the rows of X: [0.5 0.5]

Median of all elements in X: 2.5
Median of all elements in the columns of X: [2. 3.]
Median of all elements in the rows of X: [1.5 3.5]

Maximum value of all elements in X: 4
Maximum value of all elements in the columns of X: [3 4]
Maximum value of all elements in the rows of X: [2 4]

Maximum value of all elements in X: 1
Maximum value of all elements in the columns of X: [1 2]
Maximum value of all elements in the rows of X: [1 3]


In [130]:
""" 
 NumPy 如何使 ndarray 中的所有元素与单个数字相加，而不使用复杂的循环。 
"""

# We create a 2 x 2 ndarray
X = np.array([[1,2], [3,4]])

# We print x
print()
print('X = \n', X)
print()

print('3 * X = \n', 3 * X)
print()
print('3 + X = \n', 3 + X)
print()
print('X - 3 = \n', X - 3)
print()
print('X / 3 = \n', X / 3)

# NumPy 在后台对 ndarray 广播 3，使它们具有相同的形状。这样我们仅使用一行代码，就可以使 X 的每个元素加 3。


X = 
 [[1 2]
 [3 4]]

3 * X = 
 [[ 3  6]
 [ 9 12]]

3 + X = 
 [[4 5]
 [6 7]]

X - 3 = 
 [[-2 -1]
 [ 0  1]]

X / 3 = 
 [[0.33333333 0.66666667]
 [1.         1.33333333]]


In [132]:
""" 
Numpy 可以对两个形状不同的 ndarray 执行相同的操作，但是存在一些限制，如下所示。 
"""


# We create a rank 1 ndarray
x = np.array([1,2,3])

# We create a 3 x 3 ndarray
Y = np.array([[1,2,3],[4,5,6],[7,8,9]])

# We create a 3 x 1 ndarray
Z = np.array([1,2,3]).reshape(3,1)

# We print x
print()
print('x = ', x)
print()

# We print Y
print()
print('Y = \n', Y)
print()

# We print Z
print()
print('Z = \n', Z)
print()

print('x + Y = \n', x + Y)
print()
print('Z + Y = \n',Z + Y)


x =  [1 2 3]


Y = 
 [[1 2 3]
 [4 5 6]
 [7 8 9]]


Z = 
 [[1]
 [2]
 [3]]

x + Y = 
 [[ 2  4  6]
 [ 5  7  9]
 [ 8 10 12]]

Z + Y = 
 [[ 2  3  4]
 [ 6  7  8]
 [10 11 12]]



# 均值标准化

在机器学习中，我们会使用大量数据训练我们的模型。某些机器学习算法可能需要标准化数据才能正常工作。
标准化是指特征缩放，旨在确保所有数据都采用相似的刻度，即所有数据采用相似范围的值。
例如，数据集的值范围在 0 到 5,000 之间。通过标准化数据，可以使值范围在 0 到 1 之间。
在此 Lab 中，你将执行一种特殊形式的特征缩放，称之为**均值标准化**。均值标准化不仅会缩放数据，而且会确保数据的均值为 0。


将导入 NumPy 并创建一个秩为 2 的 ndarray，其中包含 0 到 5,000（含）之间的随机整数，共有 1000 行和 20 列。此数组将模拟一个值范围很广的数据集。

In [2]:
# 导入 NumPy 并创建一个秩为 2 的 ndarray，其中包含 0 到 5,000（含）之间的随机整数，共有 1000 行和 20 列。
# 此数组将模拟一个值范围很广的数据集。

# import NumPy into Python
import numpy as np
# Create a 1000 x 20 ndarray with random integers in the half-open interval [0, 5001).
X = np.random.randint(0,50001, size = (1000, 20))

# print the shape of X
print(X)


[[ 6603 49297  1780 ... 34467 11275  1790]
 [19869 38654  3967 ... 13616 25696 46362]
 [31322 30398 47077 ... 32275 29725 16893]
 ...
 [30942  1232 21396 ... 28533 48922 21170]
 [26008 45708 22957 ...  1400 45263 10668]
 [30462 15675 26194 ... 22758 25810 29888]]


创建好数组后，我们将标准化数据。我们将使用以下方程进行均值标准化：

$\mbox{Norm_Col}_i = \frac{\mbox{Col}_i - \mu_i}{\sigma_i}$

其中 $\mbox{Col}_i$ 是 $X$ 的第 $i$ 列，$\mu_i$ 是 $X$ 的第 $i$ 列的平均值，$\sigma_i$ 是 $X$ 的第 $i$ 列的标准差。换句话说，均值标准化的计算方法是将值减去 $X$ 的每列的平均值，然后除以值的标准差。在下面的空白处，你首先需要计算 $X$ 的每列的平均值和标准差。

In [3]:
# Average of the values in each column of X
ave_cols = np.mean(X, axis=0)

# Standard Deviation of the values in each column of X
std_cols = np.std(X, axis=0)

In [4]:
# Print the shape of ave_cols
print(ave_cols)

# Print the shape of std_cols
print(std_cols)

[24864.46  25268.536 25194.67  24747.029 24400.029 24859.45  24875.843
 24089.975 24767.849 24797.986 24859.487 24301.63  24120.492 24888.505
 25557.144 25529.485 24705.723 24662.456 24668.968 25606.204]
[14492.46296067 14098.7178113  14591.53475879 14309.71910095
 14231.71776927 14550.69832694 13846.28959044 14281.50585339
 14330.75303709 14216.64984488 14501.34557149 14274.0865713
 14473.87995197 14161.14178045 14567.38390993 14544.9165693
 14372.72292275 14352.01820366 14486.09857142 14738.46893508]


In [None]:
# Mean normalize X
X_norm = 

In [None]:
# Print the average of all the values of X_norm

# Print the minimum value of each column of X_norm

# Print the maximum value of each column of X_norm



请注意，因为 $X$ 是使用随机整数创建的，因此上述值将有所变化。

# 数据分离

数据均值标准化后，通常在机器学习中，我们会将数据集拆分为三个集合：

1. 训练集
2. 交叉验证集
3. 测试集

划分方式通常为，训练集包含 60% 的数据，交叉验证集包含 20% 的数据，测试集包含 20% 的数据。

在此部分，你需要将 `X_norm` 分离成训练集、交叉验证集和测试集。每个数据集将包含随机选择的 `X_norm` 行，确保不能重复选择相同的行。这样可以保证所有的 `X_norm` 行都能被选中，并且在三个新的数据集中随机分布。

首先你需要创建一个秩为 1 的 ndarray，其中包含随机排列的 `X_norm` 行索引。为此，你可以使用 `np.random.permutation()` 函数。`np.random.permutation(N)` 函数会创建一个从 0 到 `N - 1`的随机排列的整数集。我们来看一个示例：

In [None]:
# We create a random permutation of integers 0 to 4
np.random.permutation(5)

# TODO

在下面的空白处，创建一个秩为 1 的 ndarray，其中包含随机排列的 `X_norm` 行索引。用一行代码就能搞定：使用 `shape` 属性提取 `X_norm` 的行数，然后将其传递给  `np.random.permutation()` 函数。注意，`shape` 属性返回一个包含两个数字的元组，格式为 `(rows,columns)`。

In [None]:
# Create a rank 1 ndarray that contains a random permutation of the row indices of `X_norm`
row_indices = 

现在，你可以使用 `row_indices` ndarray 创建三个数据集，并选择进入每个数据集的行。注意，训练集包含 60% 的数据，交叉验证集包含 20% 的数据，测试集包含 20% 的数据。每个集合都只需一行代码就能创建。请填充以下代码

In [None]:
 Make any necessary calculations.
# You can save your calculations into variables to use later.


# Create a Training Set
X_train = 

# Create a Cross Validation Set
X_crossVal = 

# Create a Test Set
X_test = 

如果你正确地完成了上述计算步骤，那么 `X_tain` 应该有 600 行和 20 列，`X_crossVal` 应该有 200 行和 20 列，`X_test` 应该有 200 行和 20 列。你可以通过填充以下代码验证这一点：

In [None]:
# Print the shape of X_train

# Print the shape of X_crossVal

# Print the shape of X_test