### Jupyter 快捷方式
- 进入 command mode
    - esc
    - a -> 向上插入cell
    - b -> 向下插入cell
    - m -> 切换成 markdown 模式
    - y -> 切换成 python 代码模式
    - d + d -> 删除 cell
- 进入 edit mode
    - enter -> 换行
    - shift + enter -> 执行

### 深度学习计算
- 基于Block构造模型, 相较之前, 更加灵活:
    - Block 可以是 Dense, 可以是layer, 可以是model
- 方法:
    - `__init__`, 创建模型参数
    - `forward`: 定义前向计算
    - `add`: 增加串联的Block子类实例
    - `initialize`: 用uniform分布来初始化参数: `initialize(init, ctx, verbose, force_reinit)`
    - mxnet会自动求梯度, 生成反向传播backward函数(厉害了)
- ![计算图](http://zh.d2l.ai/_images/forward.svg)
- 访问模型参数
    - 访问 params属性: .params['dense0_weight']
    - 直接使用点操作符: .weight, .bias
- 延迟 初始化:
    - 在initialize()时还没初始化, 毕竟 连 输入层的个数都不知道
- 避免延后初始化(deferred initialization)
    - 在声明层时, 指定输入的个数: `nn.Dense(num_outputs, in_units=input_units)`
- 存储/读取参数:
    - `Block.save_parameters`
    - `Block.load_parameters`
- 存储/读取ndarray:
    - `nd.save`
    - `nd.load`
- GPU的context:
   - `x_gpu = nd.array([1,2,3], ctx=mx.gpu(0))`
   - `net.initialize(ctx=mx.gpu())`
- 在 device之间传输数据:
    - 深拷贝: copyto: `x.copyto(mx.gpu(0))`
    - 浅拷贝: as_in_context: `x.as_in_context(mx,gpu(0))`
        - 如果源变量和目标变量的context一致，as_in_context函数使目标变量和源变量共享源变量的内存或显存
- 

    

In [103]:
from mxnet import nd, init
from mxnet.gluon import nn

In [6]:
class MLP(nn.Block):
    """
    继承Block 构造模型
    """
    def __init__(self, **kwargs):
        super(MLP, self).__init__(**kwargs)
        # 和 Sequential一样不需要定义输入层
        self.hidden = nn.Dense(256, activation='relu')
        self.output = nn.Dense(10)
    
    def forward(self, X):
        return self.output(self.hidden(X)) ## 这样就点乘了???

In [43]:
X = nd.random.uniform(shape=(2, 20))

In [44]:
X


[[0.52580893 0.04401636 0.14181727 0.53580195 0.31673068 0.4527305
  0.6267065  0.3261177  0.7275436  0.30018947 0.02427271 0.26678815
  0.430116   0.19952953 0.6521246  0.49263242 0.853246   0.34513777
  0.47532478 0.33060053]
 [0.96920586 0.06620718 0.26563254 0.01386505 0.0135087  0.91165674
  0.48375288 0.41375843 0.2561138  0.6770445  0.82371765 0.45239896
  0.23277268 0.20063767 0.31062922 0.9294752  0.79122746 0.12266199
  0.71514326 0.48137522]]
<NDArray 2x20 @cpu(0)>

In [45]:
net = MLP()
net.initialize()
net(X)


[[-0.06070112 -0.07294163  0.04112982  0.0487619  -0.00259803 -0.04815025
   0.04822326 -0.02091947 -0.0527753  -0.06360553]
 [-0.06579573 -0.11664213  0.02646019  0.06755988  0.06176336 -0.10675579
   0.03999376 -0.01457253 -0.05268126 -0.08559721]]
<NDArray 2x10 @cpu(0)>

In [46]:
class MySequencial(nn.Block):
    """
    继承Block 构造Sequantial
    """
    def __init__(self, **kwargs):
        super(MySequencial, self).__init__(**kwargs)
        
    def add(self, *blocks):
        """
        self._chidren 是一个 OrderedDict
        """
        for block in blocks:
            self._children[block.name] = block
        
    def forward(self, X):
        for block in self._children.values():
            X = block(X)
        return X

In [47]:
my_sequantial = MySequencial()
my_sequantial.add(nn.Dense(256, activation='relu'), nn.Dense(10))
my_sequantial.initialize()
my_sequantial(X)


[[-0.01626736  0.06074644  0.01985445 -0.01832244 -0.02094231  0.0452922
   0.02652163 -0.07844356 -0.02887667  0.07291901]
 [-0.00768407  0.08003666  0.01333294  0.01722985 -0.06433114  0.03854203
   0.05133884 -0.03060111 -0.02143979  0.04306421]]
<NDArray 2x10 @cpu(0)>

In [70]:
class FancyMLP(nn.Block):
    """
    定义复杂一点的网络
    x = wx  ->  x = wx -> if |x| > 1: x/2; else: x*10
    """
    def __init__(self, **kwargs):
        super(FancyMLP, self).__init__(**kwargs)
        self.rand_weight = self.params.get_constant(
            'rand_weight', nd.random.uniform(shape=(20, 20)))  # 使用get_constant创建的参数是常量参数, 不会在训练中迭代
        self.dense = nn.Dense(20, activation="relu")
        print("__init__: ", self.dense)
    
    def forward(self, X):
        print("Initialize: ", X)
        X = self.dense(X)
        print("First dense: ", X)
        X = nd.relu(nd.dot(X, self.rand_weight.data()) + 1)
        print("RELU and dot: ", X)
        X = self.dense(X)  # 复用全连接层, 参数共享
        print("Second dense: ", X, X.norm(), X.norm().asscalar())
        while X.norm() > 1: # .asscalar() > 1:
            print("大于1: ", X.norm().asscalar())
            X /= 2
        if X.norm() < 0.8:  # .asscalar() < 0.8:
            print("小于 0.8: ", X.norm().asscalar())
            X *= 10
        return X.sum()

In [71]:
fancy_mlp = FancyMLP()
fancy_mlp.initialize()
fancy_mlp(X)

__init__:  Dense(None -> 20, Activation(relu))
Initialize:  
[[0.52580893 0.04401636 0.14181727 0.53580195 0.31673068 0.4527305
  0.6267065  0.3261177  0.7275436  0.30018947 0.02427271 0.26678815
  0.430116   0.19952953 0.6521246  0.49263242 0.853246   0.34513777
  0.47532478 0.33060053]
 [0.96920586 0.06620718 0.26563254 0.01386505 0.0135087  0.91165674
  0.48375288 0.41375843 0.2561138  0.6770445  0.82371765 0.45239896
  0.23277268 0.20063767 0.31062922 0.9294752  0.79122746 0.12266199
  0.71514326 0.48137522]]
<NDArray 2x20 @cpu(0)>
First dense:  
[[0.07024643 0.         0.06379425 0.         0.09142933 0.
  0.         0.         0.01337921 0.00755521 0.00864871 0.
  0.         0.03104995 0.         0.09223591 0.16523626 0.
  0.         0.        ]
 [0.         0.         0.1921282  0.03930675 0.12466463 0.00540278
  0.         0.         0.02709076 0.02961882 0.         0.
  0.         0.         0.         0.09034698 0.22440496 0.
  0.         0.        ]]
<NDArray 2x20 @cpu(0)>
R


[19.623453]
<NDArray 1 @cpu(0)>

In [50]:
class NestMLP(nn.Block):
    """
    嵌套调用 Block子类 和 Sequential
    输入: X_m*n
    X_m*n * W1_n*64 -(relu)-> H1_m*64 * W2_64*32 -(relu)-> O1_m*32 * W3_32*16 -(relu)-> O2_m*16
    """
    def __init__(self, **kwargs):
        super(NestMLP, self).__init__(**kwargs)
        self.net = nn.Sequential()
        self.net.add(nn.Dense(64, activation='relu'), nn.Dense(32, activation='relu'))
        self.dense = nn.Dense(16, activation='relu')
    
    def forward(self, X):
        return self.dense(self.net(X))

In [51]:
nest_mlp = NestMLP()
nest_mlp.initialize()
nest_mlp(X)


[[0.         0.00068155 0.00380999 0.00216276 0.00086105 0.
  0.         0.00184968 0.00136407 0.         0.         0.
  0.         0.00276379 0.00014916 0.        ]
 [0.         0.0021382  0.00417513 0.00222392 0.00198193 0.
  0.         0.0026347  0.         0.         0.         0.
  0.         0.00541991 0.00186445 0.        ]]
<NDArray 2x16 @cpu(0)>

In [52]:
nest_seq = nn.Sequential()
nest_seq.add(NestMLP(), nn.Dense(20), FancyMLP())
nest_seq.initialize()
nest_seq(X)

__init__:  Dense(None -> 20, Activation(relu))
Initialize:  
[[-1.3215184e-04  1.6501137e-04  3.3890214e-04  3.0373316e-04
   2.7971971e-04  4.4500313e-04 -3.1594638e-04  1.8301987e-04
   3.2636931e-04 -9.0669550e-05  6.0050341e-05  1.3032819e-04
  -3.4925487e-04  1.7909531e-04  1.9252884e-04 -1.5353228e-04
  -2.9035352e-04  6.0592662e-04  2.0440918e-04 -3.2938588e-05]
 [-1.6919333e-04  2.8088191e-04  3.5906580e-04  2.7008241e-04
   1.1220090e-04  5.9315155e-04 -1.9164746e-04  1.2763157e-04
   4.2085416e-04 -1.8352392e-04  8.0380181e-05 -3.2929376e-05
  -2.5981246e-04  1.0404169e-04  2.6852544e-04  2.5695845e-05
  -3.7983494e-04  6.2260020e-04 -9.8600703e-06 -4.3311229e-05]]
<NDArray 2x20 @cpu(0)>
First dense:  
[[1.3807738e-05 1.1303832e-05 1.6630645e-04 0.0000000e+00 3.6790429e-05
  0.0000000e+00 5.3354311e-06 7.2666131e-05 8.4236824e-05 0.0000000e+00
  3.9687871e-05 0.0000000e+00 0.0000000e+00 0.0000000e+00 4.2794942e-05
  0.0000000e+00 0.0000000e+00 3.0408901e-05 0.0000000e+00 0.00


[17.592234]
<NDArray 1 @cpu(0)>

In [53]:
X # 并没有改变


[[0.52580893 0.04401636 0.14181727 0.53580195 0.31673068 0.4527305
  0.6267065  0.3261177  0.7275436  0.30018947 0.02427271 0.26678815
  0.430116   0.19952953 0.6521246  0.49263242 0.853246   0.34513777
  0.47532478 0.33060053]
 [0.96920586 0.06620718 0.26563254 0.01386505 0.0135087  0.91165674
  0.48375288 0.41375843 0.2561138  0.6770445  0.82371765 0.45239896
  0.23277268 0.20063767 0.31062922 0.9294752  0.79122746 0.12266199
  0.71514326 0.48137522]]
<NDArray 2x20 @cpu(0)>

In [54]:
class ErrorMLP(nn.Block):
    """
    继承Block 构造模型
    """
    def __init__(self, **kwargs):
        # 和 Sequential一样不需要定义输入层
        self.hidden = nn.Dense(256, activation='relu')
        self.output = nn.Dense(10)
    
    def forward(self, X):
        return self.output(self.hidden(X)) ## 这样就点乘了???

In [55]:
mlp = ErrorMLP()

AttributeError: 'ErrorMLP' object has no attribute '_children'

In [72]:
class ErrorNestMLP(nn.Block):
    """
    嵌套调用 Block子类 和 Sequential
    输入: X_m*n
    X_m*n * W1_n*64 -(relu)-> H1_m*64 * W2_64*32 -(relu)-> O1_m*32 * W3_32*16 -(relu)-> O2_m*16
    """
    def __init__(self, **kwargs):
        super(ErrorNestMLP, self).__init__(**kwargs)
        # 用 list 代替 Sequential()
        self.net = [nn.Dense(64, activation='relu'), nn.Dense(32, activation='relu')]
        self.dense = nn.Dense(16, activation='relu')
    
    def forward(self, X):
        return self.dense(self.net(X))

In [75]:
error_nest_mlp = ErrorNestMLP()
error_nest_mlp.initialize()
error_nest_mlp(X)

  self.collect_params().initialize(init, ctx, verbose, force_reinit)


TypeError: 'list' object is not callable

In [62]:
print("胜利" if X.norm() < 1 else "失败")

失败


In [61]:
print(X.norm())


[3.2021255]
<NDArray 1 @cpu(0)>


In [79]:
class MultipleDenses(nn.Block):
    def __init__(self, **kwargs):
        super(MultipleDenses, self).__init__(**kwargs)
        # wrong
        self.denses = [nn.Dense(64, activation='relu'), nn.Dense(32, activation='relu'), nn.Dense(16, activation='relu')]
    
    def forward(self, X):
        for dense in self.denses:
            X = dense(X)
        return X

In [80]:
multiple_dense = MultipleDenses()
multiple_dense.initialize()
multiple_dense(X)

RuntimeError: Parameter 'dense61_weight' has not been initialized. Note that you should initialize parameters and create Trainer with Block.collect_params() instead of Block.params because the later does not include Parameters of nested child Blocks

In [81]:
net = nn.Sequential()
net.add(nn.Dense(256, activation='relu'))
net.add(nn.Dense(10))
net.initialize()

In [85]:
net[0].weight.data()  # 在进行前向计算前才会 初始化

DeferredInitializationError: Parameter 'dense64_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.

In [88]:
Y = net(X)

In [91]:
net[0].params

dense64_ (
  Parameter dense64_weight (shape=(256, 20), dtype=float32)
  Parameter dense64_bias (shape=(256,), dtype=float32)
)

In [94]:
net[0].params['dense64_weight'].data() == net[0].weight.data()


[[1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 ...
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]]
<NDArray 256x20 @cpu(0)>

In [95]:
net[0].params['dense64_bias'].data() == net[0].bias.data()


[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
<NDArray 256 @cpu(0)>

In [97]:
net[0].bias.data()


[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
<NDArray 256 @cpu(0)>

In [96]:
# 求梯度: 在反向计算前, 梯度值都是0
net[0].weight.grad()


[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
<NDArray 256x20 @cpu(0)>

In [100]:
# collect_params, 通过正则匹配参数
print(net.collect_params())
print(net.collect_params(".*weight"))

sequential11_ (
  Parameter dense64_weight (shape=(256, 20), dtype=float32)
  Parameter dense64_bias (shape=(256,), dtype=float32)
  Parameter dense65_weight (shape=(10, 256), dtype=float32)
  Parameter dense65_bias (shape=(10,), dtype=float32)
)
sequential11_ (
  Parameter dense64_weight (shape=(256, 20), dtype=float32)
  Parameter dense65_weight (shape=(10, 256), dtype=float32)
)


In [104]:
# 强制初始化模型参数, force reinit
net.initialize(init=init.Normal(sigma=0.01), force_reinit=True)
net[0].weight.data()


[[ 2.0388728e-03  5.0904877e-03  1.9365953e-03 ... -7.9620769e-03
  -2.0083881e-03 -8.2816696e-03]
 [-3.2664095e-03  3.5921552e-03  2.9730289e-03 ...  3.8690998e-03
   4.8305974e-03 -3.5797848e-04]
 [ 4.0084305e-03  6.6222262e-04  1.1165363e-02 ... -4.2826491e-03
   6.1445148e-03 -1.5364649e-03]
 ...
 [-2.4511721e-02 -1.0328277e-02 -6.6571822e-03 ... -8.2627647e-03
  -5.4091369e-03 -7.8883460e-03]
 [ 1.4808148e-02 -8.1008542e-03 -1.3116258e-04 ...  1.0994526e-03
   1.5791535e-02  4.4286947e-04]
 [ 7.0945927e-05 -5.7791389e-04 -1.2423999e-03 ...  1.0772002e-02
   1.2763192e-02  2.5891021e-03]]
<NDArray 256x20 @cpu(0)>

In [105]:
# 使用常量初始化参数
net.initialize(init.Constant(1), force_reinit=True)
net[0].weight.data()


[[1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 ...
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]]
<NDArray 256x20 @cpu(0)>

In [106]:
# 自定义初始化方法
class MyInit(init.Initializer):
    def _init_weight(self, name, data):
        print('Init', name, data.shape)
        data[:] = nd.random.uniform(low=-10, high=10, shape=data.shape)
        data *= data.abs() >= 5

net.initialize(MyInit(), force_reinit=True)
net[0].weight.data()[0]

Init dense64_weight (256, 20)
Init dense65_weight (10, 256)



[ 0.         0.         9.494465  -0.         0.         0.
  0.        -0.         9.339119  -0.        -0.        -7.1404243
 -0.        -0.        -9.122751   0.        -6.3159137  0.
 -5.2562075  0.       ]
<NDArray 20 @cpu(0)>

In [109]:
# 使用 set_data 改写 模型参数
net[0].weight.set_data(net[0].weight.data() + 1)
net[0].weight.data()


[[ 2.         2.        11.494465  ...  2.        -3.2562075  2.       ]
 [-4.3299108  2.         7.0956774 ...  2.        -7.4145813 -6.4642754]
 [ 2.         2.        10.1654625 ...  2.         2.         8.529558 ]
 ...
 [-6.5428123 -3.6284614  2.        ...  2.         2.         7.1772413]
 [-4.460295   8.433802   8.040327  ...  2.         2.         2.       ]
 [11.683821   9.593361   2.        ...  2.        -7.5182877 10.365871 ]]
<NDArray 256x20 @cpu(0)>

In [110]:
# 共享模型参数
net = nn.Sequential()
shared = nn.Dense(8, activation='relu')
net.add(nn.Dense(8, activation='relu'),
        shared,  # 在构造第三隐藏层时通过params来指定它使用第二隐藏层的参数
        nn.Dense(8, activation='relu', params=shared.params),
        nn.Dense(10))
net.initialize()

X = nd.random.uniform(shape=(2, 20))
net(X)

net[1].weight.data()[0] == net[2].weight.data()[0]
# 第二隐藏层和第三隐藏层的参数 的维度必须一样
# 模型参数里包含了梯度，所以在反向传播计算时，第二隐藏层和第三隐藏层的梯度都会被累加在shared.params.grad()里


[1. 1. 1. 1. 1. 1. 1. 1.]
<NDArray 8 @cpu(0)>

### 练习
1. 如果不在Block子类的 __init__中调用父类的__init__, 会报什么错? 
    - 子类不能继承父类的属性, 导致: `AttributeError: 'ErrorMLP' object has no attribute '_children'`
2. 如果去掉FancyMLP类里面的asscalar函数，会有什么问题？
    - 应该没什么问题
    - 向量不能判断其布尔值: ValueError: The truth value of an NDArray with multiple elements is ambiguous.
3. 如果将NestMLP类中通过Sequential实例定义的self.net改为self.net = [nn.Dense(64, activation='relu'), nn.Dense(32, activation='relu')]，会有什么问题？
    - TypeError: 'list' object is not callable
4. 当你要在__init__中定义Dense的列表时, 推荐 使用Sequential(), 而 [self.dense1, self.dense2] 这种方式次之
    - 否则直接列表会报错: RuntimeError: Parameter 'dense61_weight' has not been initialized. Note that you should initialize parameters and create Trainer with Block.collect_params() instead of Block.params because the later does not include Parameters of nested child Blocks
    - 原因: type是list的对象不会被注册到 Block._children属性中, 导致initialize()时找不到神经元
5. 构造一个含共享参数层的多层感知机并训练。在训练过程中，观察每一层的模型参数和梯度
    - 为什么要共享参数, 节省空间和计算量 ??
    - 参数共享在后面应该会用到