## Write code in MXnet/Gluon/Numpy  
今天开始入坑了  [中文社区](https://discuss.gluon.ai), [英文社区](https://discuss.mxnet.io)    

### 大纲
#### lecture 1: 背景
#### lecture 2: 预备知识
#### lecture 3: 基础概念和技术
#### lecture 4: 重要组成部分
#### lecture 5: 卷积网络
#### lecture 6: 处理序列数据的循环神经网络
#### lecture 7: 优化算法
#### lecture 8: 检查性能
#### lecture 9: 计算机视觉
#### lecture 10: 自然语言处理

1. 逆向思考: 与其设计一个解决问题的程序，不如从最终的需求入手来寻找一个解决方案
2. 英语单词:  
    - exemplar: 模范
    - practitioners: 从业者
    - Realize: 实现
    - simultaneously: 同时
    - motivations: 动机
    - pitfalls: 陷阱
    - navigate: 导航
    - get the most out of: 充分利用
    - formidable: 艰巨的
    - unified: 统一的
    - would-be: 未来的, 即将成为的
    - sporadic: 零星的
    - elude: 躲避
    - huddle: 挤在
    - ponder: 思考
    - encapsulate: 封装
3. 应用DL 要求明白:
    - 转换问题的动机
    - 其中的数学
    - 优化算法, 让模型更适应数据
    - 有效训练模型的工程
4. 我的困惑, 也是此书的目的之一: 市面上大部分教材都是关注于 how to implement a given approach, 却忽视 why certain algorithmic decisions are made? 因为我需要: 
    - the concepts behind deep learning
    - realizations of the concepts in code
5. 网站:
    - jupyter, 融合code/公式/文本
    - sphinx: 生成code/latex的输出
    - Discourse: 论坛
6. 你在生活中有没有这样的场景：虽有许多展示如何解决问题的样例，但缺少自动解决问题的算法？它们也许是深度学习的最好猎物
7. mxnet的NDArray比numpy的ndarray提供GPU计算和自动求梯度
8. broadcasting: 当形状不同的ndarray按元素运算时, 会先复制元素让两个ndarray形状相同
9. 节省内存开销: `[:]`, `X[:] = X + Y` 或者 `X += Y `
10. mxnet的ndarray和 numpy的ndarray转换: 
    - `nd.array(np_array)`
    - `nd.array().asnumpy()`
11. 自动求导数/梯度(gradient)
12. 随机数的生成方法(`nd.random`):
    - normal, 正态分布
    - uniform, 均匀分布
    - poisson, 泊松分布
13. [如何使用jupyter编写数学公式(译)](https://www.jianshu.com/p/93ccc63e5a1b)

In [2]:
from mxnet import nd

In [3]:
x = nd.arange(12)
X = x.reshape((3,4))
Y = nd.array([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

In [4]:
nd.random.normal(0, 1, shape=(3, 4))  # 每个元素都随机采样于均值为0、标准差为1的正态分布


[[ 1.1630787   0.4838046   0.29956347  0.15302546]
 [-1.1688148   1.5580711  -0.5459446  -2.3556297 ]
 [ 0.5414402   2.6785066   1.2546344  -0.54877394]]
<NDArray 3x4 @cpu(0)>

In [5]:
X + Y


[[ 2.  2.  6.  6.]
 [ 5.  7.  9. 11.]
 [12. 12. 12. 12.]]
<NDArray 3x4 @cpu(0)>

In [20]:
nd.elemwise_add(X, Y)


[[ 2.  2.  6.  6.]
 [ 5.  7.  9. 11.]
 [12. 12. 12. 12.]]
<NDArray 3x4 @cpu(0)>

In [21]:
Z = Y.zeros_like()
nd.elemwise_add(X, Y, out=Z)
Z


[[ 2.  2.  6.  6.]
 [ 5.  7.  9. 11.]
 [12. 12. 12. 12.]]
<NDArray 3x4 @cpu(0)>

In [7]:
X * Y  # element-wise


[[ 0.  1.  8.  9.]
 [ 4. 10. 18. 28.]
 [32. 27. 20. 11.]]
<NDArray 3x4 @cpu(0)>

In [8]:
X / Y


[[ 0.    1.    0.5   1.  ]
 [ 4.    2.5   2.    1.75]
 [ 2.    3.    5.   11.  ]]
<NDArray 3x4 @cpu(0)>

In [13]:
Y.exp()


[[ 7.389056   2.7182817 54.59815   20.085537 ]
 [ 2.7182817  7.389056  20.085537  54.59815  ]
 [54.59815   20.085537   7.389056   2.7182817]]
<NDArray 3x4 @cpu(0)>

In [10]:
nd.concat(X, Y, dim=0)  # 纵向连结


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]
 [ 2.  1.  4.  3.]
 [ 1.  2.  3.  4.]
 [ 4.  3.  2.  1.]]
<NDArray 6x4 @cpu(0)>

In [11]:
nd.concat(X, Y, dim=1)  # 横向连结


[[ 0.  1.  2.  3.  2.  1.  4.  3.]
 [ 4.  5.  6.  7.  1.  2.  3.  4.]
 [ 8.  9. 10. 11.  4.  3.  2.  1.]]
<NDArray 3x8 @cpu(0)>

In [12]:
nd.dot(X, Y.T)


[[ 18.  20.  10.]
 [ 58.  60.  50.]
 [ 98. 100.  90.]]
<NDArray 3x3 @cpu(0)>

In [14]:
X == Y


[[0. 1. 0. 1.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
<NDArray 3x4 @cpu(0)>

In [18]:
X > Y


[[0. 0. 0. 0.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
<NDArray 3x4 @cpu(0)>

In [19]:
X < Y


[[1. 0. 1. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
<NDArray 3x4 @cpu(0)>

In [15]:
X.sum()


[66.]
<NDArray 1 @cpu(0)>

In [16]:
X.norm()  # L2 范数: 所有元素的平方和的开方


[22.494442]
<NDArray 1 @cpu(0)>

In [17]:
X.norm().asscalar()  # 将结果转为标量

22.494442

### 自动求梯度
以求x的二次方为例: $$y = x^2$$
1. `attach_grad`: 为梯度计算申请内存
2. `record`: 为了减少计算和内存开销，默认条件下MXNet不会记录用于求梯度的计算, 使用record()来记录与梯度有关的计算
3. `backward`: 自动求梯度
4. 运行模式包括训练模式和预测模式
5. `autograd`: 对一般的<b>命令式程序</b>进行求导

In [22]:
x = nd.arange(4).reshape((4, 1))

In [23]:
x.attach_grad() # 申请内存

In [24]:
from mxnet import autograd

In [45]:
with autograd.record():  # python 没有为with语句内的变量创建新的作用域
    y = 2 * nd.dot(x.T, x)

In [46]:
y  # 如果 y 不是标量, mxnet会对y所有元素求和, 得到新的变量，再求该变量有关x的梯度


[[28.]]
<NDArray 1x1 @cpu(0)>

In [47]:
y.backward() 

In [48]:
x.grad == 4 * x


[[1.]
 [1.]
 [1.]
 [1.]]
<NDArray 4x1 @cpu(0)>

In [43]:
y.backward()  # Cannot differentiate node because it is not in a computational graph.

MXNetError: [23:46:14] src/imperative/imperative.cc:295: Check failed: !AGInfo::IsNone(*i): Cannot differentiate node because it is not in a computational graph. You need to set is_recording to true or use autograd.record() to save computational graphs for backward. If you want to differentiate the same graph twice, you need to pass retain_graph=True to backward.
Stack trace:
  [bt] (0) 1   libmxnet.so                         0x00000001139c7929 mxnet::op::NDArrayOpProp::~NDArrayOpProp() + 4473
  [bt] (1) 2   libmxnet.so                         0x0000000114f4eea1 mxnet::Imperative::Backward(std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, bool, bool, bool) + 16353
  [bt] (2) 3   libmxnet.so                         0x0000000114e8bffe MXAutogradBackwardEx + 1022
  [bt] (3) 4   _ctypes.cpython-37m-darwin.so       0x0000000109302367 ffi_call_unix64 + 79
  [bt] (4) 5   ???                                 0x00007ffee85b7ed0 0x0 + 140732796731088



In [44]:
print(autograd.is_recording())

False


In [49]:
print(autograd.is_training())

False


In [50]:
with autograd.record():
    print(autograd.is_recording())
    print(autograd.is_training())

True
True


In [61]:
# 对Python控制流求梯度
def f(a):
    """
    f(a) = num * a, 梯度为num
    """
    b = a * 2
    while b.norm().asscalar() < 1000:
        # 使梯度暴增
        print("b.norm().asscalar(): ", b.norm().asscalar())
        b = b * 2
    if b.sum().asscalar() > 0:
        print("b.sum().asscalar() > 0: ", b.sum().asscalar())
        c = b
    else:
        print("b.sum().asscalar() <= 0: ", b.sum().asscalar())
        c = 100 * b
    print("c = ", c)
    return c

In [62]:
a = nd.random.normal(shape=1)
a.attach_grad()
with autograd.record():
    c = f(a)

b.norm().asscalar():  1.1425364
b.norm().asscalar():  2.2850728
b.norm().asscalar():  4.5701456
b.norm().asscalar():  9.140291
b.norm().asscalar():  18.280582
b.norm().asscalar():  36.561165
b.norm().asscalar():  73.12233
b.norm().asscalar():  146.24466
b.norm().asscalar():  292.48932
b.norm().asscalar():  584.97864
b.sum().asscalar() > 0:  1169.9573
c =  
[1169.9573]
<NDArray 1 @cpu(0)>


In [63]:
c.backward()

In [64]:
a.grad


[2048.]
<NDArray 1 @cpu(0)>

In [65]:
c / a


[2048.]
<NDArray 1 @cpu(0)>

In [58]:
a


[0.37723127]
<NDArray 1 @cpu(0)>