# Fundamentals of MXNet-NumPy Module

## Namespaces for Imperative Programming
- `mxnet.numpy`: Regular NumPy operators
- `mxnet.numpy.random`: NumPy random operators
- `mxnet.numpy.linalg`: NumPy linear algebra operators
- `mxnet.numpy_extension`: Operators implemented in MXNet that do not exist in the official NumPy and some utils (e.g. context related functions).

## Operator Namespaces for Gluon
`F` can be either `mxnet.ndarray` or `mxnet.symbol`. Note that `np` and `npe` are aliases of `numpy` and `numpy_extension`, respectively.
- `F.np`: Regular NumPy operators
- `F.np.random`: NumPy random operators
- `F.np.linalg`: NumPy linear algebra operators
- `F.npe`: Operators implemented in MXNet that do not exist in official NumPy

## New `ndarray` and `symbol`
`mxnet.numpy.ndarray` (visible to users) and `mxnet.symbol.numpy._Symbol` (not directly visible to users)
- Same name as in the official NumPy package
- Dispatch convience fluent method calls to MXNet Numpy operators
- Override many convenience fluent methods that do not exist in the official NumPy ndarray
- Make the behavior of built-in methods consistent with the official NumPy
    - Indexing: `__getitem__` and `__setitem__`
    - Many binary element-wise with broadcasting, not supported in `mxnet.symbol.Symbol`
    
## User Experience of Module Importing (In Progress)
**Legacy**
```python
import mxnet as mx
from mxnet import gluon
```
**Numpy**
```python
from mxnet import np, npe, gluon
```

    
## MXNet NumPy in Action
### Scalar and zero-size tensors

In [1]:
import mxnet as mx
from mxnet import numpy as np

# create a scalar tensor
x = np.array(3.14)
print(x)  # x is actually an ndarray, but a scalar value will be printed

3.14


In [2]:
s = x.item()  # copy the element from the scalar tensor to a python scalar
print('s = {}'.format(str(s)))

s = 3.140000104904175


In [3]:
# create a scalar tensors with only one element 1.0
y = np.ones(())
print(y)

1.0


In [4]:
# create a zero-size tensor
x = np.ones((5, 4, 0, 6))
print(x)

[]
<ndarray shape=(5, 4, 0, 6)>


In [5]:
# transpose the zero-size tensor
y = np.transpose(x)
print(y)

[]
<ndarray shape=(6, 0, 4, 5)>


### Conversion between classic and numpy ndarrays

In [6]:
# create a classic MXNet NDArray
x = mx.nd.random.uniform(shape=(2, 3))
print(x)


[[0.5488135  0.5928446  0.71518934]
 [0.84426576 0.60276335 0.8579456 ]]
<NDArray 2x3 @cpu(0)>


In [7]:
# convert classic NDArray type to mxnet.numpy.ndarray with zero-copy
y = x.as_np_ndarray()
print(y)

[[0.5488135  0.5928446  0.71518934]
 [0.84426576 0.60276335 0.8579456 ]]
<ndarray shape=(2, 3)>


In [8]:
# changing y's content changes x's content too
y[:] = 1
print(x)


[[1. 1. 1.]
 [1. 1. 1.]]
<NDArray 2x3 @cpu(0)>


In [9]:
# convert mxnet.numpy.ndarray to classic NDArray with zero-copy
z = y.as_classic_ndarray()
print(z)


[[1. 1. 1.]
 [1. 1. 1.]]
<NDArray 2x3 @cpu(0)>


In [10]:
# changing z's content changes y's content too
z[:] = 2
print(y)

[[2. 2. 2.]
 [2. 2. 2.]]
<ndarray shape=(2, 3)>


### There is a line between classic operators and numpy operators...
- Numpy operators can only accept numpy `ndarray`s/`_Symbol`s as inputs
- Classic operators can only accept classic `NDArray`s/`Symbol`s as inputs
- Explicit conversions must be performed if users want to leverage operators on both sides
- The layer inheriting from `HybridBlock` must have the same type of outputs, i.e., either all classic `NDArray`s or all numpy `ndarray`s, before hybridization

#### Imperative

In [11]:
a = mx.nd.ones((2, 3))  # create a classic NDArray
print(a)
out = np.sum(a)  # feeding it to a numpy operator would result in failure


[[1. 1. 1.]
 [1. 1. 1.]]
<NDArray 2x3 @cpu(0)>


TypeError: Operator `_np_sum` registered in backend is known as `sum` in Python. This is a numpy operator which can only accept MXNet numpy ndarrays, while received a classic ndarray. Please call `as_np_ndarray()` upon the classic ndarray to convert it to an MXNet numpy ndarray, and then feed the converted array to this operator.

In [12]:
b = a.as_np_ndarray()  # convert `a` to a numpy ndarray sharing the same data memory
print(b)
out = np.sum(b)  # feed the numpy ndarray to a numpy operator
print('np.sum(b) =', out)

[[1. 1. 1.]
 [1. 1. 1.]]
<ndarray shape=(2, 3)>
np.sum(b) = 6.0


In [13]:
out = mx.nd.sum(b)  # feeding `b` to a classic operator would reuslt in failure

TypeError: Operator `sum` registered in backend is known as `sum` in Python. This is a classic operator which can only accept classic ndarrays, while received an MXNet numpy ndarray. Please call `as_classic_ndarray()` upon the numpy ndarray to convert it to a classic ndarray, and then feed the converted array to this operator.

In [14]:
c = b.as_classic_ndarray()  # convert `b` to a classic ndarray
out = mx.nd.sum(c)  # feed the classic ndarray to a classic operator
print('mx.nd.sum(c) =', str(out))

mx.nd.sum(c) = 
[6.]
<NDArray 1 @cpu(0)>


#### Gluon

In [15]:
from mxnet import gluon
class TestMultipleOutputs(gluon.HybridBlock):
    def hybrid_forward(self, F, x):
        ret1 = F.sum(x)  # a classic operator produces a classic NDArray
        ret2 = F.np.sum(x)  # a numpy operator produces a numpy NDArray
        return ret1, ret2

net = TestMultipleOutputs()
net.hybridize()
out = net(a)  # `a` is a classic NDArray and will cause an error on `F.np.sum` which is a numpy operator

TypeError: Operator `_np_sum` registered in backend is known as `sum` in Python. This is a numpy operator which can only accept MXNet numpy ndarrays, while received a classic ndarray. Please call `as_np_ndarray()` upon the classic ndarray to convert it to an MXNet numpy ndarray, and then feed the converted array to this operator.

In [16]:
net = TestMultipleOutputs()  # redefine a net with no pre-built graph
net.hybridize()
out = net(b)  # `b` is a numpy ndarray and will cause an error on `F.sum` which is a classic operator

TypeError: Operator `sum` registered in backend is known as `sum` in Python. This is a classic operator which can only accept classic ndarrays, while received an MXNet numpy ndarray. Please call `as_classic_ndarray()` upon the numpy ndarray to convert it to a classic ndarray, and then feed the converted array to this operator.

In [17]:
class TestMultipleOutputs2(gluon.HybridBlock):
    def hybrid_forward(self, F, x):  # x is known to be a numpy ndarray
        ret1 = F.sum(x.as_classic_ndarray())  # a classic operator produces a classic NDArray
        ret2 = F.np.sum()  # a numpy operator produces a numpy NDArray
        return ret1, ret2  # two outputs of the layer with different types would result in failure in building the graph

net = TestMultipleOutputs2()
net.hybridize()
out = net(b)

TypeError: Found both classic symbol (mx.sym.Symbol) and numpy symbol (mx.sym.np._Symbol) in outputs. This will prevent you from building a computation graph by grouping them since different types of symbols are not allowed to be grouped in Gluon to form a computation graph. You will need to convert them to the same type of symbols, either classic or numpy following this rule: if you want numpy ndarray output(s) from the computation graph, please convert all the classic symbols in the list to numpy symbols by calling `as_np_ndarray()` on each of them; if you want classic ndarray output(s) from the computation graph, please convert all the numpy symbols in the list to classic symbols by calling `as_classic_ndarray()` on each of them.

In [18]:
class TestMultipleOutputs3(gluon.HybridBlock):
    def hybrid_forward(self, F, x):  # x is known to be a numpy ndarray
        ret1 = F.sum(x.as_classic_ndarray())  # a classic operator produces a classic NDArray
        ret2 = F.np.sum(x)  # a numpy operator produces a numpy NDArray
        return ret1.as_np_ndarray(), ret2  # two outputs of the layer with different types would result in failure in building the graph

net = TestMultipleOutputs3()
net.hybridize()
out = net(b)
print('classic operator output: ', out[0])
print('numpy operator output: ', out[1])

classic operator output:  [6.]
<ndarray shape=(1,)>
numpy operator output:  6.0


### Binary element-wise operations with broadcasting in new and old symbols

In [19]:
class TestBinaryBroadcast(gluon.HybridBlock):
    def hybrid_forward(self, F, x1, x2):
        print("x1 type in hybrid_forward:", str(type(x1)))
        print("x2 type in hybrid_forward:", str(type(x2)))
        return x1 + x2

net = TestBinaryBroadcast()
x1 = mx.nd.ones((2, 1))
x2 = mx.nd.ones((1, 3))
print('x1 input tensor type: ', str(type(x1)))
print('x2 input tensor type: ', str(type(x2)))
out = net(x1, x2)  # ok: imperative execution supports broadcasting
print(out)

x1 input tensor type:  <class 'mxnet.ndarray.ndarray.NDArray'>
x2 input tensor type:  <class 'mxnet.ndarray.ndarray.NDArray'>
x1 type in hybrid_forward: <class 'mxnet.ndarray.ndarray.NDArray'>
x2 type in hybrid_forward: <class 'mxnet.ndarray.ndarray.NDArray'>

[[2. 2. 2.]
 [2. 2. 2.]]
<NDArray 2x3 @cpu(0)>


In [20]:
net.hybridize()  # mark the block for execution using a computational graph
try:
    out = net(x1, x2)  # error: old symbol `+` operation does not support broadcasting
    assert False  # should not reach here
except mx.MXNetError:
    print("ERROR: cannot perform broadcast add for two symbols of type mx.sym.Symbol")

x1 type in hybrid_forward: <class 'mxnet.symbol.symbol.Symbol'>
x2 type in hybrid_forward: <class 'mxnet.symbol.symbol.Symbol'>
ERROR: cannot perform broadcast add for two symbols of type mx.sym.Symbol


In [21]:
net = TestBinaryBroadcast()  # redefine a net to clear the pre-built graph cache
net.hybridize()

x1 = x1.as_np_ndarray()  # convert x1 to np.ndarray
x2 = x2.as_np_ndarray()  # convert x2 to np.ndarray
print('x1 input tensor type: ', str(type(x1)))
print('x2 input tensor type: ', str(type(x2)))
out = net(x1, x2)  # ok: a graph is built with numpy symbols which supports broadcasting, because inputs are np.ndarray's, 
print(out)

x1 input tensor type:  <class 'mxnet.numpy.ndarray'>
x2 input tensor type:  <class 'mxnet.numpy.ndarray'>
x1 type in hybrid_forward: <class 'mxnet.symbol.numpy._Symbol'>
x2 type in hybrid_forward: <class 'mxnet.symbol.numpy._Symbol'>
[[2. 2. 2.]
 [2. 2. 2.]]
<ndarray shape=(2, 3)>


## A Simple Linear Regression Model
Let's consider a simple linear regression model as the following.
Given dataset `{x, y}`, where `x`s represent input examples and `y`s represent observed data, find the parameters `w1` and `w2` for the following model.
```
y_pred = np.dot(np.maximum(np.dot(x, w1), 0), w2)
```

In [23]:
import mxnet as mx
from mxnet import gluon, autograd, np


@np.use_np_compat
class LinearRegression(gluon.HybridBlock):
    def __init__(self, num_input_dim=1000, num_hidden_dim=100, num_output_dim=10):
        super(LinearRegression, self).__init__()
        with self.name_scope():
            self.w1 = self.params.get('w1', shape=(num_input_dim, num_hidden_dim),
                                      allow_deferred_init=True)
            self.w2 = self.params.get('w2', shape=(num_hidden_dim, num_output_dim),
                                      allow_deferred_init=True)

    def hybrid_forward(self, F, x, w1, w2):
        h = x.dot(w1)  # equivalent to F.np.dot(x, w1)
        h_relu = F.npe.relu(h)  # equivalent to F.relu(h) but generating np.ndarray
        y_pred = h_relu.dot(w2)  # equivalent to F.np.dot(h_relu, w2)
        return y_pred


class TotalLoss(gluon.HybridBlock):
    def hybrid_forward(self, F, pred, label):
        return ((pred - label) ** 2).sum()  # equivalent to F.np.sum(F.np.square(pred - label))


regressor = LinearRegression()
regressor.initialize(mx.init.Normal())
regressor.hybridize()

# Create random input and output data
x = mx.nd.random.normal(shape=(64, 1000)).as_np_ndarray()  # x is of type mxnet.numpy.ndarray
y = mx.nd.random.normal(shape=(64, 10)).as_np_ndarray()  # y is of type mxnet.numpy.ndarray

total_loss = TotalLoss()
trainer = gluon.Trainer(regressor.collect_params(), 'sgd', {'learning_rate': 1e-3, 'momentum': 0.9, 'is_np_compat': True})

for t in range(50):
    with autograd.record():
        output = regressor(x)  # output is a type of np.ndarray because np.dot is the last op in the network
        loss = total_loss(output, y)  # loss is a scalar np.ndarray
    loss.backward()
    print(t, loss)  # note that loss.asnumpy() is called
    trainer.step(1)

0 635.3452471435195
1 609.4113292234346
2 559.87546427799
3 461.41301937442336
4 304.486978149979
5 151.7615894066132
6 253.24537498643855
7 135.55642916363826
8 129.34821836911033
9 128.3391935015758
10 143.42610490476966
11 84.85890323140177
12 74.78409554605787
13 61.45087224745606
14 66.81406598305887
15 39.86413416325183
16 48.55699012603194
17 39.183020775950695
18 36.9853225859044
19 29.4935944052759
20 28.067953118780892
21 22.82714183189558
22 20.185846625871605
23 23.296924431902553
24 14.45740539640055
25 16.068418413967166
26 13.014963073113483
27 13.498709249818049
28 9.324084546687768
29 9.906318719499112
30 8.94690978997059
31 7.0988503201665765
32 7.340095323153349
33 5.780370839915179
34 5.986697304177648
35 4.479480878404672
36 4.794883905060743
37 3.3289017163204235
38 3.7745999211903296
39 2.9971716826483714
40 2.9576141802030813
41 2.293731976414021
42 2.4810501748718004
43 1.8422429964271139
44 1.7880043325485007
45 1.5820533577979816
46 1.561094909726819
47 1.227