# Fundamentals of MXNet-NumPy Module

## Namespaces for Imperative Programming
- `mxnet.numpy`: Regular NumPy operators
- `mxnet.numpy.random`: NumPy random operators
- `mxnet.numpy.linalg`: NumPy linear algebra operators
- `mxnet.numpy_extension`: Operators implemented in MXNet that do not exist in the official NumPy and some utils (e.g. context related functions).

## Operator Namespaces for Gluon
`F` can be either `mxnet.ndarray` or `mxnet.symbol`. Note that `np` and `npe` are aliases of `numpy` and `numpy_extension`, respectively.
- `F.np`: Regular NumPy operators
- `F.np.random`: NumPy random operators
- `F.np.linalg`: NumPy linear algebra operators
- `F.npe`: Operators implemented in MXNet that do not exist in official NumPy

## New `ndarray` and `symbol`
`mxnet.numpy.ndarray` (visible to users) and `mxnet.symbol.numpy._Symbol` (not directly visible to users)
- Same name as in the official NumPy package
- Dispatch convience fluent method calls to MXNet Numpy operators
- Override many convenience fluent methods that do not exist in the official NumPy ndarray
- Make the behavior of built-in methods consistent with the official NumPy
    - Indexing: `__getitem__` and `__setitem__`
    - Many binary element-wise with broadcasting, not supported in `mxnet.symbol.Symbol`
    
## User Experience of Module Importing (In Progress)
**Legacy**
```python
import mxnet as mx
from mxnet import gluon
```
**Numpy**
```python
from mxnet import np, npe, gluon
```

    
## MXNet NumPy in Action
### Scalar and zero-size tensors

In [None]:
import mxnet as mx
from mxnet import numpy as np

# create a scalar tensor
x = np.array(3.14)
print(x)  # x is actually an ndarray, but a scalar value will be printed

In [None]:
s = x.item()  # copy the element from the scalar tensor to a python scalar
print('s = {}'.format(str(s)))

In [None]:
# create a scalar tensors with only one element 1.0
y = np.ones(())
print(y)

In [None]:
# create a zero-size tensor
x = np.ones((5, 4, 0, 6))
print(x)

In [None]:
# transpose the zero-size tensor
y = np.transpose(x)
print(y)

### Conversion between classic and numpy ndarrays

In [None]:
# create a classic MXNet NDArray
x = mx.nd.random.uniform(shape=(2, 3))
print(x)

In [None]:
# convert classic NDArray type to mxnet.numpy.ndarray with zero-copy
y = x.as_np_ndarray()
print(y)

In [None]:
# changing y's content changes x's content too
y[:] = 1
print(x)

In [None]:
# convert mxnet.numpy.ndarray to classic NDArray with zero-copy
z = y.as_classic_ndarray()
print(z)

In [None]:
# changing z's content changes y's content too
z[:] = 2
print(y)

### There is a line between classic operators and numpy operators...
- Numpy operators can only accept numpy `ndarray`s/`_Symbol`s as inputs
- Classic operators can only accept classic `NDArray`s/`Symbol`s as inputs
- Explicit conversions must be performed if users want to leverage operators on both sides
- The layer inheriting from `HybridBlock` must have the same type of outputs, i.e., either all classic `NDArray`s or all numpy `ndarray`s, before hybridization

#### Imperative

In [None]:
a = mx.nd.ones((2, 3))  # create a classic NDArray
print(a)
out = np.sum(a)  # feeding it to a numpy operator would result in failure

In [None]:
b = a.as_np_ndarray()  # convert `a` to a numpy ndarray sharing the same data memory
print(b)
out = np.sum(b)  # feed the numpy ndarray to a numpy operator
print('np.sum(b) =', out)

In [None]:
out = mx.nd.sum(b)  # feeding `b` to a classic operator would reuslt in failure

In [None]:
c = b.as_classic_ndarray()  # convert `b` to a classic ndarray
out = mx.nd.sum(c)  # feed the classic ndarray to a classic operator
print('mx.nd.sum(c) =', str(out))

#### Gluon

In [None]:
from mxnet import gluon
class TestMultipleOutputs(gluon.HybridBlock):
    def hybrid_forward(self, F, x):
        ret1 = F.sum(x)  # a classic operator produces a classic NDArray
        ret2 = F.np.sum(x)  # a numpy operator produces a numpy NDArray
        return ret1, ret2

net = TestMultipleOutputs()
net.hybridize()
out = net(a)  # `a` is a classic NDArray and will cause an error on `F.np.sum` which is a numpy operator

In [None]:
net = TestMultipleOutputs()  # redefine a net with no pre-built graph
net.hybridize()
out = net(b)  # `b` is a numpy ndarray and will cause an error on `F.sum` which is a classic operator

In [None]:
class TestMultipleOutputs2(gluon.HybridBlock):
    def hybrid_forward(self, F, x):  # x is known to be a numpy ndarray
        ret1 = F.sum(x.as_classic_ndarray())  # a classic operator produces a classic NDArray
        ret2 = F.np.sum()  # a numpy operator produces a numpy NDArray
        return ret1, ret2  # two outputs of the layer with different types would result in failure in building the graph

net = TestMultipleOutputs2()
net.hybridize()
out = net(b)

In [None]:
class TestMultipleOutputs3(gluon.HybridBlock):
    def hybrid_forward(self, F, x):  # x is known to be a numpy ndarray
        ret1 = F.sum(x.as_classic_ndarray())  # a classic operator produces a classic NDArray
        ret2 = F.np.sum(x)  # a numpy operator produces a numpy NDArray
        return ret1.as_np_ndarray(), ret2  # two outputs of the layer with different types would result in failure in building the graph

net = TestMultipleOutputs3()
net.hybridize()
out = net(b)
print('classic operator output: ', out[0])
print('numpy operator output: ', out[1])

### Binary element-wise operations with broadcasting in new and old symbols

In [None]:
class TestBinaryBroadcast(gluon.HybridBlock):
    def hybrid_forward(self, F, x1, x2):
        print("x1 type in hybrid_forward:", str(type(x1)))
        print("x2 type in hybrid_forward:", str(type(x2)))
        return x1 + x2

net = TestBinaryBroadcast()
x1 = mx.nd.ones((2, 1))
x2 = mx.nd.ones((1, 3))
print('x1 input tensor type: ', str(type(x1)))
print('x2 input tensor type: ', str(type(x2)))
out = net(x1, x2)  # ok: imperative execution supports broadcasting
print(out)

In [None]:
net.hybridize()  # mark the block for execution using a computational graph
try:
    out = net(x1, x2)  # error: old symbol `+` operation does not support broadcasting
    assert False  # should not reach here
except mx.MXNetError:
    print("ERROR: cannot perform broadcast add for two symbols of type mx.sym.Symbol")

In [None]:
net = TestBinaryBroadcast()  # redefine a net to clear the pre-built graph cache
net.hybridize()

x1 = x1.as_np_ndarray()  # convert x1 to np.ndarray
x2 = x2.as_np_ndarray()  # convert x2 to np.ndarray
print('x1 input tensor type: ', str(type(x1)))
print('x2 input tensor type: ', str(type(x2)))
out = net(x1, x2)  # ok: a graph is built with numpy symbols which supports broadcasting, because inputs are np.ndarray's, 
print(out)

## A Simple Linear Regression Model
Let's consider a simple linear regression model as the following.
Given dataset `{x, y}`, where `x`s represent input examples and `y`s represent observed data, find the parameters `w1` and `w2` for the following model.
```
y_pred = np.dot(np.maximum(np.dot(x, w1), 0), w2)
```

In [None]:
import mxnet as mx
from mxnet import gluon, autograd, np


@np.use_np_compat
class LinearRegression(gluon.HybridBlock):
    def __init__(self, num_input_dim=1000, num_hidden_dim=100, num_output_dim=10):
        super(LinearRegression, self).__init__()
        with self.name_scope():
            self.w1 = self.params.get('w1', shape=(num_input_dim, num_hidden_dim),
                                      allow_deferred_init=True)
            self.w2 = self.params.get('w2', shape=(num_hidden_dim, num_output_dim),
                                      allow_deferred_init=True)

    def hybrid_forward(self, F, x, w1, w2):
        h = x.dot(w1)  # equivalent to F.np.dot(x, w1)
        h_relu = F.npe.relu(h)  # equivalent to F.relu(h) but generating np.ndarray
        y_pred = h_relu.dot(w2)  # equivalent to F.np.dot(h_relu, w2)
        return y_pred


class TotalLoss(gluon.HybridBlock):
    def hybrid_forward(self, F, pred, label):
        return ((pred - label) ** 2).sum()  # equivalent to F.np.sum(F.np.square(pred - label))


regressor = LinearRegression()
regressor.initialize(mx.init.Normal())
regressor.hybridize()

# Create random input and output data
x = mx.nd.random.normal(shape=(64, 1000)).as_np_ndarray()  # x is of type mxnet.numpy.ndarray
y = mx.nd.random.normal(shape=(64, 10)).as_np_ndarray()  # y is of type mxnet.numpy.ndarray

total_loss = TotalLoss()
trainer = gluon.Trainer(regressor.collect_params(),
                        'sgd',
                        {'learning_rate': 1e-3, 'momentum': 0.9, 'allow_np': True})

for t in range(50):
    with autograd.record():
        output = regressor(x)  # output is a type of np.ndarray because np.dot is the last op in the network
        loss = total_loss(output, y)  # loss is a scalar np.ndarray
    loss.backward()
    print(t, loss)  # note that loss.asnumpy() is called
    trainer.step(1)