[MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems](https://arxiv.org/pdf/1512.01274.pdf)

- https://medium.com/@julsimon/an-introduction-to-the-mxnet-api-part-2-ce761513124e
- https://indico.io/blog/getting-started-with-mxnet/

**NDArray**

In [1]:
import mxnet as mx
import numpy as np
import logging

logging.basicConfig(level=logging.INFO)

In [2]:
# NDArray
a = mx.nd.array([[1, 2, 3], [4, 5, 6]])
print a.size
print a.shape
print a.dtype

6
(2L, 3L)
<type 'numpy.float32'>


In [3]:
b = mx.nd.array([[1, 2, 3], [1, 2, 3]], dtype=np.int32)
print b.dtype
b.asnumpy(), b

<type 'numpy.int32'>


(array([[1, 2, 3],
        [1, 2, 3]], dtype=int32), 
 [[1 2 3]
  [1 2 3]]
 <NDArray 2x3 @cpu(0)>)

In [4]:
# Element-wise product
a = mx.nd.array([[1, 2, 3], [4, 5, 6]])
b = a * a
b.asnumpy()

array([[  1.,   4.,   9.],
       [ 16.,  25.,  36.]], dtype=float32)

In [5]:
# Dot product
a = mx.nd.array([[1, 2, 3], [4, 5, 6]])
print a.shape
print a.asnumpy()
b = a.T
print b.shape
c = mx.nd.dot(a, b)
c.shape
c.asnumpy()

(2L, 3L)
[[ 1.  2.  3.]
 [ 4.  5.  6.]]
(3L, 2L)


array([[ 14.,  32.],
       [ 32.,  77.]], dtype=float32)

In [6]:
# Initialize matrices with different distributions
u = mx.nd.uniform(low=0, high=1, shape=(1000, 1000))

n = mx.nd.normal(loc=1, scale=2, shape=(1000, 1000))

# Dot product
d = mx.nd.dot(u, n)
d.shape

(1000L, 1000L)

**Symbol**

Dataflow Programming - A flexible way of defining parallel computation, where data flows through a **graph**. The graph defines the order of operations. Each operation is a *black box*, only its input and output are defined.

$$E = (A \times B) + (C \times D)$$

In [7]:
# Define symbols
a = mx.sym.Variable(name='A')
b = mx.sym.Variable(name='B')
c = mx.sym.Variable(name='C')
d = mx.sym.Variable(name='D')

# Define graph
e = (a * b) + (c * d)

print 'Symbols: ', a, b, c, d
print 'E: ', e
print 'E type: ', type(d)

Symbols:  <Symbol A> <Symbol B> <Symbol C> <Symbol D>
E:  <Symbol _plus0>
E type:  <class 'mxnet.symbol.Symbol'>


In [8]:
# E is a symbol and a result of '+' operation
print 'E arguments, i.e. E depends on on variables: ', e.list_arguments()
print 'E inputs: ', e.list_inputs()
print 'E outputs, i.e. Operation that computes E: ', e.list_outputs()
print 'E internals: ', e.get_internals()
print 'E internals outputs: ', e.get_internals().list_outputs()

E arguments, i.e. E depends on on variables:  ['A', 'B', 'C', 'D']
E inputs:  ['A', 'B', 'C', 'D']
E outputs, i.e. Operation that computes E:  ['_plus0_output']
E internals:  <Symbol group [A, B, _mul0, C, D, _mul1, _plus0]>
E internals outputs:  ['A', 'B', '_mul0_output', 'C', 'D', '_mul1_output', '_plus0_output']


**Binding NDArrays and Symbols**

Applying computation steps defined with Symbols to data stored in NDArrays requires an operation called **binding**, i.e. assining an NDArray to each input variable of the graph
 
- Data is loaded and prepared using the `imperative` programming model
- Computation is performed using the `symbolic` programming model - Allows MXNet to decouple code and data, perform parallel execution and graph optimization

In [9]:
# Create NDArrays
a_data = mx.nd.array([1], dtype=np.int32)
b_data = mx.nd.array([2], dtype=np.int32)
c_data = mx.nd.array([3], dtype=np.int32)
d_data = mx.nd.array([4], dtype=np.int32)

In [10]:
# Binding each NDArray to its corresponding Symbol. 
# NOTE: Need to select the context where execution will take place

arguments = {'A': a_data, 'B': b_data, 'C': c_data, 'D': d_data}
executor = e.bind(ctx=mx.cpu(device_id=0), args=arguments)
executor

<mxnet.executor.Executor at 0x7fe0ced6ce50>

In [11]:
# Let input data flow through the graph
e_data = executor.forward()
print e_data
print e_data[0]
print e_data[0].asnumpy()

[
[14]
<NDArray 1 @cpu(0)>]

[14]
<NDArray 1 @cpu(0)>
[14]


In [12]:
# Apply same graph to larger matrices: Just need to define inputs (binding and computation are identical)
a_data = mx.nd.uniform(low=0, high=1, shape=(1000, 1000))
b_data = mx.nd.uniform(low=0, high=1, shape=(1000, 1000))
c_data = mx.nd.uniform(low=0, high=1, shape=(1000, 1000))
d_data = mx.nd.uniform(low=0, high=1, shape=(1000, 1000))

# Bind data to symbol
arguments = {'A': a_data, 'B': b_data, 'C': c_data, 'D': d_data}
executor = e.bind(ctx=mx.cpu(device_id=0), args=arguments)

# Let input data flow through the graph
e_data = executor.forward()
print e_data

[
[[ 0.25428662  0.2716822   0.08855453 ...,  0.17513204  0.12643263
   0.24611941]
 [ 0.43185544  0.93409884  1.13012767 ...,  1.26165771  0.93693149
   0.92390227]
 [ 0.34301594  0.10574408  0.47203985 ...,  0.06712524  0.86225599
   0.38010356]
 ..., 
 [ 0.7397536   0.76276726  0.5228703  ...,  0.40895808  1.04495525
   1.34238076]
 [ 0.09233697  1.00399196  0.06909392 ...,  0.69651967  0.70354009
   0.40740401]
 [ 0.39363438  0.85467219  0.08330497 ...,  0.17517376  0.36264491
   0.81301337]]
<NDArray 1000x1000 @cpu(0)>]


**Module**

- Synthetic data
    - 1000 samples
    - 100 features (each represented by a float value between 0 and 1)
    - 10 categories
    - Train/Test split: 80/20
    - Batch size: 10

In [13]:
# Generate the data set
sample_count = 1000
train_count = 800
val_count = sample_count - train_count

feature_count = 100
category_count = 10
batch_size = 10

X = mx.nd.uniform(low=0, high=1, shape=(sample_count, feature_count))
print('X shape: ', X.shape)

Y = mx.nd.empty(shape=(sample_count,))
for i in range(0, sample_count-1):
    Y[i] = np.random.randint(0, category_count)
    
print('Y shape: ', Y.shape)

('X shape: ', (1000L, 100L))
('Y shape: ', (1000L,))


In [14]:
# Splitting the data set (data is random so no need to shuffle)
X_train = mx.nd.slice(data=X, begin=(0, 0), end=(train_count, feature_count-1))
X_val = mx.nd.slice(data=X, begin=(train_count, 0), end=(sample_count, feature_count-1))

Y_train = Y[0:train_count]
Y_val = Y[train_count:sample_count]

In [15]:
# Building the network
data = mx.sym.Variable(name='data')
fc1 = mx.sym.FullyConnected(data=data, name='fc1', num_hidden=64)
relu1 = mx.sym.Activation(data=fc1, name='relu1', act_type='relu')
fc2 = mx.sym.FullyConnected(data=relu1, name='fc2', num_hidden=category_count)
out = mx.sym.Softmax(data=fc2, name='softmax')

# Create a module
module = mx.mod.Module(out)

In [16]:
# Building the data iterator
train_iter = mx.io.NDArrayIter(data=X_train, label=Y_train, batch_size=batch_size)

In [17]:
# Training the network

# Bind the input symbol to data
module.bind(data_shapes=train_iter.provide_data, label_shapes=train_iter.provide_label)

# Initialize the neuron weights
module.init_params(initializer=mx.init.Xavier(magnitude=2.0))

# Define optimization parameters
module.init_optimizer(optimizer='sgd', optimizer_params=(('learning_rate', 0.1), ))

# Train
module.fit(train_data=train_iter, num_epoch=50)

  allow_missing=allow_missing, force_init=force_init)
INFO:root:Epoch[0] Train-accuracy=0.092500
INFO:root:Epoch[0] Time cost=0.202
INFO:root:Epoch[1] Train-accuracy=0.128750
INFO:root:Epoch[1] Time cost=0.166
INFO:root:Epoch[2] Train-accuracy=0.153750
INFO:root:Epoch[2] Time cost=0.129
INFO:root:Epoch[3] Train-accuracy=0.175000
INFO:root:Epoch[3] Time cost=0.115
INFO:root:Epoch[4] Train-accuracy=0.200000
INFO:root:Epoch[4] Time cost=0.107
INFO:root:Epoch[5] Train-accuracy=0.220000
INFO:root:Epoch[5] Time cost=0.240
INFO:root:Epoch[6] Train-accuracy=0.241250
INFO:root:Epoch[6] Time cost=0.169
INFO:root:Epoch[7] Train-accuracy=0.251250
INFO:root:Epoch[7] Time cost=0.100
INFO:root:Epoch[8] Train-accuracy=0.280000
INFO:root:Epoch[8] Time cost=0.088
INFO:root:Epoch[9] Train-accuracy=0.293750
INFO:root:Epoch[9] Time cost=0.071
INFO:root:Epoch[10] Train-accuracy=0.310000
INFO:root:Epoch[10] Time cost=0.169
INFO:root:Epoch[11] Train-accuracy=0.336250
INFO:root:Epoch[11] Time cost=0.115
INFO:r

In [18]:
# Validating the model

# Build val iterator
val_iter = mx.io.NDArrayIter(data=X_val, label=Y_val, batch_size=batch_size)

# Validation accuracy
pred_count = val_count
correct_preds = total_correct_preds = 0

for preds, i_batch, batch in module.iter_predict(val_iter):
    label = batch.label[0].asnumpy().astype(int)
    pred_label = preds[0].asnumpy().argmax(axis=1)
    correct_preds = np.sum(pred_label==label)
    total_correct_preds += correct_preds
    
print('Validation accuracy: {:2.3f}'.format(total_correct_preds/float(pred_count)))

Validation accuracy: 0.095
