# Manipulate data the MXNet way with `ndarray`

It's impossible to get anything done if we can't manipulate data. 
Generally, there are two important things we need to do with: 
(i) acquire it! and (ii) process it once it's inside the computer.
There's no point in trying to acquire data if we don't even know how to store it,
so let's get our hands dirty first by playing with synthetic data.

We'll start by introducing NDArrays, MXNet's primary tool for storing and transforming data. If you've worked with NumPy before, you'll notice that NDArrays are, by design, similar to NumPy's multi-dimensional array. However, they confer a few key advantages. First, NDArrays support asynchronous computation on CPU, GPU, and distributed cloud architectures. Second, they provide support for automatic differentiation. These properties make NDArray an ideal library for machine learning, both for researchers and engineers launching production systems.


## Getting started

In this chapter, we'll get you going with the basic functionality. Don't worry if you don't understand any of the basic math, like element-wise operations or normal distributions. In the next two chapters we'll take another pass at NDArray, teaching you both the math you'll need and how to realize it in code.

To get started, let's import `mxnet`. We'll also import `ndarray` from `mxnet` for convenience. We’ll make a habit of setting a random seed so that you always get the same results that we do.

In [1]:
import mxnet as mx
from mxnet import nd
mx.random.seed(1)

Next, let's see how to create an NDArray, without any values initialized. Specifically, we'll create a 2D array (also called a *matrix*) with 3 rows and 4 columns.

In [2]:
x = nd.empty((3, 4))
print(x)


[[  4.81077988e-24   4.57103559e-41  -2.63939521e-36   3.08159545e-41]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00]]
<NDArray 3x4 @cpu(0)>


The `empty` method just grabs some memory and hands us back a matrix without setting the values of any of its entries. This means that the entries can have any form of values, including very big ones! But typically, we'll want our matrices initialized. Commonly, we want a matrix of all zeros. 

In [3]:
x = nd.zeros((3, 5))
x


[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]
<NDArray 3x5 @cpu(0)>

Similarly, `ndarray` has a function to create a matrix of all ones. 

In [4]:
x = nd.ones((3, 4))
x


[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]
<NDArray 3x4 @cpu(0)>

Often, we'll want to create arrays whose values are sampled randomly. This is especially common when we intend to use the array as a parameter in a neural network. In this snippet, we initialize with values drawn from a standard normal distribution with zero mean and unit variance.

In [5]:
y = nd.random_normal(0, 1, shape=(3, 4))
y


[[-0.67765152  0.10073948  0.57595438 -0.3469252 ]
 [-0.22134334 -1.80471897 -0.80642909  1.22033095]
 [ 2.23235631  0.20070229 -0.54968649 -0.19819015]]
<NDArray 3x4 @cpu(0)>

As in NumPy, the dimensions of each NDArray are accessible via the `.shape` attribute.

In [6]:
y.shape

(3, 4)

We can also query its size, which is equal to the product of the components of the shape. Together with the precision of the stored values, this tells us how much memory the array occupies.

In [7]:
y.size

12

## Operations

NDArray supports a large number of standard mathematical operations. Such as element-wise addition:

In [8]:
x + y


[[ 0.32234848  1.10073948  1.57595444  0.6530748 ]
 [ 0.77865666 -0.80471897  0.19357091  2.22033095]
 [ 3.23235631  1.20070231  0.45031351  0.80180985]]
<NDArray 3x4 @cpu(0)>

Multiplication:

In [9]:
x * y


[[-0.67765152  0.10073948  0.57595438 -0.3469252 ]
 [-0.22134334 -1.80471897 -0.80642909  1.22033095]
 [ 2.23235631  0.20070229 -0.54968649 -0.19819015]]
<NDArray 3x4 @cpu(0)>

And exponentiation:

In [10]:
nd.exp(y)


[[ 0.50780815  1.1059885   1.77882743  0.70685822]
 [ 0.80144149  0.16452068  0.44644946  3.388309  ]
 [ 9.321805    1.22226083  0.57713073  0.82021385]]
<NDArray 3x4 @cpu(0)>

We can also grab a matrix's transpose to compute a proper matrix-matrix product.

In [11]:
nd.dot(x, y.T)


[[-0.34788287 -1.61216044  1.68518198]
 [-0.34788287 -1.61216044  1.68518198]
 [-0.34788287 -1.61216044  1.68518198]]
<NDArray 3x3 @cpu(0)>

We'll explain these operations and present even more operators in the [linear algebra](P01-C03-linear-algebra.ipynb) chapter. But for now, we'll stick with the mechanics of working with NDArrays.

## In-place operations

In the previous example, every time we ran an operation, we allocated new memory to host its results. For example, if we write `y = x + y`, we will dereference the matrix that `y` used to point to and instead point it at the newly allocated memory. In the following example we demonstrate this with Python's `id()` function, which gives us the exact address of the referenced object in memory. After running `y = y + x`, we'll find that `id(y)` points to a different location. That's because Python first evaluates `y + x`, allocating new memory for the result and then subsequently redirects `y` to point at this new location in memory.

In [12]:
print('id(y):', id(y))
y = y + x
print('id(y):', id(y))

id(y): 140099375706464
id(y): 140101792980552


This might be undesirable for two reasons. First, we don't want to run around allocating memory unnecessarily all the time. In machine learning, we might have hundreds of megabytes of paramaters and update all of them multiple times per second. Typically, we'll want to perform these updates in place. Second, we might point at the same parameters from multiple variables. If we don't update in place, this could cause a memory leak, and could cause us to inadvertently reference stale parameters. 

Fortunately, performing in-place operations in MXNet is easy. We can assign the result of an operation to a previously allocated array with slice notation, e.g., `y[:] = <expression>`.

In [13]:
print('id(y):', id(y))
y[:] = x + y
print('id(y):', id(y))

id(y): 140101792980552
id(y): 140101792980552


While this syntacically nice, `x+y` here will still allocate a temporary buffer to store the result before copying it to `y[:]`. To make even better use of memory, we can directly invoke the underlying `ndarray` operation, in this case `elemwise_add`, avoiding temporary buffers. We do this by specifying the `out` keyword argument, which every `ndarray` operator supports:

In [14]:
nd.elemwise_add(x, y, out=y)


[[ 2.32234859  3.10073948  3.57595444  2.65307474]
 [ 2.77865672  1.19528103  2.19357085  4.22033119]
 [ 5.23235607  3.20070219  2.45031357  2.80180979]]
<NDArray 3x4 @cpu(0)>

If we're not planning to re-use ``x``, then we can assign the result to ``x`` itself. There are two ways to do this in MXNet. 
1. By using slice notation x[:] = x op y
2. By using the op-equals operators like `+=`

In [15]:
print('id(x):', id(x))
x += y
x
print('id(x):', id(x))

id(x): 140101793630696
id(x): 140101793630696


## Slicing
MXNet NDArrays support slicing in all the ridiculous ways you might imagine accessing your data. Here's an example of reading the second and third rows from `x`.

In [16]:
x[1:3]


[[ 3.77865672  2.19528103  3.19357085  5.22033119]
 [ 6.23235607  4.20070219  3.45031357  3.80180979]]
<NDArray 2x4 @cpu(0)>

Now let's try writing to a specific element.

In [17]:
x[1,2] = 9.0
x


[[ 3.32234859  4.10073948  4.57595444  3.65307474]
 [ 3.77865672  2.19528103  9.          5.22033119]
 [ 6.23235607  4.20070219  3.45031357  3.80180979]]
<NDArray 3x4 @cpu(0)>

Multi-dimensional slicing is also supported.

In [18]:
x[1:2,1:3]


[[ 2.19528103  9.        ]]
<NDArray 1x2 @cpu(0)>

In [19]:
x[1:2,1:3] = 5.0
x


[[ 3.32234859  4.10073948  4.57595444  3.65307474]
 [ 3.77865672  5.          5.          5.22033119]
 [ 6.23235607  4.20070219  3.45031357  3.80180979]]
<NDArray 3x4 @cpu(0)>

## Broadcasting

You might wonder, what happens if you add a vector `y` to a matrix `X`? These operations, where we compose a low dimensional array `y` with a high-dimensional array `X` invoke a functionality called broadcasting. Here, the low-dimensional array is duplicated along any axis with dimension $1$ to match the shape of the high dimensional array. Consider the following example.

In [20]:
x = nd.ones(shape=(3,3))
print('x = ', x)
y = nd.arange(3)
print('y = ', y)
print('x + y = ', x + y)

x =  
[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
<NDArray 3x3 @cpu(0)>
y =  
[ 0.  1.  2.]
<NDArray 3 @cpu(0)>
x + y =  
[[ 1.  2.  3.]
 [ 1.  2.  3.]
 [ 1.  2.  3.]]
<NDArray 3x3 @cpu(0)>


While `y` is initially of shape (3), 
MXNet infers its shape to be (1,3), 
and then broadcasts along the rows to form a (3,3) matrix). 
You might wonder, why did MXNet choose to interpret `y` as a (1,3) matrix and not (3,1). 
That's because broadcasting prefers to duplicate along the left most axis. 
We can alter this behavior by explicitly giving `y` a 2D shape.

In [21]:
y = y.reshape((3,1))
print('y = ', y)
print('x + y = ', x+y)

y =  
[[ 0.]
 [ 1.]
 [ 2.]]
<NDArray 3x1 @cpu(0)>
x + y =  
[[ 1.  1.  1.]
 [ 2.  2.  2.]
 [ 3.  3.  3.]]
<NDArray 3x3 @cpu(0)>


## Converting from MXNet NDArray to NumPy
Converting MXNet NDArrays to and from NumPy is easy. The converted arrays do not share memory.

In [22]:
a = x.asnumpy()
type(a)

numpy.ndarray

In [23]:
y = nd.array(a) 
y


[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
<NDArray 3x3 @cpu(0)>

## Managing context
You might have noticed that MXNet NDArray looks almost identical to NumPy. 
But there are a few crucial differences.
One of the key features that differentiates MXNet from NumPy is its support for diverse hardware devices.

In MXNet, every array has a context. 
One context could be the CPU. 
Other contexts might be various GPUs. 
Things can get even hairier when we deploy jobs across multiple servers. 
By assigning arrays to contexts intelligently, 
we can minimize the time spent transferring data between devices. 
For example, when training neural networks on a server with a GPU, 
we typically prefer for the model's parameters to live on the GPU. 
To start, let's try initializing an array on the first GPU.

In [24]:
z = nd.ones(shape=(3,3), ctx=mx.gpu(0))
z


[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
<NDArray 3x3 @gpu(0)>

Given an NDArray on a given context, we can copy it to another context by using the copyto() method.

In [25]:
x_gpu = x.copyto(mx.gpu(0))
print(x_gpu)


[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
<NDArray 3x3 @gpu(0)>


The result of an operator will have the same context as the inputs.

In [26]:
x_gpu + z


[[ 2.  2.  2.]
 [ 2.  2.  2.]
 [ 2.  2.  2.]]
<NDArray 3x3 @gpu(0)>

If we ever want to check the context of an NDArray programmaticaly, 
we can just call its `.context` attribute.

In [27]:
print(x_gpu.context)
print(z.context)

gpu(0)
gpu(0)


In order to perform an operation on two ndarrays `x1` and `x2`,
we need them both to live on the same context. 
And if they don't already, 
we may need to explicitly copy data from one context to another.
You might think that's annoying. 
After all, we just demonstrated that MXNet knows where each NDArray lives. 
So why can't MXNet just automatically copy `x1` to `x2.context` and then add them?

In short, people use MXNet to do machine learning
because they expect it to be fast. 
But transferring variables between different contexts is slow. 
So we want you to be 100% certain that you want to do something slow 
before we let you do it. 
If MXNet just did the copy automatically without crashing
then you might not realize that you had written some slow code.
We don't want you to spend your entire life on StackOverflow,
so we make some mistakes impossible. 

![](../img/operator-context.png)

## Watch out!

Imagine that your variable z already lives on your second GPU (`gpu(0)`). What happens if we call `z.copyto(gpu(0))`? It will make a copy and allocate new memory, even though that variable already lives on the desired device!

There are times where depending on the environment our code is running in,
two variables may already live on the same device.
So we only want to make a copy if the variables currently lives on different contexts. 
In these cases, we can call `as_in_context()`. 
If the variable is already the specified context then this is a no-op.

In [28]:
print('id(z):', id(z))
z = z.copyto(mx.gpu(0))
print('id(z):', id(z))
z = z.as_in_context(mx.gpu(0))
print('id(z):', id(z))
print(z)

id(z): 140101780190600
id(z): 140101780192784
id(z): 140101780192784

[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
<NDArray 3x3 @gpu(0)>


## Next
[Linear algebra](../chapter01_crashcourse/linear-algebra.ipynb)

For whinges or inquiries, [open an issue on  GitHub.](https://github.com/zackchase/mxnet-the-straight-dope)

In [1]:
import logging

In [3]:
import math
import random
import mxnet as mx
import numpy as np
logging.getLogger().setLevel(logging.DEBUG)

In [20]:
n_sample = 1000
batchsize = 10
learning_rate = 0.1
n_epoch=1

In [5]:
train_in = [[random.uniform(0,1) for c in range(2)] for n in range(n_sample)]

In [6]:
train_in

[[0.7900633085203065, 0.044807790361679145],
 [0.6539154006667066, 0.94089065849957],
 [0.8070766419909117, 0.9360508315029309],
 [0.572245943004865, 0.4071177343331972],
 [0.16979524391326417, 0.30727415087021626],
 [0.8182693590012237, 0.7014178810949341],
 [0.5102908298677126, 0.9855257205083898],
 [0.3345360942588129, 0.01468618037637226],
 [0.012907341324591592, 0.7285121473351921],
 [0.9529313564633165, 0.8089518955030356],
 [0.8241442455998116, 0.8685051655411183],
 [0.07265406740337432, 0.1544827002699457],
 [0.5916520521086587, 0.0913171203114328],
 [0.11922752993472308, 0.9610884163396227],
 [0.3885882648200478, 0.5535977417712672],
 [0.8958666425757527, 0.4327438498135404],
 [0.885364480656064, 0.5986733682346562],
 [0.7863403370667333, 0.1617702222557228],
 [0.739042594831718, 0.2927949195771039],
 [0.7068446656223393, 0.5052013139735675],
 [0.21403903534348, 0.47757494199140116],
 [0.7287181422597351, 0.7780110096591639],
 [0.44452874862448355, 0.2605893550720718],
 [0.209

In [7]:
train_out = [0 for n in range(n_sample)]

In [8]:
for i in range(n_sample):
    train_out[i] = max(train_in[i][0], train_in[i][1])

In [9]:
train_out

[0.7900633085203065,
 0.94089065849957,
 0.9360508315029309,
 0.572245943004865,
 0.30727415087021626,
 0.8182693590012237,
 0.9855257205083898,
 0.3345360942588129,
 0.7285121473351921,
 0.9529313564633165,
 0.8685051655411183,
 0.1544827002699457,
 0.5916520521086587,
 0.9610884163396227,
 0.5535977417712672,
 0.8958666425757527,
 0.885364480656064,
 0.7863403370667333,
 0.739042594831718,
 0.7068446656223393,
 0.47757494199140116,
 0.7780110096591639,
 0.44452874862448355,
 0.3482438437089307,
 0.21258654377276986,
 0.38863993392834983,
 0.19116119612248317,
 0.8891474042472653,
 0.9080366614804349,
 0.57535278505867,
 0.43338712905731636,
 0.6661296402093466,
 0.021962947768306873,
 0.6618922732432475,
 0.6780933895603043,
 0.824520528123539,
 0.6004289630572687,
 0.8901605522528311,
 0.7037627626404642,
 0.8490130844704884,
 0.8317063145899669,
 0.8105076726881468,
 0.39272334684169075,
 0.9887397875602735,
 0.23551053515331355,
 0.7342082339411029,
 0.625471882811338,
 0.63797953

In [11]:
train_iter = mx.io.NDArrayIter(data=np.array(train_in), label={'reg_label': np.array(train_out)}, 
                          batch_size= batchsize,
                          shuffle= True)

In [12]:
train_iter

<mxnet.io.io.NDArrayIter at 0x7fb82a006588>

In [13]:
src = mx.sym.Variable('data')

In [14]:
fc = mx.sym.FullyConnected(data=src, num_hidden=1, name='fc')

In [15]:
net = mx.sym.LinearRegressionOutput(data=fc, name='reg')

In [16]:
module = mx.mod.Module(symbol=net, label_names = (['reg_label']))

In [17]:
def epoch_callback(epoch, symbol, arg_params, aux_params):
    for k in arg_params:
        print(k)
        print(arg_params[k].asnumpy())

In [21]:
module.fit(
    train_iter,
    eval_data=None,
    eval_metric = mx.metric.create('mse'),
    optimizer = 'sgd',
    optimizer_params = {'learning_rate': learning_rate},
    num_epoch = n_epoch,
    batch_end_callback = mx.callback.Speedometer(batchsize, 100),
    epoch_end_callback = epoch_callback,
)

  allow_missing=allow_missing, force_init=force_init)
INFO:root:Epoch[0] Batch [0-100]	Speed: 15824.46 samples/sec	mse=0.016863
INFO:root:Epoch[0] Batch [100-200]	Speed: 14844.57 samples/sec	mse=0.013711
INFO:root:Epoch[0] Batch [200-300]	Speed: 14807.10 samples/sec	mse=0.015332
INFO:root:Epoch[0] Batch [300-400]	Speed: 18595.40 samples/sec	mse=0.015705
INFO:root:Epoch[0] Batch [400-500]	Speed: 18141.85 samples/sec	mse=0.015852
INFO:root:Epoch[0] Batch [500-600]	Speed: 18206.42 samples/sec	mse=0.016993
INFO:root:Epoch[0] Batch [600-700]	Speed: 17528.78 samples/sec	mse=0.013882
INFO:root:Epoch[0] Batch [700-800]	Speed: 18312.06 samples/sec	mse=0.016820
INFO:root:Epoch[0] Batch [800-900]	Speed: 18179.59 samples/sec	mse=0.013476
INFO:root:Epoch[0] Train-mse=0.015173
INFO:root:Epoch[0] Time cost=0.594


fc_weight
[[0.5047617  0.45395592]]
fc_bias
[0.16543563]
