# Manipulate data the MXNet way with `ndarray`

It's impossible to get anything done if we can't manipulate data. 
Generally, there are two important things we need to do with: 
(i) acquire it! and (ii) process it once it's inside the computer.
There's no point in trying to acquire data if we don't even know how to store it,
so let's get our hands dirty first by playing with synthetic data.

We'll start by introducing NDArrays, MXNet's primary tool for storing and transforming data. If you've worked with NumPy before, you'll notice that NDArrays are, by design, similar to NumPy's multi-dimensional array. However, they confer a few key advantages. First, NDArrays support asynchronous computation on CPU, GPU, and distributed cloud architectures. Second, they provide support for automatic differentiation. These properties make NDArray an ideal library for machine learning, both for researchers and engineers launching production systems.


## Getting started

In this chapter, we'll get you going with the basic functionality. Don't worry if you don't understand any of the basic math, like element-wise operations or normal distributions. In the next two chapters we'll take another pass at NDArray, teaching you both the math you'll need and how to realize it in code.

To get started, let's import `mxnet`. We'll also import `ndarray` from `mxnet` for convenience. We’ll make a habit of setting a random seed so that you always get the same results that we do.

In [1]:
import mxnet as mx
from mxnet import nd
mx.random.seed(1)

Let's start with a very simple 1-dimensional array with a python list.

In [2]:
x = nd.array([1,2,3])
print(x)


[ 1.  2.  3.]
<NDArray 3 @cpu(0)>


Now a 2-dimensional array.

In [3]:
y = nd.array([[1,2,3,4], [1,2,3,4], [1,2,3,4]])
print(y)


[[ 1.  2.  3.  4.]
 [ 1.  2.  3.  4.]
 [ 1.  2.  3.  4.]]
<NDArray 3x4 @cpu(0)>


Next, let's see how to create an NDArray, without any values initialized. Specifically, we'll create a 2D array (also called a *matrix*) with 3 rows and 4 columns using the `.empty` function. We'll also try out `.full` which takes an additional parameter for what value you want to fill in the array.

In [4]:
x = nd.empty((3, 3))
print(x)
x = nd.full((3,3), 7)
print(x)


[[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]
<NDArray 3x3 @cpu(0)>

[[ 7.  7.  7.]
 [ 7.  7.  7.]
 [ 7.  7.  7.]]
<NDArray 3x3 @cpu(0)>


`empty` just grabs some memory and hands us back a matrix without setting the values of any of its entries. This means that the entries can have any form of values, including very big ones! Typically, we'll want our matrices initialized and very often we want a matrix of all zeros, so we can use the `.zeros` function. If you're feeling experimental, try one of the several [array creation functions](https://mxnet.incubator.apache.org/api/python/ndarray.html?highlight=random_normal#array-creation-routines).

<!-- showing something different here (3,10) since the zeros may not produce anything different from empty... or use the two demonstrations to show something interesting or unique... when would I use one over the other?-->

In [5]:
x = nd.zeros((3, 10))
print(x)


[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]
<NDArray 3x10 @cpu(0)>


Similarly, `ndarray` has a function to create a matrix of all ones aptly named [ones](https://mxnet.incubator.apache.org/api/python/ndarray.html?highlight=random_normal#mxnet.ndarray.ones).

In [6]:
x = nd.ones((3, 4))
print(x)


[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]
<NDArray 3x4 @cpu(0)>


Often, we'll want to create arrays whose values are sampled randomly. This is especially common when we intend to use the array as a parameter in a neural network. In this snippet, we initialize with values drawn from a standard normal distribution with zero mean and unit variance using [random_normal](https://mxnet.incubator.apache.org/api/python/ndarray.html?highlight=random_normal#mxnet.ndarray.random_normal).

<!-- Is it that important to introduce zero mean and unit variance right now? Describe more? Or how about explain which is which for the 0 and the 1 and what they're going to do... if it actually matters at this point. -->

In [7]:
y = nd.random_normal(0, 1, shape=(3, 4))
print(y)


[[ 0.03629481 -0.67765152 -0.49024421  0.10073948]
 [-0.95017916  0.57595438  0.03751944 -0.3469252 ]
 [-0.72984636 -0.22134334 -2.04010558 -1.80471897]]
<NDArray 3x4 @cpu(0)>


Sometimes you will want to copy an array by its shape but not its contents. You can do this with `.zeros_like`.

In [8]:
z = nd.zeros_like(y)
print(z)


[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]
<NDArray 3x4 @cpu(0)>


As in NumPy, the dimensions of each NDArray are accessible via the `.shape` attribute.

In [9]:
y.shape

(3, 4)

We can also query its `.size`, which is equal to the product of the components of the shape. Together with the precision of the stored values, this tells us how much memory the array occupies.
<!-- is there a function for that or do you just do it manually? Should we show that? -->

In [10]:
y.size

12

We can query the data type using `.dtype`.

In [11]:
y.dtype

numpy.float32

`float32` is the default data type. Performance can be improved with less precision, or you might want to use a different data type. You can force the data type when you create the array using a numpy type. This requires you to import numpy first.

In [12]:
import numpy as np
a = nd.array([1,2,3])
b = nd.array([1,2,3], dtype=np.int32)
c = nd.array([1.2, 2.3], dtype=np.float16)
(a.dtype, b.dtype, c.dtype)

(numpy.float32, numpy.int32, numpy.float16)

As you will come to learn in detail later, operations and memory storage will happen on specific devices that you can set. You can compute on the CPU, GPU, a specific GPU, or all of the above depending on your situtation and preference. Using `.context` reveals the location of the variable.

In [13]:
y.context

cpu(0)

## Operations

NDArray supports a large number of standard mathematical operations. Such as element-wise addition:
<!-- keeping it easy -->

In [14]:
print('x=', x)
print('y=', y)
x = x + y
print('x = x + y, x=', x)

x= 
[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]
<NDArray 3x4 @cpu(0)>
y= 
[[ 0.03629481 -0.67765152 -0.49024421  0.10073948]
 [-0.95017916  0.57595438  0.03751944 -0.3469252 ]
 [-0.72984636 -0.22134334 -2.04010558 -1.80471897]]
<NDArray 3x4 @cpu(0)>
x = x + y, x= 
[[ 1.03629482  0.32234848  0.50975579  1.10073948]
 [ 0.04982084  1.57595444  1.03751945  0.6530748 ]
 [ 0.27015364  0.77865666 -1.04010558 -0.80471897]]
<NDArray 3x4 @cpu(0)>


Multiplication:

In [15]:
x = nd.array([1, 2, 3])
y = nd.array([2, 2, 2])
x * y


[ 2.  4.  6.]
<NDArray 3 @cpu(0)>

And exponentiation:
<!-- with these next ones we'll just have to take your word for it... -->

In [17]:
nd.exp(x)


[  2.71828175   7.38905621  20.08553696]
<NDArray 3 @cpu(0)>

We can also grab a matrix's transpose to compute a proper matrix-matrix product.
<!-- because we need to do that before we have coffee every day... and you know how those dirty, improper matrixeses can be... -->

In [18]:
nd.dot(x, y.T)


[ 12.]
<NDArray 1 @cpu(0)>

We'll explain these operations and present even more operators in the [linear algebra](P01-C03-linear-algebra.ipynb) chapter. But for now, we'll stick with the mechanics of working with NDArrays.

## In-place operations

In the previous example, every time we ran an operation, we allocated new memory to host its results. For example, if we write `y = x + y`, we will dereference the matrix that `y` used to point to and instead point it at the newly allocated memory. We can show this using Python's `id()` function, which tells us precisely which object a variable refers to.

<!-- dereference is something C++ people would know but everyone else... not so much. What's the point? ;) get it? Put it in more context as to why you care about this and why this is in front of so much other material. Seems like an optimization topic best suited for later...
###edit### we just talked about this, so I have better context. Now I understand, but your new reader will not. This should be covered in much more detail, and quite possibily in its own notebook since I think it will help to show some gotchas like you mentioned verbally. I am still leaning toward delaying the introduction of this topic....-->

In [19]:
print('y=', y)
print('id(y):', id(y))
y = y + x
print('after y=y+x, y=', y)
print('id(y):', id(y))

y= 
[ 2.  2.  2.]
<NDArray 3 @cpu(0)>
id(y): 4538672016
after y=y+x, y= 
[ 3.  4.  5.]
<NDArray 3 @cpu(0)>
id(y): 4538673416


We can assign the result to a previously allocated array with slice notation, e.g., `result[:] = ...`.

In [20]:
print('x=', x)
z = nd.zeros_like(x)
print('z is zeros_like x, z=', z)
print('id(z):', id(z))
print('y=', y)
z[:] = x + y
print('z[:] = x + y, z=', z)
print('id(z) is the same as before:', id(z))

x= 
[ 1.  2.  3.]
<NDArray 3 @cpu(0)>
z is zeros_like x, z= 
[ 0.  0.  0.]
<NDArray 3 @cpu(0)>
id(z): 4538674648
y= 
[ 3.  4.  5.]
<NDArray 3 @cpu(0)>
z[:] = x + y, z= 
[ 4.  6.  8.]
<NDArray 3 @cpu(0)>
id(z) is the same as before: 4538674648


However, `x+y` here will still allocate a temporary buffer to store the result before copying it to z. To make better use of memory, we can perform operations in place, avoiding temporary buffers. To do this we specify the `out` keyword argument every operator supports:

In [21]:
print('x=', x, 'is in id(x):', id(x))
print('y=', y, 'is in id(y):', id(y))
print('z=', z, 'is in id(z):', id(z))
nd.elemwise_add(x, y, out=z)
print('after nd.elemwise_add(x, y, out=z), x=', x, 'is in id(x):', id(x))
print('after nd.elemwise_add(x, y, out=z), y=', y, 'is in id(y):', id(y))
print('after nd.elemwise_add(x, y, out=z), z=', z, 'is in id(z):', id(z))

x= 
[ 1.  2.  3.]
<NDArray 3 @cpu(0)> is in id(x): 4538672072
y= 
[ 3.  4.  5.]
<NDArray 3 @cpu(0)> is in id(y): 4538673416
z= 
[ 4.  6.  8.]
<NDArray 3 @cpu(0)> is in id(z): 4538674648
after nd.elemwise_add(x, y, out=z), x= 
[ 1.  2.  3.]
<NDArray 3 @cpu(0)> is in id(x): 4538672072
after nd.elemwise_add(x, y, out=z), y= 
[ 3.  4.  5.]
<NDArray 3 @cpu(0)> is in id(y): 4538673416
after nd.elemwise_add(x, y, out=z), z= 
[ 4.  6.  8.]
<NDArray 3 @cpu(0)> is in id(z): 4538674648


If we're not planning to re-use ``x``, then we can assign the result to ``x`` itself. There are two ways to do this in MXNet. 
1. By using slice notation x[:] = x op y
2. By using the op-equals operators like `+=`

In [22]:
print('x=', x, 'is in id(x):', id(x))
x += y
print('x=', x, 'is in id(x):', id(x))

x= 
[ 1.  2.  3.]
<NDArray 3 @cpu(0)> is in id(x): 4538672072
x= 
[ 4.  6.  8.]
<NDArray 3 @cpu(0)> is in id(x): 4538672072


## Slicing
MXNet NDArrays support slicing in all the ridiculous ways you might imagine accessing your data. For a quick review:

```
a[start:end] # items start through end-1
a[start:]    # items start through the rest of the array
a[:end]      # items from the beginning through end-1
a[:]         # a copy of the whole array
```

Here's an example of reading the second and third rows from `x`.

In [23]:
x = nd.array([1, 2, 3])
print('1D complete array, x=', x)
s = x[1:3]
print('slicing the 2nd and 3rd elements, s=', s)
x = nd.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print('multi-D complete array, x=', x)
s = x[1:3]
print('slicing the 2nd and 3rd elements, s=', s)

1D complete array, x= 
[ 1.  2.  3.]
<NDArray 3 @cpu(0)>
slicing the 2nd and 3rd elements, s= 
[ 2.  3.]
<NDArray 2 @cpu(0)>
multi-D complete array, x= 
[[  1.   2.   3.   4.]
 [  5.   6.   7.   8.]
 [  9.  10.  11.  12.]]
<NDArray 3x4 @cpu(0)>
slicing the 2nd and 3rd elements, s= 
[[  5.   6.   7.   8.]
 [  9.  10.  11.  12.]]
<NDArray 2x4 @cpu(0)>


Now let's try writing to a specific element.

In [24]:
print('original x, x=', x)
x[2] = 9.0
print('replaced entire row with x[2] = 9.0, x=', x)
x[0,2] = 9.0
print('replaced specific element with x[0,2] = 9.0, x=', x)
x[1:2,1:3] = 5.0
print('replaced range of elements with x[1:2,1:3] = 5.0, x=', x)

original x, x= 
[[  1.   2.   3.   4.]
 [  5.   6.   7.   8.]
 [  9.  10.  11.  12.]]
<NDArray 3x4 @cpu(0)>
replaced entire row with x[2] = 9.0, x= 
[[ 1.  2.  3.  4.]
 [ 5.  6.  7.  8.]
 [ 9.  9.  9.  9.]]
<NDArray 3x4 @cpu(0)>
replaced specific element with x[0,2] = 9.0, x= 
[[ 1.  2.  9.  4.]
 [ 5.  6.  7.  8.]
 [ 9.  9.  9.  9.]]
<NDArray 3x4 @cpu(0)>
replaced range of elements with x[1:2,1:3] = 5.0, x= 
[[ 1.  2.  9.  4.]
 [ 5.  5.  5.  8.]
 [ 9.  9.  9.  9.]]
<NDArray 3x4 @cpu(0)>


Multi-dimensional slicing is also supported.

In [33]:
x = nd.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print('original x, x=', x)
s = x[1:2,1:3]
print('plucking specific elements with x[1:2,1:3]', s)
s = x[:,:1]
print('first column with x[:,:1]', s)
s = x[:1,:]
print('first row with x[:1,:]', s)
s = x[:,3:]
print('last column with x[:,3:]', s)
s = x[2:,:]
print('last row with x[2:,:]', s)

original x, x= 
[[  1.   2.   3.   4.]
 [  5.   6.   7.   8.]
 [  9.  10.  11.  12.]]
<NDArray 3x4 @cpu(0)>
plucking specific elements with x[1:2,1:3] 
[[ 6.  7.]]
<NDArray 1x2 @cpu(0)>
first column with x[:,:1] 
[[ 1.]
 [ 5.]
 [ 9.]]
<NDArray 3x1 @cpu(0)>
first row with x[:1,:] 
[[ 1.  2.  3.  4.]]
<NDArray 1x4 @cpu(0)>
last column with x[:,3:] 
[[  4.]
 [  8.]
 [ 12.]]
<NDArray 3x1 @cpu(0)>
last row with x[2:,:] 
[[  9.  10.  11.  12.]]
<NDArray 1x4 @cpu(0)>


## Shape Manipulation
Using `reshape` we can change the array's shape. For example, say you want to take a series  of numbers like 0 to 23 and turn that into a multi-dimensional array.

In [39]:
x = mx.nd.array(np.arange(24))
print('a simple array:', x)
b = a.reshape((2,3,4))
print('converted to a multi-dimensional array', b)

a simple array: 
[  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.  14.
  15.  16.  17.  18.  19.  20.  21.  22.  23.]
<NDArray 24 @cpu(0)>
converted to a multi-dimensional array 
[[[  0.   1.   2.   3.]
  [  4.   5.   6.   7.]
  [  8.   9.  10.  11.]]

 [[ 12.  13.  14.  15.]
  [ 16.  17.  18.  19.]
  [ 20.  21.  22.  23.]]]
<NDArray 2x3x4 @cpu(0)>


 `concat` is another useful function that we'll use to stack arrays.

In [46]:
x = mx.nd.ones((2,3))
print('x=', x)
y = mx.nd.ones((2,3))*2
print('y=', y)
z = mx.nd.concat(x, y)
print('z = mx.nd.concat(x, y)', z)

x= 
[[ 1.  1.  1.]
 [ 1.  1.  1.]]
<NDArray 2x3 @cpu(0)>
y= 
[[ 2.  2.  2.]
 [ 2.  2.  2.]]
<NDArray 2x3 @cpu(0)>
z = mx.nd.concat(x, y) 
[[ 1.  1.  1.  2.  2.  2.]
 [ 1.  1.  1.  2.  2.  2.]]
<NDArray 2x6 @cpu(0)>


## Broadcasting

You might wonder, what happens if you add a vector `y` to a matrix `X`? These operations, where we compose a low dimensional array `y` with a high-dimensional array `X` invoke a functionality called broadcasting. First we'll introduce `.arange` which is useful for filling out an array with evenly spaced data. Then we can take the low-dimensional array and duplicate it along any axis with dimension $1$ to match the shape of the high dimensional array. 
Consider the following example.

Comment (visible to demonstrate with font): dimension one(1)? Or L(elle) or l(lil elle) or I(eye) or... ? We don't even use the notation later, so did it need to be introduced here? 

<!--Also, if you use a shape like (3,3) you lose some of the impact and miss some errors if people play with the values. Better to have a distinct shape so that it is more obvious what is happening and what can break.-->

In [47]:
x = nd.ones(shape=(3,6))
print('x = ', x)
y = nd.arange(6)
print('y = ', y)
print('x + y = ', x + y)

x =  
[[ 1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.]]
<NDArray 3x6 @cpu(0)>
y =  
[ 0.  1.  2.  3.  4.  5.]
<NDArray 6 @cpu(0)>
x + y =  
[[ 1.  2.  3.  4.  5.  6.]
 [ 1.  2.  3.  4.  5.  6.]
 [ 1.  2.  3.  4.  5.  6.]]
<NDArray 3x6 @cpu(0)>


While `y` is initially of shape (6), 
MXNet infers its shape to be (1,6), 
and then broadcasts along the rows to form a (3,6) matrix). 
You might wonder, why did MXNet choose to interpret `y` as a (1,6) matrix and not (6,1). 
That's because broadcasting prefers to duplicate along the left most axis. 
We can alter this behavior by explicitly giving `y` a 2D shape using `.reshape`. You can also chain `.arange` and `.reshape` to do this in one step.

In [48]:
y = y.reshape((3,1))
print('y = ', y)
print('x + y = ', x+y)
y = nd.arange(6).reshape((3,1))
print('y = ', y)

y =  
[[ 0.]
 [ 1.]
 [ 2.]]
<NDArray 3x1 @cpu(0)>
x + y =  
[[ 1.  1.  1.  1.  1.  1.]
 [ 2.  2.  2.  2.  2.  2.]
 [ 3.  3.  3.  3.  3.  3.]]
<NDArray 3x6 @cpu(0)>
y =  
[[ 0.]
 [ 1.]
 [ 2.]]
<NDArray 3x1 @cpu(0)>


## Converting from MXNet NDArray to NumPy
Converting MXNet NDArrays to and from NumPy is easy. The converted arrays do not share memory.

In [28]:
a = x.asnumpy()
type(a)

numpy.ndarray

In [29]:
y = nd.array(a) 
print('id(a)=', id(a), 'id(x)=', id(x), 'id(y)=', id(y))

id(a)= 4538685840 id(x)= 4538674312 id(y)= 4538671344


## Managing context
You might have noticed that MXNet NDArray looks almost identical to NumPy. 
But there are a few crucial differences.
One of the key features that differentiates MXNet from NumPy is its support for diverse hardware devices.

In MXNet, every array has a context. 
One context could be the CPU. 
Other contexts might be various GPUs. 
Things can get even hairier when we deploy jobs across multiple servers. 
By assigning arrays to contexts intelligently, 
we can minimize the time spent transferring data between devices. 
For example, when training neural networks on a server with a GPU, 
we typically prefer for the model's parameters to live on the GPU. 
If you have a GPU, let's try initializing an array on the first GPU.
Otherwise, use `ctx=mx.cpu()` in place of `ctx=gpu(0)`.

In [None]:
from mxnet import gpu
z = nd.ones(shape=(3,3), ctx=gpu(0))
print(z)

Given an NDArray on a given context, we can copy it to another context by using the copyto() method. Skip this if you don't have a GPU at the moment.

In [None]:
x_gpu = x.copyto(gpu(0))
print(x_gpu)

The result of an operator will have the same context as the inputs.

In [None]:
x_gpu + z

If we ever want to check the context of an NDArray programmaticaly, 
we can just call its `.context` attribute.

In [None]:
print(x_gpu.context)
print(z.context)

In order to perform an operation on two ndarrays `x1` and `x2`,
we need them both to live on the same context. 
And if they don't already, 
we may need to explicitly copy data from one context to another.
You might think that's annoying. 
After all, we just demonstrated that MXNet knows where each NDArray lives. 
So why can't MXNet just automatically copy `x1` to `x2.context` and then add them?

In short, people use MXNet to do machine learning
because they expect it to be fast. 
But transferring variables between different contexts is slow. 
So we want you to be 100% certian that you want to do something slow 
before we let you do it. 
If MXNet just did the copy automatically without crashing
then you might not realize that you had written some slow code.
We don't want you to spend your entire life on StackOverflow,
so we make some mistakes impossible. 

![](../img/operator-context.png)

## Watch out!

Imagine that your variable z already lives on your second GPU (`gpu(0)`). What happens if we call `z.copyto(gpu(0))`? It will make a copy and allocate new memory, even though that variable already lives on the desired device!
<!-- wouldn't the second GPU be gpu(1)? -->

There are times where depending on the environment our code is running in,
two variables may already live on the same device.
So we only want to make a copy if the variables currently lives on different contexts. 
In these cases, we can call `as_in_context()`. 
If the variable is already the specified context then this is a no-op.

In [None]:
print('id(z):', id(z))
z = z.copyto(gpu(0))
print('id(z):', id(z))
z = z.as_in_context(gpu(0))
print('id(z):', id(z))
print(z)

## Next
[Linear algebra](../chapter01_crashcourse/linear-algebra.ipynb)

For whinges or inquiries, [open an issue on  GitHub.](https://github.com/zackchase/mxnet-the-straight-dope)