Add three sparse tutorials #7921

eric-haibin-lin · 2017-09-16T23:19:31Z

Preview at http://ec2-54-187-32-207.us-west-2.compute.amazonaws.com/tutorials/

rahul003

Tutorials look good :)

Added a few minor comments

rahul003 · 2017-09-24T20:18:47Z

docs/tutorials/sparse/rowsparse.md

+
+``dense[rsp.indices[i], :, :, :, ...] = rsp.data[i, :, :, :, ...]``
+
+A RowSparseNDArray is typically used to represent non-zero row slices of a large NDArray of shape [LARGE0, D1, .. , Dn] where LARGE0 >> D0 and most row slices are zeros.


The naming of LARGE0 just felt slightly odd

Hmm. Any suggestion for better names?

rahul003 · 2017-09-24T20:26:07Z

docs/tutorials/sparse/rowsparse.md

+
+## Inspecting Arrays
+
+Also as with `CSRNDArray`, you can inspect the contents of a `RowSparseNDArray` by filling


Also can be dropped

rahul003 · 2017-09-24T20:35:44Z

docs/tutorials/sparse/rowsparse.md

+{'b is a': b is a, 'b.asnumpy()':b.asnumpy(), 'c.asnumpy()':c.asnumpy(), 'd.asnumpy()':d.asnumpy()}
+```
+
+If the storage types of source array and destination array do not match,


The normal copyto function requires the shapes of arrays to be same, right?
How does this rowSparse.copyto work when storage types are different since that would require different amounts of memory. I think it would be good to clarify what happens here.

The source array will be converted if its storage type does not match. I'll add some clarification here

rahul003 · 2017-09-24T20:40:46Z

docs/tutorials/sparse/rowsparse.md

+
+* If sparse outputs are provided, MXNet will convert the dense outputs generated by the dense operator into the provided sparse format.
+
+* For operators that don't specialize in sparse arrays, you can still use them with sparse inputs with some performance penalty.


The first point is repeated

Yeah I noticed that, too. Will remove it

Actually, they're different...

rahul003 · 2017-09-24T20:43:53Z

docs/tutorials/sparse/rowsparse.md

+```
+
+* For operators that don't specialize in sparse arrays, you can still use them with sparse inputs with some performance penalty.
+In MXNet, dense operators require all inputs and outputs to be in the dense format.


Merging of these two sentences as one bullet point is odd.

I'd actually suggest merging all points as a paragraph, because there is a flow between them like that.

rahul003 · 2017-09-24T21:47:14Z

docs/tutorials/sparse/train.md

+
+You can instantiate an executor from a sparse symbol by using the `simple_bind` method,
+which allocate zeros to all free variables according to their storage types.
+The executor provides `forward` method for evaluation and an attribute


Is evaluation the right word to use here? As I understand calling this function executes the commands ( to allocate zeros to those variables in this case), isn't it?

which sentence are you referring to? bind will allocate the memory, forward will perform evaluation

rahul003 · 2017-09-24T22:46:31Z

docs/tutorials/sparse/train.md

+```python
+# Create module
+mod = mx.mod.Module(symbol=lro, data_names=['data'], label_names=['label'])
+# Allocate memory by given the input data and label shapes


typo: given -> giving

bhavinthaker

Nicely written tutorials.

bhavinthaker · 2017-09-24T23:41:03Z

docs/tutorials/sparse/csr.md

+and each row is sparse (i.e. with only a few nonzeros).**
+
+## Advantages of Compressed Sparse Row NDArray (CSRNDArray)
+For matrices of high sparsity (e.g. ~1% non-zeros), there are two primary advantages of `CSRNDArray` over the existing `NDArray`:


You may want to introduce both terms, sparsity and density. Say, For matrices of high sparsity, also known as low density (e.g. ~1% non-zeros = ~1% density), there are ...

bhavinthaker · 2017-09-24T23:51:12Z

docs/tutorials/sparse/csr.md

+- certain operations are much faster (e.g. matrix-vector multiplication)
+
+You may be familiar with the CSR storage format in [SciPy](https://www.scipy.org/) and will note the similarities in MXNet's implementation. However there are some additional competitive features in `CSRNDArray` inherited from `NDArray`, such as lazy evaluation and automatic parallelization that are not available in SciPy's flavor of CSR.
+


It is possible that some newbie user may read the Sparse tutorial without having background on the terms. Hmm, do we want to define the terms here, even though one could argue that these are easy-to-understand terms?

Lazy evaluation means that any operations on the CSRNDArray are not performed immediately, but are delayed until their evaluation is specifically requested. Automatic parallelization means that all operations are automatically executed in parallel with each other without requiring explicit mention.

Good point. These are mentioned in the NDArray tutorial which I listed as a prerequisite. I'll add the description nonetheless and add a link for further reading

BTW I find async nonblocking more appropriate /accurate than lazy evaluation since the evaluation is actually not delayed. https://mxnet.incubator.apache.org/tutorials/basic/ndarray.html#lazy-evaluation-and-automatic-parallelization

I'd suggest we avoid using the term lazy eval in the original NDArray tutorial, too. It's misleading

bhavinthaker · 2017-09-25T00:14:54Z

docs/tutorials/sparse/csr.md

+The `indptr` array is what will help identify the rows where the data appears. It stores the index into `data` of the first non-zero element number of each row of the matrix. This array always starts with 0 (reasons can be explored later), so indptr[0] is 0. Each subsequent value in the array is the aggregate number of non-zero elements up to that row. Looking at the first row of the matrix you can see two non-zero values, so indptr[1] is 2. The next row contains all zeros, so the aggregate is still 2, so indptr[2] is 2. Finally, you see the last row contains one non-zero element bring the aggregate to 3, so indptr[3] is 3. Your result:
+
+    indptr = [0, 2, 2, 3]
+


Here is a pictorial summary of the above explanation: [[7, 0, 8, 0] [0, 0, 0, 0] [0, 9, 0, 0]] C0 C1 C2 C3 | | | | R0 -> [[7, 0, 8, 0] R1 -> [0, 0, 0, 0] R2 -> [0, 9, 0, 0]] R0 (R1,R2) Row-end => indptr | | | D0 D1 D2 D3 | | | | data = [7, 8, 9] Sentinel indices = [C0, C2, C1] = [0, 2, 1] indptr = [D0, D2, D2, D3] = [0, 2, 2, 3]

We haven't talked about Sentinel. Can you describe that further?

Yeah I am also thinking about putting something visual like this:
https://image.slidesharecdn.com/effectivenumericalcomputationinnumpyandscipy-140913005204-phpapp01/95/effective-numerical-computation-in-numpy-and-scipy-18-638.jpg?cb=1410824598

bhavinthaker · 2017-09-25T00:17:17Z

docs/tutorials/sparse/csr.md

+data_list = [7, 8, 9]
+indptr_list = [0, 2, 2, 3]
+indices_list = [0, 2, 1]
+a = mx.nd.sparse.csr_matrix(data_list, indptr_list, indices_list, shape)


It may help to have this example too, saying, that the user does not have to do the calculations to create a CSRNDArray.

darr = np.array([[7., 0., 8., 0.], [0., 0., 0., 0.], [0., 9., 0., 0.]]) dmx = mx.nd.array(darr) csr = dmx.tostype('csr') print({'csr': csr, 'csr.data': \ csr.data, 'csr.indices': \ csr.indices, \ 'csr.indptr': csr.indptr})

I like this approach. In fact, we should address this towards the beginning by showing how it describes the data and even extend it to show how these values can be used to regenerate the full dataset.
Also, I think it would help to build up the scenarios a bit more. Using only 0's and 1's to show that duplicated data is fine. (There was some content before that mentioned you couldn't duplicate data, but it was changed to refer to the indices.) Without an explicit example the reader may assume incorrectly that data should be unique. Then, as we move to this example, one or two duplicated numbers would be good as well as a larger dataset. We should have an example that demonstrates the compression factor since the small ones we currently have don't show any savings of space.

bhavinthaker · 2017-09-25T00:22:34Z

docs/tutorials/sparse/csr.md

+shape = (3, 4)
+data_list = [7, 8, 9]
+indptr_list = [0, 2, 2, 3]
+indices_list = [0, 2, 1]


Picky: I would suggest keeping the order here the same as the order when these terms were explained, that is, indices_list followed by indptr_list.

bhavinthaker · 2017-09-25T01:51:20Z

docs/tutorials/sparse/train.md

+
+## Variables
+
+Variables are placeholder for arrays. We can use them to hold sparse arrays, too.


Remove the comma between arrays and too.

bhavinthaker · 2017-09-25T01:54:52Z

docs/tutorials/sparse/train.md

+
+
+```python
+var_exec.arg_dict['b'][:] = mx.nd.ones(shape).tostype('csr')


FYI: NameError: name 'var_exec' is not defined

bhavinthaker · 2017-09-25T02:37:50Z

docs/tutorials/sparse/train.md

+### Storage Type Inference
+
+What will be the output storage types of sparse symbols? In MXNet, for any sparse symbol, the result storage types are inferred based on storage types of inputs.
+You can read the [Sparse Symbol API](mxnet.io/api/python/symbol/sparse.html) documentation to find what output storage types are. In the example below we will try out the storage types introduced in the Row Sparse and Compressed Sparse Row tutorials: `default` (dense), `csr`, and `row_sparse`.


Add http:// to the URL.

bhavinthaker · 2017-09-25T02:42:59Z

docs/tutorials/sparse/train.md

+and [mx.io.NDArrayIter](https://mxnet.incubator.apache.org/versions/master/api/python/io.html#mxnet.io.NDArrayIter)
+support loading sparse data in CSR format. In this example, we'll use the `NDArrayIter`.
+
+You may see some warnings from SciPy. You don't need to worry about those for this example.


Hmm, I don't find use of SciPy below. What is the reason to see possible warnings from SciPy?

mx.test_utils.rand_ndarray may call scipy.random if scipy is available. The random function in scipy converts coo to csr. For some version of scipy, its random function is using a deprecated argument when performing the conversion. This is not what we can control

This is the output you get from running the block below.

/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/sparse/coo.py:200: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. if np.rank(self.data) != 1 or np.rank(self.row) != 1 or np.rank(self.col) != 1: /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/sparse/compressed.py:130: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. if np.rank(self.data) != 1 or np.rank(self.indices) != 1 or np.rank(self.indptr) != 1:

bhavinthaker · 2017-09-25T02:43:59Z

docs/tutorials/sparse/train.md

+### Training the model with multiple machines
+
+To train a sparse model with multiple machines, please refer to the example in [mxnet/example/sparse/](https://github.com/apache/incubator-mxnet/tree/master/example/sparse)
+


All the 3 tutorials are well-written!

anirudh2290 · 2017-09-28T00:44:50Z

docs/tutorials/sparse/csr.md

+
+    data = [7, 8, 9]
+
+The `indices` array stores the column index for each non-zero element in `data`. As you cycle through the data array, starting with 7, you can see it is in column 0. Then looking at 8, you can see it is in column 2. Lastly 9 is in column 1. Your result:


Does it make sense to mention here that column indices for a specific row will be looked at in sorted order?

Yes that's mentioned a few lines below

anirudh2290 · 2017-09-28T01:07:00Z

docs/tutorials/sparse/csr.md

+{'b':b, 'c':c}
+```
+
+Note that accessing individual elements in a CSRNDArray is currently not supported.


Should we also mention that slicing across axis 1 is not supported.

anirudh2290 · 2017-09-28T01:23:45Z

docs/tutorials/sparse/row_sparse.md

+
+To create a RowSparseNDArray on gpu, we need to explicitly specify the context:
+
+**Note** If a GPU is not available, an error will be reported in the following section. In order to execute it a cpu, set gpu_device to mx.cpu().


In order to execute it a cpu -> In order to execute it on a cpu

* add three sparse tutorials * update csr and rsp * update train.md * fix wrong var name * first round of edits * first edits and removed outputs * first round of edits * first round of edits * updates based on CR comments * update tutorial link. It wont work until the PR is merged * update based on cr comments

eric-haibin-lin added 4 commits September 16, 2017 16:20

add three sparse tutorials

ed7e962

update csr and rsp

8dc4fcf

update train.md

ae6a3cc

fix wrong var name

b1277b7

eric-haibin-lin requested review from mli and piiswrong as code owners September 16, 2017 23:19

This was referenced Sep 16, 2017

[WIP] CSRNDArray Tutorial #7656

Closed

[WIP] RowSparseNDArray tutorial #7707

Closed

[WIP] Tutorial - Training with Sparse Symbols #7816

Closed

aaronmarkham added 4 commits September 22, 2017 09:18

first round of edits

0b20cc0

first edits and removed outputs

e7f899a

first round of edits

f9b45bf

first round of edits

54a657d

rahul003 reviewed Sep 24, 2017

View reviewed changes

bhavinthaker reviewed Sep 25, 2017

View reviewed changes

eric-haibin-lin added 2 commits September 25, 2017 19:26

updates based on CR comments

ce8ddf8

update tutorial link. It wont work until the PR is merged

b845324

anirudh2290 reviewed Sep 28, 2017

View reviewed changes

sxjscience mentioned this pull request Sep 28, 2017

can mxnet provide the sparse gradient update for word embedding #1237

Closed

eric-haibin-lin mentioned this pull request Oct 4, 2017

add storage type logging to graph executor #8138

Merged

update based on cr comments

fd70846

piiswrong merged commit 7b7f9ae into apache:master Oct 4, 2017

eric-haibin-lin deleted the three-tutorials branch October 18, 2017 17:30


		``dense[rsp.indices[i], :, :, :, ...] = rsp.data[i, :, :, :, ...]``

		A RowSparseNDArray is typically used to represent non-zero row slices of a large NDArray of shape [LARGE0, D1, .. , Dn] where LARGE0 >> D0 and most row slices are zeros.


		## Inspecting Arrays

		Also as with `CSRNDArray`, you can inspect the contents of a `RowSparseNDArray` by filling


		* If sparse outputs are provided, MXNet will convert the dense outputs generated by the dense operator into the provided sparse format.

		* For operators that don't specialize in sparse arrays, you can still use them with sparse inputs with some performance penalty.

		- certain operations are much faster (e.g. matrix-vector multiplication)

		You may be familiar with the CSR storage format in [SciPy](https://www.scipy.org/) and will note the similarities in MXNet's implementation. However there are some additional competitive features in `CSRNDArray` inherited from `NDArray`, such as lazy evaluation and automatic parallelization that are not available in SciPy's flavor of CSR.

		The `indptr` array is what will help identify the rows where the data appears. It stores the index into `data` of the first non-zero element number of each row of the matrix. This array always starts with 0 (reasons can be explored later), so indptr[0] is 0. Each subsequent value in the array is the aggregate number of non-zero elements up to that row. Looking at the first row of the matrix you can see two non-zero values, so indptr[1] is 2. The next row contains all zeros, so the aggregate is still 2, so indptr[2] is 2. Finally, you see the last row contains one non-zero element bring the aggregate to 3, so indptr[3] is 3. Your result:

		indptr = [0, 2, 2, 3]


		## Variables

		Variables are placeholder for arrays. We can use them to hold sparse arrays, too.



		```python
		var_exec.arg_dict['b'][:] = mx.nd.ones(shape).tostype('csr')

		### Training the model with multiple machines

		To train a sparse model with multiple machines, please refer to the example in [mxnet/example/sparse/](https://github.com/apache/incubator-mxnet/tree/master/example/sparse)


		data = [7, 8, 9]

		The `indices` array stores the column index for each non-zero element in `data`. As you cycle through the data array, starting with 7, you can see it is in column 0. Then looking at 8, you can see it is in column 2. Lastly 9 is in column 1. Your result:


		To create a RowSparseNDArray on gpu, we need to explicitly specify the context:

		Note If a GPU is not available, an error will be reported in the following section. In order to execute it a cpu, set gpu_device to mx.cpu().

Add three sparse tutorials #7921

Add three sparse tutorials #7921

Conversation

eric-haibin-lin commented Sep 16, 2017 • edited

rahul003 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhavinthaker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-haibin-lin commented Sep 16, 2017 •

edited