-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this tutorial!
docs/tutorials/sparse/csr.md
Outdated
(i.e. most of the elements are zeros). | ||
|
||
Storing and manipulating such large sparse matrices in the default dense structure results | ||
in wated memory and processing on the zeros. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wated -> wasted
indices_list = [0, 2, 1] | ||
a = mx.nd.sparse.csr_matrix(data_list, indptr_list, indices_list, shape) | ||
# create a CSRNDArray with numpy arrays | ||
data_np = np.array([7, 8, 9]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just use the above lists, data_list, indptr_list, indices_list
docs/tutorials/sparse/csr.md
Outdated
- memory consumption is reduced significantly | ||
- certain operations (e.g. matrix-vector multiplication) are much faster | ||
|
||
Meanwhile, ``CSRNDArray`` inherits competitve features from ``NDArray`` such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo at competitive
docs/tutorials/sparse/csr.md
Outdated
- certain operations (e.g. matrix-vector multiplication) are much faster | ||
|
||
Meanwhile, ``CSRNDArray`` inherits competitve features from ``NDArray`` such as | ||
lazy evaluation and automatic parallelization, which is not available in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is -> are
docs/tutorials/sparse/csr.md
Outdated
@@ -166,7 +175,7 @@ a.copyto(d) | |||
{'b is a': b is a, 'b.asnumpy()':b.asnumpy(), 'c.asnumpy()':c.asnumpy(), 'd.asnumpy()':d.asnumpy()} | |||
``` | |||
|
|||
If the storage types of source array and destination array doesn't match, | |||
* If the storage types of source array and destination array doesn't match, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type...doesn't match
Or
Types...don't match
docs/tutorials/sparse/csr.md
Outdated
|
||
Many real world datasets deal with high dimensional sparse feature vectors. For instance, | ||
in a recommendation system, the number of categories and users is in the order of millions, | ||
while most users only made a few purchases, leading to feature vectors with high sparsity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make all sentences have a common tense -- which is the present tense here.
Suggestion: while most users typically make a few purchases only, which leads to ...
docs/tutorials/sparse/csr.md
Outdated
Storing and manipulating such large sparse matrices in the default dense structure results | ||
in wasted memory and processing on the zeros. | ||
To take advantage of the sparse structure of the matrix, the ``CSRNDArray`` in MXNet | ||
stores the matrix in [compressed sparse row(CSR)](https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_.28CSR.2C_CRS_or_Yale_format.29) format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Add a space before the opening parenthesis throughout the document. Please check other occurrences in the doc and fix them as well.
FYI: https://english.stackexchange.com/questions/5987/is-there-any-rule-for-the-placement-of-space-after-and-before-parenthesis
docs/tutorials/sparse/csr.md
Outdated
the existing ``NDArray`` is that | ||
|
||
- memory consumption is reduced significantly | ||
- certain operations (e.g. matrix-vector multiplication) are much faster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a period after faster.
docs/tutorials/sparse/csr.md
Outdated
[0, 2, 1] # indices | ||
[0, 2, 2, 3] # indptr | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: I think the suggested text below may help newbies understand the various numbers better. Try it with a newbie if you like (correct the text spacing appropriately).
[7, 8, 9] # data: flattened representation of the dense matrix in row-major format after removing all zeros.
[0, 2, 1] # indices: column indices pointing to the non-zero elements in the dense matrix.
[0, 2, 2, 3] # indptr: index pointers into data[] array that signify start of a row in the dense matrix.
# i.e. Row 0 starts at index pointer 0, pointing to element 7, in data[].
# i.e. Row 1 starts at index pointer 2, pointing to element 9, in data[] since Row 1 is all-zeroes.
# i.e. Row 2 starts at index pointer 2, pointing to element 9, in data[].
# i.e. the last element in indptr is always one past the size of data[], signify end of data[].
# create a CSRNDArray from a scipy csr object | ||
d = mx.nd.sparse.array(c) | ||
{'d':d} | ||
except ImportError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow in the rendered text, there is a newline between try and except and that causes invalid syntax when I cut-paste the text. Please check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed now
```python | ||
b = a * 2 # b will be a CSRNDArray since zero multiplied by 2 is still zero | ||
c = a + 1 # c will be a dense NDArray | ||
{'b.stype':b.stype, 'c.stype':c.stype} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You say: b will be a CSRNDArray, but I see it as NDArray only. Am I interpreting things correctly?
b = a * 2 # b will be a CSRNDArray since zero multiplied by 2 is still zero
c = a + 1 # c will be a dense NDArray
{'b.stype':b.stype, 'c.stype':c.stype}
{'c.stype': 'default', 'b.stype': 'default'}
a.stype
'csr'
b.stype
'default' <======= NOT a CSRNDArray.
c.stype
'default'
b
[[ 14. 0. 16. 0.]
[ 0. 0. 0. 0.]
[ 0. 18. 0. 0.]]
<NDArray 3x4 @cpu(0)> <======= NOT a CSRNDArray.
a
[[ 7. 0. 8. 0.]
[ 0. 0. 0. 0.]
[ 0. 9. 0. 0.]]
<CSRNDArray 3x4 @cpu(0)>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it results in a dense NDArray because Chris's PR is not merged in, as mentioned in summary
docs/tutorials/sparse/csr.md
Outdated
|
||
* For operators that don't specialize in sparse arrays, we can still use them with sparse inputs with some performance penalty. | ||
What happens is that MXNet will generate temporary dense inputs from sparse inputs so that the dense operators can be used. | ||
Warning messages will be printed when such storage fallback event happens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: when such a storage fallback event happens. (add the article: "a")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the warnings printed? I did not see them when I tried in a terminal window on macOS.
d = mx.nd.log(a) # warnings will be printed
a
[[ 7. 0. 8. 0.]
[ 0. 0. 0. 0.]
[ 0. 9. 0. 0.]]
<CSRNDArray 3x4 @cpu(0)>
d
[[ 1.9459101 -inf 2.07944155 -inf]
[ -inf -inf -inf -inf]
[ -inf 2.19722462 -inf -inf]]
<NDArray 3x4 @cpu(0)>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mentioned in summary:
the behavior described for Sparse Operators and Storage Type Inference section requires #7577 and storage inference refactoring
so the warning message is not there yet in current master branch
docs/tutorials/sparse/csr.md
Outdated
``` | ||
|
||
* For operators that don't specialize in sparse arrays, we can still use them with sparse inputs with some performance penalty. | ||
What happens is that MXNet will generate temporary dense inputs from sparse inputs so that the dense operators can be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean: temporary dense outputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant temp dense inputs because dense operator doesn't handle sparse inputs. I should mention the storage type for outputs, too. I'll update the section.
docs/tutorials/sparse/csr.md
Outdated
### GPU Support | ||
|
||
By default, CSRNDArray operators are executed on CPU. In MXNet, GPU support for CSRNDArray is experimental | ||
with few sparse operators such as cast_storage and dot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add: with only a few sparse operators such as...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used few
instead of a few
because we only have 2 operators supported for GPU.. I can change it if only a few
is more accurate.
gpu_device=mx.gpu() # Change this to mx.cpu() in absence of GPUs. | ||
|
||
a = mx.nd.sparse.zeros('csr', (100, 100), ctx=gpu_device) | ||
a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I run this code on macOS with no GPU, the python session seg-faults. I know that the context is set incorrectly to GPU when GPU is not present, but should the python session seg-fault? Shouldn't the python session give an error/exception that can be caught by the user and handled appropriately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did it segfault and exit python? For me there was error msg "GPU support is disabled..":
>>> mx.nd.sparse.zeros('csr', (100, 100), ctx=mx.gpu())
[20:01:08] src/c_api/c_api_ndarray.cc:148: GPU support is disabled. Compile MXNet with USE_CUDA=1 to enable GPU support.
[20:01:08] /Users/haibilin/mxnet/dmlc-core/include/dmlc/logging.h:308: [20:01:08] src/c_api/c_api_ndarray.cc:546: Operator _zeros is not implemented for GPU.
Stack trace returned 5 entries:
[bt] (0) 0 libmxnet.so 0x0000000107a73358 _ZN4dmlc15LogMessageFatalD2Ev + 40
[bt] (1) 1 libmxnet.so 0x000000010822f447 _Z20ImperativeInvokeImplRKN5mxnet7ContextEON4nnvm9NodeAttrsEPNSt3__16vectorINS_7NDArrayENS6_9allocatorIS8_EEEESC_PNS7_IbNS9_IbEEEESF_ + 2039
[bt] (2) 2 libmxnet.so 0x00000001082304f7 MXImperativeInvoke + 439
[bt] (3) 3 libmxnet.so 0x0000000108230ace MXImperativeInvokeEx + 46
[bt] (4) 4 _ctypes.so 0x0000000106fd67d7 ffi_call_unix64 + 79
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/mxnet-0.11.1-py2.7.egg/mxnet/ndarray/sparse.py", line 123, in __repr__
shape_info, self.context)
File "/usr/local/lib/python2.7/site-packages/mxnet-0.11.1-py2.7.egg/mxnet/ndarray/ndarray.py", line 1147, in context
return Context(Context.devtype2str[dev_typeid.value], dev_id.value)
KeyError: 0
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be fixed in #7676
Updated the tutorial with runnable example for data iterators. |
Moved to #7921 |
Note:
Sparse Operators and Storage Type Inference
section requires Sparse operators for unary and binary elemwise NDArray operators. #7577 and storage inference refactoringThis should not be merged before #7577