ARROW-7622: [Format] Mark Tensor and SparseTensor fields required#6234
ARROW-7622: [Format] Mark Tensor and SparseTensor fields required#6234pitrou wants to merge 2 commits intoapache:masterfrom
Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format? See also: |
148eb39 to
81a8cd8
Compare
81a8cd8 to
925d44f
Compare
cpp/src/arrow/ipc/reader.cc
Outdated
There was a problem hiding this comment.
@mrkn Can you check if those changes look ok to you?
925d44f to
0598e33
Compare
|
@mrkn Is there a reason why Tensor strides default to row-major but SparseIndexCOO strides default to column-major? |
|
Actually, the default strides of SparseIndexCOO is column-major, but it isn't intentionally decided, just historically. I think now is the best timing to change the default strides to row-major. |
|
(delete a message where I was making a mistake. Row-major is actually the preferred order which makes coordinates sequential in memory) |
|
I'll note that pydata/sparse uses row-major order for coordinates: >>> x = np.eye(4)
>>> x[2, 3] = 5
>>> s = sparse.COO.from_numpy(x)
>>> s.coords
array([[0, 1, 2, 2, 3],
[0, 1, 2, 3, 3]])
>>> s.coords.strides
(40, 8)As for scipy.sparse, it only supports 2-dim sparse tensors and stores row and column coordinates in separate arrays... (which has the same memory access properties as column-major) |
0598e33 to
b318bcf
Compare
|
Ok, I've switched to row-major by default for COO indices. |
Also make sparse COO indices row-major by default, both by consistency with tensors and because it packs related coordinates sequentially.