Skip to content

ARROW-7622: [Format] Mark Tensor and SparseTensor fields required#6234

Closed
pitrou wants to merge 2 commits intoapache:masterfrom
pitrou:ARROW-7622-tensor-required-fbs
Closed

ARROW-7622: [Format] Mark Tensor and SparseTensor fields required#6234
pitrou wants to merge 2 commits intoapache:masterfrom
pitrou:ARROW-7622-tensor-required-fbs

Conversation

@pitrou
Copy link
Copy Markdown
Member

@pitrou pitrou commented Jan 20, 2020

Also make sparse COO indices row-major by default, both by consistency with tensors and because it packs related coordinates sequentially.

@github-actions
Copy link
Copy Markdown

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@pitrou pitrou force-pushed the ARROW-7622-tensor-required-fbs branch from 148eb39 to 81a8cd8 Compare January 21, 2020 09:46
@pitrou pitrou changed the title [EXP] ARROW-7622: [Format] Mark Tensor and SparseTensor fields required ARROW-7622: [Format] Mark Tensor and SparseTensor fields required Jan 21, 2020
@github-actions
Copy link
Copy Markdown

Copy link
Copy Markdown
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment but otherwise looks good. cc @mrkn for thoughts also

@pitrou pitrou force-pushed the ARROW-7622-tensor-required-fbs branch from 81a8cd8 to 925d44f Compare January 22, 2020 16:37
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrkn Can you check if those changes look ok to you?

@pitrou pitrou force-pushed the ARROW-7622-tensor-required-fbs branch from 925d44f to 0598e33 Compare January 22, 2020 18:35
@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Jan 22, 2020

@mrkn Is there a reason why Tensor strides default to row-major but SparseIndexCOO strides default to column-major?

@mrkn
Copy link
Copy Markdown
Member

mrkn commented Jan 23, 2020

Actually, the default strides of SparseIndexCOO is column-major, but it isn't intentionally decided, just historically.
At first, SparseIndexCOO only supported column-major strides because scipy.sparse.coo_matrix always has a column-major index. When I added the support of row-major, I didn't change the default strides. That's all.

I think now is the best timing to change the default strides to row-major.

@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Jan 23, 2020

(delete a message where I was making a mistake. Row-major is actually the preferred order which makes coordinates sequential in memory)

@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Jan 23, 2020

I'll note that pydata/sparse uses row-major order for coordinates:

>>> x = np.eye(4)                                                                                                                                                     
>>> x[2, 3] = 5                                                                                                                                                       
>>> s = sparse.COO.from_numpy(x)                                                                                                                                      
>>> s.coords                                                                                                                                                          
array([[0, 1, 2, 2, 3],
       [0, 1, 2, 3, 3]])
>>> s.coords.strides                                                                                                                                                  
(40, 8)

As for scipy.sparse, it only supports 2-dim sparse tensors and stores row and column coordinates in separate arrays... (which has the same memory access properties as column-major)

@pitrou pitrou force-pushed the ARROW-7622-tensor-required-fbs branch from 0598e33 to b318bcf Compare January 23, 2020 15:22
@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Jan 23, 2020

Ok, I've switched to row-major by default for COO indices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants