Skip to content

increased performance of k-diagonal extraction in da.diag() and da.diagonal()#8689

Merged
jcrist merged 9 commits intodask:mainfrom
ParticularMiner:extract_kdiag
Feb 23, 2022
Merged

increased performance of k-diagonal extraction in da.diag() and da.diagonal()#8689
jcrist merged 9 commits intodask:mainfrom
ParticularMiner:extract_kdiag

Conversation

@ParticularMiner
Copy link
Copy Markdown
Contributor

@ParticularMiner ParticularMiner commented Feb 8, 2022

Perhaps you might find this useful. It mirrors the dask-grblas implementation, which supports rectangular chunks.

It follows the straight path of the k-diagonal through the rectangular chunks of the input matrix , constructing the dask graph along the way. So chunks untouched by the diagonal end up not being part of the final dask graph, reducing the algorithmic complexity of diagonal(A, offset, axis1, axis2) from

O(M Naxis1 Naxis2)       to        O(M max(1, Nk)) ,

where Naxis1 (Naxis2) is the number of chunks along axis1 (axis2) of array A; M is the total number of chunks of A after axis1 and axis2 have been removed; while Nk is the number of chunks touched by the k-diagonal after all axes, except axis1 and axis2, have been removed.

Relevant test-units have been modified to reflect this change.

Critique is welcome.

NB: It might be worth comparing this with #5683 (which I only discovered after pushing this commit).

@GPUtester
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

@github-actions github-actions bot added the array label Feb 8, 2022
@quasiben
Copy link
Copy Markdown
Member

quasiben commented Feb 8, 2022

add to allowlist

@ParticularMiner
Copy link
Copy Markdown
Contributor Author

FYI, I have extended this PR with @TAdeJong's simple padding solution for diag(v, k) when v is 1d and k != 0.

@TAdeJong
Copy link
Copy Markdown
Contributor

TAdeJong commented Feb 9, 2022

At the time, #5683 failed on the issue described in #5661 with test_corrcoef() in dask\array\tests\test_routines.py (which was semi-unrelated). As of master today, #5661 still seems valid, so it might be interesting in comparing to see if this does not trigger #5661 and if so: why not.

@TAdeJong
Copy link
Copy Markdown
Contributor

TAdeJong commented Feb 9, 2022

For completeness:
I am the original author of #5683 and happy to let this PR supersede it 👍
As per discussion there, #5661 is essentially independent of this PR. (Which makes this all the better PR to merge of the 2).

@TAdeJong
Copy link
Copy Markdown
Contributor

Other than thinking about not duplicating code from da.diagonal as mentioned above, this PR seems in all ways better than #5683 to me.
I think (admittedly from the sidelines, it has been some time since I was this deep in the dask code base) that if @ParticularMiner already had a good reason to not call da.diagonal this can be merged, #5683 can be closed/rejected and #8703 can be handled separately, otherwise it might be worth it to incorporate #8703 here, call da.diagonal and review and merge this as one PR.

@ParticularMiner
Copy link
Copy Markdown
Contributor Author

@TAdeJong

Good question. You're right about code-duplication issues in that code-section. I would do so but ...

I feel it would be less performant to call diagonal()'s code from diag(). The reason being that diagonal() converts numpy arrays to dask arrays before constructing the dask graph which later extracts the k-diagonal, while diag() takes a more direct route to the graph without such any such conversion.

On the other hand, performance here is up for discussion as very large numpy arrays could benefit from being converted to dask arrays prior to graph construction. As a rule, however, I try to avoid creating very large numpy arrays in the first place.

I don't feel strongly though about performance in this particular case, so if you have other arguments in favor of avoiding code duplication, feel free to share!

@ParticularMiner
Copy link
Copy Markdown
Contributor Author

Btw, many thanks for your constructive critique and support of this PR @TAdeJong .

@ParticularMiner
Copy link
Copy Markdown
Contributor Author

@TAdeJong , @pavithraes , @ian-r-rose

RE: code duplication

Upon further reflection, I agree with @TAdeJong's suggestion to reduce code-duplication.

To do this, I generalized this k-diagonal extraction algorithm to higher dimensions and transferred the code into diagonal(), so that diag() now calls diagonal() where applicable. Now both diag() and diagonal() are independent of #5661.

See the opening comment above of this PR for a note on algorithmic complexity/performance.

@ParticularMiner ParticularMiner changed the title added to dask.array.diag() support for extracting k-diagonals from a 2d-array reduced algorithmic complexity of k-diagonal extraction in dask.array.diag() and dask.array.diagonal() Feb 13, 2022
@ParticularMiner ParticularMiner changed the title reduced algorithmic complexity of k-diagonal extraction in dask.array.diag() and dask.array.diagonal() increased performance of k-diagonal extraction in da.diag() and da.diagonal() Feb 13, 2022
Copy link
Copy Markdown
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ParticularMiner , overall this looks good to me. I left one test comment, but the rest looks good. While you're looking at this code, can I ask that you look through the tests for da.diagonal as well to see if they offer full coverage for the different parameters we'd care about here (dimensions of input array, chunking variation, axis arguments, etc...)? It'd be good to ensure we have full coverage here while you're still thinking about this code.

@ParticularMiner ParticularMiner force-pushed the extract_kdiag branch 2 times, most recently from b3b1076 to a5b2b37 Compare February 16, 2022 14:57
@pavithraes
Copy link
Copy Markdown
Member

Thanks for the update @ParticularMiner!

@jcrist Did the updates address your comments?

Copy link
Copy Markdown
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@jcrist jcrist merged commit e3b3259 into dask:main Feb 23, 2022
@jcrist jcrist mentioned this pull request Feb 23, 2022
2 tasks
@jakirkham
Copy link
Copy Markdown
Member

Thanks all! 😄

phobson pushed a commit to phobson/dask that referenced this pull request Feb 28, 2022
…agonal() (dask#8689)

* added support for extracting k-diagonals from a 2d-array

* included heterogeneous chunks in test_diag()

* fixed linting errors in test_diag()

* improved efficiency of diagonal extractor a bit

* stole @TAdeJong's simple padding solution for diag(v, k) when v is 1d

* reduced complexity of `diagonal()` from O(N**2) to O(N)

diag() now calls diagonal()

* fixed linting errors in diagonal()

* reorganized tests and ensured coverage of diag() & diagonal()

as per @jcrist's advice

* catered for cupy type input arrays to diagonal()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dask Array's diag supporting diagonal selection

7 participants