increased performance of k-diagonal extraction in da.diag() and da.diagonal() by ParticularMiner · Pull Request #8689 · dask/dask

ParticularMiner · 2022-02-08T16:23:46Z

Closes Dask Array's diag supporting diagonal selection #2726 (and maybe [WIP] Add k arg to diag #5683)
Tests added / passed
Passes pre-commit run --all-files

Perhaps you might find this useful. It mirrors the dask-grblas implementation, which supports rectangular chunks.

It follows the straight path of the k-diagonal through the rectangular chunks of the input matrix , constructing the dask graph along the way. So chunks untouched by the diagonal end up not being part of the final dask graph, reducing the algorithmic complexity of diagonal(A, offset, axis1, axis2) from

O(M N_axis1 N_axis2) to O(M max(1, N_k)) ,

where N_axis1 (N_axis2) is the number of chunks along axis1 (axis2) of array A; M is the total number of chunks of A after axis1 and axis2 have been removed; while N_k is the number of chunks touched by the k-diagonal after all axes, except axis1 and axis2, have been removed.

Relevant test-units have been modified to reflect this change.

Critique is welcome.

NB: It might be worth comparing this with #5683 (which I only discovered after pushing this commit).

GPUtester · 2022-02-08T16:23:47Z

Can one of the admins verify this patch?

quasiben · 2022-02-08T16:33:06Z

add to allowlist

ParticularMiner · 2022-02-09T15:35:44Z

FYI, I have extended this PR with @TAdeJong's simple padding solution for diag(v, k) when v is 1d and k != 0.

TAdeJong · 2022-02-09T16:52:40Z

At the time, #5683 failed on the issue described in #5661 with test_corrcoef() in dask\array\tests\test_routines.py (which was semi-unrelated). As of master today, #5661 still seems valid, so it might be interesting in comparing to see if this does not trigger #5661 and if so: why not.

TAdeJong · 2022-02-09T19:24:48Z

For completeness:
I am the original author of #5683 and happy to let this PR supersede it 👍
As per discussion there, #5661 is essentially independent of this PR. (Which makes this all the better PR to merge of the 2).

dask/array/creation.py

TAdeJong · 2022-02-10T11:33:10Z

Other than thinking about not duplicating code from da.diagonal as mentioned above, this PR seems in all ways better than #5683 to me.
I think (admittedly from the sidelines, it has been some time since I was this deep in the dask code base) that if @ParticularMiner already had a good reason to not call da.diagonal this can be merged, #5683 can be closed/rejected and #8703 can be handled separately, otherwise it might be worth it to incorporate #8703 here, call da.diagonal and review and merge this as one PR.

ParticularMiner · 2022-02-10T12:24:27Z

@TAdeJong

Good question. You're right about code-duplication issues in that code-section. I would do so but ...

I feel it would be less performant to call diagonal()'s code from diag(). The reason being that diagonal() converts numpy arrays to dask arrays before constructing the dask graph which later extracts the k-diagonal, while diag() takes a more direct route to the graph without such any such conversion.

On the other hand, performance here is up for discussion as very large numpy arrays could benefit from being converted to dask arrays prior to graph construction. As a rule, however, I try to avoid creating very large numpy arrays in the first place.

I don't feel strongly though about performance in this particular case, so if you have other arguments in favor of avoiding code duplication, feel free to share!

ParticularMiner · 2022-02-10T12:28:56Z

Btw, many thanks for your constructive critique and support of this PR @TAdeJong .

diag() now calls diagonal()

ParticularMiner · 2022-02-13T12:56:21Z

@TAdeJong , @pavithraes , @ian-r-rose

RE: code duplication

Upon further reflection, I agree with @TAdeJong's suggestion to reduce code-duplication.

To do this, I generalized this k-diagonal extraction algorithm to higher dimensions and transferred the code into diagonal(), so that diag() now calls diagonal() where applicable. Now both diag() and diagonal() are independent of #5661.

See the opening comment above of this PR for a note on algorithmic complexity/performance.

jcrist

Thanks @ParticularMiner , overall this looks good to me. I left one test comment, but the rest looks good. While you're looking at this code, can I ask that you look through the tests for da.diagonal as well to see if they offer full coverage for the different parameters we'd care about here (dimensions of input array, chunking variation, axis arguments, etc...)? It'd be good to ensure we have full coverage here while you're still thinking about this code.

dask/array/tests/test_creation.py

@jcrist

as per @jcrist's advice

pavithraes · 2022-02-23T17:17:09Z

Thanks for the update @ParticularMiner!

@jcrist Did the updates address your comments?

jcrist

🚀

jakirkham · 2022-02-23T19:43:22Z

Thanks all! 😄

@TAdeJong

…agonal() (dask#8689) * added support for extracting k-diagonals from a 2d-array * included heterogeneous chunks in test_diag() * fixed linting errors in test_diag() * improved efficiency of diagonal extractor a bit * stole @TAdeJong's simple padding solution for diag(v, k) when v is 1d * reduced complexity of `diagonal()` from O(N**2) to O(N) diag() now calls diagonal() * fixed linting errors in diagonal() * reorganized tests and ensured coverage of diag() & diagonal() as per @jcrist's advice * catered for cupy type input arrays to diagonal()

added support for extracting k-diagonals from a 2d-array

9b7188e

github-actions bot added the array label Feb 8, 2022

ParticularMiner added 2 commits February 8, 2022 17:27

included heterogeneous chunks in test_diag()

5a81f96

fixed linting errors in test_diag()

6f4e62c

improved efficiency of diagonal extractor a bit

687611d

stole @TAdeJong's simple padding solution for diag(v, k) when v is 1d

9d46f4a

ParticularMiner force-pushed the extract_kdiag branch from 4e429e3 to 9d46f4a Compare February 9, 2022 15:38

TAdeJong mentioned this pull request Feb 9, 2022

da.diagonal is broken #5661

Closed

pavithraes requested review from ian-r-rose and pavithraes February 9, 2022 17:00

ParticularMiner mentioned this pull request Feb 10, 2022

offered a fix for rechunking with zero-size-chunks #8703

Merged

3 tasks

TAdeJong reviewed Feb 10, 2022

View reviewed changes

dask/array/creation.py Show resolved Hide resolved

reduced complexity of diagonal() from O(N**2) to O(N)

5e61451

diag() now calls diagonal()

ParticularMiner force-pushed the extract_kdiag branch from 0b09ffc to 12786c9 Compare February 13, 2022 13:05

fixed linting errors in diagonal()

ab940bb

ParticularMiner force-pushed the extract_kdiag branch from 12786c9 to ab940bb Compare February 13, 2022 13:07

ParticularMiner changed the title ~~added to dask.array.diag() support for extracting k-diagonals from a 2d-array~~ reduced algorithmic complexity of k-diagonal extraction in dask.array.diag() and dask.array.diagonal() Feb 13, 2022

ParticularMiner changed the title ~~reduced algorithmic complexity of k-diagonal extraction in dask.array.diag() and dask.array.diagonal()~~ increased performance of k-diagonal extraction in da.diag() and da.diagonal() Feb 13, 2022

jcrist reviewed Feb 15, 2022

View reviewed changes

dask/array/tests/test_creation.py Show resolved Hide resolved

reorganized tests and ensured coverage of diag() & diagonal()

f93e75c

as per @jcrist's advice

ParticularMiner force-pushed the extract_kdiag branch from 09029ef to f93e75c Compare February 16, 2022 14:10

ParticularMiner force-pushed the extract_kdiag branch 2 times, most recently from b3b1076 to a5b2b37 Compare February 16, 2022 14:57

catered for cupy type input arrays to diagonal()

8b66f9f

ParticularMiner force-pushed the extract_kdiag branch from a5b2b37 to 8b66f9f Compare February 16, 2022 15:02

pavithraes removed request for ian-r-rose and pavithraes February 23, 2022 17:15

jcrist approved these changes Feb 23, 2022

View reviewed changes

jcrist merged commit e3b3259 into dask:main Feb 23, 2022

jcrist mentioned this pull request Feb 23, 2022

[WIP] Add k arg to diag #5683

Closed

2 tasks

Uh oh!

Conversation

ParticularMiner commented Feb 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GPUtester commented Feb 8, 2022

Uh oh!

quasiben commented Feb 8, 2022

Uh oh!

ParticularMiner commented Feb 9, 2022

Uh oh!

TAdeJong commented Feb 9, 2022

Uh oh!

TAdeJong commented Feb 9, 2022

Uh oh!

Uh oh!

TAdeJong commented Feb 10, 2022

Uh oh!

ParticularMiner commented Feb 10, 2022

Uh oh!

ParticularMiner commented Feb 10, 2022

Uh oh!

ParticularMiner commented Feb 13, 2022

Uh oh!

jcrist left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pavithraes commented Feb 23, 2022

Uh oh!

jcrist left a comment

Choose a reason for hiding this comment

Uh oh!

jakirkham commented Feb 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ParticularMiner commented Feb 8, 2022 •

edited

Loading