New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decompose tensordot function to avoid memory #821
Comments
@shoyer @clarkfitzg solutions to this problem may come close to your desire to support more sophisticated |
Looking over use cases of Currently So in One alternate proposal is to provide So in |
On second thought. I can achieve this now with multiple calls to |
Previously the main operation used to compute da.tensordot computed the chunkwise tensordots and summed them together in a single in-memory operation. This possibly resulted in memory blowup, particularly in short-and-fat .dot. tall-and-skinny cases. Now we treat each sub-tensordot call as an independent task. This allows the scheduler to clear out intermediate results more intelligently. We expose more of the algorithm to the scheduler. Fixes dask#821
Previously the main operation used to compute da.tensordot computed the chunkwise tensordots and summed them together in a single in-memory operation. This possibly resulted in memory blowup, particularly in short-and-fat .dot. tall-and-skinny cases. Now we treat each sub-tensordot call as an independent task. This allows the scheduler to clear out intermediate results more intelligently. We expose more of the algorithm to the scheduler. Fixes dask#821
@mrocklin My thought for an automatic wrapper of That said, I've found the current way |
I propose that The solution here in tensordot was to separate things exactly like how you suggest. This was done just by fiddling with indices and calling If you have an explicit API and set of test cases then I can probably just bang out the function that you need. |
Thanks @mrocklin- I'm definitely interested in doing more with this, but I first need to invest the time to understand what's currently possible before I can provide more useful feedback. And that's going to have to wait until my schedule clears up, hopefully this spring. |
Consider a Matrix-Vector multiply
z = W.dot(v)
forW
a short and fat matrix andv
a tall vector. Under common chunking approaches this has a single output chunkTo compute this efficiently in small space we want to compute the three chunks first
W1.dot(v1)
,W2.dot(v2)
,W3.dot(v3)
, allow their children to be released, and then sum these intermediate results to our final result.Unfortunately this is not what the current
tensordot
function does. It provides the two iterators[W1, W2, W3]
,[v1, v2, v3]
to a functionmany
, which does both the per-blocktensordot
and the resulting sum. Because of this choice, all ofW
and all ofv
will be in memory at once.So we should change this behavior. The only issue here is that the full scope of
tensordot
is rather large and it largely depends onatop
which, while beautiful and saves us untold amounts of frustration, also isn't something that we can easily tweak.The text was updated successfully, but these errors were encountered: