-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18471][CORE] New treeAggregate overload for big large aggregators #16038
Conversation
The zero for the aggregation used to be shipped into a closure which is higly problematic when this zero is big (100s of MB is typical for ML). This change introduces a new overload of treeAggregate which only ships a function able to generate this zero.
Can one of the admins verify this patch? |
This is not really related to the other change or the JIRA though. It also doesn't seem to address the problem as you describe it elsewhere. It's not that the zero value is big (a sparse 0 vector serializes into something tiny) but that some other representation gets into the closure? |
It is related as if this PR get accepted, then the fix for PR to #16037 becomes trivial: replace treeAggregate with a call to |
If you're right here, then #16037 does nothing, right? In which case my understanding at #15905 (review) is wrong. But if your other change does work, and that would be explained by my understanding, then this doesn't help. At https://issues.apache.org/jira/browse/SPARK-18471?focusedCommentId=15670795&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15670795 you describe aggregating a dense vector, but this is different from having a dense vector zero value. |
Maybe we could have a live talk/chat this evening Paris time / Morning West Coast time? If I'm not mistaken both #16037 and this change fix the issue at two different levels.
I might have misunderstood your comment |
(I am in the UK, but let's keep the discussion online here) OK, I had assumed that However, this means there's a more direct solution without changing or adding to the public API: pass a sparse vector (of zeroes) as the zero value. It looks like this should work, even be a little faster on the first pass when the gradient is 0. If for some reason something doesn't like the sparse vector, it's easy enough to call In that case, I would not make the change in this PR, and change #16037 to make the initial vector sparse only. |
No this does not do the trick as the result of the aggregation IS dense. And the zero in (tree)aggregate has the same type as the result. Said otherwise, in L-BFGS, we do aggregate vectors that are each pretty sparse but whose aggregation is dense as they have different support. So taking a dense vector as the zero of the aggregator make perfect sense, adding a sparse contribution to a dense aggregator yield a dense aggregator and this is what is desired. We just need not to waiste bandwith by sending this huge zero to executors. If you do not want to change or add a public API in core, we might contribute a treeAggregateWithZeroGenerator to MLlib using a third way to solve the issue: use an Option[DenseVector] as the aggregate type and unwrap it as a dense zero bby wrapping the seqOp and comboOp, we have a draft of that locally... |
Yes, but the result is inherently dense -- not going to be many zeroes in the gradient if any. There's no way around that. The discussion has been about the initial value, the zero vector, right? That you agree is sparse. I don't see what the problem is then. A huge sparse 0 vector is not huge when serialized. |
Yes sure the zero Vector is very sparse. Bu ti do not get your suggestion ? I see no way to pass a Sparse Vector as zero and get the type of vector to change underway to Dense Vector with only operations on Sparse vectors. What do you propose ? We do want to accumulate the Sparse Vector built from each datapoint into a Dense Vector. And because the type of the zero is the type of the return then we need to use a Dense vector as the zero, doesn't it ? You might have seen a way out of this I haven't, have you ? |
It might be as simple as writing The method does ultimately want to return a dense vector, and does insert a cast that assumes it's already a dense vector. If that's a problem, it's easy to insert a call to |
THe big doubt I have is on Of course, the first call to axpy should produce a dense vector, but, that's already on the executor: the other operand is Sparse and has to be sparse, and this axpy call between to sparse vectors has to return a sparse vector. And we do not want to cast at the end because the performance penalties of using sparse instead of dense will already have been paid. Fully exploring the trade-off between sparse and dense aggregation os ot-of-scope and likely very application specific. I'll have a new look at the problem trying to manage a work around along the line of Sparse vectors, but this will not solve the typing issue as far as i can see for now. |
Right now, everything is dense here, right? That's the worst case. Your goal is to avoid serializing a dense zero vector and I say it can just be sparse, which solves the immediate problem. From there, some operations may or may not result in sparse vectors, but that's not relevant -- it can only be better than being all dense, which is the current case. It matters at the end because the return type is dense, but, making it dense is easy. Paying the cost of copying to a dense representation is the only downside, but that's small compared to the saving in serialization (I presume). I don't understand the problem you're suggesting? are you saying that you don't want any operations to be on sparse vectors? I'd leave that choice to the implementations, but, if you're worried about it, simply force the argument of the seqOp to become dense. Then, the only change is the smaller serialization and everything should be identical afterwards. |
Actually aggregating (and thus sending on the network) on quite dense SparseVectors with 10s of million elements is not to taken lightly. This would required serious benchmarking. What I tell is that we can not do that lightly because we know that aggregating Anyway if you consider this is not worth an new API and an optimization in core, the discusion should probably go on on the MLlib only PR #16037 for which we already have a specific work around. Feel free to close this PR is you deem it (ir)relevant. |
I see, you are saying that encoding an actually-dense vector as sparse is somewhat less efficient, and if the implementations happened to make that choice given sparse input, could be bad. In that case, just short-cut the problem and use |
Without understanding the specifics of the ML part here - wont the actual impact of a large dense vector on Task 'bytes' be minimal at best ? |
@mridulm we don't know how to monitor the size of the serialize task. Sure it would not 10MB due to all those zeros. But we nonetheless measure a significant increase in performance and (more importantly) in stability when using the workaround and our custom TreeAggregatorWithZeroGenerator @srowen when the density of a Sparse Vector increases it became very inefficient, not only because it is using almost twice the size of memory, but mainly because you fragment memory access on overload the GC :( See #16078 for a generic wrapper around treeAggregate with uses Option[U] do denote the zero of U by None and generate a full representation by need when calling seqOp or comboOp. |
Agreed this is too ML specific to be deserve a fix in core. |
What changes were proposed in this pull request?
The zero for the aggregation used to be shipped into a closure which is
higly problematic when this zero is big (100s of MB is typical for ML).
This change introduces a new overload of treeAggregate which only ships a
function able to generate this zero.
How was this patch tested?
Test suite for the core and mllib modules launched locally.