Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18471][CORE] New treeAggregate overload for big large aggregators #16038

Closed
wants to merge 1 commit into from

Conversation

AnthonyTruchet
Copy link

What changes were proposed in this pull request?

The zero for the aggregation used to be shipped into a closure which is
higly problematic when this zero is big (100s of MB is typical for ML).
This change introduces a new overload of treeAggregate which only ships a
function able to generate this zero.

How was this patch tested?

Test suite for the core and mllib modules launched locally.

The zero for the aggregation used to be shipped into a closure which is
higly problematic when this zero is big (100s of MB is typical for ML).
This change introduces a new overload of treeAggregate which only ships a
function able to generate this zero.
@AnthonyTruchet
Copy link
Author

@srowen here is the companion PR to #16037.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Nov 28, 2016

This is not really related to the other change or the JIRA though. It also doesn't seem to address the problem as you describe it elsewhere. It's not that the zero value is big (a sparse 0 vector serializes into something tiny) but that some other representation gets into the closure?

@AnthonyTruchet
Copy link
Author

It is related as if this PR get accepted, then the fix for PR to #16037 becomes trivial: replace treeAggregate with a call to treeAggregateWithZeroGenerator( () => (Vectors.zeros(n), 0.0)), in which case only the n gets caught in the closure if I'm not mistaken. This fix then gets much easier to port to other similat use cases in MLlib.

@srowen
Copy link
Member

srowen commented Nov 28, 2016

If you're right here, then #16037 does nothing, right?

In which case my understanding at #15905 (review) is wrong.

But if your other change does work, and that would be explained by my understanding, then this doesn't help.

At https://issues.apache.org/jira/browse/SPARK-18471?focusedCommentId=15670795&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15670795 you describe aggregating a dense vector, but this is different from having a dense vector zero value.

@AnthonyTruchet
Copy link
Author

Maybe we could have a live talk/chat this evening Paris time / Morning West Coast time?

If I'm not mistaken both #16037 and this change fix the issue at two different levels.

I might have misunderstood your comment
#15905 (review)

@srowen
Copy link
Member

srowen commented Nov 29, 2016

(I am in the UK, but let's keep the discussion online here)

OK, I had assumed that Vectors.zeros makes a sparse vector. It does not; it makes a dense vector. If it were sparse, then it would be tiny to serialize. Hence I didn't understand this to be the issue, and though you were saying that another copy of a dense representation was accidentally in the closure.

However, this means there's a more direct solution without changing or adding to the public API: pass a sparse vector (of zeroes) as the zero value. It looks like this should work, even be a little faster on the first pass when the gradient is 0. If for some reason something doesn't like the sparse vector, it's easy enough to call .toDense such that it's only expanded once and not on the driver.

In that case, I would not make the change in this PR, and change #16037 to make the initial vector sparse only.

@AnthonyTruchet
Copy link
Author

No this does not do the trick as the result of the aggregation IS dense. And the zero in (tree)aggregate has the same type as the result. Said otherwise, in L-BFGS, we do aggregate vectors that are each pretty sparse but whose aggregation is dense as they have different support. So taking a dense vector as the zero of the aggregator make perfect sense, adding a sparse contribution to a dense aggregator yield a dense aggregator and this is what is desired. We just need not to waiste bandwith by sending this huge zero to executors.

If you do not want to change or add a public API in core, we might contribute a treeAggregateWithZeroGenerator to MLlib using a third way to solve the issue: use an Option[DenseVector] as the aggregate type and unwrap it as a dense zero bby wrapping the seqOp and comboOp, we have a draft of that locally...

@srowen
Copy link
Member

srowen commented Nov 29, 2016

Yes, but the result is inherently dense -- not going to be many zeroes in the gradient if any. There's no way around that. The discussion has been about the initial value, the zero vector, right? That you agree is sparse. I don't see what the problem is then. A huge sparse 0 vector is not huge when serialized.

@AnthonyTruchet
Copy link
Author

Yes sure the zero Vector is very sparse. Bu ti do not get your suggestion ? I see no way to pass a Sparse Vector as zero and get the type of vector to change underway to Dense Vector with only operations on Sparse vectors.

What do you propose ? We do want to accumulate the Sparse Vector built from each datapoint into a Dense Vector. And because the type of the zero is the type of the return then we need to use a Dense vector as the zero, doesn't it ? You might have seen a way out of this I haven't, have you ?

@srowen
Copy link
Member

srowen commented Nov 29, 2016

It might be as simple as writing Vectors.sparse(n, Seq()) as the zero value. Everything else appears to operate on a Vector that's either sparse or dense then. (Of course, the first call to axpy should produce a dense vector, but, that's already on the executor.)

The method does ultimately want to return a dense vector, and does insert a cast that assumes it's already a dense vector. If that's a problem, it's easy to insert a call to .toDense on the Vector at the end, which is a no-op if it's already dense, which definitely produces a dense vector.

@AnthonyTruchet
Copy link
Author

THe big doubt I have is on Of course, the first call to axpy should produce a dense vector, but, that's already on the executor: the other operand is Sparse and has to be sparse, and this axpy call between to sparse vectors has to return a sparse vector. And we do not want to cast at the end because the performance penalties of using sparse instead of dense will already have been paid.

Fully exploring the trade-off between sparse and dense aggregation os ot-of-scope and likely very application specific. I'll have a new look at the problem trying to manage a work around along the line of Sparse vectors, but this will not solve the typing issue as far as i can see for now.

@srowen
Copy link
Member

srowen commented Nov 29, 2016

Right now, everything is dense here, right? That's the worst case. Your goal is to avoid serializing a dense zero vector and I say it can just be sparse, which solves the immediate problem. From there, some operations may or may not result in sparse vectors, but that's not relevant -- it can only be better than being all dense, which is the current case. It matters at the end because the return type is dense, but, making it dense is easy. Paying the cost of copying to a dense representation is the only downside, but that's small compared to the saving in serialization (I presume).

I don't understand the problem you're suggesting? are you saying that you don't want any operations to be on sparse vectors? I'd leave that choice to the implementations, but, if you're worried about it, simply force the argument of the seqOp to become dense. Then, the only change is the smaller serialization and everything should be identical afterwards.

@AnthonyTruchet
Copy link
Author

Actually aggregating (and thus sending on the network) on quite dense SparseVectors with 10s of million elements is not to taken lightly. This would required serious benchmarking. What I tell is that we can not do that lightly because we know that aggregating acc += partial when partial is sparse and acc dense is efficient locally (i.e. within a partition) and that after a partition has been aggregated the vector is already quite dense (enough so that the dense and sparse representation have similar size).

Anyway if you consider this is not worth an new API and an optimization in core, the discusion should probably go on on the MLlib only PR #16037 for which we already have a specific work around.
I just naively assumed that aggregating over big accumulator was typical enough to get some support from core, if this is not the case, I'll fall back on a MLlib only workaround.

Feel free to close this PR is you deem it (ir)relevant.

@srowen
Copy link
Member

srowen commented Nov 29, 2016

I see, you are saying that encoding an actually-dense vector as sparse is somewhat less efficient, and if the implementations happened to make that choice given sparse input, could be bad. In that case, just short-cut the problem and use .toDense to force a dense representation immediately.

@mridulm
Copy link
Contributor

mridulm commented Nov 29, 2016

Without understanding the specifics of the ML part here - wont the actual impact of a large dense vector on Task 'bytes' be minimal at best ?
We do compress the task binary; and 1B zero's should have fairly low compressed footprint, no ?

@AnthonyTruchet
Copy link
Author

@mridulm we don't know how to monitor the size of the serialize task. Sure it would not 10MB due to all those zeros. But we nonetheless measure a significant increase in performance and (more importantly) in stability when using the workaround and our custom TreeAggregatorWithZeroGenerator

@srowen when the density of a Sparse Vector increases it became very inefficient, not only because it is using almost twice the size of memory, but mainly because you fragment memory access on overload the GC :(

See #16078 for a generic wrapper around treeAggregate with uses Option[U] do denote the zero of U by None and generate a full representation by need when calling seqOp or comboOp.

@srowen
Copy link
Member

srowen commented Nov 30, 2016

As in #16078 I do not think we should merge a change like this. Let's fix the problem directly in #16037

@AnthonyTruchet
Copy link
Author

Agreed this is too ML specific to be deserve a fix in core.

@AnthonyTruchet AnthonyTruchet deleted the SPARK-18471 branch November 30, 2016 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants