-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-8733][MLLIB] ML RDD.unpersist calls should use blocking = false #7160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Merged build triggered. |
|
Merged build started. |
|
Test build #36269 has started for PR 7160 at commit |
import ordering
import ordering
space
|
Merged build triggered. |
|
Merged build started. |
|
Test build #36269 has finished for PR 7160 at commit
|
|
Merged build finished. Test FAILed. |
|
Test build #36272 has started for PR 7160 at commit |
|
Test build #36272 has finished for PR 7160 at commit
|
|
Merged build finished. Test FAILed. |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #36273 has started for PR 7160 at commit |
|
Test build #36273 has finished for PR 7160 at commit
|
|
Merged build finished. Test PASSed. |
|
@ilganeli I'm not sure whether this is a safe change. If the old ones are not deleted from block managers, new RDDs might not be able to cache or have to downgrade the storage level. Since it is hard to tell when the next caching happens, we set blocking to true to be safe. It would be great if you can share some benchmark results, so we can easily see the trade-off and decide. |
|
@mengxr @jkbradley Initially reported this, perhaps he should weigh in here? |
|
I think the concern is that the memory cache is not fully LRU. If a new big block shows up when the memory cache is almost full, then there's a chance that the new block will end up being dropped straight to disk instead. Here's how this issue is related: if the asynchronous unpersist happens too slowly, a workload that previously would have benefited from the new block being cached in memory may now drop the new block to disk directly instead. Whether or not this actually happens is largely workload dependent. @ilganeli Have you had a chance to benchmark performances of a few algorithms before and after this change? It'll be good to see if this does indeed speed up many workloads. |
|
Good points; I suppose unpersisting without blocking might cause other slowdowns which could cause other timeouts to occur. This does seem like the kind of change which might require running all of the MLlib tests in spark-perf on a cluster to test for speed changes. Perhaps a better solution in the meantime is simply to adjust timeout settings as needed. @mengxr I'm OK with closing the JIRA if it does not seem worth the trouble of testing. @ilganeli I doubt we have the bandwidth to test right now, but do let us know if you'd like to keep this open & run tests. |
|
All - thanks for the clarification and weigh in. I don't have any pipelines set up that would exercise this with the level of rigor necessary. I think it's fine to close it for now. Thanks. Thank you, -----Original Message----- Good points; I suppose unpersisting without blocking might cause other slowdowns which could cause other timeouts to occur. This does seem like the kind of change which might require running all of the MLlib tests in spark-perf on a cluster to test for speed changes. Perhaps a better solution in the meantime is simply to adjust timeout settings as needed. @mengxrhttps://github.com/mengxr I'm OK with closing the JIRA if it does not seem worth the trouble of testing. @ilganelihttps://github.com/ilganeli I doubt we have the bandwidth to test right now, but do let us know if you'd like to keep this open & run tests. — The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer. |
|
OK, sorry for the trouble! Can you please close this PR? (We can't) |
|
Will do once I have access to a PC. No trouble at all. Thank you, -----Original Message----- OK, sorry for the trouble! Can you please close this PR? (We can't) — The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer. |
Updated all usages within the ML Lib module to use blocking = false.