-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10955][streaming] Add a warning if dynamic allocation for Streaming applications #8998
Conversation
…lications. Dynamic allocation can be painful for streaming apps and can lose data. The one drawback though is that apps which run Streaming and non-streaming form the same context will also end up not being able to do dynamic allocation.
Actually looking at this again, I think our only option is to log a message. It is possible that the |
/cc @andrewor14 |
This looks good. I personally prefer to fail fast loudly since this is a correctness issue. Many people don't pay attention to warnings these days, especially in hosted environments. |
Test build #43289 has finished for PR 8998 at commit
|
Test build #43290 has finished for PR 8998 at commit
|
Yeah, looks fine to me. Perhaps an internal config option so that the check can be disabled? e.g. if someone wants to actually try it out and see how it behaves without having to recompile Spark. |
463fd1a
to
725f090
Compare
Added config parameter to enable it if the user really wants to enable dynamic allocation |
Pointing out the obvious here, but we should document this new property so someone doesn't have to read the source code to figure out what's the name of the property to enable dynamic allocation in streaming. Also, I personally prefer the name |
If you document it, it becomes public, and you need to document that you removed it, and all sorts of things. Since the point is not to publicize it, instead just have a way for those tinkering with Spark itself to enable the feature, documentation is not necessary. |
Test build #43299 has finished for PR 8998 at commit
|
I don't have a strong preference for the config name. @vanzin, @andrewor14 - like the current name or the one which @markgrover suggested? Vote please :) |
I strongly think that we should not add that require. It can break existing streaming applications very easily. People might be using dynamic allocation with direct kafka or kinesis or with anything + WAL, in which case it is fine to have dynamic allocation as the system will not loose data. Upgrading Spark will immediately break their application, and it might be hard to figure out the solution for disabling dynamic allocation in their setting. We have made mistakes in the past by throwing errors like this for usecases which we didnt think would be useful in production, and people have complained that it broke their testing stuff. See https://issues.apache.org/jira/browse/SPARK-8630 and discussion https://www.mail-archive.com/search?l=user@spark.apache.org&q=subject:%22Re%5C%3A+QueueStream+Does+Not+Support+Checkpointing%22&o=newest&f=1 We must not break stuff. So I suggest this be turned into a warning. In that case we dont need a new configuration. |
Also please update the PR title to be similar to other PRs. |
As I said, I don't mind either. But in this case, we need to be conservative. We can additional checks to see if WAL is enabled, but it is not possible to actually check what DStreams are being used (it could be a custom one, which does not break if executors go away), so that case still exists. How about we print the config param out or document it in that case? |
@tdas What do you think about the above? If you still think we should just make it a warn, I will make the change. |
Checking which types of dstreams are used and what configurations are used and accordingly throwing errors seems to be complicated and brittle. I think its best to log a warning. In general, if they are not using WAL or reliable sources, they should be aware that they are going to loose data under failures anyways, and they should be aware of that anyways. Dynamic allocation really does not aggravate the problem significantly more. |
5ea6690
to
eaa48cc
Compare
if (Utils.isDynamicAllocationEnabled(sc.conf)) { | ||
logWarning("Dynamic Allocation is enabled for this application. " + | ||
"Enabling Dynamic allocation for Spark Streaming applications can cause data loss if " + | ||
"Write Ahead Log is not enabled for non-replayable sources, like Flume.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove ","
You can also add "See the documentation for enabling the Write Ahead Log".
Test build #43306 has finished for PR 8998 at commit
|
Test build #43308 has finished for PR 8998 at commit
|
@harishreedharan you missed one small point, probably because i edited the comment later. Could you also add "See the programming guide for enabling the Write Ahead Log" . |
Other than that its LGTM. |
Test build #43330 has finished for PR 8998 at commit
|
LGTM, merging this to master and branch 1.5 |
…eaming applications Dynamic allocation can be painful for streaming apps and can lose data. Log a warning for streaming applications if dynamic allocation is enabled. Author: Hari Shreedharan <hshreedharan@apache.org> Closes #8998 from harishreedharan/ss-log-error and squashes the following commits: 462b264 [Hari Shreedharan] Improve log message. 2733d94 [Hari Shreedharan] Minor change to warning message. eaa48cc [Hari Shreedharan] Log a warning instead of failing the application if dynamic allocation is enabled. 725f090 [Hari Shreedharan] Add config parameter to allow dynamic allocation if the user explicitly sets it. b3f9a95 [Hari Shreedharan] Disable dynamic allocation and kill app if it is enabled. a4a5212 [Hari Shreedharan] [streaming] SPARK-10955. Disable dynamic allocation for Streaming applications. (cherry picked from commit 0984129) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
…eaming applications Dynamic allocation can be painful for streaming apps and can lose data. Log a warning for streaming applications if dynamic allocation is enabled. Author: Hari Shreedharan <hshreedharan@apache.org> Closes apache#8998 from harishreedharan/ss-log-error and squashes the following commits: 462b264 [Hari Shreedharan] Improve log message. 2733d94 [Hari Shreedharan] Minor change to warning message. eaa48cc [Hari Shreedharan] Log a warning instead of failing the application if dynamic allocation is enabled. 725f090 [Hari Shreedharan] Add config parameter to allow dynamic allocation if the user explicitly sets it. b3f9a95 [Hari Shreedharan] Disable dynamic allocation and kill app if it is enabled. a4a5212 [Hari Shreedharan] [streaming] SPARK-10955. Disable dynamic allocation for Streaming applications.
Dynamic allocation can be painful for streaming apps and can lose data. Log a warning for streaming applications if dynamic allocation is enabled.