Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10955][streaming] Add a warning if dynamic allocation for Streaming applications #8998

Closed
wants to merge 6 commits into from

Conversation

harishreedharan
Copy link
Contributor

Dynamic allocation can be painful for streaming apps and can lose data. Log a warning for streaming applications if dynamic allocation is enabled.

…lications.

Dynamic allocation can be painful for streaming apps and can lose data. The one drawback though
is that apps which run Streaming and non-streaming form the same context will also end up not
being able to do dynamic allocation.
@harishreedharan
Copy link
Contributor Author

/cc @vanzin @tdas

@harishreedharan
Copy link
Contributor Author

Actually looking at this again, I think our only option is to log a message. It is possible that the SparkContext was already created and passed to us, in which case we can't do anything about dynamic allocation.

@JoshRosen
Copy link
Contributor

/cc @andrewor14

@andrewor14
Copy link
Contributor

This looks good. I personally prefer to fail fast loudly since this is a correctness issue. Many people don't pay attention to warnings these days, especially in hosted environments.

@SparkQA
Copy link

SparkQA commented Oct 6, 2015

Test build #43289 has finished for PR 8998 at commit a4a5212.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 6, 2015

Test build #43290 has finished for PR 8998 at commit b3f9a95.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Oct 6, 2015

Yeah, looks fine to me. Perhaps an internal config option so that the check can be disabled? e.g. if someone wants to actually try it out and see how it behaves without having to recompile Spark.

@harishreedharan
Copy link
Contributor Author

Added config parameter to enable it if the user really wants to enable dynamic allocation

@markgrover
Copy link
Member

Pointing out the obvious here, but we should document this new property so someone doesn't have to read the source code to figure out what's the name of the property to enable dynamic allocation in streaming.

Also, I personally prefer the name spark.streaming.dynamicAllocation.allowed over spark.streaming.dynamicAllocation.enabled because one will still have to enable dynamic allocation for the entire spark context separately. And, as much as I don't like the idea of adding yet another property I don't really see a better way to do this, essentially we want to decouple folks from enabling dynamic allocation for streaming vs. non-streaming workloads. Currently, we use the same property, we could decouple by having 2 separate properties - one for streaming and one for the rest, or, alternatively, as is being suggested here, have a second property to allow/disallow dynamic allocation for streaming. The latter seems less hairy so I am for it.

@vanzin
Copy link
Contributor

vanzin commented Oct 6, 2015

but we should document this new property

If you document it, it becomes public, and you need to document that you removed it, and all sorts of things. Since the point is not to publicize it, instead just have a way for those tinkering with Spark itself to enable the feature, documentation is not necessary.

@SparkQA
Copy link

SparkQA commented Oct 6, 2015

Test build #43299 has finished for PR 8998 at commit 725f090.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@harishreedharan
Copy link
Contributor Author

I don't have a strong preference for the config name. @vanzin, @andrewor14 - like the current name or the one which @markgrover suggested? Vote please :)

@tdas
Copy link
Contributor

tdas commented Oct 6, 2015

I strongly think that we should not add that require. It can break existing streaming applications very easily. People might be using dynamic allocation with direct kafka or kinesis or with anything + WAL, in which case it is fine to have dynamic allocation as the system will not loose data. Upgrading Spark will immediately break their application, and it might be hard to figure out the solution for disabling dynamic allocation in their setting.

We have made mistakes in the past by throwing errors like this for usecases which we didnt think would be useful in production, and people have complained that it broke their testing stuff. See https://issues.apache.org/jira/browse/SPARK-8630 and discussion https://www.mail-archive.com/search?l=user@spark.apache.org&q=subject:%22Re%5C%3A+QueueStream+Does+Not+Support+Checkpointing%22&o=newest&f=1

We must not break stuff. So I suggest this be turned into a warning. In that case we dont need a new configuration.

@tdas
Copy link
Contributor

tdas commented Oct 6, 2015

Also please update the PR title to be similar to other PRs.

@harishreedharan
Copy link
Contributor Author

As I said, I don't mind either. But in this case, we need to be conservative. We can additional checks to see if WAL is enabled, but it is not possible to actually check what DStreams are being used (it could be a custom one, which does not break if executors go away), so that case still exists.

How about we print the config param out or document it in that case?

@harishreedharan harishreedharan changed the title [streaming] SPARK-10955. Disable dynamic allocation for Streaming app… [SPARK-10955][streaming] Disable dynamic allocation for Streaming app… Oct 6, 2015
@harishreedharan
Copy link
Contributor Author

@tdas What do you think about the above? If you still think we should just make it a warn, I will make the change.

@tdas
Copy link
Contributor

tdas commented Oct 6, 2015

Checking which types of dstreams are used and what configurations are used and accordingly throwing errors seems to be complicated and brittle. I think its best to log a warning.

In general, if they are not using WAL or reliable sources, they should be aware that they are going to loose data under failures anyways, and they should be aware of that anyways. Dynamic allocation really does not aggravate the problem significantly more.

if (Utils.isDynamicAllocationEnabled(sc.conf)) {
logWarning("Dynamic Allocation is enabled for this application. " +
"Enabling Dynamic allocation for Spark Streaming applications can cause data loss if " +
"Write Ahead Log is not enabled for non-replayable sources, like Flume.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove ","
You can also add "See the documentation for enabling the Write Ahead Log".

@SparkQA
Copy link

SparkQA commented Oct 7, 2015

Test build #43306 has finished for PR 8998 at commit eaa48cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 7, 2015

Test build #43308 has finished for PR 8998 at commit 2733d94.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor

tdas commented Oct 7, 2015

@harishreedharan you missed one small point, probably because i edited the comment later. Could you also add "See the programming guide for enabling the Write Ahead Log" .

@tdas
Copy link
Contributor

tdas commented Oct 7, 2015

Other than that its LGTM.

@SparkQA
Copy link

SparkQA commented Oct 7, 2015

Test build #43330 has finished for PR 8998 at commit 462b264.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@harishreedharan harishreedharan changed the title [SPARK-10955][streaming] Disable dynamic allocation for Streaming app… [SPARK-10955][streaming] Add a warning if dynamic allocation for Streaming applications Oct 7, 2015
@tdas
Copy link
Contributor

tdas commented Oct 9, 2015

LGTM, merging this to master and branch 1.5

asfgit pushed a commit that referenced this pull request Oct 9, 2015
…eaming applications

Dynamic allocation can be painful for streaming apps and can lose data. Log a warning for streaming applications if dynamic allocation is enabled.

Author: Hari Shreedharan <hshreedharan@apache.org>

Closes #8998 from harishreedharan/ss-log-error and squashes the following commits:

462b264 [Hari Shreedharan] Improve log message.
2733d94 [Hari Shreedharan] Minor change to warning message.
eaa48cc [Hari Shreedharan] Log a warning instead of failing the application if dynamic allocation is enabled.
725f090 [Hari Shreedharan] Add config parameter to allow dynamic allocation if the user explicitly sets it.
b3f9a95 [Hari Shreedharan] Disable dynamic allocation and kill app if it is enabled.
a4a5212 [Hari Shreedharan] [streaming] SPARK-10955. Disable dynamic allocation for Streaming applications.

(cherry picked from commit 0984129)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
@asfgit asfgit closed this in 0984129 Oct 9, 2015
ashangit pushed a commit to ashangit/spark that referenced this pull request Oct 19, 2016
…eaming applications

Dynamic allocation can be painful for streaming apps and can lose data. Log a warning for streaming applications if dynamic allocation is enabled.

Author: Hari Shreedharan <hshreedharan@apache.org>

Closes apache#8998 from harishreedharan/ss-log-error and squashes the following commits:

462b264 [Hari Shreedharan] Improve log message.
2733d94 [Hari Shreedharan] Minor change to warning message.
eaa48cc [Hari Shreedharan] Log a warning instead of failing the application if dynamic allocation is enabled.
725f090 [Hari Shreedharan] Add config parameter to allow dynamic allocation if the user explicitly sets it.
b3f9a95 [Hari Shreedharan] Disable dynamic allocation and kill app if it is enabled.
a4a5212 [Hari Shreedharan] [streaming] SPARK-10955. Disable dynamic allocation for Streaming applications.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants