-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19605][DStream] Fail it if existing resource is not enough to run streaming job #16936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #72926 has finished for PR 16936 at commit
|
| ssc.conf.get(DYN_ALLOCATION_MAX_EXECUTORS) | ||
| } else { | ||
| val targetNumExecutors = | ||
| sys.env.get("SPARK_EXECUTOR_INSTANCES").map(_.toInt).getOrElse(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here "2" refers to "YarnSparkHadoopUtil.DEFAULT_NUMBER_EXECUTORS"
| .intConf | ||
| .createWithDefault(1) | ||
|
|
||
| private[spark] val EXECUTOR_MEMORY_OVERHEAD = ConfigBuilder("spark.yarn.executor.memoryOverhead") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multi-definition error in ApplicationMaster.scala, remove this as we add it in core
| def test() { | ||
| val conf = new SparkConf().setMaster("local").setAppName("CreationSite test") | ||
| val conf = new SparkConf().setMaster("local[2]").setAppName("CreationSite test") | ||
| val ssc = new StreamingContext(conf, Milliseconds(100)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit test fail here
|
Test build #72928 has finished for PR 16936 at commit
|
|
(There is no detail in the JIRA that you point to.) I think we already discussed this in part? this duplicates some logic about how cluster resource management works. Does it really not work if not enough receivers can schedule? that seems like a condition to check somewhere else, like, log a warning that a batch can't operate or can't start because not all receivers scheduled. This could happen even if enough resources are theoretically available, just not available to the app. |
That's not what I want to express. What I mean is the stream output can not operate if there is no enough resource, i.e. existing resource is just enough or even not enough to schedule receiver. |
|
Hm it just seems like the wrong approach, to externally estimate whether in theory it won't schedule. It is certainly a problem if streaming doesn't work though users would already realize it. The error check or message could be more explicit but it seems like something the streaming machinery should know and warn about? |
|
Let us call @zsxwing for some suggestions. |
|
I agreed with @srowen. Streaming should not need to know the details about the mode, and when someone changes other modules, they won't know these codes inside streaming. I would like to see a cleaner solution. In addition, when there are not enough resources in a cluster, even if the user sets a large core number, they will still not be able to run the streaming job. Hence, they should always be aware of this limitation of Streaming receivers, and it seems not worth to fix this issue. |
What changes were proposed in this pull request?
For more detailed discussion, please review:
How was this patch tested?
add new unit test.