TaskCount == parittions if you dont specify in your spec#13339
TaskCount == parittions if you dont specify in your spec#13339churromorales wants to merge 3 commits intoapache:masterfrom
Conversation
|
@kfaraz does this look good to you? Thank you |
|
@churromorales , it makes sense for the MM-less setup. But for a cluster running on middle managers, if we set the number of tasks equal to the number of partitions, that means each task would be mapped to a single partition. If we don't have enough workers to launch that many tasks, some partitions will never be read from. So maybe we can use the new default logic only for the MM-less setup for now? |
|
This is just if you leave the taskCount parameter out of your spec. You can always specify taskCount=xx in the spec and it it would launch only that many tasks. |
|
One other thing, I have run into issues like this when having tasks handle multiple partitions. #8139 I personally think the default should do the most sensible thing, instead of defaulting to 1, just to satisfy a small set of users. 1 is the safest thing to do, but if you have more partitions than tasks, you are not necessarily safe either. Now I haven’t checked if this happens in the latest Druid version, we stopped having a task handle more than one partition, so I never looked into this issue again after it bit us enough times. Also this is part of the supervisor, so I believe the change set is agnostic to the task runner implementation. I don’t see an easy way to make this work just for mm-less without a hack. Anyways let me know your thoughts. I do agree that in a mm world having tasks == partitions may not be an ideal default. But in most deployments, 1 as a default doesn’t make much sense either. I guess there is no “right answer” here :(. Thanks |
|
Yeah, #8139 needs some investigation. We should be able to support multiple partitions per task. As you said, 1 is the safest option, but I see your point.
For this, we already have a |
|
Fair enough, can we do a compromise here. How about this? if you specify taskCount that is respected if you specify a taskCount of I think that way for all existing users, nothing changes. But for users that want to default to the number of partitions for their queue and not really worry about things, they pass Are you okay with this? |
Thanks for your prompt response, @churromorales . I don't really like the idea of having a special meaning attached to a specific value of a config. Especially, passing a negative value to a config that is clearly supposed to be positive is not so great if we wanted to validate the parameters. That said, we could probably have this behaviour on passing task count as The idea of defaulting task count to number of partitions is a good one. We just need to implement it in the right way. Does the worker capacity based solution not seem viable to you? Even with passing the |
|
I agree with Kashif about overloading this configuration. I would rather stick with the original proposal made in this PR. Any user should be setting up a reasonable value for the number of tasks in a production environment. And those specs don't get affected by this change. I have marked it as "Design Review" since its a change in config behavior. |
|
any update on this? I personally think in a production environment it would not be prudent to run with a default of 1 task. I think having taskCount == partitions would be more sensible as a default. But if you guys have customers that rely on having a default of |
|
@churromorales , as @abhishekagarwal87 mentions, in a production environment, we should always be prudent while using the default value of any config, especially one that dictates the usage of resources such as task slots. So I agree that in prod, no one should be using a task count of 1. I have not seen anyone do it in my experience either. The only concern I have with using a default value of How about we do this:
This way, we ensure that ingestion always runs successfully and has a better default value than 1. I agree that this might lead to multiple partitions being mapped to a single task, but only in the case where your setup doesn't have enough task slots. In a prod environment, we would ideally have enough task slots. Let me know what you think. |
|
This pull request has been marked as stale due to 60 days of inactivity. |
|
This pull request/issue has been closed due to lack of activity. If you think that |
If you don't specify taskCount in your supervisor spec, it now defaults to the # of partitions you have in kinesis / kafka. This is useful for customers that are in a growth phase. Where they start with x number of partitions but then have to grow by some amount. Before you would have to update the supervisor spec and add middle managers (if needed). With the mm-less patch in druid, if a customer increased the number of partitions, all you have to do is restart the supervisor and it will pick up the new partitions. Even if they don't grow, it is best to have one task per partition as the default (unless others oppose this).
Future work: this patch forces a manual restart of the supervisor, but no spec changes are needed, or knowledge of how many partitions your message bus has. In the future we could have a thread that detects a change and automatically restarts the supervisor, but that might be overkill.