Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] User's resources quota #211

Closed
smallzhongfeng opened this issue Sep 13, 2022 · 34 comments
Closed

[Feature] User's resources quota #211

smallzhongfeng opened this issue Sep 13, 2022 · 34 comments
Assignees

Comments

@smallzhongfeng
Copy link
Contributor

At present, we can not limit the user's resources. Maybe we can manually update the number of tasks submitted by the user through a configuration file. When the quota is exceeded, the app will be rejected, and the number of apps of different users can be used to represent resource quotas. What do u think? @jerqi

@zuston
Copy link
Member

zuston commented Sep 13, 2022

Maybe this could be solved by implementing a custom AccessChecker to limit the users quota, I have done this.

@smallzhongfeng
Copy link
Contributor Author

At present, the app level is also limited, right?

@smallzhongfeng
Copy link
Contributor Author

Because I haven't seen similar PR for the time being, so I create this issue.

@zuston
Copy link
Member

zuston commented Sep 13, 2022

At present, the app level is also limited, right?

Yes, I introduce a custom access checker to do following operation

  1. Do grey-scale.
  2. Add the blacklist for some jobs to fallback ESS.
  3. ...

And so I think the resource quotas limitation could be implemented in custom access checker.

Please let me know If I misunderstand u.

@smallzhongfeng
Copy link
Contributor Author

Very coincidentally, our ideas are similar :) .

@zuston
Copy link
Member

zuston commented Sep 13, 2022

Very coincidentally, our ideas are similar :) .

Maybe this is the best practice

@smallzhongfeng
Copy link
Contributor Author

When will this feature be available? @zuston

@zuston
Copy link
Member

zuston commented Sep 15, 2022

I think you misunderstand my thought. I implement the custom access checker to solve the problem you mentioned. You can do similar operations like me. And I think I wont submit this access checker to the uniffle codebase, because maybe it's not general.

@smallzhongfeng
Copy link
Contributor Author

Well, although I think this may actually have some effect on user isolation, we can try to let users with high priority use more resources. I can understand what you mean.

@smallzhongfeng
Copy link
Contributor Author

In fact, we have also achieved it, and completed the launch, the effect is still obvious, the user's resources are effectively managed, and it is easier to calculate the cost of the user's use for billing, so this issue is mentioned.

@jerqi
Copy link
Contributor

jerqi commented Sep 21, 2022

User quota is ok for us. I think it's the part work of multi-tent user support.

@smallzhongfeng
Copy link
Contributor Author

smallzhongfeng commented Sep 22, 2022

I can raise a pr if needed. @jerqi

@jerqi
Copy link
Contributor

jerqi commented Sep 27, 2022

I can raise a pr if needed. @jerqi

If the pr is large, you could write a design document first.

@Gustfh
Copy link

Gustfh commented Oct 12, 2022

any update? we also have plan to do this. may i ask whats the scope of the quota limit ? is it on single shuffle server ? or for the whole shuffle size. am thinking maybe we can do it as server level quota, so this feature can work with multiple server feature, the shuffle write could write the rest blocks to another server.

@zuston
Copy link
Member

zuston commented Oct 13, 2022

Currently I have no ideas on concrete design. If you want to contribute this feature, it’s better to have a simple design doc for reviewing. @Gustfh

Do u have some plan to invest this ticket? @smallzhongfeng

@smallzhongfeng
Copy link
Contributor Author

In the versions used internally in our company, we use quotas to limit the number of apps that a single user can submit. I don't have much idea about the number of shuffle servers that a single user can use. But I will write a simple document this weekend to discuss whether there are other requirements that can be developed in the future.@jerqi @zuston @Gustfh

@smallzhongfeng
Copy link
Contributor Author

@jerqi
Copy link
Contributor

jerqi commented Oct 17, 2022

Could you add some diagrams?

@smallzhongfeng
Copy link
Contributor Author

OK, I will add later.

@jerqi
Copy link
Contributor

jerqi commented Oct 17, 2022

Could you give us the authority of the comment?

@zuston
Copy link
Member

zuston commented Oct 17, 2022

Could you give us the authority of the comment?

+1

@smallzhongfeng
Copy link
Contributor Author

Sorry, I forgot to give you permission, it has been updated.@jerqi @zuston

@Gustfh
Copy link

Gustfh commented Oct 19, 2022

so it's user level quota, what if single app produce large shuffle data, then impact other app, for example a app have large shuffle data and also have lots of stage, and running for days, if you enable memory storage, this app's shuffle could live in memory for long times, am wonder should we have a quota for this situation.

@zuston
Copy link
Member

zuston commented Oct 19, 2022

so it's user level quota, what if single app produce large shuffle data, then impact other app, for example a app have large shuffle data and also have lots of stage, and running for days, if you enable memory storage, this app's shuffle could live in memory for long times, am wonder should we have a quota for this situation.

+1. I think the quota of bytes used by app/hadoop-user also should be involved in the design. And I think the different quota limitation like app-number/storage-bytes could be enabled by user.

@jerqi
Copy link
Contributor

jerqi commented Oct 21, 2022

@smallzhongfeng If you add some extra interfaces, you should describe them in the document.

@smallzhongfeng
Copy link
Contributor Author

Could you add some diagrams?

I added a simple graphic to illustrate the process of Spark's resource limitation. A more complete pr will be proposed this week.

@smallzhongfeng
Copy link
Contributor Author

so it's user level quota, what if single app produce large shuffle data, then impact other app, for example a app have large shuffle data and also have lots of stage, and running for days, if you enable memory storage, this app's shuffle could live in memory for long times, am wonder should we have a quota for this situation.

This is a good suggestion. I am currently developing it, which may be implemented in the next pr.

@jerqi
Copy link
Contributor

jerqi commented Oct 25, 2022

@smallzhongfeng @Gustfh @zuston Do you want to discuss this issue through a meeting? I will start a meeting to discuss the issue #80, I want to discuss this issue, too. There are some other issues which we need to discuss, so I will send a email to our dev mail list, and select a proper date to start the meeting. You can tell me what time you are free by the email.

@smallzhongfeng
Copy link
Contributor Author

Of course, I'm looking forward to it.

@jerqi
Copy link
Contributor

jerqi commented Oct 26, 2022

@Gustfh @smallzhongfeng I have already send an email https://lists.apache.org/thread/2jlm3fswmsxy619ldyo4px700p3ybnvc. Do you have time at 11 am (UTC +8) Thursday this week?

@jerqi
Copy link
Contributor

jerqi commented Oct 26, 2022

@smallzhongfeng @Gustfh

Meeting link is https://meeting.tencent.com/dm/oR95wASCNe91

@Gustfh
Copy link

Gustfh commented Oct 26, 2022

ye, we are looking forward to it

@jerqi
Copy link
Contributor

jerqi commented Oct 27, 2022

Offline Discussion Result:
We should consider the quota of coordinator and the quota of shuffle server .

jerqi pushed a commit that referenced this issue Nov 22, 2022
### What changes were proposed in this pull request?
For issue #211 and the design document [https://docs.google.com/document/d/1MApSMFQgoS1VAoKbZjomqSRm0iTbSuKG1yvKNlWW65c/edit?usp=sharing](https://docs.google.com/document/d/1MApSMFQgoS1VAoKbZjomqSRm0iTbSuKG1yvKNlWW65c/edit?usp=sharing)

### Why are the changes needed?
Better isolation of resources between different users.

### Does this PR introduce _any_ user-facing change?
Add config `rss.coordinator.quota.default.app.num` to set default app number each user and `rss.coordinator.quota.default.path` to set a path to record the number of apps that each user can run.

### How was this patch tested?

Add uts.
@jerqi
Copy link
Contributor

jerqi commented Nov 22, 2022

close by #311

@jerqi jerqi closed this as completed Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants