Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adapter: Add resource limit for compute credits #18846

Merged
merged 33 commits into from
Apr 21, 2023

Conversation

jkosh44
Copy link
Contributor

@jkosh44 jkosh44 commented Apr 19, 2023

Adds a resource limit for compute credits pre hour. Additionally, update the resource limit error messages.

Resolves #18830

Motivation

This PR adds a known-desirable feature.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (https://github.com/MaterializeInc/cloud/issues/6216).
  • This PR includes the following user-facing behavior changes:
    • This release adds a max compute credits per hour resource limit

Adds a resource limit for compute credits pre hour. Additionally,
update the resource limit error messages.

Resolves MaterializeInc#18830
@jkosh44 jkosh44 marked this pull request as ready for review April 19, 2023 17:17
@jkosh44 jkosh44 requested a review from a team as a code owner April 19, 2023 17:17
@jkosh44 jkosh44 requested review from a team April 19, 2023 17:17
@jkosh44 jkosh44 requested a review from a team as a code owner April 19, 2023 17:17
@jkosh44
Copy link
Contributor Author

jkosh44 commented Apr 19, 2023

@philip-stoev Feel free to delegate to someone else on the QA team. Also is there a better way to get the tests passing other than sprinkling a bunch of ALTER SYSTEM SET everywhere?

Comment on lines 82 to 83
/// The number of compute credits per hour.
pub compute_credits_per_hour: f64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels odd that this is a float (instead of e.g. an int representing 1000ths of a credit). I've traditionally been taught that anything accounting-related being a float is a code smell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Numeric (i.e. Decimal<13>), does that have the same code smell?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK no

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to Numeric then. I think it's a bit more straight forward than converting back and forth between 1000ths of a credit.

@umanwizard
Copy link
Contributor

Don't we need a cloud PR to add the new values to the size mappings?

@jkosh44
Copy link
Contributor Author

jkosh44 commented Apr 19, 2023

Don't we need a cloud PR to add the new values to the size mappings?

Yes, but I think the cloud team is going to do that, @chaas can you confirm?

@umanwizard
Copy link
Contributor

We should have an issue with the release-blocker tag for this to be added to cloud (per the PR checklist)

@chaas
Copy link
Contributor

chaas commented Apr 19, 2023

size mappings

Filed an issue in cloud https://github.com/MaterializeInc/cloud/issues/6216

Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
Copy link
Member

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo bikeshed about max_credits_per_hour.

Maybe 64 was too low of a default. Should we set it to 1024 by default so you don't have to constantly adjust tests to bump the limit?

src/adapter/src/catalog/builtin.rs Outdated Show resolved Hide resolved
src/adapter/src/catalog/builtin_table_updates.rs Outdated Show resolved Hide resolved
src/adapter/src/catalog/builtin_table_updates.rs Outdated Show resolved Hide resolved
src/adapter/src/catalog/config.rs Outdated Show resolved Hide resolved
src/adapter/src/catalog/config.rs Outdated Show resolved Hide resolved
src/adapter/src/catalog/config.rs Outdated Show resolved Hide resolved
src/adapter/src/catalog/config.rs Outdated Show resolved Hide resolved
src/controller/src/clusters.rs Outdated Show resolved Hide resolved
src/sql/src/session/vars.rs Outdated Show resolved Hide resolved
@benesch
Copy link
Member

benesch commented Apr 19, 2023

Thanks again for turning this around so quickly!

jkosh44 and others added 3 commits April 19, 2023 18:18
Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
@jkosh44
Copy link
Contributor Author

jkosh44 commented Apr 20, 2023

Item No 3. The settings has a "per hour" in its name, but is there anything timing dependent here? From what I understand, it limits the current usage at any given time, that is, it is not possible to have 60 concurrent clusters running for 1 minute and consume 1 credit :-) Some customer confusion may arise as a result.

This could hopefully be solved with a better description for the variable itself. @MaterializeInc/docs any thoughts? The current description is in show.md and is "The maximum number of credits per hour in the region."

@jkosh44
Copy link
Contributor Author

jkosh44 commented Apr 20, 2023

I'm going to merge this once the cloud PR is merged.

@jseldess
Copy link
Contributor

jseldess commented Apr 20, 2023

Sorry for the delay, @jkosh44.

This could hopefully be solved with a better description for the variable itself. https://github.com/orgs/MaterializeInc/teams/docs any thoughts? The current description is in show.md and is "The maximum number of credits per hour in the region."

This is hard to do well because it ties into how we charge, but I agree that per_hour in the name makes this even harder. If this isn't a limit per hour, but rather a limit at any given point in time, like max_databases or max_clusters, I would remove per_hour from the name. I'm not sure I can come up with a better description, but here's an attempt:

The maximum credit usage in a region at any point in time. Credits are charged based on the number of replicas and their size.

Also, is this really per region and not per account?

@benesch
Copy link
Member

benesch commented Apr 20, 2023

Item No 3. The settings has a "per hour" in its name, but is there anything timing dependent here? From what I understand, it limits the current usage at any given time, that is, it is not possible to have 60 concurrent clusters running for 1 minute and consume 1 credit :-) Some customer confusion may arise as a result.

This is hard to do well because it ties into how we charge, but I agree that per_hour in the name makes this even harder. If this isn't a limit per hour, but rather a limit at any given point in time, like max_databases or max_clusters, I would remove per_hour from the name.

The problem is that the unit is not "credits" but "credits per hour." I agree this is confusing!

Here's how I think about it: this limit is equivalent to a driving speed limit. We call that a "speed limit" or a "maximum speed", but the limit is expressed in miles per hour. You might even say "max mph." But you wouldn't want to say "max miles", as that's would express a constraint on the maximum distance traveled, not the maximum rate at which distance is traveled (speed).

Would max_credit_rate be less confusing? (Or ... max_credit_speed?)

@benesch
Copy link
Member

benesch commented Apr 20, 2023

The problem with max_credit_rate is that that "credit rate" sounds like "the price I pay for credits" not "the rate at which I am depleting credits." Sigh. Naming is hard.

@sploiselle
Copy link
Contributor

max_credit_burn? Or too negative?

@benesch benesch added the release-blocker Critical issue that should block *any* release if not fixed label Apr 20, 2023
@benesch
Copy link
Member

benesch commented Apr 20, 2023

Filing as a release blocker because this needs to make the release tomorrow.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Apr 20, 2023

Here are the current proposals (plus two that I'm adding):

  • max_credits_per_hour
  • max_credit_rate
  • max_credit_burn
  • max_credit_limit
  • credit_limit

@benesch Do you want to just have the final say and pick one? We can always change it later since users never actually type in the field name.

Also,

Also, is this really per region and not per account?

Can you confirm this? I'm pretty sure the answer is yes.

@benesch
Copy link
Member

benesch commented Apr 20, 2023

Also, is this really per region and not per account?

Can you confirm this? I'm pretty sure the answer is yes.

Yes, confirmed.

@benesch
Copy link
Member

benesch commented Apr 20, 2023

@benesch Do you want to just have the final say and pick one?

Well, I like what we currently have (max_credits_per_hour)—after all I proposed it! But @jseldess is the one who has to explain it.

One more proposal to add: max_credit_consumption_rate.

I say we give it until 11am tomorrow for @jseldess and/or @frankmcsherry to express an opinion. If we don't hear from them, then let's go with what you have. Does that timing work for you, @jkosh44?

@jseldess
Copy link
Contributor

The problem is that the unit is not "credits" but "credits per hour." I agree this is confusing!

@benesch, I definitely don't want to block, but I I'm still having trouble understanding this. The credits per hour unit is for billing, correct? So if in a given hour I have 2 medium replicas (4 credits per hour each), for the full hour or any portion of it, we bill for 8 credits?

Assuming that's how it works, and it's probably not, this internal limit isn't using the unit the same way. Here, it's not for billing per hour but for limiting the number of resources we allow at any given point in time. Am I way off?

@benesch
Copy link
Member

benesch commented Apr 20, 2023

@benesch, I definitely don't want to block, but I I'm still having trouble understanding this. The credits per hour unit is for billing, correct? So if in a given hour I have 2 medium replicas (4 credits per hour each), for the full hour or any portion of it, we bill for 8 credits?

So, we bill by the second. Say you have those 2 medium replicas for 3 minutes each. We'll bill you for 2 replicas * 4 credits/hour/replica * 3 minutes * 1/60 hours/minute = 0.4 credits.

The limit restricts the maximum instantaneous credit consumption rate. So if max_credits_per_hour were set to 6, you'd be allowed to have one medium replica (instantaneous credit consumption rate of 4 credits per hour), but not two, as that would result in an instantaneous credit consumption rate of 8 credits per hour, exceeding the maximum consumption rate of 6. But you could have one medium replica and one small replica, as that would be a instantaneous credit consumption rate of exactly 6 credits per hour.

@benesch
Copy link
Member

benesch commented Apr 20, 2023

I am increasingly thinking that max_credit_consumption_rate is more clear than max_credits_per_hour.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Apr 21, 2023

I say we give it until 11am tomorrow for @jseldess and/or @frankmcsherry to express an opinion. If we don't hear from them, then let's go with what you have. Does that timing work for you, @jkosh44?

Yep, that gives me 3 hours before the 2pm deadline which should be plenty.

@philip-stoev
Copy link
Contributor

We can suggest users run this query?

SELECT SUM(s.credits_per_hour) total_credits_per_hour from mz_clusters c
JOIN mz_cluster_replicas r ON c.id = r.cluster_id
JOIN mz_internal.mz_cluster_replica_sizes s ON r.size=s.size;

oh, I did not realize this column exists -- it does not seem to be exercised by the test suite.

Apart from that, the query is reasonable, (except that it also counts the internal clusters, which the limit enforcement does not) so I think no further changes to SHOW or any of the other tables would be warranted.

@philip-stoev
Copy link
Contributor

I suspected this PR will have a merge skew conflict with a test-fixing PR that operated in the same area, so I kicked the merge skew checker and indeed it is now red :

https://buildkite.com/materialize/tests/builds/54145#0187a3fd-4a99-42bc-b3a2-7e8550a99006

@jseldess
Copy link
Contributor

Sorry to be so down to the wire here. I think it's hard to find a name that fully captures the meaning here, but I like max_credit_consumption_rate better than max_credits_per_hour. If we go with that, could we change the description to the following?

The maximum rate of credit consumption in a region. Credits are consumed based on the size of cluster replicas in use.

Lots that still needs to be unpacked, but probably not (just) in this variable description.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Apr 21, 2023

Ok, it's 11AM. I went with the name max_credit_consumption_rate and description "The maximum rate of credit consumption in a region. Credits are consumed based on the size of cluster replicas in use.". I'm setting this to auto-merge, so speak up and disable auto-merge if you have something to say.

@jkosh44 jkosh44 enabled auto-merge (squash) April 21, 2023 15:01
@jkosh44 jkosh44 merged commit a4053bb into MaterializeInc:main Apr 21, 2023
@jkosh44 jkosh44 deleted the max-compute-credits branch April 21, 2023 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-blocker Critical issue that should block *any* release if not fixed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add max_credits_per_hour system limit
8 participants