Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[storage_gas] implement customizable storage gas config #3810

Merged
merged 1 commit into from Sep 16, 2022

Conversation

lightmark
Copy link
Contributor

@lightmark lightmark commented Sep 2, 2022

Description

We implement a customizable curve for storage-based gas base unit and map x <-[0, 10000] to y<-[0, 10000].
storage gas = min_gas + (max_gas - min_gas) * (current_usage / target_utilization)

The curve is a vector of point(x, y), both are basis points.

  1. curve's points' y value must be monotonically non-decreasing, and x values must be monotonically increasing.
  2. curve's points must start with a point of x = 0 and end with a point of x = 10000.

Test Plan

ut


This change is Reviewable

davidiw
davidiw previously requested changes Sep 6, 2022
Copy link
Contributor

@davidiw davidiw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff.

However, this is a breaking change.

We can't just delete GasParameter but we can stop using it.

We'll also need to figure out how to upgrade long lived testnet.

@lightmark lightmark changed the title [storage_gas] implement a 1/4 arc curve for storage gas params [storage_gas] implement customizable storage gas config Sep 9, 2022
@msmouse msmouse self-requested a review September 9, 2022 15:49
Copy link
Contributor

@wrwg wrwg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a compatible change. I have not seen a clean indicating on slack that we are ok with breaking changes now.

Frankly, I also think this is a lot of new code in the last minute. There are little tests with this code, or do I miss them?

/// epoch for gas calculation of the entire next epoch. -- The data is one
/// epoch older than ideal, but the Vm doesn't need to worry about reloading
/// gas parameters after the first txn of an epoch.
struct GasParameter has key, store {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't remove this without reset of the chain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's something we'll discuss with @davidiw ... Yeah, I guess I have to keep and put it to the end of the file for now tho.

Copy link
Contributor

@vgao1996 vgao1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM! Thanks for the hard work!

};
};
// Check the corner case that current_usage_bps drops before the first point.
let (left, right) = if (current_usage_bps < vector::borrow(points, i).x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could only happen when i == 0, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. it is a corner case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I wonder if it's clearer to replace the is with 0s?

Comment on lines +199 to +202
while (i <= len) {
let cur = if (i == 0) { &Point {x: 0, y: 0} } else { vector::borrow(points, i - 1) };
let next = if (i == len) { &Point {x: BASIS_POINT_DENOMINATION, y: BASIS_POINT_DENOMINATION} } else { vector::borrow(points, i) };
assert!(cur.x < next.x && cur.y <= next.y, error::invalid_argument(EINVALID_MONOTONICALLY_NON_DECREASING_CURVE));
i = i + 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you already checked the points at the two ends, you can just loop from i = 0 to (len - 2) inclusive and get rid of the special checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the points at the ends. but if doesn't tells me points[len - 1].x < end_point.x and points[len - 1].y <= end_point.y. So tho I have len points, actually we have len + 2 in total and have to compare len + 1 pairs that's why we loop i in [0, len], which has len + 1 iterations.

Point {x: 2000, y: 300},
Point {x: 3000, y: 600},
Point {x: 4000, y: 1000},
Point {x: 5000, y: 1500},
Copy link
Contributor

@vgao1996 vgao1996 Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msmouse when we discussed previously, you said something like "we want the the cost to be 1.15x of the base cost at 50% utilization". However here it's base + 0.15x the max price hike. Just to double check: which one is correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.15x the max price hike
^ is what I meant, sorry

@msmouse msmouse self-requested a review September 12, 2022 22:21
@lightmark lightmark force-pushed the storage_gas_curve branch 4 times, most recently from 14690e6 to 82e3a44 Compare September 14, 2022 21:39
Comment on lines +92 to +93
new_point(9500, 6372),
new_point(9900, 9138),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beautiful

@lightmark lightmark enabled auto-merge (squash) September 15, 2022 19:24
@@ -44,4 +46,11 @@ module aptos_framework::gas_schedule {
// Need to trigger reconfiguration so validator nodes can sync on the updated gas schedule.
reconfiguration::reconfigure();
}

public fun set_storage_gas_config(aptos_framework: &signer, config: StorageGasConfig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this extra function? Can't we just call storage_gas::set_config directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can. But I think @vgao1996 insists we control gas-related params from a single module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just call storage_gas::set_config directly?

I'm totally fine with calling storage_gas::set_config directly.

@vgao1996 insists we control gas-related params from a single module.

Well I did prefer us having a single entity to manage all gas-related stuff, but my definition for this single entity is pretty broad -- the combined group of "gas_schedule" & "storage" gas is conceptually still a single entity to me ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that being said I remember there were some technical issues preventing us from having everything in the same module in the first place, that is, modules cannot have cyclic dependencies and I have a feeling that the same issue also applies here.

Correct me if I'm wrong:

When we update the gas schedule, we want to call reconfigure(), so the module this entry point is in needs to depend on reconfiguration. However, storage_gas has an on_reconfig event so the reconfiguration module needs to depend on it. Therefore we cannot really call reconfigure() in storage_gas, but have to do it in a different module...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vgao1996 is correct about the reconfig. Yes, we want reconfig after set_storage_gas_config but we cannot do it inside storage_gas. we can only set_config inside.

*borrow_global_mut<StorageGasConfig>(@aptos_framework) = config;
}

public(friend) fun initialize(aptos_framework: &signer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not compatible with LLT as we'd not redo genesis. @areshand @wrwg Do you know if a module's initialize function would be called on a package republish if the module is brand new?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhhhh, we definitely need a way to initialize the module. But what do you mean by republish?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright. It seems I have to rename it as "init_module"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you should be able to do is add an init_module which calls this function here. This should cover the case when the module is loaded the first time during upgrade, whereas this function covers genesis.

}

public(friend) fun on_new_block(epoch: u64) acquires StateStorageUsage {
assert!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add these checks now? If this fails, it crashes the entire network

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the check doesn't fail, the next line will also fail, right?

usage: Usage,
}

public(friend) fun on_reconfig() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can delete friend-only functions now I think. cc @wrwg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me try.

}

// ======================== deprecated ============================
friend aptos_framework::reconfiguration;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just remove this then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it breaks the module_upgrade_test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird cc @wrwg. I'm not aware of friend statements being checked in compatibility validation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The friend_a_private feature is merged, but the feature isn't on by default. And it can't be turned on before rolled out, strictly spoken (because of replay).

For now any changes to friend function or module declarations will cause the upgrade to fail. But we can clean this once we upgraded testnet. So keep everything for now, remove later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it changes visibility. It took me a while to figure it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The friend M decls need to be a superset of the old module. With the new feature this check is not longer happening, but you can't use that yet.

friend aptos_framework::genesis;
friend aptos_framework::reconfiguration;

const ESTORAGE_GAS_CONFIG: u64 = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not visible to end users, but can we still add comments here explaining what these error codes are for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me it is quite self-explanatory. But I can add some.

*borrow_global_mut<StorageGasConfig>(@aptos_framework) = config;
}

fun init_module(aptos_framework: &signer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you verify that this works in the context of a package upgrade?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an easy way to do that?
is that something fn init_module() tries to do?

calculate_gas(config.target_usage, usage, &config.write_curve)
}

public(friend) fun on_reconfig() acquires StorageGas, StorageGasConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this fail? If this fails, it crashes the network

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the code triggered by reconfig will crash the network if fails. I think we avoid all the possible overflow issues. But if you can anything risky, I am glad to discuss.

@github-actions
Copy link
Contributor

Forge is running suite land_blocking on 82e3a4493d70d868862a9b7c4434f7e01e12398b

Forge is running suite compat on testnet ==> 82e3a4493d70d868862a9b7c4434f7e01e12398b

✅ Forge suite land_blocking success on 82e3a4493d70d868862a9b7c4434f7e01e12398b

performance benchmark with full nodes : 7933 TPS, 4998 ms latency, 8700 ms p99 latency,no expired txns
Test Ok

✅ Forge suite compat success on testnet ==> 82e3a4493d70d868862a9b7c4434f7e01e12398b

Compatibility test results for testnet ==> 82e3a4493d70d868862a9b7c4434f7e01e12398b (PR)
1. Check liveness of validators at old version: testnet
compatibility::simple-validator-upgrade::liveness-check : 8158 TPS, 4646 ms latency, 7300 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 82e3a4493d70d868862a9b7c4434f7e01e12398b
compatibility::simple-validator-upgrade::single-validator-upgrade : 6044 TPS, 5980 ms latency, 8000 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: 82e3a4493d70d868862a9b7c4434f7e01e12398b
compatibility::simple-validator-upgrade::half-validator-upgrade : 5920 TPS, 6279 ms latency, 8500 ms p99 latency,no expired txns
4. upgrading second batch to new version: 82e3a4493d70d868862a9b7c4434f7e01e12398b
compatibility::simple-validator-upgrade::rest-validator-upgrade : 7603 TPS, 4657 ms latency, 7000 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for testnet ==> 82e3a4493d70d868862a9b7c4434f7e01e12398b passed
Test Ok

Forge is running suite compat on testnet ==> dca00b535428b63341fbc4733f2b2c30a1973b00

Forge is running suite land_blocking on dca00b535428b63341fbc4733f2b2c30a1973b00

✅ Forge suite land_blocking success on dca00b535428b63341fbc4733f2b2c30a1973b00

performance benchmark with full nodes : 7599 TPS, 5221 ms latency, 7500 ms p99 latency,no expired txns
Test Ok

✅ Forge suite compat success on testnet ==> dca00b535428b63341fbc4733f2b2c30a1973b00

Compatibility test results for testnet ==> dca00b535428b63341fbc4733f2b2c30a1973b00 (PR)
1. Check liveness of validators at old version: testnet
compatibility::simple-validator-upgrade::liveness-check : 8270 TPS, 4524 ms latency, 8300 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: dca00b535428b63341fbc4733f2b2c30a1973b00
compatibility::simple-validator-upgrade::single-validator-upgrade : 5807 TPS, 6176 ms latency, 8300 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: dca00b535428b63341fbc4733f2b2c30a1973b00
compatibility::simple-validator-upgrade::half-validator-upgrade : 6128 TPS, 6016 ms latency, 7900 ms p99 latency,no expired txns
4. upgrading second batch to new version: dca00b535428b63341fbc4733f2b2c30a1973b00
compatibility::simple-validator-upgrade::rest-validator-upgrade : 7757 TPS, 4660 ms latency, 8800 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for testnet ==> dca00b535428b63341fbc4733f2b2c30a1973b00 passed
Test Ok

Copy link
Contributor

@movekevin movekevin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending the potential risk of storage_gas::on_reconfig() aborting. I double checked and didn't see any potential aborts beyond overflow/underflow. Would be great to have @vgao1996 and @davidiw double check this as well

Comment on lines +205 to +240
fun calculate_gas(max_usage: u64, current_usage: u64, curve: &GasCurve): u64 {
let capped_current_usage = if (current_usage > max_usage) max_usage else current_usage;
let points = &curve.points;
let num_points = vector::length(points);
let current_usage_bps = capped_current_usage * BASIS_POINT_DENOMINATION / max_usage;

// Check the corner case that current_usage_bps drops before the first point.
let (left, right) = if (num_points == 0) {
(&Point {x: 0, y: 0}, &Point {x: BASIS_POINT_DENOMINATION, y: BASIS_POINT_DENOMINATION})
} else {
let (i, j) = (0, num_points - 1);
while (i < j) {
let mid = j - (j - i) / 2;
if (current_usage_bps < vector::borrow(points, mid).x) {
j = mid - 1;
} else {
i = mid;
};
};
if (current_usage_bps < vector::borrow(points, 0).x) {
(&Point {x: 0, y: 0}, vector::borrow(points, 0))
} else {
// Check the corner case that current_usage_bps drops after the last point.
(
vector::borrow(points, i),
if (i == num_points - 1) {
&Point { x: BASIS_POINT_DENOMINATION, y: BASIS_POINT_DENOMINATION }
} else {
vector::borrow(points, i + 1)
}
)
}
};

curve.min_gas + (curve.max_gas - curve.min_gas) * (left.y + (current_usage_bps - left.x) * (right.y - left.y) / (right.x - left.x)) / BASIS_POINT_DENOMINATION
}
Copy link
Contributor Author

@lightmark lightmark Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidiw @vgao1996 Can our SMART smart contract engineers help review the code to reassure us that it will not abort in any cases?

  1. line 209 would not overflow because, line 186 checks max_usage * BASIS_POINT_DENOMINATION <= u64::MAX
  2. Please review the binary search logic is valid.
  3. Please check 239 will not overflow or underflow( such as current_usage_bps - left.x).

Some background:
in validate_curve(), we check 0 <= point.x, point.y <= BASIS_POINT_DENOMINATION, x is strictly monotonically increasing while y is monotonically non-decreasing;
max_gas * BASIS_POINT_DENOMINATION <= u64::MAX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah when I read this part of the code previously I also checked that the code would not overflow/underflow

@davidiw davidiw dismissed their stale review September 16, 2022 01:08

Unblocking... Haven't reviewed

@lightmark lightmark merged commit 29f746b into main Sep 16, 2022
@lightmark lightmark deleted the storage_gas_curve branch September 16, 2022 01:08
Comment on lines +148 to +149
let item_curve = base_8192_exponential_curve(10, 10000);
let byte_curve = base_8192_exponential_curve(1, 1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lightmark This is wrong.. the base 8192 curve is for max / min = 100x, the base 1M curve is for 1000x..

But none the less, we need to update these together with @vgao1996 's initial gas parameters. @vgao1996 if I'm reading your pending diff correctly, we should set min_price for read to be 1000, which is the value of load_data.per_byte in the current schedule, right? And according to the discussion yesterday, the max will be 100x that, if we are going with the base 8192 curve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should set both in the same governance proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And / or we should send another PR to make init_module set things to the "correct" numbers already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants