Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] New Strorage Charging Scheme #10092

Open
msmouse opened this issue Sep 18, 2023 · 8 comments
Open

[Feature Request] New Strorage Charging Scheme #10092

msmouse opened this issue Sep 18, 2023 · 8 comments
Assignees
Labels
enhancement New feature or request stale-exempt Prevents issues from being automatically marked and closed as stale

Comments

@msmouse
Copy link
Contributor

msmouse commented Sep 18, 2023

Charge for IOPS, disk bandwidth and disk space in a fair and straightforward way.

Motivation

The current storage fee structure was designed pre- storage fee. Needs refresh.

Pitch

  • Storage Fee (for disk space, charged in absolute APT value)
    • per slot fee
      • Reflects the permanent structural cost on the global state tree it imposes to allocate a new slot, i.e., the internal nodes on the tree.
      • This should be priced according to our “target tree size” instead of dynamically according to the current height of the tree, because otherwise there’s incentive to reserve space for future usage.
      • refundable, either fully or with degradation according to the life span.
    • per permanent (state) byte fee
      • for state value bytes, reflects the cost of storing state value bytes, cheaper than slot_fee / 4096, for example.
      • Charged / Refunded on state slot allocation or modification.
    • per ephemeral (ledger) byte fee
      • Transaction itself and transaction outputs, including the events, and the write set.
        • Notice that the size of each write op in the write set is the same as the latest size of the entire slot.
        • The space overhead of updating the authentication data structures falls into this as well -- each write op results in some internal nodes being overwritten on the Jellyfish Merkle Tree
      • Reflects the cost of storing (the yet-to-prune window of) the ledger history.
      • These are ephemeral bytes hence can be cheaper than the permanent bytes but needs to be expensive enough so that one can’t fill the DB by dumping bytes to the ledger history.
    • Removal of the “free quota”. The tradeoff becomes straightforward: every time a slot gets updated, the whole slot is charged for the the ephemeral bytes dumped to the ledger history.
  • Read IO Gas (for disk IOPS and bandwidth resources, priced dynamically according to the gas price)
    • per state slot read gas
      • Reflects the cost of going down the global state tree from the root to the leaf, loading all the nodes (random reads)
    • State slot bytes gas
      • roughly every 4KB- ish of bytes loaded from the state DB maps to a random IO, so the read can be charged in 4KB (or slightly smaller) steps.
      • To prevent large loads as a way to attack, the state slot size will be subject to a stricter hard limit (16KB, for example).
  • Write IO Gas (dynamically priced, same with read)
    • Mostly IO bandwidth consumption by “ephemeral bytes” described above. Although rooted from the same operations done by the node, the gas reflects the latency aspect of them while the fee reflects the disk space aspect of things.
    • Will be minimal under normal load when the gas price is low.

Additional context

AIP-17 Storage Fee
AIP-38 Deprecate Storage Gas Curves
AIP-32 Storage Deletion Refund

@davidiw
Copy link
Contributor

davidiw commented Sep 24, 2023

wen?

@lightmark
Copy link
Contributor

lightmark commented Sep 26, 2023

Removal of the “free quota”.

how to deal with current data with free quota?

@alnoki
Copy link
Contributor

alnoki commented Sep 26, 2023

Removal of the “free quota”.

how to deal with current data with free quota?

Does this mean that there will be an implicit optimal storage size?

Or will the incentive then be to cram as much data as possible into a single storage slot to avoid per-item costs?

What about incentivizing conformity to cache sizes (e.g. cache hits like in a B+ tree in filesystems)?

@davidiw
Copy link
Contributor

davidiw commented Oct 2, 2023

How does the base fee for a slot work? We have both ephemeral charge and the original value fee? Or?

per permanent (state) byte fee
for state value bytes, reflects the cost of storing state value bytes, cheaper than slot_fee / 4096, for example.
Charged / Refunded on state slot allocation or modification.

this kinda sounds like we'll have two fees, one for the persistent and another ephemeral. From there, if we increase state, we get more towards persistent, if we decrease, we get a refund?

@msmouse
Copy link
Contributor Author

msmouse commented Oct 11, 2023

how to deal with current data with free quota?

Existing slots, when updated, will be charged according to the new scheme.

Does this mean that there will be an implicit optimal storage size?
Or will the incentive then be to cram as much data as possible into a single storage slot to avoid per-item costs?

No. The optimal size will be (rightfully) dependent on how frequently you write to the slot.

What about incentivizing conformity to cache sizes (e.g. cache hits like in a B+ tree in filesystems)?

That's more relevant on the read side, which is why read is proposed to be charged at 4k steps.

How does the base fee for a slot work? We have both ephemeral charge and the original value fee? Or?

  • Base slot fee charged at allocation, reflecting the structrual cost on the SMT.
  • Ephemeral bytes charged at write time, reflecting short term disk space occupation by the ledger history (which will be pruned later)

From there, if we increase state, we get more towards persistent, if we decrease, we get a refund?

  • If I read you right, yes. If you increase the slot size, you pay more permanent bytes fee; If you make it smaller, you get some refunds.
  • "Ephemeral bytes" is, again, for the bytes in the ledger history caused by touching the slots.

@alnoki
Copy link
Contributor

alnoki commented Oct 12, 2023

@msmouse thank you for the above

The optimal size will be (rightfully) dependent on how frequently you write to the slot.

That's more relevant on the read side, which is why read is proposed to be charge at 4k steps.

Base slot fee charged at allocation, reflecting the structrual cost on the SMT.
Ephemeral bytes charged at write time, reflecting short term disk space occupation by the ledger history (which will be pruned later)

It seems like optimal space for a given slot is then a function of specific use: e.g. if I read a slot regularly and write infrequently, then I would tend toward 4k size, whereas if I write more often then reads, it changes the optimal size

Correct? What would be optimal size for frequent writing?

@msmouse
Copy link
Contributor Author

msmouse commented Oct 12, 2023

if I read a slot regularly and write infrequently, then I would tend toward 4k size, whereas if I write more often then reads, it changes the optimal size.

Exactly.

What would be optimal size for frequent writing?

A nuance here is the breakdown of the ephemeral bytes fee for a write operation: even if the slot is only 1 byte large, all the nodes along the route from the leaf node representing the 1 byte value to the SMT root will be written down. As a result, there will be a constant ephemeral bytes charge associated with an individual write operation. One can also think of it as "per slot update fee".

Then the ultimate trade-offs wrt the optimal size for frequently written to item is really down to this formula:
Each time a slot is written:
update_fee = per_slot_update_fee + per_byte_update_fee * num_bytes

Assuming most of the content of a slot gets updated pretty frequently: when roughly num_bytes < per_slot_update_fee / per_byte_update_fee * 2, it's relatively obviously favorable to keep it a single slot, because it's almost like every time you touch a slot, you automatically pay for per_slot_update_fee / per_byte_update_fee bytes overhead to begin with.

While if only part of the content is truly frequently updated, it can still be favorable to break it up to two items and enjoy some savings.

A more complicated example:

let's say per_slot_update_fee / per_byte_update_fee = 2048;

struct S {
    a: Type256Bytes
    b: Type128Bytes 
}

Let's say b only gets updated every 4 times a is updated. It's still favorable to keep them together. Because if separately, every time b is updated, we are paying for 2048 bytes worth of "per slot update fee"; while when it's kept inside of the single slot, wasting 4x 128 bytes overhead while updating a is only 512 bytes worth.

You get the idea.

(numbers will become concrete down the road)

Copy link
Contributor

This issue is stale because it has been open 45 days with no activity. Remove the stale label or comment - otherwise this will be closed in 15 days.

@github-actions github-actions bot added the Stale label Nov 27, 2023
@lbmeiyi lbmeiyi added stale-exempt Prevents issues from being automatically marked and closed as stale and removed Stale labels Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale-exempt Prevents issues from being automatically marked and closed as stale
Projects
None yet
Development

No branches or pull requests

8 participants