-
Notifications
You must be signed in to change notification settings - Fork 3.6k
DAWN-472 ⁃ Bandwidth Rate Limiting & Storage Usage Limits #353
Comments
Proposed DesignBandwidth rate limiting (BRL) will automatically limit computational usage if we assume that each message uses the maximum allowed computational time. If BRL is not part of hard consensus, then it can be implemented using any algorithm that an individual producer wants, including billing bandwidth sequentially as transactions come in, using locks, etc. Each transaction contains a number of messages and each message defines a set of provided permissions which include the account authorizing the transaction.
All of these calculations should be implemented in the producer plugin and not in the chain plugin. The only consensus state information needed for this is the current staking balance. All other information can be tracked by nodes outside the consensus process. All full nodes need to run this, not just producers. Anyone participating in P2P network needs to help limit abuse. |
Dan and I had a discussion about this issue today. Here is the summary of my understanding of what we concluded (as well as some of my later additional thoughts regarding the database memory usage limit design that we discussed): Bandwidth and Computation Rate LimitingTransactions are received from the network and put inside a ring buffer to later be scheduled into cycles and processed by other threads. But prior to being placed in the ring buffer, the transaction must go through a filter (let us denote that filter the "bandwidth/computation filter"). The bandwidth/computation filter maintains its own database (this is a bandwidth/computation quota database) separate from the databases holding contract state that are accessed during WebAssembly execution. The bandwidth/computation filter receives transactions from the network and then does a check / quota update for each of the accounts to be billed for bandwidth and computation usage of that transaction (these are the accounts that authorized the transaction). The check / quota update consists of:
If any of the limits are exceeded that transaction is rejected right there. Otherwise, it moves on to the next stage: placed in the ring buffer to later be scheduled and processed. To calculate a new resource consumption value assuming the worst-case, we need to determine what the worst-case values should actually be for both resource types in any given block. For the sake of easier implementation of the later resource consumption value updating stage, it would be helpful if these worst-case values were the same constants for all transactions. Ideally, they would remain constant over time as well, but it is still reasonable if they are only constant within any given block. The worst-case value to use for computation rate limiting is the maximum time that a contract is allowed to execute; so, for example, that might be 1 ms. The worst-case value to use for bandwidth rate limiting determines the maximum allowed size of a processed transaction. On one hand, one would wish that it not be too small, so that useful transactions are not rejected. On the other hand, larger worst-case values mean that the user needs to keep a larger buffer in their quotas to be able to transact at all, even if the transaction they broadcast will end up (even after being processed) remaining a very small size compared to the maximum allowed processed transaction size. During the later stages of the transactions lifetime, the transaction will ultimately either be rejected for a variety of possibly reasons or be accepted into a block. In either case, that transaction (which, if not rejected, is now a larger processed transaction which also includes metadata of how long the block producer claimed the contract took to run) needs to return back to the bandwidth/computation filter for the resource consumption value updating stage. This updating stage reduces the resource consumption values (for each of the relevant accounts) to reflect the fact that the transaction did not (at least in most cases) actually use the maximum amount of computational resources (e.g. contract execution took less than 1 ms of CPU time) or bandwidth resources (i.e the processed transaction size is less than the maximum processed transaction size). And in the case of transaction rejection, the values are updated by subtracting the worst-case delta values that were added earlier. This stage is in place to be fair to the users and not bill them at worst-case rates if they did not use up the resources to that level (or to not bill them at all if the transaction was rejected). Database Memory Usage LimitsLimiting database memory usage works very differently compared to the bandwidth and computation rate limiting described above. The current memory usage of a contract would be stored in the regular database of the contract's account that is locked whenever access is needed to it during WebAssembly execution, rather than using a separate database as bandwidth and computation rate limiting do. EOS.IO database management code could then simply reject transactions that try to use more memory than they are allowed according to their current staked EOS tokens. The justification for this design comes from the following facts. First, the actual net change in a contract's database memory usage is not known until after the contract completes execution. Second, rejecting a transaction for using too much database memory requires at least two pieces of information: the amount of database memory the contract is using after contract execution and the number of EOS tokens staked in the contract account. The staked tokens information is stored in the regular database of the contract's account so that the The negative consequence to that design (which I don't like but I haven't yet thought through other possible designs to see if they could actually have better performance than this design) is that anytime a contract needs to increase or decrease memory usage (even by a single byte) it needs to include the contract account as part of its required read/write scope. At least with some contracts, this isn't a big deal in terms of performance. A transfer transaction for a currency contract would only need to require the I do think this is a slight downgrade in terms of the user experience. While the required scopes could be automatically determined by running the user's transaction in a simulated node, this would not be a perfect process because it would be based on old blockchain state. For example, Alice may run her transfer from her Perhaps the currency contract designer will choose to not remove database objects even if an account's balance goes to 0. In this case the required scope determined by the simulated node may be over-conservative (requiring the But the point is that this design of database memory usage limits requires contract developers to make these sorts of trade-offs. Also, the previous example only looked at a currency contract. Other contracts may have more difficulty avoiding memory usage changes with most of their transactions. For example, it is highly unlikely that most transactions for an exchange contract will lead to no net change in memory usage (bid/asks need to be created and zero or more of existing bids/asks may need to be removed as orders are filled). In the case of an exchange contract, this design adds no additional overhead because all the transactions require the exchange account in the scope anyway (bids/asks are stored within the database of the exchange account). But there may be many other contracts we have not thought of that will be burdened with much less concurrency (and thus less transaction throughput) because of this design of database memory usage limits. |
Database Memory AccountingCurrency Contract - on creation of new balances or emptying of balances SolutionsMake the billing flexible so that developers can pass the expense (and locking requirements) on to their users. In this case the transaction would declare which scope to bill for storage. It can either be the "authorizing account" or the "contract account" as these are the two parties to the execution of the message. With this arrangement, an escrow contract could ask the escrow-creator to pay the cost of storage. The creator would know the storage will be freed at the end of the transaction and therefore it will be a net cost of 0 EOS to use the escrow contract. Under this arrangement the "voter" or "poster" would cover the cost of creating the storage for their vote, but again they could recover the cost when the post or vote expires. This technique gives contract developers the greatest flexibility, but may push the problem back to user experience in either requiring temporary holding of EOS or requiring locking of the contract scope. Note about Design HistoryOriginal design of EOS required all contracts to keep their state within their own scope and thus the billing / scoping / parallelism constraints were not a problem. This added design complexity around storage billing is a result of a breakthrough in enabling more scalable apps via user-local storage rather than contract-side storage. Our goal is to keep the performance benefits of user-local storage without abuse-prevention accounting getting in the way. |
Some random thoughts on how to implement database memory accounting efficiently: The Relevant Data
Relevant Actions
Required Invariants
Assumptions
Atomic AlgorithmThe biggest problem with our parallel algorithm is the UNDO database which must accurately revert the transaction if it fails. We cannot use the traditional approach to "backup data before modifying it" because that would be a sequential process (multiple transactions attempting to do this at the same time would be a mess). Instead, we need atomic update and atomic undo.
The key to making this work is a viable undo infrastructure that will accurately revert the used_memory without requiring locks on this field. To do this I imagine a new kind of database table that does not use the current UNDO infrastructure. Instead it depends upon each thread creating a stack of "atomic do" and "atomic undo" operations. Furthermore, it requires that the result of reading data modified by these atomic operations is only possible "while producing" and that no other side effects except "success | failure" are possible based upon the result of reading said memory. If we maintain this pattern and we have three threads accessing total_used property such that not all 3 can be included at once then we have the following guarantee: none, A, B, C, AB, AC or BC will be the only valid acceptance. This works because we always increment first, then check, then decrement and fail. At the time we check we are over-optimistic in used space. When we fail we unwind. Worst case scenario is all 3 add, then all three check, then all 3 unwind. This will happen when storage is near full. The above is only true when contracts attempt to increase storage. If one of the messages decreases storage then when it unwinds total storage use could increase again. This means that after all transactions either succeed or fail the total used might be greater than the amount allowed. In this case the block producers can freeze the account until they fund it with more capacity. Letting it slide could open an attack where they intentionally "fake a delete". This process will work for deciding to increment or fail the state of this field, but it does not allow other threads to access the data. For example, the currency contract could not reduce the available capacity so that it could claim tokens while this is going on. Other options:
|
The atomic algorithm above can get quirky and may impact overall performance by forcing cache syncs. To avoid atomics all together each cycle will need to be generated in phases:
This multi-step process makes the cost of producing a block more expensive in the failure event (forcing an undo of a cycle), but should result in relatively simple execution path in the nominal case where it can be pipelined. Transactions that can assert 0 changes in storage usage are an order of magnitude more efficient because they can be scheduled without the need to sum deltas or the potential for undoing. In fact, transactions that can assert that they will only reduce storage usage can enable potential optimizations by eliminating the potential for failing all transactions. |
Conclusions:
|
Should we limit the creation of account name too? Without limitation, good names will soon be exhausted. |
Is there any ATC for this? |
new test cases added for ram_limits, weighted_cpu_limits, weighted_net_limits (#2450) |
Types of Consensus
Resource Types
Measuring Resources
Who to Bill
In this case billing applies to which account has their resource usage adjusted.
Challenges
The biggest challenge with all resource usage is to perform these calculations in a way that does not force everything into a single thread. The second biggest challenge is how to build blocks incrementally while doing resource calculations in parallel.
Potential Solutions
Goals
From a platform perspective these goals are conflicting. If applications cover the costs then applications must implement their own rate limiting algorithm to prevent abuse and that also means the rate limiting is part of the hard consensus.
Assumptions
High Level Approach
Principles
Each write-scope implies a single-threaded throughput limit and maintains its own independent accounting of how many computation units an individual user has consumed in that scope.
Note: this issue is still under development and will need to be defined in greater detail before work can begin.
The text was updated successfully, but these errors were encountered: