Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf investigation: KREAd, ERTP, NFT Purses, Zoe #8862

Open
warner opened this issue Feb 4, 2024 · 5 comments
Open

perf investigation: KREAd, ERTP, NFT Purses, Zoe #8862

warner opened this issue Feb 4, 2024 · 5 comments
Labels
bug Something isn't working chain-incident ERTP package: ERTP performance Performance related issues Zoe Contract Contracts within Zoe Zoe package: Zoe

Comments

@warner
Copy link
Member

warner commented Feb 4, 2024

Summary: ERTP's large-cardinality NFT Amount handling is O(N) expensive, and KREAd triggers it

I spent some time investigating the mainnet slowdown on tuesday 23-jan-2024, specifically the KREAd MintCharacter wallet action in block 13,499,082 . The packages/SwingSet/misc-tools/monitor-slog-block-time.py, when fed the slogfiles of a follower (in this case my follower named agreeable), prints one line per block, with inter-blockTime and some swingset data:

  13499080    6( 8.9)   9.867 ->    0( 69.5)          0  0.002( 1.638)  +   0.004 =   0% ( 9.6) [ 75/  0/ 75] r0 (CFC45F)
  13499081    9( 9.1)   6.866 ->    0( 69.2)          0  0.001( 1.138)  +   0.003 =   0% ( 7.4) [ 97/  2/ 99] r0 (5F329F)
  13499082    6( 8.0)  11.994 ->  674(100.5)  194136238 15.471( 1.256)  +   0.003 =  58% ( 7.4) [ 99/  0/ 99] r0 (E29E61)
  13499083   11( 7.6)  30.321 ->    4( 88.3)  137700412 10.916( 1.681)  +   0.002 =  44% ( 8.8) [ 66/  0/ 66] r1 (E29E61)
-    block  blockTime    lag  -> cranks(avg) computrons  swingset(avg)  +  cosmos = proc% (avg) [sigs2/3/tot] rounds proposer
  13499084   29( 7.7)  21.861 ->   50( 90.8)  193691028 14.165( 2.389)  +   0.004 =  59% (11.8) [ 75/  0/ 75] r1 (23E1B0)
  13499085   21( 8.4)  29.981 ->  292(105.4)   13829768  0.986( 2.439)  +   0.004 =   6% (12.1) [ 75/  1/ 76] r0 (BB5E2C)
  13499086   29( 9.6)   4.392 ->    0(105.4)          0  0.001( 2.439)  +   0.002 =   0% (12.1) [ 78/  0/ 78] r1 (BB5E2C)
  13499087    3( 9.4)   7.504 ->    0(105.1)          0  0.001( 2.438)  +   0.002 =   0% (12.1) [ 77/  1/ 78] r0 (996453)

which indicates that we had elevated block times (11s/29s/21s/29s, instead of the usual 6s), four non-empty blocks in a row, and about 25 validators fell behind (they were unable to vote in time to be counted). It took about ten blocks for them to catch up.

Slogfile Investigation

Feeding the same slogfile into my classify-runs.py tool (which will eventually land in packages/SwingSet/misc-tools/) summarizes this action as:

{"classification": "kread-mint-character", "runids": ["b13499082-r5", "b13499083-r0", "b13499084-r0", "b13499085-r0"], "computrons": 493433524, "elapsed": 37.98}

This action required a total of 493,433,524 computrons, which exceeds our currently configured maximum of 65Mc/block, so it spilled over into multiple blocks. The action started as the 6th run (r5) of the first block, consumed the second and third entirely, and finished in the fourth. The four runs combined required 37.98 seconds of SwingSet computation time (on the follower from which I took this sample). The four blocks, including actions that shared the first and last blocks, looked like:

action runID computrons seconds
push-price b13499082-r1 15,282,850 1.0311628211056814
push-price b13499082-r2 15,266,286 1.2485291080083698
push-price b13499082-r3 15,254,469 1.2618149660993367
kread-mint-character b13499082-r5 148,328,225 11.928104420076124
kread-mint-character-continuation b13499083-r0 137,700,412 10.915490992018022
kread-mint-character-continuation b13499084-r0 193,691,028 14.165184138924815
kread-mint-character-continuation b13499085-r0 13,713,859 0.9705590480007231
vbank-balance-update b13499085-r2 115,909 0.01444947812706232

I used the same tool to scan for all kread-mint-character actions in the chain's history, and noticed that the number of computrons is growing nearly linearly with the number of characters minted:

kread-mint execution computrons by event

classify-runs.py also extracts the slogfile for the four combined runs. From this I found that the first block was ended by a 137Mc mintAndEscrow() delivery to v9-zoe (crankNum c44159622), causing the block to finish with a total of 148,328,225 computrons (i.e. it spend about 11Mc in smaller deliveries first, then the large mintAndEscrow() finished, then the runPolicy scheduler observed the computron limit was exceeded and ended the block, leaving the remaining work for the next one). On my follower, this delivery took 10.95 seconds.

The second block was ended by c44159626, which was another 137Mc mintAndEscrow(), taking 10.84 seconds. The third was ended by c44159676 which was a 153Mc exitSeat() to v9-zoe that took 12.03 seconds.

I fed these three deliveries into my find-slowness.py tool to identify gaps between slog events that exceed one second. This reveals bursts of computation that happen between the vat worker's receipt of the results of one syscall, and its emission of the next syscall. We can use this to deduce where acute CPU effort is being spent. It would not help us identify chronic CPU usage (small delays interspersed over a large number of syscalls), however none of these deliveries had a significant number of syscalls: they had 113/113/116 syscalls respectively.

The mintAndEscrow() deliveries both had gaps of 2.5s, 5.0s, and 3.2s , the first of which occurred immediately after a vatstoreGet() of a VOM state record (v9 o+d391/2) that was unusually large, about 200kB. This virtual/durable object turns out to be a Purse, of the KREAd CHARACTER Brand, whose Amount contained 380 separate records. So this purse contains 380 NFTs, and our code to manipulate these sorts of balances turns out to be expensive, and also O(N) in the number of records/characters contained therein. The act of minting a character needs to interact with this large Purse several times (described below), and the elevated computron and elapsed time consumed are a result of that expense. The surprisingly expensive operations include:

  • AmountMath on the large balance: add, subtract, isGTE
  • marshalling (toCapData, fromCapData) needed by liveslots to fetch/store the VOM state
    • multiplied by the number of deliveries that touched it, since we flush the VOM state cache at end-of-crank
  • stateShape checking on both serialization and unserialization in the VOM state setter/getter
  • typeGuard checking of the Amount as it is passed into various functions

Building a Benchmark

To investigate further, I adapted the KREAd test-minting.js unit test into a benchmark tool. I removed the error-case-testing portions and stripped out everything but the minting of a new character. I also increased the number of BaseCharacters (since each mint operation consumes one), and added a configurable loop to mint multiple ones. I added a bunch of console.time()/timeEnd() instrumentation, and enabled V8's profiling during the final loop, so we could get a flamegraph of most expensive pass.

diff --git a/agoric/contract/test/bootstrap.js b/agoric/contract/test/bootstrap.js
index 6607e265..08db083a 100644
--- a/agoric/contract/test/bootstrap.js
+++ b/agoric/contract/test/bootstrap.js
@@ -76,6 +76,16 @@ export const bootstrapContext = async (conf = undefined) => {
     minUncommonRating: 20,
   };

+  if (conf?.COUNT) {
+    const { COUNT } = conf;
+    const firstChar = defaultCharacters[0];
+    for (let i=0; i<COUNT-1; i++) {
+      const newChar = [firstChar[0]+i+1, {...firstChar[1]}];
+      defaultCharacters.push(newChar);
+    }
+    assert.equal(defaultCharacters.length, COUNT);
+  }
+
   // Start governed contract instance
   const { governorFacets } = await setUpGovernedContract(
     zoe,

To get results that are closer to how XS/xsnap would work, I made some additional changes:

  • at @mhofman's suggestion, I run the benchmark with LOCKDOWN_OPTIONS='{"__hardenTaming__":"unsafe"}' node test/bench-minting.js , because XS has a native harden(), so we aren't unfairly penalizing V8 for lacking one
    • I believe this replaces harden() with a NOP
    • XS would still need to harden() a lot, so this might overcompensate
  • I edited node_modules/@agoric/internal/src/storage-test-utils.js to make makeMockChainStorageRoot pass through options to makeFakeStorageKit:
--- /home/node/trees/agoric-sdk/packages/internal/src/storage-test-utils.js	2024-01-22 18:49:18.302358483 +0000
+++ ../node_modules/@agoric/internal/src/storage-test-utils.js	2024-02-01 23:20:29.845248835 +0000
@@ -206,8 +216,8 @@
 harden(makeFakeStorageKit);
 /** @typedef {ReturnType< typeof makeFakeStorageKit>} FakeStorageKit */

-export const makeMockChainStorageRoot = () => {
-  const { rootNode, data } = makeFakeStorageKit('mockChainStorageRoot');
+export const makeMockChainStorageRoot = (options={}) => {
+  const { rootNode, data } = makeFakeStorageKit('mockChainStorageRoot', options);
   return Far('mockChainStorage', {
     ...bindAllMethods(rootNode),
     /**

so I could have the KREAd test bootstrapjs override the default of sequence: true:

diff --git a/agoric/contract/test/bootstrap.js b/agoric/contract/test/bootstrap.js
index 6607e265..08db083a 100644
--- a/agoric/contract/test/bootstrap.js
+++ b/agoric/contract/test/bootstrap.js
@@ -55,7 +55,7 @@ export const bootstrapContext = async (conf = undefined) => {
   const installation = await E(zoe).install(contractBundle);
   const privateArgs = {
     powers: {
-      storageNode: makeMockChainStorageRoot().makeChildNode('thisElectorate'),
+      storageNode: makeMockChainStorageRoot({ sequence: false }).makeChildNode('thisElectorate'),
       marshaller: makeFakeBoard().getReadonlyMarshaller(),
     },
     clock: timerService.getClock(),

because sequence: true turns the O(1) chain-storage set() calls into O(N) append() ones. The real chain only remembers the latest value for each chain-storage cell, whereas storage-test-util.js was designed to support unit test cases that benefit from seeing all historical items too.

Both the KREAd code and agoric-sdk utilities use non-immediately-awaited Promises, which makes CPU profiling more difficult: if A triggers an expensive B, but does not await it before continuing on to C, the flamegraph may make it look like C is the source of the expense, not A+B. So I inserted a bunch of await calls which aren't strictly necessary for correctness, but helped tremendously for performance analysis. I also inserted stall() calls (which is just a loop that does await null twenty times in a row) in places where I suspected remaining overlap (code which doesn't await at all, where a synchronous function fires off an async call and doesn't give the caller anything to synchronize upon).

Note that liveslots flushes the VOM state-data cache at the end of every crank, to reduce GC-influenced nondeterminism. This requires us to re-load and re-unserialize the state data of any virtual object the first time it is referenced in any crank. In contract, the fakeLiveslots environment never ends a crank, so it never flushes this cache, so it never needs to unserialize state objects. This means the V8 benchmark run will lack some expensive deserialization time which actually exists on the mainnet (liveslots/XS) slog traces.

Finally, the V8 run has an expensive initial call to getRandomBaseIndex, which is an artifact of how the fakeLiveslots environment is built. This call produces a list of all 800-ish BaseCharacter keys, in order, so it can select a pseudo-random one (although, really, the keys are integers, so it ought to be able to do this in O(1) without enumeration). The enumeration performs a series of vatstoreGetNextKey() calls. On the real chain, where swing-store is backed by SQLite, this uses a normal DB index and is fast. In the fake environment, the vatstore is backed by a plain Map (which, being JS, enumerates in insertion order), plus a side Array to track the sorted order. The act of re-sorting that side array (too much) cause a perf hit that does not represent what the real chain would do. So we ignore that initial burst of activity when looking at the data.

Benchmark Results

The resulting flamegraph, for a single mint operation, looks like this:

screenshot annotated9

Eventually, I was able to correlate the long CPU times in the (V8 benchmark) flamegraph with the long gaps in the (real-chain XS) slogfile record, and deduce the sequence of events that took place. As a rough measure,

The real-chain XS operations (with 380 characters in the pooled purse) took about 95x the time needed by the V8 benchmark (with 800 characters). Compensating for the 380-vs-800, that makes the chain about 200x slower than the benchmark. This is a combination of my benchmark machine being faster than my follower, the XS engine being slower than V8 (no JIT), syscall overhead, and work done by XS that was not done by the benchmark (like deserializing the large purse on every crank).

How KREAd Works

First, some background on how the KREAd contract operates (as reverse-engineered by me; I may have it entirely wrong). It manages "Characters" and "Items", which can be owned and traded by users. Characters can also be "equipped" with Items, basically causing the item to be owned-by / associated-with the character, rather than a user. There are a limited number of Characters that can be minted (about 800 I think), of which about 380 have been minted so far. Minting a character costs a fee, which is split into "royalty" and "platform fee", and deposited into two depositFacets. The newly-minted character is automatically equipped with three Items. The equipped Items of each Character are held in that character's "Inventory".

At startup, the KREAd contract asks ZCF to create a zcfMint for two denominations (KREAd CHARACTER Brand and KREAd ITEM Brand). This causes v9-zoe to creates an ERTP Issuer for each, with a paymentLedger (mapping empty Payment objects to their balance), a recoverySet for those payments, and a mintRecoveryPurse to hold the recovery set for the mint-generated Payment objects. We can use the syscalls that manipulate these tables as a way to trace the execution inside v9-zoe from just the slog events.

Internally, each character's inventory is managed by a separate inventorySeat: a regular Zoe/ZCF Seat. This seat is created for each character when the character is minted, held internally by the contract (in a table indexed by the character name), and never exited or deleted.

Equipping an Item causes the item to be allocated to the inventorySeat, just like how fees are allocated to their recipients. ZCF/Zoe know about all the current allocations, but holds the assets in question in a zoe-side escrow purse until the seat is exited. Zoe creates a single shared localPooledPurse for each denomination, and merges all compatible amounts into that same purse, just like a regular bank pools all fungible customer deposits into a single vault and just uses record-keeping to track who owns what.

Users are only allowed to equip items they own, to characters they own. To demonstrate this authority safely, and to work around some limitations of Zoe (no wash trades), KREAd uses an unusual pattern. When initially minting the character, it actually creates two character Amounts, which I'll call the A character and the B character (think of this like the main actor and their nearly-interchangeable stunt double, or the two buffers in a double-buffered graphics rendering pipeline). These Amounts differ in only a single field: A gets { keyID: 1 } while B gets { keyID: 2 }. The user gets back A: it is allocated to the seat that triggered the mint() handler, which will be exited by the contract, and then the user will call getPayouts() and will get a Payment for character A, which they'll deposit into their own Purse. But character B is allocated to the newly-created inventorySeat, along with the three newly-minted Items. It sits there for an indefinite time (until an equip or unequip action occurs).

The protocol for equipping an item to a character requires the user to determine which A/B character Amount the user holds, vs which one is in the inventorySeat. If the user holds A, then the offer they submit looks like:

  • give: character A, Item
  • want: character B

The contract the tells ZCF to transfer the inventorySeat's charB back to the user, and to transfer charA and the item into the inventorySeat. If the user held B, then these are reversed: the user gives character B and the item, and wants character A.

So now that 380 Characters have been minted on the chain, there are 380 "A" amounts, and 380 "B" amounts. For each pair, one of the two is owned by some user purse (held only by vat-wallet), and the other is allocated to a unique inventorySeat (held in the character table), but owned by the localPooledPurse in v9-zoe (held on behalf of the contract, to escrow all assets currently allocated to all open seats).

Note: if Zoe allowed wash trades (which we considered, but decided against because it's more likely to be an error than a legitimate use), the protocol could instead be "give A+Item, want A", and KREAd wouldn't need this A-vs-B duplication. Or, if we had better notions of "use objects" (eg @erights' recent "owned amounts" work), then equip() would be a method on an object only available to the current owner of the character, and perhaps wouldn't need a Zoe offer to execute.

Performance Issue

Anyways, the performance issue arises from these B characters and where they are held. The collection of open inventorySeats hold on to 380 character records, all of which are held by a single zoe-side localPooledPurse.

Zoe's escrow service was designed to be short-lived (as were seats in general: it makes sense to hold a seat open until a trade/offer is fulfilled, but they aren't meant as a long-term storage tool). And it was optimized for a moderate amount of things being escrowed. Zoe merges all escrowed assets of the same Issuer in a single localPooledPurse. For fungible tokens, this is an obvious performance win: adding two IST or BLD amounts is trivial, and the result requires only a single variable to track.

However for non-fungible tokens, the ERTP Amounts are copyBags. These nominally behave like a Map whose keys are character records and whose values are integers (i.e. 2 apples and 3 oranges gets you { apple: 2, orange: 3 }. (If the amounts are unique, it could use a copySet, which is like a copyBag where the values are always 1, but it wouldn't help here). The AmountMath code provides the same operations for these Amounts as it does with plain numbers: you can add two amounts, subtract them, query isGTE, etc. But these operations are a lot more complex than adding/subtracting/comparing numbers: they must do Set math, where "add" is really a union, and "isGTE" is really a form of intersection. So they take significantly longer, and this expense grows with the number of items in the Amount.

So when Zoe is told to escrow a newly-minted A character and allocate it to the user's seat, it must add this value into the existing localPooledPurse for Characters, which already has a B character for every Character that's ever been minted (380 on mainnet). In my benchmark run, on V8, when there are already 800 characters in this purse, it takes 70ms to perform this add.

Much of this time is spent sorting the entries into a stable order, and asserting that they are all of the right type, to meet the requirements of a copyBag (which exists because a normal Map cannot be serialized into durable storage, or used as a key in a table). There are a tremendous number of passStyleOf() calls in this portion of the flamegraph.

In addition, the subsequent updatePurseBalance requires Zoe to change the state.currentBalance of the localPooledPurse virtual object, which requires serializing this many-valued Amount. In the benchmark run, this needs 35ms just to check that the value matches the declared stateShape for the property in question, and then another 10ms to perform the serialization (our userspace marshalling code is not nearly as fast as a native JSON implementation).

All of this work is what consumes 137Mc in the zoe mintAndEscrow() delivery, and ends the first block. In the V8 benchmark run, we can see the mintAndEscrow() function takes 118ms, out of which add takes 70ms and updatePurseBalance takes 45ms (leaving 3ms for other small stuff).

mintAndEscrow

In the mainnet slog record, we cannot directly tell which function is being called when, but from the syscalls we can deduce some dividing points and correlations: The total delivery time for the first mintAndEscrow (the last delivery of the first block) was 10.051s, which is broken down into three big parts:

  • 2.68s: balance deserialize/checking
  • 4.95s add()
  • 3.24s: updatePurseBalance() (checking/serialize)

It must do this all again for the B character, taking about the same amount of time, which ends the second block. At this point, both A and B characters are in the pooled purse.

The third block is ended by the zoe.exitSeat() method spending similar time withdrawing the A character from the pooled purse, so it can be transferred to a Payment for the user. This does a subtract instead of an add, preceded by an isGTE() check that is not necessary in the deposit case.

withdraw

The fourth block contains the remaining work, like depositing the royalty/platform fees, which are fast because IST/BLD/etc is entirely fungible. On chain, this took just under a second (b13499085-r0 took 13.7Mc and 970ms).

(The V8 benchmark flamegraph has an artifact, the long deposit section at the end, which is where the benchmark test is depositing the A character into the test user's purse. This is slow because the test only has a single user, so that Purse already contains 800 characters. On chain, most users have only one character each, so their purses are small, and when the wallet deposits the character in the user's private Purse, the time to add/subtract/etc is insignificant).

Directions To Pursue

This leads to a number of interesting questions, and ways we might improve the situation. First, the concerns:

  • execution time for KREAd operations that touch the Character localPooledPurse are slow, and will get slower as more characters are minted
    • this includes mint (3 expensive things: add+add+subtract), equip/unequip (2 expensive things: add+subtract), and buy/sell operations (probably one add or subtract each)
    • at the current speeds, this causes about about 90s of slow blocks, and about 7 blocks where >15% of validators are too slow to be counted
    • this will probably get linearly worse, up to a maximum when all 800 baseCharacters are minted (so 800/380= 2.1x worse than the current 90s slowdown)
    • it will also get worse for each Character currently offered for sale, since these will live in outstanding seats until the trade is matched or cancelled. expense = k * (numMinted + numForSale)
  • reducing the 65Mc computrons-per-block limit won't help to split this into more blocks; the limiting factor is the indivisible 137Mc/153Mc deliveries. And anyways four blocks is already too many for a single operation.
  • while we see no obvious reason that v63-kread is not upgradable (it uses baggage well), we do not have tests to demonstrate that it can be upgraded safely, so any contract-side fixes that might improve this behavior may be hard to deploy, first requiring engineering work to make the contract fully upgradeable
  • I only investigated the Zoe pooled escrow purse for Characters. There's probably another one for Items, and as far as I know we could wind up with a lot of them too, which might be a second instance of the same problem
  • this time is spent in v9-zoe, not in v63-kread, so our runaway-execution defenses need to avoid pausing or terminating v9-zoe, and our charge-for-execution plans need a way to assign the costs to v63-kread
    • if we enabled per-delivery hard execution limits in xsnap, we might run the risk of killing zoe

Then, potential fixes we might pursue.

  • liveslots: remove the checkStatePropertyValue assertion on vatstore read (in the state getter), under the theory that we will never write non-conformant state data into the vatstore on the setter side. This would save about 20% of the mintAndEscrow() time.
    • requires a zoe upgrade to deploy
  • liveslots+endo: speed up passStyleOf by giving it a real WeakMap, instead of the VOAwareWeakMap that is imposed on all userspace code
    • this requires special coordination/authority-sharing between the liveslots layer and the code it hosts
    • requires a zoe upgrade to deploy
  • ERTP: replace the Purse with an NFT-specialized one that uses a MapStore instead of a copyBag
    • this requires relaxing MAX_DBKEY_LENGTH in collectionManager.js, so the full (serialized) character record could be used as keys of the MapStore (increase MAX_DBKEY_LENGTH in liveslots, to allow complex keys in virtual collections #8864)
    • requires a zoe upgrade to deploy
    • requires some sort of upgrade-Purse-in-place remediation scheme to be useful for the existing KREAd contract
    • preferably the upgrade-in-place code would remove all values from the large Purse balance simultaneously, otherwise the removal process would be accidentally quadratic and could take thousands of seconds
  • KREAd: rewrite the contract to hold a separate Character and Item Purse for each character, and hold the equipped Items in that Purse, instead of in an open Seat
    • requires a KREAd contract upgrade to be useful
    • requires replacing all the inventorySeats as remediation, perhaps one character at a time (as user actions touch them)
  • KREAd, Zoe: change Zoe to allow wash trades, then change the KREAd equip() protocol to return the same character
    • change the dapp frontend to stop using A/B characters, set give.CharacterKey = want.CharacterKey
    • delete B character from inventorySeat as remediation
    • requires Zoe upgrade, KREAd contract upgrade, coordinated dapp frontend upgrade
    • doesn't help the localPooledPurse add/subtract costs: the user's Character is still escrowed/withdrawn
  • Endo: maybe memoize checkBagEntries() or checkNoDuplicateKeys()
    • add() does it once, maybe set() then does it again on the same object
    • requires zoe upgrade to deploy
  • consider disabling internal consistency checks in production
    • requires zoe upgrade to deploy
  • deep features: the equip() protocol cries out for "Use Objects" or "Owned Facets" like what @erights is working on, where the right to sell a Character is semi-independent of the right to talk to it (i.e. tell it to equip an item)
    • if we had that, equip() would be a method on the Character object (or it's use-object facet, or something), rather than its own contract-level invitation
    • equip() would transfer ownership of the Item from the user to the character itself, in an internal Purse
    • so there would be no large (escrow) Purses that hold all characters, or all items
    • this would require new features, changes to ERTP and liveslots, exos, the equip() protocol, and upgrades of zoe and contract vats, as well as new frontend dapp to speak the new protocol

And ways in which this should inform other efforts:

  • can we add a benchmark test to the contract-readiness checklist to discover problem like this before deployment, so governance voters can make an informed choice?

    • perhaps the checklist could call for simulation of a year of contract operation, and observe what kind of load it would cause, and most importantly discover slowdowns/expenses that only arise after significant accumulation of data
  • the KREAd code is doing something unusual (holding Seats open forever) but not malicious, how can we build an execution-fee model that correctly charges for this?

    • especially when the actual expense is happening in v9-zoe, shared by all contracts
    • @erights suggested having Zoe maintain a separate escrow Purse per contract instance (times denoms), instead of sharing the denom's Purse across all instances, to enable zoe to perform space-usage accounting more accurately
    • maybe v9-zoe could notice that it's spending too much time/computrons for one particular instance, and then refuse to do any more work for it (breaking, perhaps temporarily, the contract)
      • would require exposing computrons to userspace, which I've been reluctant to do (information leakage), but maybe vatPowers.getComputronsUsed() would be appropriate

cc @Chris-Hibbert @erights @dckc

@warner warner added bug Something isn't working ERTP package: ERTP Zoe package: Zoe performance Performance related issues Zoe Contract Contracts within Zoe labels Feb 4, 2024
@dckc
Copy link
Member

dckc commented Feb 5, 2024

  • ERTP: replace the Purse with an NFT-specialized one that uses a MapStore ...
    • requires some sort of upgrade-Purse-in-place remediation scheme to be useful for the existing KREAd contract

I think it's plausible that we can make new purses and transfer the assets, without any in-place-upgrade magic.

Purses tend to be closely held. I think the walletFactory is the only importer of any purses:

  • vbank purses are fungible and short-lived, created afresh for each offer.
  • invitation purses and KREAd purses: walletFactory could voluntarily make new ones and transfer the assets.

earlier discussion: #8861 (comment)

warner added a commit to agoric-labs/KREAd that referenced this issue Feb 5, 2024
@warner
Copy link
Member Author

warner commented Feb 5, 2024

https://github.com/agoric-labs/KREAd/tree/warner/8862-benchmarking has my benchmarking code in the last commit. cc @erights

@warner
Copy link
Member Author

warner commented Feb 5, 2024

  • ERTP: replace the Purse with an NFT-specialized one that uses a MapStore ...

    • requires some sort of upgrade-Purse-in-place remediation scheme to be useful for the existing KREAd contract

I think it's plausible that we can make new purses and transfer the assets, without any in-place-upgrade magic.

Purses tend to be closely held. I think the walletFactory is the only importer of any purses:

  • vbank purses are fungible and short-lived, created afresh for each offer.
  • invitation purses and KREAd purses: walletFactory could voluntarily make new ones and transfer the assets.

earlier discussion: #8861 (comment)

Yeah, I think the only Purse with a non-trivial size is Zoe's internal localPooledPurse, which holds all the Characters that are escrowed for open seats. Because this issuer is a zcfMint, this Purse lives in vat-zoe, not vat-kread, and is never exposed outside that vat (although it's probably held in zoe's baggage too, for benefit of upgrade).

If the ERTP change is to implement a new more-efficient Kind for new Purses (noticing the AmountMath is non-fungible and using a MapStore for state.currentBalance), then on each upgrade (and for each zcfMint issuer), Zoe could ask ERTP for a new empty Purse, get one of the latest Kind, and drain the old one into it. O(numIssuers) work on each Zoe upgrade.

A wallet user who acquires a few hundred characters (or Items; I haven't investigated them very much) could have the same problem.. I'm not sure where to stand to have the wallet decide that it should ask the zoe-based Issuer for a replacement Purse.

@dckc
Copy link
Member

dckc commented Feb 5, 2024

I'm not sure where to stand to have the wallet decide that it should ask the zoe-based Issuer for a replacement Purse.

in an upgrade of walletFactory, during the 1st crank, much like...

for (const wallet of walletsByAddress.values()) {
wallet.repairWalletForIncarnation2(upgradeToIncarnation2Key);
}
baggage.init(UPGRADE_TO_INCARNATION_TWO, 'done');

@Chris-Hibbert
Copy link
Contributor

If there's a desire to, we could make the KREAd contract a lot more performant.

  • We expect there to be long-lived seats. We use long-lived seats to hold onto contracts' assets in vaults and auctions, and have done so in several other contracts we've written.

We need to make large NFT purses a lot more efficient. The case of the Zoe escrow purses is particularly concerning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working chain-incident ERTP package: ERTP performance Performance related issues Zoe Contract Contracts within Zoe Zoe package: Zoe
Projects
None yet
Development

No branches or pull requests

3 participants