[marketplace] Availability improvements #535

emizzle · 2023-08-28T03:35:10Z

Fixes #534.

Problem

When Availabilities are created, the amount of bytes in the Availability are reserved in the repo, so those bytes on disk cannot be written to otherwise. When a request for storage is received by a node, if a previously created Availability is matched, an attempt will be made to fill a slot in the request (more accurately, the request's slots are added to the SlotQueue, and eventually those slots will be processed). During download, bytes that were reserved for the Availability were released (as they were written to disk). To prevent more bytes from being released than were reserved in the Availability, the Availability was marked as used during the download, so that no other requests would match the Availability, and therefore no new downloads (and byte releases) would begin. The unfortunate downside to this, is that the number of Availabilities a node has determines the download concurrency capacity. If, for example, a node creates a single Availability that covers all available disk space the operator is willing to use, that single Availability would mean that only one download could occur at a time, meaning the node could potentially miss out on storage opportunities.

Solution

To alleviate the concurrency issue, each time a slot is processed, a Reservation is created, which takes size (aka reserved bytes) away from the Availability and stores them in the Reservation object. This can be done as many times as needed as long as there are enough bytes remaining in the Availability. Therefore, concurrent downloads are no longer limited by the number of Availabilities. Instead, they would more likely be limited to the SlotQueue's maxWorkers.

From a database design perspective, an Availability has zero or more Reservations, and looks like this:

                                                      +--------------------------------------+
                                                      |            RESERVATION               |
+--------------------------------------+              |--------------------------------------|
|            AVAILABILITY              |              | ReservationId  | id             | PK |
|--------------------------------------|              |--------------------------------------|
| AvailabilityId | id            | PK  |<-||-------o<-| AvailabilityId | availabilityId | FK |
|--------------------------------------|              |--------------------------------------|
| UInt256        | size          |     |              | UInt256        | size           |    |
|--------------------------------------|              |--------------------------------------|
| UInt256        | duration      |     |              | SlotId         | slotId         |    |
|--------------------------------------|              +--------------------------------------+
| UInt256        | minPrice      |     |
|--------------------------------------|
| UInt256        | maxCollateral |     |
+--------------------------------------+

Reservations are persisted in the RepoStore's metadata, along with Availabilities. The metadata store key path for Reservations is meta / sales / reservations / <availabilityId> / <reservationId>, while Availabilities are stored one level up, eg meta / sales / reservations / <availabilityId> , allowing all Reservations for an Availability to be queried (this is not currently needed, but may be useful when work to restore Availability size is implemented, more on this later).

Lifecycle

When a reservation is created, its size is deducted from the Availability, and when a reservation is deleted, any remaining size (bytes not written to disk) is returned to the Availability. If the request finishes, is cancelled (expired), or an error occurs, the Reservation is deleted (and any undownloaded bytes returned to the Availability). In addition, when the Sales module starts, any Reservations that are not actively being used in a filled slot, are deleted.

Having a Reservation persisted until after a storage request is completed, will allow for the originally set Availability size to be reclaimed once a request contract has been completed. This is a feature that is yet to be implemented, however the work in this PR is a step in the direction towards enabling this.

Unknowns

Reservation size is determined by the StorageAsk.slotSize. If during download, more bytes than slotSize are attempted to be downloaded than this, then the Reservation update will fail, and the state machine will move to a SaleErrored state, deleting the Reservation. This will likely prevent the slot from being filled.

Notes

Based on #514

- add requestId and slotIndex to Reservation (hopefully these will prove to be useful when we persist Reservations until request are completed, to add back bytes to Availability) - add querying of all reservations, with accompanying tests - change from find to findAvailabilities - move onCleanUp from SalesContext to SalesAgent as it was getting overwritten for each slot processed - remove sales agent AFTER deleting reservation, as this was causing some SIGSEGVs - retrofit testsales and testslotqueue to match updated Reservations module API

Apply to onStore errors as we are seeing undetailed errors in the dist tests logs

Because availability filtering on push was removed, when availability is added and past storage request events are queried, those requests need to be filtered by availability before being added to the queue.

This reverts commit 0c23626.

markspanbroek

I like these changes a lot, the codes cleans up nicely by moving to the reservations model instead of using availability.used.

codex/sales/reservations.nim

markspanbroek · 2023-09-11T08:22:47Z

codex/sales/reservations.nim

-  for a in availabilities:
-    if availability =? (await a) and not availability.used:
-      ret.add availability
+  # NOTICE: there is a swallowed deserialization error


Should we log it?

Surprisingly, I can't seem to log it because there is an odd compilation error, which I suspect has something to do with questionable. If I change this to:

for storable in storables.items: if bytes =? (await storable): without obj =? T.fromJson(bytes), e: error "deserialization error", error = e.msg continue ret.add obj

I get the compilation error:

/Users/egonat/repos/status-im/nim-codex/tests/codex/sales/testsales.nim(442, 41) template/generic instantiation of `all` from here /Users/egonat/repos/status-im/nim-codex/codex/sales/reservations.nim(458, 15) template/generic instantiation of `without` from here /Users/egonat/repos/status-im/nim-codex/vendor/questionable/questionable/withoutresult.nim(36, 3) template/generic instantiation of `without` from here /Users/egonat/repos/status-im/nim-codex/codex/sales/reservations.nim(459, 15) template/generic instantiation of `error` from here /Users/egonat/repos/status-im/nim-codex/vendor/nim-chronicles/chronicles.nim(363, 10) template/generic instantiation of `log` from here /Users/egonat/repos/status-im/nim-codex/vendor/nim-chronicles/chronicles.nim(332, 21) Error: undeclared identifier: 'activeChroniclesStream'

This is extrememly odd because there are other uses of without and chronicles error logging in the same module and there are no compilation errors there.

Any ideas?

For future reference: needed to export chronicles.

Due to the generic type parameter, the context at which this proc is instantiated and called (which is different than at compile time) does not contain chronicles in its scope unless exported by the current module.

codex/sales/reservations.nim

codex/sales/states/preparing.nim

- ensure compiler support for changes to SomeStorableObject types in Reservations.storables - do not swallow json deserialization error

emizzle · 2023-09-14T23:57:42Z

This should be g2g mark, ready for another look.

AuHau

Pretty great work! I like the testing! Although I have a few comments, please have a look ;-)

codex/node.nim

codex/rest/api.nim

AuHau · 2023-09-25T11:53:25Z

codex/sales.nim

    let agent = newSalesAgent(
      sales.context,
      slot.request.id,
      slot.slotIndex,
      some slot.request)

-    agent.context.onCleanUp = proc {.async.} = await sales.remove(agent)
+    agent.onCleanUp = proc {.async.} = await sales.remove(agent)


This should be set to the await sales.cleanUp(agent, done) as well, otherwise loaded Sales won't do proper cleanup.

Good catch. I must've missed this because of the rebased changes (on top of your changes that modified at which point we complete the processing future).

The loaded slots are not processed in the context of the slot queue, however, I've created a dummy future to pass in to sales.cleanUp.

AuHau · 2023-09-25T12:02:03Z

codex/sales/salesagent.nim

+    onCleanUp*: OnCleanUp
+
+  OnCleanUp* = proc: Future[void] {.gcsafe, upraises: [].}


Why did you duplicate onCleanUp callback here from context? And why didn't you move it here and remove it from context? Now I believe it is duplicated, right? Even though only this new one is being used?

IMHO I would leave it at the context as there are also other callbacks...

Took me a while to remember, thank god for well-documented git comments 😄

You're right, this should not be duplicated, which has been updated (by removing OnCleanUp from SalesContext.

The reason this was moved was because every time processSlot was called, the context's onCleanUp callback was overwritten, taking the done future in its closure with it. Effectively, done was only called for the last processed slot, and the other workers would remain blocked. This will likely also happen with onFilled, so I've moved that to SalesAgent.

We have slot/agent-level callbacks, that are specific to the slot and the callback closure will be overwritten for each slot that is process. Then we have sales-level callbacks, in which their closure will not be overwritten for each slot processed. I've added som comments regarded this.

OnSale is a sales-level callback, which is triggered once a slot is filled. It is only used in the tests, which I don't particularly care for, but I don't think we have better options for testing that filled is called at this point.

See df7ac21 for all the above changes.

codex/utils/trackedfutures.nim

tests/codex/sales/testreservations.nim

- remove duplicated OnCleanUp from SalesContext, as it is better-placed at the agent/slot-level than sales-level, as its closure will get overwritten for each processed slot - remove onSale call from the filled state, which is now called in sales.filled. Note, onSale is only being used in the tests - In sales.load, call sales.cleanUp when agent.onCleanUp is triggered, to ensure that reservations are released correctly

AuHau

👍

AuHau · 2023-09-28T13:41:45Z

codex/sales.nim

+    else:
+      trace "deleted unused reservation"


This will log this message for every reservation deleted, right? Maybe only one would be sufficient? 😅

AuHau · 2023-09-28T13:44:44Z

tests/codex/sales/testreservations.nim

@@ -21,7 +21,7 @@ asyncchecksuite "Reservations module":
    reservations: Reservations

  setup:
-    randomize()
+    randomize(1.int64) # create reproducible results


This is also possible, but it will run with the same inputs every time. If it is not a problem then keep it.

emizzle added the marketplace label Aug 28, 2023

Base automatically changed from feat/integration/improve-codex-client to master September 1, 2023 05:44

emizzle force-pushed the feat/sales/availability-improvements branch 2 times, most recently from d43aa61 to 1426f7c Compare September 1, 2023 05:56

emizzle added 13 commits September 7, 2023 15:56

Process slot queue on reservation callback onMarkUnused

99cd088

Add Reservation object, rename reserve > create

a139018

refactor Reservations api to include Reservation CRUD

9ebb3fe

Add deletion of inactive reservations on sales load

f46ce97

clean up

f7cadc5

add exception message detail util

d696b31

Apply to onStore errors as we are seeing undetailed errors in the dist tests logs

add missing file

9075dce

change slotsize to reflect current implemenation

5b61ff3

Fix slotSize to reduce by one block

cb7febf

Revert change to slotSize that reduces it by one block

9713909

Add additional test check for querying reservations/availabilities

f9fc3e9

filter past requests based on availability

0c23626

Because availability filtering on push was removed, when availability is added and past storage request events are queried, those requests need to be filtered by availability before being added to the queue.

emizzle force-pushed the feat/sales/availability-improvements branch from 1426f7c to 0c23626 Compare September 7, 2023 07:18

Revert "filter past requests based on availability"

3fbcfa9

This reverts commit 0c23626.

markspanbroek previously approved these changes Sep 11, 2023

View reviewed changes

address PR comments

fe45598

- ensure compiler support for changes to SomeStorableObject types in Reservations.storables - do not swallow json deserialization error

emizzle dismissed markspanbroek’s stale review via fe45598 September 14, 2023 23:53

change info to debug

790c1d4

AuHau reviewed Sep 25, 2023

View reviewed changes

AuHau mentioned this pull request Sep 26, 2023

Sales statemachine should call freeSlot() for failed slots #556

Closed

emizzle added 2 commits September 28, 2023 11:13

provide seed in Reservations tests for reproducible results

6582eab

AuHau previously approved these changes Sep 28, 2023

View reviewed changes

Remove confusing comment regarding closure overwriting

077046a

emizzle dismissed AuHau’s stale review via 077046a September 29, 2023 02:19

emizzle merged commit 570a1f7 into master Sep 29, 2023
8 checks passed

emizzle deleted the feat/sales/availability-improvements branch September 29, 2023 04:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[marketplace] Availability improvements #535

[marketplace] Availability improvements #535

emizzle commented Aug 28, 2023 •

edited

markspanbroek left a comment

markspanbroek Sep 11, 2023

emizzle Sep 14, 2023

emizzle Sep 14, 2023

emizzle commented Sep 14, 2023

AuHau left a comment

AuHau Sep 25, 2023

emizzle Sep 28, 2023

AuHau Sep 25, 2023

emizzle Sep 28, 2023

emizzle Sep 28, 2023

AuHau left a comment

AuHau Sep 28, 2023

AuHau Sep 28, 2023

		onCleanUp*: OnCleanUp

		OnCleanUp* = proc: Future[void] {.gcsafe, upraises: [].}

[marketplace] Availability improvements #535

[marketplace] Availability improvements #535

Conversation

emizzle commented Aug 28, 2023 • edited

Problem

Solution

Lifecycle

Unknowns

Notes

markspanbroek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emizzle commented Sep 14, 2023

AuHau left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AuHau left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emizzle commented Aug 28, 2023 •

edited