Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat[marketplace]: add slot queue pausing #752

Merged
merged 25 commits into from
May 26, 2024

Conversation

emizzle
Copy link
Contributor

@emizzle emizzle commented Mar 26, 2024

Adds slot queue pausing based on the design in codex-storage/codex-research#188.

@AuHau -- I took the changes in commit 1425bf5 and added to this PR as I needed to update that routine.

Close #549

Copy link
Member

@AuHau AuHau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 👍

However, I am a bit confused about one part of your PR. In your proposal at codex-storage/codex-research#188, there were no mentions of the concept of "empty availabilities". I remember we discussed this before and I think reaching the definition of what "empty availability" is was bit tough.

I noticed that you have two ways to pause the queue. The first is based on the seen flag as per your original proposal. The second is triggered when "all availabilities are empty". I am unsure why you added the second part. Wouldn't the seen flag mechanism already cover this scenario? It seems to introduce unnecessary complexity and complicates understanding and reasoning about the codebase.

What do you think?

codex/sales/reservations.nim Show resolved Hide resolved
codex/sales/states/errored.nim Show resolved Hide resolved
codex/sales/slotqueue.nim Outdated Show resolved Hide resolved
codex/sales/slotqueue.nim Outdated Show resolved Hide resolved
@emizzle
Copy link
Contributor Author

emizzle commented Mar 27, 2024

That's a great point -- the queue passing should work without the added complexity of pausing when availabilities are "empty". I'll try to remove and write a test to assert that the queue pauses once availabilities are "emptied".

@emizzle
Copy link
Contributor Author

emizzle commented Mar 28, 2024

@AuHau the PR has been updated with your feedback. Thanks for the review! 🙌

Copy link
Member

@AuHau AuHau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty great work! There are some cleanup items, but generally LGTM.

Before merging lets discuss on our call about the reprocessing of error states.

test "queue is paused once availability is insufficient to service slots in queue":
createAvailability() # enough to fill a single slot
await market.requestStorage(request)
# await sleepAsync(10.millis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code? Can be deleted?

await market.requestStorage(request)
await sleepAsync(10.millis)
# await sleepAsync(10.millis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another dead code?

codex/sales/states/errored.nim Show resolved Hide resolved
@emizzle emizzle force-pushed the feat/marketplace/slot-queue-improvements branch 3 times, most recently from 5a91aa0 to ee42b61 Compare May 9, 2024 04:00
@emizzle
Copy link
Contributor Author

emizzle commented May 9, 2024

This PR should be ready to go, however I've some (small) changes to the way the queue pauses, so maybe it might be prudent to review again @AuHau ?

AuHau
AuHau previously approved these changes May 9, 2024
Copy link
Member

@AuHau AuHau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just few comments but generally LGTM! Great work! 👍

item.seen = false # does not maintain the heap invariant

# force heap reshuffling to maintain the heap invariant
doAssert self.queue.update(self.queue[0]), "slot queue failed to reshuffle"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this raises, where is it caught? Or will this crash the whole node? That should not really be necessary, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update() only returns false when the argument is not in the queue. Since we're supplying self.queue[0] as the arguments, this will never happen. And if it does happen, then it's a programmer error and the node should crash, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. In addition, we know that self.queue[0] exists because there is an early return
when if self.queue.empty is true. If the queue is empty when we don't expect
it to be, there is no real way to recover.

codex/sales/slotqueue.nim Outdated Show resolved Hide resolved
markspanbroek
markspanbroek previously approved these changes May 15, 2024
Copy link
Member

@markspanbroek markspanbroek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @emizzle, thanks! I have a few small comments, but other than that good to go!

codex/sales.nim Outdated Show resolved Hide resolved
codex/sales/slotqueue.nim Outdated Show resolved Hide resolved
item.seen = false # does not maintain the heap invariant

# force heap reshuffling to maintain the heap invariant
doAssert self.queue.update(self.queue[0]), "slot queue failed to reshuffle"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update() only returns false when the argument is not in the queue. Since we're supplying self.queue[0] as the arguments, this will never happen. And if it does happen, then it's a programmer error and the node should crash, I think.

codex/sales/slotqueue.nim Outdated Show resolved Hide resolved
codex/sales/states/downloading.nim Outdated Show resolved Hide resolved
codex/sales/states/downloading.nim Outdated Show resolved Hide resolved
codex/sales/states/downloading.nim Outdated Show resolved Hide resolved
@dryajov
Copy link
Contributor

dryajov commented May 16, 2024

Where are we with this? Should it be merged?

@emizzle emizzle dismissed stale reviews from markspanbroek and AuHau via 940dae4 May 21, 2024 05:13
@emizzle emizzle force-pushed the feat/marketplace/slot-queue-improvements branch from 940dae4 to 6946cbf Compare May 21, 2024 06:16
@emizzle
Copy link
Contributor Author

emizzle commented May 21, 2024

Where are we with this? Should it be merged?

Will merge once green. Had to address a few small comments and rebase on master.

emizzle added 14 commits May 24, 2024 14:14
Re-add processed slots to queue if the sale was ignored or errored
- when processing slots in queue, pause queue if item was marked seen
- if availability size is increased, trigger onAvailabilityAdded callback
- in sales, on availability added, clear 'seen' flags, then unpause the queue
- when items pushed to the queue, unpause the queue
The slot queue should also have nothing to do with availabilities
An empty availability is defined as size < DefaultBlockSize as this means even the smallest possible request could not be served. However, this is up for discussion.
onAvailabilityAdded and onAvailabilitiesEmptied are now only called from reservations.update (and eventually reservations.delete once implemented).

- Add empty routine for Availability and Reservation
- Add allEmpty routine for Availability and Reservation, which returns true when all all Availability or Reservation objects in the datastore are empty.
Includes tests for sales states cancelled, errored, ignored to ensure onCleanUp is called with correct parameters
emizzle added 11 commits May 24, 2024 14:28
- indent `self.unpause`
- update comment for `clearSeenFlags`
Queue pausing when all availiabilies are "emptied" is not necessary, given that the node would not be able to service slots once all its availabilities' freeSize are too small for the slots in the queue, and would then be paused anyway.

Add test that asserts the queue is paused once the freeSpace of availabilities drops too low to fill slots in the queue.
The asyncheapqueue update overload would need to check index bounds and ultimately a different solution was found using the mitems iterator.
request.id was different before updating request.ask.slots, and that id was used to set the state in mockmarket.
Previously, when a seen item was processed, it was first popped off the queue, then the queue was paused waiting to process that item once the queue was unpaused. Now, when a seen item is processed, it is popped off the queue, the queue is paused, then the item is re-added to the queue and the queue will wait until unpaused before it will continue popping items off the queue. If the item was not re-added to the queue, it would have been processed immediately once unpaused, however there may have been other items with higher priority pushed to the queue in the meantime. The queue would not be unpaused if those added items were already seen. In particular, this may happen when ignored items due to lack of availability are re-added to a paused queue. Those ignored items will likely have a higher priority than the item that was just seen (due to it having been processed first), causing the queue to the be paused.
@emizzle emizzle force-pushed the feat/marketplace/slot-queue-improvements branch from 6946cbf to 449cedb Compare May 24, 2024 05:05
@emizzle emizzle added this pull request to the merge queue May 26, 2024
Merged via the queue into master with commit e6a387e May 26, 2024
8 checks passed
@emizzle emizzle deleted the feat/marketplace/slot-queue-improvements branch May 26, 2024 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce Slot Queue pausing and remove past events fetching
4 participants