Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe block deletion (with ref count) #631

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

tbekas
Copy link
Contributor

@tbekas tbekas commented Nov 21, 2023

No description provided.

@tbekas tbekas force-pushed the safe-block-deletion branch 2 times, most recently from 3833072 to 92fea28 Compare November 28, 2023 14:26
@tbekas tbekas force-pushed the safe-block-deletion branch 3 times, most recently from 0f86d9f to eb90288 Compare December 13, 2023 16:11
@tbekas tbekas changed the title WIP, Safe block deletion (with ref count) Safe block deletion (with ref count) Dec 13, 2023
@tbekas tbekas force-pushed the safe-block-deletion branch 2 times, most recently from c96b0f2 to bfc467e Compare December 13, 2023 17:24
@tbekas tbekas requested a review from dryajov December 13, 2023 17:29
@tbekas tbekas requested a review from elcritch December 13, 2023 17:53
@tbekas tbekas marked this pull request as ready for review December 14, 2023 15:32
@tbekas tbekas force-pushed the safe-block-deletion branch 5 times, most recently from 3b2d85e to f8c13ea Compare December 20, 2023 11:15
totalBlocks* {.serialize.}: Natural
quotaMaxBytes* {.serialize.}: NBytes
quotaUsedBytes* {.serialize.}: NBytes
quotaReservedBytes* {.serialize.}: NBytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, you just broke a bunch of deserializers.
'NBytes' serialized to a string that is "123'NByte" where previously this json field was just a number. Since the name of the fields have "Bytes" in them, do we want this? This If yes, I'll updated the deserializers.
I suppose we could adjust NBytes to JSON-serialize to normal numbers, since we're not building a UI here, and the NBytes always adds "'NByte" no matter if the number is in KB/MB/GB range, it's not really human-readable anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do curl -s 127.0.0.1:8080/api/codex/v1/space | jq I get

{
  "totalBlocks": 0,
  "quotaMaxBytes": 8589934592,
  "quotaUsedBytes": 0,
  "quotaReservedBytes": 0
}

I think that the bytes suffix should stay on properties.

KeyVal[T] = (?Key, ?!T)
ResIter[T] = Future[KeyVal[T]]


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any tests. You could probably run these using SQLiteDatastore.new(Memory), right?

Copy link
Contributor Author

@tbekas tbekas Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tested transitively via the testrepostore.nim. I think we can live without a dedicated test suit for this file, since the functionality is small and used only in repostore.nim.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like something that needs to live in datastore proper? And, I'd actually add some tests for it as suggested.

proc isFinished(): bool = true

Iter.new(genNext, isFinished)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the Iter/AsyncIter types. Given how fundamental they can be to the codebase, I think they really need some tests. Even if the tests are very simple and kinda silly, like testing that an empty iterator is finished.... :b But that just means the tests are easy to write!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I will add tests for this file.

pb.finish
pb.buffer

proc autodecode*(T: typedesc[tuple | object], bytes: seq[byte]): ?!T =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was a good idea, I think. I'd definitely test the autoencode/autodecode. The other ones would be very easy, but I could live without tests for those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I will add few simple test cases.

blockTtl*: Duration
started*: bool

QuotaUsage* = object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could maybe make a 'Quota' object, pull the quotaMaxBytes in here too, and put this object + all related procs in a different file? I don't know if there are enough Quota-procs to justify this, but it'd make this file a little smaller maybe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QuotaUsage is db entity, while quotaMaxBytes is a configuration property that's stays immutable through the lifetime of an app. Therefore I think they belong in different spaces. If we want to present them together, they could be combined in a presentation layer (Json) as it's already done in the RestRepoStore object.

Also I think that either all db entities should be in separate files, or none of them, to keep the code base consistent. Moving QuotaUsage to separate file while keeping other entities is inconsistent.

res: DeleteResult

if currMd =? maybeCurrMd:
if currMd.refCount == 0 or currMd.expiry > expiryLimit:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currMd.expiry > expiryLimit
expiry is a timestamp, right? Does this mean we delete blocks if their currMd.expiry is further into the future than expiryLimit?

Copy link
Contributor Author

@tbekas tbekas Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're 100% right, there's a logic error here. It should be currMd.expiry < expiryLimit and the default value of expiryLimit should be SecondsSince1970.low instead of SecondsSince1970.high.

Thank you for catching this! I will add a test case for this.

return failure(err)
trace "Leaf metadata stored, block refCount incremented"
else:
trace "Leaf metadata already exists"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (we just stored this block metadata)
-> then we increase the ref count by one
else (it was already stored)
-> we do nothing

Shouldn't that be the other way around?
if (we stored the block)
-> do nothing (it got stored with a ref count of 1)
else (it was already there)
-> increase ref count

Copy link
Contributor Author

@tbekas tbekas Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the refCount refers to how many trees are referring this block.

If you store just a block (without any references to the tree yet), what should be the initial refCount for this block? In my opinion it's 0 since that's the count of references to the trees.

Then when adding new references to this block (new leaf metadata) we increase the refCount by 1.

trace "Block Stored"
if err =? (await self.updateQuotaUsage(plusUsed = res.used)).errorOption:
# rollback changes
without delRes =? await self.tryDeleteBlock(blk.cid), err:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I'm reading this right, we store the block first, then we check if this exceeds the quota.
Shouldn't that be the other way around? Ask the quota if it has space for this, and error out immediately if it doesn't.

Copy link
Contributor Author

@tbekas tbekas Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will think about a way to address it, but really it's not that trivial. First of all we don't know if we should attempt to increase the quota unless we try to store a block - because it may turn out that the block is already stored and there was no reason to attempt increase the quota in the first place.

It seems to be easy at surface. But there might occur concurrency errors, that eventually may lead to permanent inconsistencies in db, which should be avoided at all cost. Keep in mind that we need to keep consistent 4 different records in 2 different stores:

  • block in repoDs
  • block metadata in metaDs
  • quota usage in metaDs
  • total blocks in metaDs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this should be written as a batch in one go with the method put*(self: Datastore, batch: seq[BatchEntry]) method, this will avoid inconsistencies across this stores, the only exception being the repoDs which is right now a separate datastore. Another approach would be to moving the repoDs to the same table as the rest of the keys, but that might have additional performance consequences - but still worth a try.

We can also think about adding a concurrent store on top that will update/rollback a batch of put oprations in one transation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on a offline discussion regarding this issue. We agreed that all the items we store in metaDs, which are:

  • block metadata
  • quota byte usage
  • total number of blocks

should be saved with a single batch and a single transaction. However currently we don't have modify API that would accept batch input, and adding such API would require some additional development effort in nim-datastore, so we're fine with lower consistency guarantee, which is just consistency between block metadata
and the block data (a file).

discard (await metaDs.get(QuotaReservedKey)).tryGet
repo.totalUsed == 100'nb
repo.quotaUsedBytes == 100'nb
repo.quotaReservedBytes == 0'nb

test "Should release bytes":
discard createTestBlock(100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we make this block, and then throw it away?
that's not very nice to the block. :o

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't added this code, but as far as I can tell the goal is to verify that the quota usage changes after creating a block.


test "Should not update block expiration timestamp when current expiration is farther then new one":
test "Should update block expiration timestamp when new expiration is farther":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one does the opposite, which is correct. The title is a copy-paste error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right, thanks for catching it!

@tbekas tbekas force-pushed the safe-block-deletion branch 2 times, most recently from 4ac3464 to 31f431b Compare January 18, 2024 15:28
@tbekas tbekas force-pushed the safe-block-deletion branch 6 times, most recently from d7053f7 to ac896f9 Compare January 31, 2024 16:54
n = uint64.fromBytesBE(data[0..<sizeof(uint64)]).int
cid = ? Cid.init(data[sizeof(uint64)..<sizeof(uint64) + n]).mapFailure
success(cid)
proc encode(t: Cid): seq[byte] = t.data.buffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is already quite busy, can we perhaps split this helpres out to their own unit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for Cid all encode/decode procs are defined here because all corresponding types are also defined here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it's probably good to extract the types to their own types module and share across both, this is the commonly accepted approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs to be split into several functional units that group related functionality.

I suggest the following structure:

  • respostore/types.nim - common types shared by all other functional unites
  • repostore/coders.nim - encoding/decoding primitives
  • repostore/operations.nim - the update/get/modify operation
  • repostore/repostore.nim/store.nim - the actual repostore functionality
  • repostore.nim - top level module that exports the required public functinal units/modules

This structure would make the changes easier to track and compare with previous versions. It's quite hard to follow whats going on right now and the file is already quite large and busy with unrelated functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Copy link
Contributor

@dryajov dryajov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a lot of work, I think we're at a point now that we can start merging this into master, can we rebase this with master so I can give it a proper review?

@tbekas tbekas force-pushed the safe-block-deletion branch 4 times, most recently from a951bbf to dd212ef Compare February 16, 2024 17:14
@@ -4,8 +4,8 @@ import pkg/questionable
import pkg/chronos

type
Function*[T, U] = proc(fut: T): U {.raises: [CatchableError], gcsafe, noSideEffect.}
IsFinished* = proc(): bool {.raises: [], gcsafe, noSideEffect.}
Function*[T, U] = proc(fut: T): U {.raises: [CatchableError], gcsafe, closure.}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closure is unnecessary here, when declaring a proc type, it's implicitely a closure.

@@ -0,0 +1,5 @@


type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover from other work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is 🤦 Removing it

Copy link
Contributor

@benbierens benbierens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good improvements.
Commit sha-5e952f0 passes dist-tests. 👍

trace "Stopping repo"
await self.close()

self.started = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, we usually have new lines at EOF, maybe we should add a github hook to enforce it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants