-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coalesce reads in Actor Cache flush #1814
Conversation
Exciting! Let me know if/when you'd like me to take a look at this. |
There's still 1 failing test ( I just tossed in one more commit to try and (at least partially) fix that failing test, but it doesn't seem to be doing much. Other than that, I think I may try to add some more comments and do a bit more refactoring. |
0354506
to
a529139
Compare
^ tries to make it a bit easier to track when |
4feb0f1
to
c25eca9
Compare
https://github.com/cloudflare/workerd/compare/a529139f3b54dacedd1b6b77b0b3bebe2babc63d..4feb0f11e12a52065c9f38936ff165d4af9c622e was a rebase to fix conflicts. I accidentally pulled 1 extra commit so the second force push just gets us back to this feature branch on top of |
8d27987
to
d673ea0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a partial review of the first handful of commits for now, sorry. I'll continue reviewing the rest, although have a couple questions that could use a response as well.
79b4a08
to
70a2d46
Compare
Thanks Alex, was hoping to get to some of the comments today but I noticed there was a new segfault. Thought something broke during the rebase, but it was just a bad commit. This took way longer to figure out than it should've 🙃 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it looks like you're aware of it, but there are a bunch of TODO(now) comments to work out :)
c90d9f9
to
5ea3d9d
Compare
This allows us to mutate from UNKNOWN to PRESENT or ABSENT but permit no other transition.
This commit does a few notable things: - Counted deletes now use the lastFlush promise instead of their own fulfillers. This does mean that counted delete promises will resolve later in some rare cases like a counted delete and a delete all being requested within the same scheduled flush or a counted delete being issued as part of a transaction. The goal here is to get to a point where *all* operations just have to wait for the flush promise after they are scheduled. - Counted deletes now maintain their own list of participating entries (somewhat like delete all). This means that we no longer have to "inherit" counted deletes when entries are overridden or do iterator tracking when building their flush batches.
This config doesn't need to chage, so let's convey that in the code.
This commit: - Makes `syncStatus` private - Makes removeEntry() the only way we remove from lists - Introduces methods to add to the dirty or clean list, so that transitioning between `EntrySyncStatus` states is easier to follow. - Adds/corrects some comments
The `dirtyList` is a kj::List with a thin wrapper around its methods. When we add or remove from the list, we also update the size (total bytes). This was fine until we started modifying the size of the elements of the list. In this case, we put an Entry in the `dirtyList` when we want to read it from storage, and when the result from storage is ready we overwrite whatever is in `Entry`. When we would later attempt to remove the Entry from the `dirtyList`, it would decrement the size past 0 and cause underflow, which would cause tests to hang.
It turns out we skip flushing in preview sessions entirely. Rather than trying to find a way to only flush reads, we can just do what we've always done prior to coalescing reads.
Promises were being dropped, but we mark them nodiscard. We also explicit check for calls to mockStorage.
This commit attempts to provide a mechanism to cancel the promise that waits for the read batches to complete in syncGets(). This is needed for when the `ActorCache` is destroyed, since we need to destroy all `Entry`s before we destroy the `SharedLru`.
5ea3d9d
to
2bf2d26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished reviewing all test changes, will look at the code changes now.
Sorry for some of the comments being more questions and/or things I could poke at myself once my devspace is usable again, feel free to breeze past any such comments that aren't useful.
@@ -1427,70 +1413,122 @@ KJ_TEST("ActorCache read retry on flush containing only puts") { | |||
KJ_ASSERT(KJ_ASSERT_NONNULL(promise.wait(ws)) == "123"); | |||
} | |||
|
|||
KJ_TEST("ActorCache read hard fail") { | |||
// KJ_TEST("ActorCache read hard fail") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this test commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we've moved reads to flush along with writes, if the read fails, it looks like we're actually breaking the output gate 😬.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😬
.thenReturn(CAPNP(numDeleted = 2)); | ||
mockTxn->expectCall("delete", ws) | ||
.withParams(CAPNP(keys = ["quux", "baz"])) | ||
.thenReturn(CAPNP(numDeleted = 1234)); // count ignored | ||
.withParams(CAPNP(keys = ["baz"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't worry about it now, but I wonder if this could be merged with the delete of quux
at this point, given baz
is an uncounted delete and quux
is a counted delete whose count has already been previously filled in.
Might be worth adding a TODO for, but there's clearly already enough happening in this PR, and it's possible the code complexity wouldn't be worth it anyway.
@@ -1744,6 +1768,9 @@ KJ_TEST("ActorCache list() with limit") { | |||
// Cached if we try it again though. | |||
KJ_ASSERT(expectCached(test.list("bar", "qux", 4)) == | |||
kvs({{"bar", "456"}, {"baz", "789"}, {"foo", "123"}, {"garply", "54321"}})); | |||
|
|||
// Return our uncached get from earlier. | |||
mockStorage->expectCall("get", ws).withParams(CAPNP(key = "fooa")).thenReturn(CAPNP()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, so now we issue a get for this even though its promise was dropped? And somehow this comes after the list()
that was issued later than it (line 1752 for the list vs line 1739 for the get)? I don't understand this :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Now that we've confirmed `qux`, `xxx`, and `yyy` are in cache, we can let `bar` and `baz` be | ||
// inserted into cache. If we did individual `get()`s, then `bar` and `baz` would actually | ||
// evict `qux` and `xxx` before we got to verify if they were in cache! | ||
mockStorage->expectCall("getMultiple", ws) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it at all worrisome that we now send these get
s to the backend despite neither of their callers actually waiting around on the results? It seems preferable to avoid the storage reads entirely if the caller abandoned the operations before the event loop was yielded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, especially because in some cases we drop it, ex. ActorCache read overwrite
// Make some gets but overwrite them in the cache with puts.
auto promise1 = expectUncached(test.get("foo"));
auto promise2 = expectUncached(test.get("bar"));
(void)expectUncached(test.get("baz"));
test.put("foo", "456");
test.put("bar", "789");
test.put("baz", "123");
In this case, we don't do the get()
for baz
, but in other cases, ex. this comment we still do the get.
It's not yet clear to me what's causing this, but I think it's worth figuring out.
.withParams(CAPNP(keys = ["bar", "baz", "qux"]), "stream"_kj) | ||
.useCallback("stream", [&](MockClient stream) { | ||
// TODO(now): Using `kilobyte` for value results in the error "External constants not allowed." | ||
// What can we do to fix that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, yeah, this is messy because of how CAPNP is a preprocessor macro that's just stringifying its input into the capnproto text format.
I can help play with this once I get my dev environment back if you want (e.g. perhaps a #define KILOBYTE xxx...
would work here), but TBH I don't think it's really a problem if we have to leave these in so long as there's a comment explaining why.
@@ -291,8 +301,21 @@ void ActorCache::touchEntry(Lock& lock, Entry& entry) { | |||
} | |||
|
|||
void ActorCache::removeEntry(Lock& lock, Entry& entry) { | |||
KJ_IASSERT(!entry.isDirty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I kind of appreciated that we had some protections against accidentally removing dirty entries (and ensuring their countedDelete
is carried forward) rather than happily doing so silently here.
No description provided.