New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse KeyValueStore on rollback #821

Merged
merged 19 commits into from Oct 22, 2018

Conversation

Projects
None yet
5 participants
@panghy
Copy link
Contributor

panghy commented Oct 7, 2018

Not sure if this is dangerous but I have tested this in clusters that are taking significant writes (> 1M) and this seems to be working. I can always trigger a rollback for some storage servers by finding and nuking the CC. With this change, SS reboots will not reload the KVMS if the knob is turned on. SS code is changed to not close the IKVS if the error is please_reboot. handleIOError needs to not wait for the IKVS to close if the error is again please_reboot and the flag is on.

The idea is that a rollback is very expensive for the memory store (I had a version that just did flushAndExit() since you might as well reclaim some memory before spending many minutes re-reading state from disk). I am not sure if this would work though but it seems like recovery just reads everything and writes a final rollback (which is what reusing an existing memory storage instance would do with this change).

@panghy

This comment has been minimized.

Copy link
Contributor

panghy commented Oct 7, 2018

Seems like some indenting is off, looks ok on my CLion but I guess it's doing something underneath the hood.

@hgray1 hgray1 requested a review from etschannen Oct 8, 2018

@alexmiller-apple

This comment has been minimized.

Copy link
Contributor

alexmiller-apple commented Oct 8, 2018

Doing a git clang-format would probably be the easiest way to clean up the whitespace.

@panghy

This comment has been minimized.

Copy link
Contributor

panghy commented Oct 9, 2018

Hm, @alecgrieser it still looks weird

@alecgrieser

This comment has been minimized.

Copy link
Contributor

alecgrieser commented Oct 9, 2018

@panghy Did you mean to ping @alexmiller-apple?

@panghy

This comment has been minimized.

Copy link
Contributor

panghy commented Oct 9, 2018

Oh, sorry, yeah :)

@panghy

This comment has been minimized.

Copy link
Contributor

panghy commented Oct 9, 2018

OK, fixed and just need to use tabs

@alexmiller-apple

This comment has been minimized.

Copy link
Contributor

alexmiller-apple commented Oct 10, 2018

I don't know this code well enough to be able to sign off on it, but running through correctness, it appears to be fine. I had to remove the restarting tests, which broke, but I'm under the impression that's because of issues with our correctness running 5.2 builds and not with this PR.

Correctness runs:
20181010-014734-alexmiller-b2ebc0998481b902 had 2/100k failures, neither of which look related
20181010-110710-alexmiller-4692c291f3ceef8a is ongoing, but has 0 failures out of 35k runs so far

@etschannen

This comment has been minimized.

Copy link
Contributor

etschannen commented Oct 10, 2018

I believe this approach should work for all storage engines, not just the memory storage engine.

For safety, we need to wait for any outstanding commits to be completed before starting the new storage server. This means the "Future durable" variable in the updateStorage actors needs to be stored in the "StorageServer" data structure. Then, in the catch block of the "update" actor, if the error code is please_reboot we can wait for the durable future to be ready before re-throwing the error.

@panghy

This comment has been minimized.

Copy link
Contributor

panghy commented Oct 10, 2018

@etschannen sounds like this would be required for both the SSD and memory engines? Am I reading you correct?

@etschannen

This comment has been minimized.

Copy link
Contributor

etschannen commented Oct 10, 2018

Yes, waiting for outstanding commits is needed regardless. Using it for all storage engine will give a minor performance improvement for the SSD engine and also help simplify the code.

@alexmiller-apple

This comment has been minimized.

Copy link
Contributor

alexmiller-apple commented Oct 10, 2018

Simulation finally got enough runs in that it appears to agree with Evan.

With a build of 4f1cb97, you should be able to run:

./bin/fdbserver -r simulation -b off -s 44328167 -f tests/slow/SwizzledRollbackTimeLapseIncrement.txt --crash

and reproduce the failure. You should see a <TestFailure Severity="40" Reason="Bad increments" A="141392" B="141546"

It looks like if one just grinds though runs of SwizzledRollbackTimeLapseIncrement.txt, I'm seeing about 1/3000 fail, which unfortunately makes verifying a fix to this PR is a bit impractical if you don't have a cluster of machines to throw at it. So, I'm happy to throw new commits into our correctness cluster and let you know what falls out. (And it looks like SwizzledRollbackTimeLapseIncrement is the only one that does the right combination of things to shake out this bug.)

@panghy

This comment has been minimized.

Copy link
Contributor

panghy commented Oct 11, 2018

@etschannen please take a look to make sure what I am doing is correct (I am trying to find examples of Future that's global.
@alexmiller-apple thanks for the help, see if the latest commit helps

@alexmiller-apple

This comment has been minimized.

Copy link
Contributor

alexmiller-apple commented Oct 11, 2018

Building on 88e842, I still see failures:

Tests causing errors:
Test                                          | Seed                 | Buggify              | Determinism Check    | Error                | Unseed               | Reason              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
slow/SwizzledRollbackTimeLapseIncrement.txt   | 738014155            | True                 | False                | TestFailure          | 98065                | Bad increments      
slow/SwizzledRollbackTimeLapseIncrement.txt   | 216361174            | True                 | False                | TestFailure          | 8282                 | Bad increments      
slow/SwizzledRollbackTimeLapseIncrement.txt   | 439696692            | False                | False                | TestFailure          | 51462                | Bad increments      
slow/SwizzledRollbackTimeLapseIncrement.txt   | 376392070            | True                 | False                | TestFailure          | 48049                | Bad increments      
slow/SwizzledRollbackTimeLapseIncrement.txt   | 873899149            | True                 | False                | TestFailure          | 71983                | Bad increments      
slow/SwizzledRollbackTimeLapseIncrement.txt   | 503858078            | True                 | False                | TestFailure          | 26293                | Bad increments      
@@ -333,6 +333,7 @@ struct StorageServer {
CoalescedKeyRangeMap<bool, int64_t, KeyBytesMetric<int64_t>> byteSampleClears;
AsyncVar<bool> byteSampleClearsTooLarge;
Future<Void> byteSampleRecovery;
Future<Void> durable = Void();

This comment has been minimized.

@etschannen

etschannen Oct 12, 2018

Contributor

durable should be set to Void() in the constructor

This comment has been minimized.

@etschannen

etschannen Oct 12, 2018

Contributor

to clarify, this is just style, not a correctness issue

This comment has been minimized.

@panghy

panghy Oct 13, 2018

Contributor

Sure

@panghy

This comment has been minimized.

Copy link
Contributor

panghy commented Oct 13, 2018

@alexmiller-apple is there something to suggest from the test that the code is wrong? This code requires rollbacks to trigger, can we confirm that that's the case? I guess I can pick the seed and run the simulator and it would be deterministically crashing?

@alexmiller-apple

This comment has been minimized.

Copy link
Contributor

alexmiller-apple commented Oct 13, 2018

Yeah, you should be able to plug in the same seed and get the same failure, which was one that suggested data corruption.

There was office discussion about why this still isn't correct, and the proposed next fix appears to have fixed the corruption problem, but instead sometimes causes storage server hangs. So, we'll see if we have a trivial fix to give you, or a description of something that should actually pass correctness.

@panghy

This comment has been minimized.

Copy link
Contributor

panghy commented Oct 13, 2018

Thanks!

Show resolved Hide resolved fdbserver/IKeyValueStore.h Outdated
Show resolved Hide resolved fdbserver/Knobs.cpp Outdated
Show resolved Hide resolved fdbserver/storageserver.actor.cpp
Show resolved Hide resolved fdbserver/worker.actor.cpp
Show resolved Hide resolved fdbserver/worker.actor.cpp Outdated
@etschannen

This comment has been minimized.

Copy link
Contributor

etschannen commented Oct 15, 2018

I got the correctness tests to pass, and committed the fix directly to your branch. Now there is just some small cleanup necessary before I will merge the PR.

Also, if possible I would like the PR to be against 6.0. We probably would not do a patch release on 5.2 with the 6.0 release imminent.

Thanks!

@panghy

This comment has been minimized.

Copy link
Contributor

panghy commented Oct 15, 2018

@panghy

This comment has been minimized.

Copy link
Contributor

panghy commented Oct 20, 2018

Should be good to look @etschannen

@panghy panghy changed the title Reuse KeyValueStoreMemory on rollback Reuse KeyValueStore on rollback Oct 21, 2018

@panghy panghy changed the base branch from release-5.2 to release-6.0 Oct 21, 2018

@hgray1 hgray1 added this to the 6.0 milestone Oct 22, 2018

@panghy

This comment has been minimized.

Copy link
Contributor

panghy commented Oct 22, 2018

@alexmiller-apple since this is now targeted for 6.0, do you need to do another round of testing for it?

@alexmiller-apple

This comment has been minimized.

Copy link
Contributor

alexmiller-apple commented Oct 22, 2018

Correctness says you're good 👍

@panghy

This comment has been minimized.

Copy link
Contributor

panghy commented Oct 22, 2018

Thanks!

@etschannen etschannen merged commit 890fb3f into apple:release-6.0 Oct 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment