New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for read repair before responding #5

Merged
merged 1 commit into from Aug 29, 2011

Conversation

Projects
None yet
2 participants
@kocolosk
Member

kocolosk commented Aug 12, 2011

Our philosophy up to this point has been to respond to the client as soon as possible and execute a repair asynchronously if one was warranted. This caused occasional problems where a client would receive a 409 Conflict response to an update request based on the Reply that we had just sent if the repair was too slow to execute.

This patch changes the behavior so that in most cases we delay responding to the client until the repair has completed. The only case where we still respond quickly and issue an asynchronous repair is if we achieved the read quorum and the only alternative replies that we received were ancestors of the quorum reply.

BugzID: 12003

@bdionne

This comment has been minimized.

Show comment
Hide comment
@bdionne

bdionne Aug 16, 2011

Contributor

I don't have a strong opinion about a server wide config versus per-request, as I don't have a good idea of how much users would want such a thing. I can go either way

Contributor

bdionne commented Aug 16, 2011

I don't have a strong opinion about a server wide config versus per-request, as I don't have a good idea of how much users would want such a thing. I can go either way

@kocolosk

This comment has been minimized.

Show comment
Hide comment
@kocolosk

kocolosk Aug 16, 2011

Member

I'm thinking consistency wins over latency when we've detected a problem. Let's block until the repair returns.

Member

kocolosk commented Aug 16, 2011

I'm thinking consistency wins over latency when we've detected a problem. Let's block until the repair returns.

@kocolosk

This comment has been minimized.

Show comment
Hide comment
@kocolosk

kocolosk Aug 17, 2011

Member

To clarify, I'd say we can remove the configurability altogether and just make it blocking. If users ask for asynchronous repairs we can reconsider, but we've seen the case where a subsequent write request wins the race with the repair job and fails with a conflict. It isn't pretty.

Member

kocolosk commented Aug 17, 2011

To clarify, I'd say we can remove the configurability altogether and just make it blocking. If users ask for asynchronous repairs we can reconsider, but we've seen the case where a subsequent write request wins the race with the repair job and fails with a conflict. It isn't pretty.

@bdionne

This comment has been minimized.

Show comment
Hide comment
@bdionne

bdionne Aug 17, 2011

Contributor

removed configurability completely

Contributor

bdionne commented Aug 17, 2011

removed configurability completely

@kocolosk

View changes

Show outdated Hide outdated src/fabric_doc_open.erl
@@ -32,25 +32,28 @@ go(DbName, Id, Options) ->
{ok, Reply} ->
format_reply(Reply, SuppressDeletedDoc);
{error, needs_repair, Reply} ->

This comment has been minimized.

@kocolosk

kocolosk Aug 23, 2011

Member

I guess there isn't really much point to this clause any more, is there? We're blocking on the repair every time, so the fact that we know what the answer will be doesn't help us. It's not harmful, but we could simplify the code a bit by not distinguishing between the two cases.

@kocolosk

kocolosk Aug 23, 2011

Member

I guess there isn't really much point to this clause any more, is there? We're blocking on the repair every time, so the fact that we know what the answer will be doesn't help us. It's not harmful, but we could simplify the code a bit by not distinguishing between the two cases.

This comment has been minimized.

@bdionne

bdionne Aug 23, 2011

Contributor

yes, this commit fixes that.

@bdionne

bdionne Aug 23, 2011

Contributor

yes, this commit fixes that.

@kocolosk

This comment has been minimized.

Show comment
Hide comment
@kocolosk

kocolosk Aug 24, 2011

Member

Actually, I take it back. I think there is one case where we should respond before the repair, and that's if

a) we achieved the quorum, and
b) a read executed after the repair will match the currently agreed-upon value (assuming no new writes)

For instance, in a situation where one of the copies is missing just the latest revision.

Member

kocolosk commented Aug 24, 2011

Actually, I take it back. I think there is one case where we should respond before the repair, and that's if

a) we achieved the quorum, and
b) a read executed after the repair will match the currently agreed-upon value (assuming no new writes)

For instance, in a situation where one of the copies is missing just the latest revision.

@kocolosk

This comment has been minimized.

Show comment
Hide comment
@kocolosk

kocolosk Aug 24, 2011

Member

I'm pretty happy with the current diff. Can I get a review?

Member

kocolosk commented Aug 24, 2011

I'm pretty happy with the current diff. Can I get a review?

@bdionne

This comment has been minimized.

Show comment
Hide comment
@bdionne

bdionne Aug 25, 2011

Contributor

Looks great to me.

Contributor

bdionne commented Aug 25, 2011

Looks great to me.

@kocolosk

This comment has been minimized.

Show comment
Hide comment
@kocolosk

kocolosk Aug 25, 2011

Member

I lied, I'm not happy with the patch yet. It doesn't do what I said it should do. It checks for quorum after folding the ancestor replies into their descendants.

Member

kocolosk commented Aug 25, 2011

I lied, I'm not happy with the patch yet. It doesn't do what I said it should do. It checks for quorum after folding the ancestor replies into their descendants.

@kocolosk

This comment has been minimized.

Show comment
Hide comment
@kocolosk

kocolosk Aug 25, 2011

Member

Ok, now I think it does what I said it did.

Member

kocolosk commented Aug 25, 2011

Ok, now I think it does what I said it did.

@kocolosk

This comment has been minimized.

Show comment
Hide comment
@kocolosk

kocolosk Aug 29, 2011

Member

@bdionne, does the last commit make sense to you?

Member

kocolosk commented Aug 29, 2011

@bdionne, does the last commit make sense to you?

@bdionne

This comment has been minimized.

Show comment
Hide comment
@bdionne

bdionne Aug 29, 2011

Contributor

LGTM

Contributor

bdionne commented Aug 29, 2011

LGTM

@kocolosk

This comment has been minimized.

Show comment
Hide comment
@kocolosk

kocolosk Aug 29, 2011

Member

Force-pushed a cleanup here, so the history's gonna look wonky. Still trying to figure out the best workflow for rewriting history on topic branches.

Member

kocolosk commented Aug 29, 2011

Force-pushed a cleanup here, so the history's gonna look wonky. Still trying to figure out the best workflow for rewriting history on topic branches.

Wait for read repair before responding
Our philosophy up to this point has been to respond to the client as
soon as possible and execute a repair asynchronously if one was
warranted. This caused occasional problems where a client would receive
a 409 Conflict response to an update request based on the Reply that we
had just sent if the repair was too slow to execute.

This patch changes the behavior so that in most cases we delay
responding to the client until the repair has completed. The only case
where we still respond quickly and issue an asynchronous repair is if we
achieved the read quorum and the only alternative replies that we
received were ancestors of the quorum reply.

Thanks Bob Dionne for various implementations and review.

BugzID: 12003

@kocolosk kocolosk merged commit 42a8c84 into master Aug 29, 2011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment