Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure replicator _active_tasks entry reports recent pending changes #623

Merged
merged 1 commit into from Jun 28, 2017

Conversation

nickva
Copy link
Contributor

@nickva nickva commented Jun 28, 2017

Previously there was a race between reporting the source update sequence
between the the workers and the changes readers. Each one used separate
incrementing timestamp sequences.

In some cases that lead to pending changes being stuck. For example, if changes
reader reported the highest sequence with timestamp 10000, then later workers
reported it with sequences 5000, 5001, 5002, then all those reports would be
ignored and users would see an always lagging pending changes value reported
with timestamp 1000.

The fix is to thread the last_sequence update through the changes queue to
the changes manager, so only its timestamp sequence will be used. This removes
the race condition.

Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. I definitely like that use of lists:partition/2. Though I think for sanity we should look at routing the last_seq to the worker so that we're changing this logic as little as possible.

ok;
maybe_report_last_sequence(Parent, LastSeqs, Ts) ->
{last_seq, Seq} = lists:last(LastSeqs),
Msg = {report_seq_done, {Ts, Seq}, couch_replicator_stats:new()},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work if we don't call {report_Seq, {Ts, Seq}} first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be handled by this case:

    {NewThroughSeq0, NewSeqsInProgress} = case SeqsInProgress of
    [] ->
        {Seq, []};

From https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_scheduler_job.erl#L231

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'll switch the order. I think it would make more sense.



maybe_process_changes(_Parent, [] = _Changes, _From, _Ts) ->
ok;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this will dead lock since you're sometimes not sending a message back to the From client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right!

@nickva nickva force-pushed the fix-replicator-progress-reporting-2 branch from 254ce3d to bc1ca8a Compare June 28, 2017 21:17
Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still seems like it'd be easier to find the ReportSeq from the list, strip out last_seq entries, and then in the ignored empty clause just send a report_seq_done message than your two extra maybe functions.



maybe_report_in_progress(_Parent, [] = _Changes, _From, _Ts) ->
nil;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReportSeq could end up being nil here, will that not break a worker? Or are you relying on the empty clause match?

Copy link
Contributor Author

@nickva nickva Jun 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It relies on {changes, ChangesManager, [], nil} -> clause match in the worker

@nickva nickva force-pushed the fix-replicator-progress-reporting-2 branch 2 times, most recently from b1f2307 to 780baef Compare June 28, 2017 23:07
Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style issue and need to remove that maybe function but pretty close otherwise.

fun(#doc_info{}) ->
true;
({last_seq, _Seq}) ->
false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation is wonky

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, will fix

({last_seq, _Seq}) ->
false
end, ChangesOrLastSeqs),
maybe_report_in_progress(Parent, Changes, From, Ts),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no maybe here any more. You need to just hoist gen_server:cast(Parent, {report_Seq, ReportSeq}) out now.

@nickva nickva force-pushed the fix-replicator-progress-reporting-2 branch 2 times, most recently from 52b84e9 to f3dfa73 Compare June 28, 2017 23:24
…value

Previously there was a race between reporting the source update  sequence
between the the workers and the changes readers. Each one used separate
incrementing timestamp sequences.

In some cases that lead to pending changes being stuck. For example, if changes
reader reported the highest sequence with timestamp 10000, then later workers
reported it with sequences 5000, 5001, 5002, then all those reports would be
ignored and users would see an always lagging pending changes value reported
with timestamp 1000.

The fix is to thread the last_sequence update through the changes queue to
the changes manager, so only its timestamp sequence will be used. This removes
the race condition.
@nickva nickva force-pushed the fix-replicator-progress-reporting-2 branch from f3dfa73 to 719436a Compare June 28, 2017 23:26
Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@nickva nickva merged commit 571a2fc into apache:master Jun 28, 2017
@nickva nickva deleted the fix-replicator-progress-reporting-2 branch June 28, 2017 23:33
lag-linaro pushed a commit to lag-linaro/couchdb that referenced this pull request Oct 25, 2018
nickva pushed a commit to nickva/couchdb that referenced this pull request Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants