Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out empty missing_revs results in mem3_rep #1795

Merged
merged 1 commit into from Dec 6, 2018

Conversation

Projects
None yet
2 participants
@nickva
Copy link
Contributor

nickva commented Dec 6, 2018

This avoids needlessly making fabric:update_docs calls with emtpy doc lists.

@davisp

This comment has been minimized.

Copy link
Member

davisp commented Dec 6, 2018

Seems like it'd be better to move that filter into find_missing_revs in case we ever use it somewhere else in the future.

@nickva nickva force-pushed the filter-out-responses-with-empty-revisions branch from 8681f78 to 4f50ffa Dec 6, 2018

@davisp
Copy link
Member

davisp left a comment

Should have thought of this earlier, but can you make that a lists:flatmap/2 call instead of a list comprehension. My worry is that the list comprehension might swallow a badmatch exception if that 3-uple ever changes shape and this would be a bad place to accidentally drop revisions that need replicating since we'd continue on our merry way checkpointing past the failure.

@nickva nickva force-pushed the filter-out-responses-with-empty-revisions branch from 4f50ffa to 11a676c Dec 6, 2018

Show resolved Hide resolved src/mem3/src/mem3_rep.erl Outdated
@davisp

davisp approved these changes Dec 6, 2018

Copy link
Member

davisp left a comment

+1

Up to you if you want to switch to function clauses or not.

@nickva nickva force-pushed the filter-out-responses-with-empty-revisions branch from 11a676c to bbdc94c Dec 6, 2018

Filter out empty missing_revs results in mem3_rep
This avoids needlessly making cross-cluster fabric:update_docs(Db, [], Opts)
calls.

@nickva nickva force-pushed the filter-out-responses-with-empty-revisions branch from bbdc94c to 7da67f4 Dec 6, 2018

@nickva nickva merged commit 25e9e77 into master Dec 6, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@nickva nickva deleted the filter-out-responses-with-empty-revisions branch Dec 6, 2018

@davisp

This comment has been minimized.

Copy link
Member

davisp commented Dec 7, 2018

For posterity, @nickva and I tested how this affects internal replication of a shard with 1M docs. Before the change internal replication took 364s vs 331s after this change. So a noticeable speedup in a fairly worst case scenario. Just thought I'd make a note since I was curious how much this might help.

Also for posterity, here's the script I used to reset and time replication for anyone that cares. (The db in question was a Q=1 with 1M smallish docs).

f(),
rr(mem3),
TimeStamp = fun() ->
    {_,_,Micro} = Now = os:timestamp(),
    {{Year,Month,Date},{Hour,Minute,Second}} = calendar:now_to_datetime(Now),
    Format = "~4.10.0B-~2.10.0B-~2.10.0BT~2.10.0B:~2.10.0B:~2.10.0B.~6.10.0BZ",
    iolist_to_binary(io_lib:format(Format, [Year, Month, Date, Hour, Minute, Second, Micro]))
end,
TimeReplication = fun(DbName) ->
    Source = hd(mem3:local_shards(DbName)),
    Target = hd([
        S || S <- mem3:shards(DbName),
        S#shard.name == Source#shard.name andalso S#shard.node /= node()
    ]),
    #shard{
        name = TgtName,
        node = TgtNode
    } = Target,
    couch_util:with_db(Source#shard.name, fun(Db) ->
        UUID = couch_db:get_uuid(Db),
        {LocalDocId, _} = mem3_rpc:load_checkpoint(TgtNode, TgtName, node(), UUID),
        NewEntry = [
            {<<"source_node">>, atom_to_binary(node(), utf8)},
            {<<"source_uuid">>, UUID},
            {<<"source_seq">>, 0},
            {<<"timestamp">>, TimeStamp()}
        ],
        NewBody = mem3_rpc:save_checkpoint(TgtNode, TgtName, LocalDocId, 0, NewEntry, {[]}),
        {ok, _} = couch_db:update_doc(Db, #doc{id = LocalDocId, body = NewBody}, [])
    end),
    Start = os:timestamp(),
    Resp = mem3_rep:go(Source, Target, [{batch_count, all}]),
    Finish = os:timestamp(),
    Diff = timer:now_diff(Finish, Start) / 1000000,
    {Diff, Resp}
end,
TimeReplication(<<"foo">>).

@wohali wohali referenced this pull request Feb 7, 2019

Merged

2.3.1 Release Proposal #1908

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.