Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out empty missing_revs results in mem3_rep #1795

Merged
merged 1 commit into from Dec 6, 2018

Conversation

@nickva
Copy link
Contributor

@nickva nickva commented Dec 6, 2018

This avoids needlessly making fabric:update_docs calls with emtpy doc lists.

@davisp
Copy link
Member

@davisp davisp commented Dec 6, 2018

Seems like it'd be better to move that filter into find_missing_revs in case we ever use it somewhere else in the future.

Loading

@nickva nickva force-pushed the filter-out-responses-with-empty-revisions branch from 8681f78 to 4f50ffa Dec 6, 2018
Copy link
Member

@davisp davisp left a comment

Should have thought of this earlier, but can you make that a lists:flatmap/2 call instead of a list comprehension. My worry is that the list comprehension might swallow a badmatch exception if that 3-uple ever changes shape and this would be a bad place to accidentally drop revisions that need replicating since we'd continue on our merry way checkpointing past the failure.

Loading

@nickva nickva force-pushed the filter-out-responses-with-empty-revisions branch from 4f50ffa to 11a676c Dec 6, 2018
src/mem3/src/mem3_rep.erl Outdated Show resolved Hide resolved
Loading
davisp
davisp approved these changes Dec 6, 2018
Copy link
Member

@davisp davisp left a comment

+1

Up to you if you want to switch to function clauses or not.

Loading

@nickva nickva force-pushed the filter-out-responses-with-empty-revisions branch from 11a676c to bbdc94c Dec 6, 2018
This avoids needlessly making cross-cluster fabric:update_docs(Db, [], Opts)
calls.
@nickva nickva force-pushed the filter-out-responses-with-empty-revisions branch from bbdc94c to 7da67f4 Dec 6, 2018
@nickva nickva merged commit 25e9e77 into master Dec 6, 2018
1 check passed
Loading
@nickva nickva deleted the filter-out-responses-with-empty-revisions branch Dec 6, 2018
@davisp
Copy link
Member

@davisp davisp commented Dec 7, 2018

For posterity, @nickva and I tested how this affects internal replication of a shard with 1M docs. Before the change internal replication took 364s vs 331s after this change. So a noticeable speedup in a fairly worst case scenario. Just thought I'd make a note since I was curious how much this might help.

Also for posterity, here's the script I used to reset and time replication for anyone that cares. (The db in question was a Q=1 with 1M smallish docs).

f(),
rr(mem3),
TimeStamp = fun() ->
    {_,_,Micro} = Now = os:timestamp(),
    {{Year,Month,Date},{Hour,Minute,Second}} = calendar:now_to_datetime(Now),
    Format = "~4.10.0B-~2.10.0B-~2.10.0BT~2.10.0B:~2.10.0B:~2.10.0B.~6.10.0BZ",
    iolist_to_binary(io_lib:format(Format, [Year, Month, Date, Hour, Minute, Second, Micro]))
end,
TimeReplication = fun(DbName) ->
    Source = hd(mem3:local_shards(DbName)),
    Target = hd([
        S || S <- mem3:shards(DbName),
        S#shard.name == Source#shard.name andalso S#shard.node /= node()
    ]),
    #shard{
        name = TgtName,
        node = TgtNode
    } = Target,
    couch_util:with_db(Source#shard.name, fun(Db) ->
        UUID = couch_db:get_uuid(Db),
        {LocalDocId, _} = mem3_rpc:load_checkpoint(TgtNode, TgtName, node(), UUID),
        NewEntry = [
            {<<"source_node">>, atom_to_binary(node(), utf8)},
            {<<"source_uuid">>, UUID},
            {<<"source_seq">>, 0},
            {<<"timestamp">>, TimeStamp()}
        ],
        NewBody = mem3_rpc:save_checkpoint(TgtNode, TgtName, LocalDocId, 0, NewEntry, {[]}),
        {ok, _} = couch_db:update_doc(Db, #doc{id = LocalDocId, body = NewBody}, [])
    end),
    Start = os:timestamp(),
    Resp = mem3_rep:go(Source, Target, [{batch_count, all}]),
    Finish = os:timestamp(),
    Diff = timer:now_diff(Finish, Start) / 1000000,
    {Diff, Resp}
end,
TimeReplication(<<"foo">>).

Loading

@wohali wohali mentioned this pull request Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants