Ensure all code paths for creating databases respect db creation options #3631

chewbranca · 2021-06-16T22:51:35Z

When partitioned queries were introduced, new database creation options were added for the ?partitioned=true boolean along with the hash function used, and these options are stored in the dbs db doc for the relevant database. The underlying bug here is that there are a number of different code paths that result in creation of database shards, and not all of them create the shards with the appropriate database options. We fixed a few of these issues in [1], but I've stumbled upon some more scenarios where we encounter these failures. These currently manifest on partitioned databases having shards created after initial database creation and are incorrectly created without the partitioned flag.

If individual shards of a partitioned database are incorrectly created as non-partitioned, for the most part things just "work", making this an issue that hides in plain site. One of the things that does not work, is that design documents with partitioned query views have a partitioned boolean metadata value, and there's additional validation logic to prevent partitioned ddocs from being written to unpartitioned database shards. So in the event you get a shard replica incorrectly created as unpartitioned, and you have a ddoc that maps to that shard, the ddoc will be unable to write to that shard replica.

In the event two of the three replicas are created correctly as partitioned, and only one of the shards is incorrectly created as unpartitioned, and there's a partitioned=true ddoc on that shard range, then the ddoc will fail to write to the unpartitioned shard, which will then trigger read_repair anytime the ddoc is accessed through the quorum system, however, the read_repair logic doesn't not enforce W=N write semantics, so having two out of three shard replicas properly created as partitioned will result in false positive read_repair success, never triggering the failure case where we log the issue. As a result, there will be a false positive successful read_repair operation every time the ddoc is accessed.

I've tried to be a bit more meticulous in terms of auditing this issue and trying to fix it once and for all. I think there are only two ways of creating database shards: 1) Calling couch_server:open with the option create_if_missing=true, and 2) by directly calling couch_server:create. If anyone can think of any code paths I'm skipping, let me know.

Here's the scenarios where we set create_if_missing:

(! 22221)-> grep -r create_if_missing src/
src//mem3/src/mem3_shards.erl:                [create_if_missing(mem3:name(S), mem3:engine(S)) || S
src//mem3/src/mem3_shards.erl:create_if_missing(Name, Options) ->
src//mem3/src/mem3_util.erl:    Options = [nologifmissing, sys_db, {create_if_missing, true}, ?ADMIN_CTX],
src//mem3/src/mem3_util.erl:                Options1 = [{create_if_missing, true} | Options],
src//couch/test/eunit/couch_db_tests.erl:        {ok, Db} = couch_db:open(DbName, [{create_if_missing, true}]),
src//couch/test/eunit/couch_db_tests.erl:        {ok, Db} = couch_db:open(DbName, [{create_if_missing, true}]),
src//couch/test/eunit/couch_db_tests.erl:        {ok, Db1} = couch_db:open(DbName, [{create_if_missing, true}]),
src//couch/src/couch_server.erl:        Create = couch_util:get_value(create_if_missing, Options, false),
src//fabric/src/fabric_util.erl:    get_shard(Live, [{create_if_missing, true} | Options], 100, Factor).

And here's the scenarios where we call couch_server:create directly:

(! 22220)-> grep -r couch_server:create src/
src//mem3/test/eunit/mem3_seeds_test.erl:    {ok, _} = couch_server:create(<<"_users">>, []),
src//mem3/src/mem3_shards.erl:            case couch_server:create(Name, [?ADMIN_CTX] ++ Options) of
src//couch/test/eunit/couch_server_tests.erl:    Resp = couch_server:create(?tempdb(), [{engine, <<"cowabunga!">>}]),
src//couch/src/couch_httpd_db.erl:    case couch_server:create(DbName, [{user_ctx, UserCtx}] ++ Engine) of
src//couch/src/couch_server.erl:            couch_server:create(DbName, Options);
src//couch/src/couch_db.erl:    couch_server:create(DbName, Options).
src//dreyfus/src/dreyfus_rpc.erl:        couch_server:create(DbName, Options);
src//fabric/src/fabric_rpc.erl:    rexi:reply(case couch_server:create(DbName, Options) of

Let's audit these to see which are problematic and need to be fixed, starting with the create_if_missing invocations:

src//mem3/src/mem3_shards.erl: [create_if_missing(mem3:name(S), mem3:engine(S)) || S
src//mem3/src/mem3_shards.erl:create_if_missing(Name, Options) ->

These two calls [2][3] are a function in mem3_shards with the same name as the option we're interested in, which is why they showed up with grep, but basically the call in [2] to the function defined in [3] ends up calling couch_server:create so we've got an overlap between the two greps here, and we'll cover the case of [4]:

src//mem3/src/mem3_shards.erl: case couch_server:create(Name, [?ADMIN_CTX] ++ Options) of

here too. You'll notice that when the function in [4] calls couch_server:create it passes in the Options passed in from [2] which is just mem:engine(S) supplying the engine options. You'll notice nothing here checks to see if the database should be created, and therefore this is one of the problematic shard creation code paths that'll we need to fix.

Next up is:

src//mem3/src/mem3_util.erl: Options = [nologifmissing, sys_db, {create_if_missing, true}, ?ADMIN_CTX]

from [5]. This line of code is from mem3_util:ensure_exists [6] which is only used to the creation of sys_dbs, so I think we're ok to leave this as is, as we don't make partitioned system dbs.

src//mem3/src/mem3_util.erl: Options1 = [{create_if_missing, true} | Options],

This is from mem3_util:get_or_create_db from [7], and this is the improved db creation function we updated in [1], and this properly sets the db properties by way of [8], so we're good to go here.

The following three cases are for isolated tests:

src//couch/test/eunit/couch_db_tests.erl: {ok, Db} = couch_db:open(DbName, [{create_if_missing, true}]),
src//couch/test/eunit/couch_db_tests.erl: {ok, Db} = couch_db:open(DbName, [{create_if_missing, true}]),
src//couch/test/eunit/couch_db_tests.erl: {ok, Db1} = couch_db:open(DbName, [{create_if_missing, true}]),

src//couch/src/couch_server.erl: Create = couch_util:get_value(create_if_missing, Options, false),

This is the core mechanism couch_server uses to determine if it should create the database when it's missing [9].

src//fabric/src/fabric_util.erl: get_shard(Live, [{create_if_missing, true} | Options], 100, Factor).

This is another problematic invocation in [10], which is called as part of fabric_util:get_db [11]. You can clearly see in the fabric_util:get_db/1 (arity one case) that we're supplying an empty list of options which will never have the partition options [12]. We never actually use the one arity function head, as you can see from the following grep:

(! 22224)-> grep -r fabric_util:get_db src/
src//fabric/src/fabric.erl:    {ok, Db} = fabric_util:get_db(dbname(DbName), [?ADMIN_CTX]),
src//fabric/src/fabric.erl:    {ok, Db} = fabric_util:get_db(dbname(DbName), [?ADMIN_CTX]),
src//fabric/src/fabric.erl:    {ok, Db} = fabric_util:get_db(dbname(DbName), opts(Options)),

Those three invocations come from fabric:get_revs_limit/1 [13], fabric:get_purge_infos_limit/1 [14], and the two arity version of fabric:get_security/2 [15], although do note that the one arity fabric:get_security/1 just calls the two arity with an empty options list. You can see in the following grep that we never actually supply database creation options to fabric:get_security and only use the options for setting a user_ctx, if at all:

(! 22227)-> grep -r fabric:get_security src/
src//mem3/test/eunit/mem3_reshard_test.erl:    with_proc(fun() -> fabric:get_security(DbName, [?ADMIN_CTX]) end).
src//chttpd/src/chttpd_auth_request.erl:    {_} = fabric:get_security(DbName, [{user_ctx, Ctx}]),
src//chttpd/src/chttpd_auth_request.erl:    Sec = fabric:get_security(DbName, [{user_ctx, Ctx}]),
src//chttpd/src/chttpd_external.erl:            fabric:get_security(Db);
src//chttpd/src/chttpd_db.erl:    fabric:get_security(DbName, [{user_ctx, Ctx}]),
src//chttpd/src/chttpd_db.erl:    send_json(Req, fabric:get_security(Db));
src//fabric/src/fabric_util.erl:    {SecProps} = fabric:get_security(DbName), % as admin

As such all three of these cases are broken, but they all use the same db creation invocation in [10], so we can fix the problem there.

Next up are the direct invocations of couch_server:create:

src//mem3/test/eunit/mem3_seeds_test.erl: {ok, _} = couch_server:create(<<"_users">>, []),

This is an isolated test.

src//mem3/src/mem3_shards.erl: case couch_server:create(Name, [?ADMIN_CTX] ++ Options) of

This was covered above.

src//couch/test/eunit/couch_server_tests.erl: Resp = couch_server:create(?tempdb(), [{engine, <<"cowabunga!">>}]),

This is for an isolated test.

src//couch/src/couch_httpd_db.erl: case couch_server:create(DbName, [{user_ctx, UserCtx}] ++ Engine) of

This interesting case comes from [16], and because it doesn't include db creation options, it means that you can't actually create partitioned dbs against the 5986 endpoint.

src//couch/src/couch_server.erl: couch_server:create(DbName, Options);

This is the core couch_server logic for creating databases that have had create_if_missing passed through [17].

src//couch/src/couch_db.erl: couch_server:create(DbName, Options).

This is an alias function couch_db:create/2 --> couch_server:create/2 . This opens up more code paths for invoking create without the appropriate options. For the most part, it seems this function is only used in test modules, so the following grep shows cases that don't involve the word "test":

(! 22233)-> grep -r couch_db:create src/ | grep -v test
src//couch/src/couch_auth_cache.erl:        {ok, Db} = couch_db:create(users_db(), Options),
src//couch_replicator/src/couch_replicator_docs.erl:    {ok, Db} = couch_db:create(DbName, [?ADMIN_CTX]),

The first case is for creating an internal users database, and the second case is part of the inline eunit tests of that module, so I don't think either of these scenarios or this code path are problematic at the moment.

src//dreyfus/src/dreyfus_rpc.erl: couch_server:create(DbName, Options);

This is from dreyfus_rpc:get_or_create_database [19], which is only invoked by {ok, Db} = get_or_create_db(DbName, []), in [20], and you can clearly see that it's supplying an empty list for the db creation options, which is not appropriate.

On a related note, it's not in ASF CouchDB, but the Hastings [21] library uses a similar structure to Dreyfus and is susceptible to the same get_or_create_database code [22]. I've included it here as a reminder it needs to be fixed too.

And for our final code path we have:

src//fabric/src/fabric_rpc.erl: rexi:reply(case couch_server:create(DbName, Options) of

which is in the fabric_rpc:create_db/2 function that is only used by way of fabric:create_db and fabric_db_create:go, and this is the main code path for creating databases (partitioned or not), and I believe this is working as expected.

How to find problematic shards

I've created a little remsh snippet that can be invoked on any node in the cluster and it will audit all dbs and shard replicas for issues, returning the problematic shards. I think it would be worthwhile to add this to mem3_util or some such.

rr(mem3_shards), f(FoldPartitionedShards), FoldPartitionedShards = fun() ->                                                                                                              Dbnames = lists:filter(fun fabric_util:is_partitioned/1, element(2, fabric:all_dbs())),                                                                                          
    Shards = lists:flatten([mem3:shards(Dbname) || Dbname <- Dbnames]),                                                                                                              
    lists:foldl(fun(Shard, Acc) ->                                                                                                                                                   
        case rpc:call(Shard#shard.node, erlang, apply, [fun() ->                                                                                                                     
                    {ok, Db} = couch_db:open_int(Shard#shard.name, []),                                                                                                              
                    couch_db:is_partitioned(Db)                                                                                                                                      
                end, []]) of                                                                                                                                                         
            true ->                                                                                                                                                                  
                Acc;                                                                                                                                                                 
            false ->                                                                                                                                                                 
                [{Shard#shard.node, Shard#shard.name} | Acc]                                                                                                                         
        end                                                                                                                                                                          
    end, [], Shards)                                                                                                                                                                 
end.

Problematic shard creation code paths to fix

This leaves us with the following bugs to fix:

couchdb/src/mem3/src/mem3_shards.erl

Line 361 in 8ac1978

[create_if_missing(mem3:name(S), mem3:engine(S)) || S
couchdb/src/fabric/src/fabric_util.erl

Line 108 in 8ac1978

get_shard(Live, [{create_if_missing, true} | Options], 100, Factor).
couchdb/src/couch/src/couch_httpd_db.erl

Line 235 in 8ac1978

case couch_server:create(DbName, [{user_ctx, UserCtx}] ++ Engine) of
couchdb/src/dreyfus/src/dreyfus_rpc.erl

Line 108 in 8ac1978

couch_server:create(DbName, Options);
https://github.com/cloudant-labs/hastings/blob/master/src/hastings_rpc.erl#L103-L111

[1] #2690
[2]

couchdb/src/mem3/src/mem3_shards.erl

Line 361 in 8ac1978

[create_if_missing(mem3:name(S), mem3:engine(S)) || S

[3]

couchdb/src/mem3/src/mem3_shards.erl

Lines 411 to 423 in 8ac1978

    
           create_if_missing(Name, Options) -> 
        
               case couch_server:exists(Name) of 
        
                   true -> 
        
                       ok; 
        
                   false -> 
        
                       case couch_server:create(Name, [?ADMIN_CTX] ++ Options) of 
        
                       {ok, Db} -> 
        
                           couch_db:close(Db); 
        
                       Error -> 
        
                           couch_log:error("~p tried to create ~s, got ~p", 
        
                               [?MODULE, Name, Error]) 
        
                       end 
        
               end.

[4]

couchdb/src/mem3/src/mem3_shards.erl

Line 416 in 8ac1978

case couch_server:create(Name, [?ADMIN_CTX] ++ Options) of

[5]

couchdb/src/mem3/src/mem3_util.erl

Line 268 in 8ac1978

Options = [nologifmissing, sys_db, {create_if_missing, true}, ?ADMIN_CTX],

[6]

couchdb/src/mem3/src/mem3_util.erl

Lines 265 to 274 in 8ac1978

    
           ensure_exists(DbName) when is_list(DbName) -> 
        
               ensure_exists(list_to_binary(DbName)); 
        
           ensure_exists(DbName) -> 
        
               Options = [nologifmissing, sys_db, {create_if_missing, true}, ?ADMIN_CTX], 
        
               case couch_db:open(DbName, Options) of 
        
               {ok, Db} -> 
        
                   {ok, Db}; 
        
               file_exists -> 
        
                   couch_db:open(DbName, [sys_db, ?ADMIN_CTX]) 
        
               end.

[7]

couchdb/src/mem3/src/mem3_util.erl

Lines 511 to 529 in 8ac1978

    
           get_or_create_db(DbName, Options) -> 
        
               case couch_db:open_int(DbName, Options) of 
        
                   {ok, _} = OkDb -> 
        
                       OkDb; 
        
                   {not_found, no_db_file} -> 
        
                       try 
        
                           DbOpts = case mem3:dbname(DbName) of 
        
                               DbName  -> []; 
        
                               MDbName -> mem3_shards:opts_for_db(MDbName) 
        
                           end, 
        
                           Options1 = [{create_if_missing, true} | Options], 
        
                           Options2 = merge_opts(DbOpts, Options1), 
        
                           couch_db:open_int(DbName, Options2) 
        
                       catch error:database_does_not_exist -> 
        
                           throw({error, missing_target}) 
        
                       end; 
        
                   Else -> 
        
                       Else 
        
               end.

[8]

couchdb/src/mem3/src/mem3_util.erl

Line 519 in 8ac1978

MDbName -> mem3_shards:opts_for_db(MDbName)

[9]

couchdb/src/couch/src/couch_server.erl

Line 108 in 8ac1978

Create = couch_util:get_value(create_if_missing, Options, false),

[10]

couchdb/src/fabric/src/fabric_util.erl

Line 108 in 8ac1978

get_shard(Live, [{create_if_missing, true} | Options], 100, Factor).

[11]

couchdb/src/fabric/src/fabric_util.erl

Lines 96 to 132 in 8ac1978

    
           get_db(DbName) -> 
        
               get_db(DbName, []). 
        
           get_db(DbName, Options) -> 
        
               {Local, SameZone, DifferentZone} = mem3:group_by_proximity(mem3:shards(DbName)), 
        
               % Prefer shards on the same node over other nodes, prefer shards in the same zone over 
        
               % over zones and sort each remote list by name so that we don't repeatedly try the same node. 
        
               Shards = Local ++ lists:keysort(#shard.name, SameZone) ++ lists:keysort(#shard.name, DifferentZone), 
        
               % suppress shards from down nodes 
        
               Nodes = [node()|erlang:nodes()], 
        
               Live = [S || #shard{node = N} = S <- Shards, lists:member(N, Nodes)], 
        
               Factor = list_to_integer(config:get("fabric", "shard_timeout_factor", "2")), 
        
               get_shard(Live, [{create_if_missing, true} | Options], 100, Factor). 
        
           get_shard([], _Opts, _Timeout, _Factor) -> 
        
               erlang:error({internal_server_error, "No DB shards could be opened."}); 
        
           get_shard([#shard{node = Node, name = Name} | Rest], Opts, Timeout, Factor) -> 
        
               Mon = rexi_monitor:start([rexi_utils:server_pid(Node)]), 
        
               MFA = {fabric_rpc, open_shard, [Name, [{timeout, Timeout} | Opts]]}, 
        
               Ref = rexi:cast(Node, self(), MFA, [sync]), 
        
               try 
        
                   receive {Ref, {ok, Db}} -> 
        
                       {ok, Db}; 
        
                   {Ref, {'rexi_EXIT', {{unauthorized, _} = Error, _}}} -> 
        
                       throw(Error); 
        
                   {Ref, {'rexi_EXIT', {{forbidden, _} = Error, _}}} -> 
        
                       throw(Error); 
        
                   {Ref, Reason} -> 
        
                       couch_log:debug("Failed to open shard ~p because: ~p", [Name, Reason]), 
        
                       get_shard(Rest, Opts, Timeout, Factor) 
        
                   after Timeout -> 
        
                       couch_log:debug("Failed to open shard ~p after: ~p", [Name, Timeout]), 
        
                       get_shard(Rest, Opts, Factor * Timeout, Factor) 
        
                   end 
        
               after 
        
                   rexi_monitor:stop(Mon) 
        
               end.

[12]

couchdb/src/fabric/src/fabric_util.erl

Lines 96 to 97 in 8ac1978

    
           get_db(DbName) -> 
        
               get_db(DbName, []).

[13]

couchdb/src/fabric/src/fabric.erl

Line 152 in 8ac1978

{ok, Db} = fabric_util:get_db(dbname(DbName), [?ADMIN_CTX]),

[14]

couchdb/src/fabric/src/fabric.erl

Lines 173 to 175 in 8ac1978

    
           get_purge_infos_limit(DbName) -> 
        
               {ok, Db} = fabric_util:get_db(dbname(DbName), [?ADMIN_CTX]), 
        
               try couch_db:get_purge_infos_limit(Db) after catch couch_db:close(Db) end.

[15]

couchdb/src/fabric/src/fabric.erl

Lines 177 to 184 in 8ac1978

    
           get_security(DbName) -> 
        
               get_security(DbName, [?ADMIN_CTX]). 
        
           %% @doc retrieve the security object for a database 
        
           -spec get_security(dbname()) -> json_obj() | no_return(). 
        
           get_security(DbName, Options) -> 
        
               {ok, Db} = fabric_util:get_db(dbname(DbName), opts(Options)), 
        
               try couch_db:get_security(Db) after catch couch_db:close(Db) end.

[16]

couchdb/src/couch/src/couch_httpd_db.erl

Line 235 in 8ac1978

case couch_server:create(DbName, [{user_ctx, UserCtx}] ++ Engine) of

[17]

couchdb/src/couch/src/couch_server.erl

Line 115 in 8ac1978

couch_server:create(DbName, Options);

[18]

couchdb/src/couch/src/couch_db.erl

Lines 150 to 151 in 8ac1978

    
           create(DbName, Options) -> 
        
               couch_server:create(DbName, Options).

[19]

couchdb/src/dreyfus/src/dreyfus_rpc.erl

Lines 104 to 111 in 8ac1978

    
           get_or_create_db(DbName, Options) -> 
        
               case couch_db:open_int(DbName, Options) of 
        
               {not_found, no_db_file} -> 
        
                   couch_log:warning("~p creating ~s", [?MODULE, DbName]), 
        
                   couch_server:create(DbName, Options); 
        
               Else -> 
        
                   Else 
        
               end.

[20]

couchdb/src/dreyfus/src/dreyfus_rpc.erl

Line 43 in 8ac1978

{ok, Db} = get_or_create_db(DbName, []),

[21] https://github.com/cloudant-labs/hastings
[22] https://github.com/cloudant-labs/hastings/blob/master/src/hastings_rpc.erl#L103-L111
[23]

couchdb/src/fabric/src/fabric_rpc.erl

Lines 164 to 173 in 8ac1978

    
           create_db(DbName) -> 
        
               create_db(DbName, []). 
        
           create_db(DbName, Options) -> 
        
               rexi:reply(case couch_server:create(DbName, Options) of 
        
               {ok, _} -> 
        
                   ok; 
        
               Error -> 
        
                   Error 
        
               end).

The text was updated successfully, but these errors were encountered:

chewbranca · 2021-06-17T19:00:32Z

Alright, I've opened up a CouchDB PR that fixes the mem3_shards, fabric_util, and dreyfus_rpc issues in #3633.

I've got a related PR for the Hastings issue in: cloudant-labs/hastings#31.

I've left the 5986 creation of PQ shards issue as is, feedback welcome on what we should do there (if anything).

janl · 2021-06-21T09:51:56Z

@chewbranca exceptional sleuthing, thanks for putting all this together.

chewbranca · 2021-06-24T20:28:53Z

I managed to come up with a sneaky test case to induce one of these failures and confirmed it fails in CI: #3633 (comment)

chewbranca · 2021-07-08T20:56:19Z

Alright, I've merged #3633, closing this out.

chewbranca added bug needs-triage labels Jun 16, 2021

chewbranca self-assigned this Jun 16, 2021

This was referenced Jun 17, 2021

Fix hastings_rpc shard creation logic cloudant-labs/hastings#31

Merged

3631 3.x fix pq shard creation #3633

Merged

chewbranca closed this as completed Jul 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure all code paths for creating databases respect db creation options #3631

Ensure all code paths for creating databases respect db creation options #3631

chewbranca commented Jun 16, 2021

chewbranca commented Jun 17, 2021

janl commented Jun 21, 2021

chewbranca commented Jun 24, 2021

chewbranca commented Jul 8, 2021

Ensure all code paths for creating databases respect db creation options #3631

Ensure all code paths for creating databases respect db creation options #3631

Comments

chewbranca commented Jun 16, 2021

How to find problematic shards

Problematic shard creation code paths to fix

chewbranca commented Jun 17, 2021

janl commented Jun 21, 2021

chewbranca commented Jun 24, 2021

chewbranca commented Jul 8, 2021