Skip to content
This repository has been archived by the owner. It is now read-only.
Permalink
Browse files
Replace mem3_cache with mem3_shards
This change is to address the unbounded cache that existed in
mem3_cache. The module name has been renamed to mem3_shards and it now
contains an API for accessing shards in the cache so that hits can be
registered with the cache.

The cache itself is now a proper LRU with a bounded size. The other
behavior change is to let the cache warm after start instead of
preloading it with the entire contents of the shards database. This
means that shards will be inserted as they are read from disk which
introduces a period of cold cache when the node boots.

BugzId: 13414
  • Loading branch information
davisp committed Apr 23, 2012
1 parent e97f507 commit e6c0913d1a0630ad3f4770cc7a6e254460c8f362
Show file tree
Hide file tree
Showing 7 changed files with 348 additions and 170 deletions.
@@ -3,13 +3,13 @@
Mem3 is the node membership application for clustered [CouchDB][1]. It is used in [BigCouch][2] and tracks two very important things for the cluster:

1. member nodes
2. node/partition mappings for each database
2. node/shards mappings for each database

Both the nodes and partitions are tracked in node-local couch databases. Partitions are heavily used, so an ETS cache is also maintained for low-latency lookups. The nodes and partitions are synchronized via continuous CouchDB replication, which serves as 'gossip' in Dynamo parlance. The partitions ETS cache is kept in sync based on membership and database event listeners.
Both the nodes and shards are tracked in node-local couch databases. Shards are heavily used, so an ETS cache is also maintained for low-latency lookups. The nodes and shards are synchronized via continuous CouchDB replication, which serves as 'gossip' in Dynamo parlance. The shards ETS cache is kept in sync based on membership and database event listeners.

A very important point to make here is that BigCouch does not necessarily divide up each database into equal partitions across the nodes of a cluster. For instance, in a 20-node cluster, you may have the need to create a small database with very few documents. For efficiency reasons, you may create your database with Q=4 and keep the default of N=3. This means you only have 12 partitions total, so 8 nodes will hold none of the data for this database. Given this feature, we even partition use out across the cluster by altering the 'start' node for the database's partitions.
A very important point to make here is that BigCouch does not necessarily divide up each database into equal shards across the nodes of a cluster. For instance, in a 20-node cluster, you may have the need to create a small database with very few documents. For efficiency reasons, you may create your database with Q=4 and keep the default of N=3. This means you only have 12 shards total, so 8 nodes will hold none of the data for this database. Given this feature, we even shard use out across the cluster by altering the 'start' node for the database's shards.

Splitting and merging partitions is an immature feature of the system, and will require attention in the near-term. We believe we can implement both functions and perform them while the database remains online.
Splitting and merging shards is an immature feature of the system, and will require attention in the near-term. We believe we can implement both functions and perform them while the database remains online.

### Getting Started

@@ -3,9 +3,9 @@
{vsn, git},
{mod, {mem3_app, []}},
{registered, [
mem3_cache,
mem3_events,
mem3_nodes,
mem3_shards,
mem3_sync,
mem3_sup
]},
@@ -16,6 +16,7 @@

-export([start/0, stop/0, restart/0, nodes/0, node_info/2, shards/1, shards/2,
choose_shards/2, n/1, dbname/1, ushards/1]).
-export([get_shard/3, local_shards/1, fold_shards/2]).
-export([sync_security/0, sync_security/1]).
-export([compare_nodelists/0, compare_shards/1]).
-export([group_by_proximity/1]).
@@ -87,14 +88,7 @@ shards(DbName) ->
dbname = ShardDbName,
range = [0, 2 bsl 31]}];
_ ->
try ets:lookup(partitions, DbName) of
[] ->
mem3_util:load_shards_from_disk(DbName);
Else ->
Else
catch error:badarg ->
mem3_util:load_shards_from_disk(DbName)
end
mem3_shards:for_db(DbName)
end.

-spec shards(DbName::iodata(), DocId::binary()) -> [#shard{}].
@@ -103,23 +97,7 @@ shards(DbName, DocId) when is_list(DbName) ->
shards(DbName, DocId) when is_list(DocId) ->
shards(DbName, list_to_binary(DocId));
shards(DbName, DocId) ->
HashKey = mem3_util:hash(DocId),
Head = #shard{
name = '_',
node = '_',
dbname = DbName,
range = ['$1','$2'],
ref = '_'
},
Conditions = [{'=<', '$1', HashKey}, {'=<', HashKey, '$2'}],
try ets:select(partitions, [{Head, Conditions, ['$_']}]) of
[] ->
mem3_util:load_shards_from_disk(DbName, DocId);
Shards ->
Shards
catch error:badarg ->
mem3_util:load_shards_from_disk(DbName, DocId)
end.
mem3_shards:for_docid(DbName, DocId).

ushards(DbName) ->
{L,S,D} = group_by_proximity(live_shards(DbName)),
@@ -143,6 +121,15 @@ group_by_proximity(Shards) ->
{SameZone, DifferentZone} = lists:partition(Fun, Remote),
{Local, SameZone, DifferentZone}.

get_shard(DbName, Node, Range) ->
mem3_shards:get(DbName, Node, Range).

local_shards(DbName) ->
mem3_shards:local(DbName).

fold_shards(Fun, Acc) ->
mem3_shards:fold(Fun, Acc).

sync_security() ->
mem3_sync_security:go().

This file was deleted.

0 comments on commit e6c0913

Please sign in to comment.