Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Make couch_multidb_changes shard map aware
couch_multidb_changes module is monitoring shards whose names match a particular suffix and notifies users with found, updated and deleted events. This is the module which drives replicator jobs when `*/_replicator` database are updated. Previously, couch_multidb_changes reacted only to node-local shard file events and was not aware of the shard map membership of those files. This discrepancy was mostly evident during shard moves: the target shard could be created long before the shard file becomes a part of the shard map. The replicator could notice the new target shard file and spawn a replication job on the new node, but keep the same replication job running on the source node. The two replication jobs will eventually conflict in the PG system (https://www.erlang.org/doc/man/pg.html) and one of them would start crashing with a "duplicate job" error. This could last days depending on how long it would take to populate the data on the target. Even after recovery, the target shard could be backed-off up to another extra 8 hours until it may run again. To avoid issues like that, make couch_multidb_changes aware of shard map membership updates. When the a shard file is discovered, and it is not in the shard map, mark it with a `wait_shard_map = true` flag. Then, re-use the existing db event monitoring mechanism to notice when shards db itself is updated, and schedule a delayed membership check for the shards tracked in our ETS table. Other changes to the module are mostly cosmetic: * In the ETS table use a proper `#row{}` record since we now have 5 items in the tuple. This simplifies some of the existing code as well. * During deletion and creation, actually delete the entries from the ETS table. Previously we didn't do it so the would hang around forever until the node was restarted. * Add comments to a few tricky sections explaining what should be happening there. * Add more tests, both the old and new functionality. Increase coverage from 96% to 98%.
- Loading branch information