You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The tictac AAE process is deficient when compared with the "legacy" riak kv_index_hashtree approach to AAE, in that it repairs much slower.
The repairs are required to be slower as the cost of running the fetch_clocks comparison is relatively high in Tictac AAE, as they require a full scan of the vnode key store (albeit skipping any blocks that don't include relevant segments). The cost of fetch_clocks is proportionate to the number of segments in the results as defined by tictacaae_maxresults:
%% @doc Max number of leaf IDs per exchange
%% To control the length of time for each exchange, only a subset of the
%% conflicting leaves will be compared on each exchange. If there are issues
%% with query timeouts this may be halved. Large backlogs may be reduced
%% faster by doubling. There are 1M segments in a standard tree overall.
{mapping, "tictacaae_maxresults", "riak_kv.tictacaae_maxresults", [
{datatype, integer},
{default, 256},
hidden
]}.
On very large vnodes, 256 segments being checked on a fold can be time consuming, and the AAE process will back off if too much time is consumed - but this limits the number of repairs per exchange to be approximately 256, and then the back-off throttles the repair speed even more.
The proposed enhancement is to add two new workers per node:
riak_kv_tictacaae_monitor
riak_kv_readrepair_pool
Tictac AAE Monitor
Results from clock comparisons will now be sent to riak_kv_tictacaae_monitor rather than direct for read repair, along with the relevant n-val and partition information pertinent to the exchange. The monitor will prompt the read repairs as now, but will also generate a per-bucket and per-nval list of LastModifiedDates on those keys where a repair was required.
If there is a bucket with > 25% of the repairs - the a new fetch clock comparison should be prompted between the vnodes. This time the constrained should not be constrained by segment but by bucket and the last modified date range which covers at least 25% of the discovered repairs(i.e. using fetch_clocks_range).
Otherwise the nval list should be checked and for the nval with the most required repairs a last modified date range for a fetch_clocks_nval should be determined that covers at least 25% of the repairs.
Having discovered a potentially broken part of the store, the riak_kv_tictcaaae_monitor should now repeat a fetch_clocks_range/fetch_clocks_nval comparison for the discovered range, and prompt any discrepancies for read repair.
In general we expect AAE to be repairing either data loss from disk, or data loss due to downtime and lack of handoff (e.g. recovery from backup. For the former case, given each store is either ordered by time (bitcask, leveled journal), or by key (eleveldb, leveled ledger) - we would expect a per-bucket or time-range concentration. In the latter case it is likely that a recent modified date range has been impacted.
The aim would be to reduce the default tictacaae_maxresults to 64 (make it fast), and then limit results on the range comparison to a much larger value - perhaps change fetch_clocks to fetch only the first 8K keys.
Configuring the AAE monitor to do prompted comparisons should be controlled like this:
%% @doc Max number of keys for a prompted comparison
%% Following the discovery of a potential delta via tictac aae, the tictac aae
%% monitor should attempt to compare this many keys in the bucket in any
%% bucket or date range which has been shown by the aae process to have
%% a high density of required repairs
%% Set to 0 to disable prompted comparisons
{mapping, "tictacaae_maxresults_prompted", "riak_kv.tictacaae_maxresults_prompted", [
{datatype, integer},
{default, 8196},
hidden
]}.
Read Repair Pool
Read repair is not constrained in Riak, other than via the soft cap hard cap dice rolling -
With the availability of the flexible riak_core_worker_pool, it would be better to constrain read_repair instead by having a fixed size pool of workers that will run the GETs to prompt the read repairs. The queue_time and work_time associated with these read repairs can then be monitored as with other pools post Riak 3.0.9.
The worker_pool should also allow a limit on the size of the queue.
For configuration of the pool, the following is proposed:
%% @doc Pool Sizes - sizes for individual node_worker_pools
%% ...
{mapping, "repair_worker_pool_size", "riak_kv.repair_worker_pool_size", [
{datatype, integer},
{default, 4}
]}.
The text was updated successfully, but these errors were encountered:
The tictac AAE process is deficient when compared with the "legacy" riak kv_index_hashtree approach to AAE, in that it repairs much slower.
The repairs are required to be slower as the cost of running the fetch_clocks comparison is relatively high in Tictac AAE, as they require a full scan of the vnode key store (albeit skipping any blocks that don't include relevant segments). The cost of fetch_clocks is proportionate to the number of segments in the results as defined by
tictacaae_maxresults
:On very large vnodes, 256 segments being checked on a fold can be time consuming, and the AAE process will back off if too much time is consumed - but this limits the number of repairs per exchange to be approximately 256, and then the back-off throttles the repair speed even more.
The proposed enhancement is to add two new workers per node:
Tictac AAE Monitor
Results from clock comparisons will now be sent to
riak_kv_tictacaae_monitor
rather than direct for read repair, along with the relevant n-val and partition information pertinent to the exchange. The monitor will prompt the read repairs as now, but will also generate a per-bucket and per-nval list of LastModifiedDates on those keys where a repair was required.If there is a bucket with > 25% of the repairs - the a new fetch clock comparison should be prompted between the vnodes. This time the constrained should not be constrained by segment but by bucket and the last modified date range which covers at least 25% of the discovered repairs(i.e. using
fetch_clocks_range
).Otherwise the nval list should be checked and for the nval with the most required repairs a last modified date range for a
fetch_clocks_nval
should be determined that covers at least 25% of the repairs.Having discovered a potentially broken part of the store, the riak_kv_tictcaaae_monitor should now repeat a
fetch_clocks_range
/fetch_clocks_nval
comparison for the discovered range, and prompt any discrepancies for read repair.In general we expect AAE to be repairing either data loss from disk, or data loss due to downtime and lack of handoff (e.g. recovery from backup. For the former case, given each store is either ordered by time (bitcask, leveled journal), or by key (eleveldb, leveled ledger) - we would expect a per-bucket or time-range concentration. In the latter case it is likely that a recent modified date range has been impacted.
The aim would be to reduce the default tictacaae_maxresults to 64 (make it fast), and then limit results on the range comparison to a much larger value - perhaps change fetch_clocks to fetch only the first 8K keys.
Configuring the AAE monitor to do prompted comparisons should be controlled like this:
Read Repair Pool
Read repair is not constrained in Riak, other than via the soft cap hard cap dice rolling -
riak_kv/src/riak_kv_get_fsm.erl
Lines 730 to 777 in 75e142e
With the availability of the flexible riak_core_worker_pool, it would be better to constrain read_repair instead by having a fixed size pool of workers that will run the GETs to prompt the read repairs. The queue_time and work_time associated with these read repairs can then be monitored as with other pools post Riak 3.0.9.
The worker_pool should also allow a limit on the size of the queue.
For configuration of the pool, the following is proposed:
The text was updated successfully, but these errors were encountered: