Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AAE status subsystem + finalize AAE for Riak 1.3 release #456

Merged
merged 3 commits into from
Dec 20, 2012

Conversation

jtuple
Copy link
Contributor

@jtuple jtuple commented Dec 19, 2012

This pull-request adds code to track the status of AAE as well as provides a console command for use with riak-admin to print out the status.

This pull-request also fixes a bug with AAE, where tree build times were not persisted to disk, and therefore restarting a node would delay tree expiration. Build times are now persisted.

Finally, this pull request removes some extremely spammy log messages that were already flagged with TODOs in the source. Commenting out these messages seems like the best approach for user-experience for Riak 1.3. We can re-consider for Riak 1.4.

The approach used by the status subsystem (recomputing everything on each request for status) is rather inefficient and can easily be optimized to both cache certain values as well as update other aggregate information in real-time as status events occur. However, this is largely premature optimization for a part of code that is not critical. The current solution is straightforward and easily handles things like changing bucket N-values, changing ring ownership, etc without any special code needed (such as to flush cached values in a cached approach). Simply put, let's ship a working solution now, consider optimizing for Riak 1.4.

Add riak_kv_entropy_info module that provides an API to keep track of AAE
status such as tree build times and exchange statistics. The information
is stored in a public ETS table that is owned by the riak_kv_sup.

Update the AAE subsystem to call into the info module where appropriate.

Add aae_status/1 as a console command in riak_kv_console that pretty prints
the AAE information.
@jtuple
Copy link
Contributor Author

jtuple commented Dec 19, 2012

This code relies upon a fix to riak_core_format:human_time to provide correct times in the status output. That fix is done in the sibling pull-request: basho/riak_core#261

@jtuple
Copy link
Contributor Author

jtuple commented Dec 19, 2012

Note: When testing this for review, you'll likely want to adjust the AAE options in app.config. The defaults of building 1 tree per hour will take quite while to see AAE exchanges really take off.

@jtuple
Copy link
Contributor Author

jtuple commented Dec 19, 2012

Let me describe the status output.

The first output lists information about exchanges that occur for a given index.

================================== Exchanges ==================================
Index                                              Last (ago)    All (ago)   
-------------------------------------------------------------------------------
114179815416476790484662877555959610910619729920   10.1 min      8.0 hr       

The Last column shows when the last exchange for that index occurred. For example,
the last exchange occurred "10.1 min ago". The All shows how long it's been since
the partition has synced with all it's siblings. Ignoring multi-replica failures,
this essentially shows how recently all data on a given index was fully verified
to be correct and valid. This is likely the most important and interesting number
in all of AAE.

The second output section shows how long ago a given index's hash trees were built.
By default, Riak expires hash trees once a week to ensure the hash tree and on-disk
data stay in sync, as well as to detect bit rot and silent disk corruption.

================================ Entropy Trees ================================
Index                                              Built (ago)
-------------------------------------------------------------------------------
114179815416476790484662877555959610910619729920   12.3 hr

The section shows statistics about the number of keys repaired in any given
exchange, showing the number of repairs in the most recent exchange, as well
as the mean and max value ever seen since node restart.

================================ Keys Repaired ================================
Index                                                Last      Mean      Max   
-------------------------------------------------------------------------------
114179815416476790484662877555959610910619729920      0         8         10

@ghost ghost assigned jrwest Dec 19, 2012
@jrwest
Copy link
Contributor

jrwest commented Dec 19, 2012

+1

jtuple added a commit that referenced this pull request Dec 20, 2012
Add AAE status subsystem + finalize AAE for Riak 1.3 release
@jtuple jtuple merged commit 9ee034a into master Dec 20, 2012
@seancribbs seancribbs deleted the jdb-aae-info branch December 20, 2012 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants