Permalink
Browse files

Fix a critical bug in the META cache.

A custom comparator is needed to maintain a local cache of META.
A META entry is essentially made of 3 comma-separated fields:
  [table],[start-key],[start-code]
The comparator needs to essentially do 3 memcmp, to first sort
by table name, then by start-key, and finally by start-code for
breaking ties.

The comparator had a bug such that if the key being looked up contained
a comma, and if the key being looked up was a prefix of a start-key,
and if the byte following the comma was less than '1', then the META
cache would incorrectly sort the key being looked up and find the wrong
region.

Here's an example:
  Key being looked up: "foo,\002"
  Region hosting that key: "table,foo,1234567890"
  META cache lookup key:   "table,foo,\002,:"
Then in this case the code was incorrectly comparing '1' with 0x02,
and because 0x02 < '1' it would incorrectly sort the search key to
be before that region.

The flaw in the logic came from the fact that the code was trying to
compare both the start keys and, if necessary, the start codes in the
same loop.  But we need to make sure that we actually do the equivalent
of 3 memcmp, so that we can check for possible prefix-only matches for
each of the 3 fields (table name, start key, start code).

This bug doesn't happen frequently due to the 3 conditions that are
required to trigger it, but when it does get triggered, its visible
consequence would be that some RPCs are failing without even being
sent because the client retries to do META lookups without finding
the right region to send the RPC to.  In particular note that the RPC
does not get sent to the wire at all.  The RPC would simply quickly
fail with a "NonRecoverableException: Too many attempts".

This closes #27.

Change-Id: I4994cb3535181397e185d5a8d8a2fa5beb16bff9
  • Loading branch information...
1 parent 5b033d9 commit 07065fdfd044c13aa85a64c7bb4270ed52a92193 @tsuna tsuna committed Jun 16, 2012
Showing with 26 additions and 18 deletions.
  1. +22 −18 src/RegionInfo.java
  2. +4 −0 test/TestMETALookup.java
View
@@ -275,28 +275,32 @@ public int compare(final byte[] a, final byte[] b) {
// If either `a' or `b' is followed immediately by another comma, then
// they are the first region (it's the empty start key).
i++; // No need to check against `length', there MUST be more bytes.
- if (a_comma != b_comma) {
- if (a_comma == i) {
- return -1002; // a < b because a is the start key, b is not.
- } else if (b_comma == i) {
- return 1002; // a > b because b is the start key, a is not.
+
+ // Compare keys.
+ final int first_comma = Math.min(a_comma, b_comma);
+ for (/*nothing*/; i < first_comma; i++) {
+ final byte ai = a[i];
+ final byte bi = b[i];
+ if (ai != bi) { // The keys differ.
+ return (ai & 0xFF) - (bi & 0xFF); // "promote" to unsigned.
}
}
+ if (a_comma < b_comma) {
+ return -1002; // `a' has a shorter key. a < b
+ } else if (b_comma < a_comma) {
+ return 1002; // `b' has a shorter key. a > b
+ }
- // Now either both are the start key (in which case we need to keep
- // comparing the "start codes" to pick up the most recent region)
- // or neither is and we simply compare the start keys.
- do {
- if (a[i] != b[i]) {
- if (i == a_comma) {
- return -1003; // `a' has a smaller start key. a < b
- } else if (i == b_comma) {
- return 1003; // `b' has a smaller start key. a > b
- }
- return (a[i] & 0xFF) - (b[i] & 0xFF); // The start keys differ.
+ // Keys have the same length and have compared identical. Compare the
+ // rest, which essentially means: use start code as a tie breaker.
+ for (/*nothing*/; i < length; i++) {
+ final byte ai = a[i];
+ final byte bi = b[i];
+ if (ai != bi) { // The start codes differ.
+ return (ai & 0xFF) - (bi & 0xFF); // "promote" to unsigned.
}
- i++;
- } while (i < length);
+ }
+
return a.length - b.length;
}
View
@@ -58,10 +58,14 @@ public void testRegionCmp() {
assertGreater("table,a,,c,1234567890", "table,a,,b,1234567890");
// If keys are equal, then start code should break the tie.
assertGreater("table,foo,1234567891", "table,foo,1234567890");
+ // Make sure that a start code being a prefix of another is handled.
+ assertGreater("table,foo,1234567890", "table,foo,123456789");
// If both are start keys, then start code should break the tie.
assertGreater("table,,1234567891", "table,,1234567890");
// The value `:' is always greater than any start code.
assertGreater("table,foo,:", "table,foo,9999999999");
+ // Issue 27: searching for key "8,\001" and region key is "8".
+ assertGreater("table,8,\001,:", "table,8,1339667458224");
}
/** Ensures that {@code sa > sb} in the META cache. */

0 comments on commit 07065fd

Please sign in to comment.