Please sign in to comment.
Fix a critical bug in the META cache.
A custom comparator is needed to maintain a local cache of META. A META entry is essentially made of 3 comma-separated fields: [table],[start-key],[start-code] The comparator needs to essentially do 3 memcmp, to first sort by table name, then by start-key, and finally by start-code for breaking ties. The comparator had a bug such that if the key being looked up contained a comma, and if the key being looked up was a prefix of a start-key, and if the byte following the comma was less than '1', then the META cache would incorrectly sort the key being looked up and find the wrong region. Here's an example: Key being looked up: "foo,\002" Region hosting that key: "table,foo,1234567890" META cache lookup key: "table,foo,\002,:" Then in this case the code was incorrectly comparing '1' with 0x02, and because 0x02 < '1' it would incorrectly sort the search key to be before that region. The flaw in the logic came from the fact that the code was trying to compare both the start keys and, if necessary, the start codes in the same loop. But we need to make sure that we actually do the equivalent of 3 memcmp, so that we can check for possible prefix-only matches for each of the 3 fields (table name, start key, start code). This bug doesn't happen frequently due to the 3 conditions that are required to trigger it, but when it does get triggered, its visible consequence would be that some RPCs are failing without even being sent because the client retries to do META lookups without finding the right region to send the RPC to. In particular note that the RPC does not get sent to the wire at all. The RPC would simply quickly fail with a "NonRecoverableException: Too many attempts". This closes #27. Change-Id: I4994cb3535181397e185d5a8d8a2fa5beb16bff9
- Loading branch information...
Showing with 26 additions and 18 deletions.