-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds admin check for dangling fate references #4686
Conversation
Added the ability to check for tablets that reference fate operations that are no longer active. This was added to the `accumulo admin checkTablets` command. A unit test was added that validates the algorithm and the extraction of fate ids from tablet metadata. Manual testing was done to validate end to end functionality. For manual test the following command were run in the shell ``` grant Table.WRITE -t accumulo.metadata -u root insert 1< srv opid SPLITTING:FATE:USER:dfdb85a6-65a0-47d2-a9e2-4c671b499829 ``` and then the following was run ``` $ accumulo admin checkTablets *** Looking for offline tablets *** Scanning zookeeper Scanning accumulo.root Scanning accumulo.metadata 1<< is UNASSIGNED #walogs:0 *** Looking for missing files *** Scanning : accumulo.root (-inf,~ : [] 9223372036854775807 false) Scan finished, 0 files of 2 missing Scanning : accumulo.metadata (-inf,~ : [] 9223372036854775807 false) Scan finished, 0 files of 0 missing *** Looking for dangling fate operations *** FATE:USER:dfdb85a6-65a0-47d2-a9e2-4c671b499829 1<< Found 1 dangling references to fate operations ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command description may need to be updated:
@Parameters(commandDescription = "print tablets that are offline in online tables")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 61447d8. While looking at the command as a whole noticed some minor issue w/ the existing code and fixed those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should an option be added to the command to do something about these dangling fate ops similar to
@Parameter(names = "--fixFiles", description = "Remove dangling file pointers")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not want to do that now because for something like merge or split the tablet could be in a really bad state so removing the reference does not make things better. Thinking its best to find these for now if they exists and try to find the cause. If the cause is a bug in accumulo, then the bug needs to be fixed and cleanup considered as part of that bug fix..
@@ -93,4 +115,94 @@ public void testCannotQualifySessionId() { | |||
EasyMock.verify(zc); | |||
} | |||
|
|||
@Test | |||
public void testDanglingFate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice test. Includes test where all are inactive fateIds, a mix of active and inactive, and tests the race condition. Could potentially add test where all are active but not really necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the unit test in 987f699
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Verified new test passes, verified end-to-end test with no dangling fate ops and 1 dangling fate op:
*** Looking for offline tablets ***
Scanning zookeeper
Scanning accumulo.root
Scanning accumulo.metadata
*** Looking for missing files ***
Scanning : accumulo.root (-inf,~ : [] 9223372036854775807 false)
Scan finished, 0 files of 2 missing
Scanning : accumulo.metadata (-inf,~ : [] 9223372036854775807 false)
Scan finished, 0 files of 0 missing
*** Looking for dangling fate operations ***
Found 0 dangling references to fate operations
*** Looking for offline tablets ***
Scanning zookeeper
Scanning accumulo.root
Scanning accumulo.metadata
*** Looking for missing files ***
Scanning : accumulo.root (-inf,~ : [] 9223372036854775807 false)
Scan finished, 0 files of 1 missing
Scanning : accumulo.metadata (-inf,~ : [] 9223372036854775807 false)
Scan finished, 0 files of 0 missing
*** Looking for dangling fate operations ***
FATE:USER:dfdb85a6-65a0-47d2-a9e2-4c671b499829 1<<
Found 1 dangling references to fate operations
Added the ability to check for tablets that reference fate operations that are no longer active. This was added to the
accumulo admin checkTablets
command.A unit test was added that validates the algorithm and the extraction of fate ids from tablet metadata. Manual testing was done to validate end to end functionality.
For manual test the following command were run in the shell
and then the following was run