Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding metadata tablet transaction logging #3724

Closed
ivakegg opened this issue Aug 25, 2023 · 3 comments
Closed

Consider adding metadata tablet transaction logging #3724

ivakegg opened this issue Aug 25, 2023 · 3 comments
Assignees
Labels
enhancement This issue describes a new feature, improvement, or optimization.

Comments

@ivakegg
Copy link
Contributor

ivakegg commented Aug 25, 2023

There have been several tickets that have been created to try and help avoid the situation where the metadata gets out of sync with the a tablet's in-memory data file manager. This can have significant consequences when this happens with a metadata tablet. Fixing the root issues are being worked on in those other tickets. This ticket is an attempt to create some fault-tolerance in the code.

One possiblity is for the Tablet to capture the transactions that are submitted against the metadata in a separate log. When the in-memory becomes out of sync with the metadata, we can

  1. output the set of transactions since the last known point of consistency to the error log
  2. replay the log using separate code to determine what the true state should be
  3. optionally try and repair either the metadata or the in-memory data file manager as needed
@ivakegg ivakegg added the enhancement This issue describes a new feature, improvement, or optimization. label Aug 25, 2023
@ivakegg ivakegg self-assigned this Aug 25, 2023
@ivakegg
Copy link
Contributor Author

ivakegg commented Aug 25, 2023

Capturing the discussion from the accumulo channel in the ASF slack:

Ivarator
18 hours ago
After many discussions, I am now thinking along the lines of some fault tolerant coding here. I think the metadata/in-memory checker should keeps a transaction log on the side. Whenever things are determined to be consistent it can essentially clear that log. However when it finds an inconsistency, it can essentially replay that log against what it last knew to be correct to determine what the state of affairs should be. Then the appropriate reparations can be made.

HP
18 hours ago
It would be cool if it could be smart enough to log results of the decision like "Metadata still had a reference to a file we successfully major compacted" which is basically what we've had to manually search for to confirm we haven't lost data.
Still chewing on potential edge cases like when splits / merges occur...

Ivarator
18 hours ago
agreed.... messages to help pinpoint the error.

Keith Turner
18 hours ago
One way to implement this would be to have a per tablet set of files that were compacted away. This set would be cleared on each successful consistency check. Unsuccessful checks could use the set to compute better messages.

Ivarator
18 hours ago
I expect files that are compacted away would certainly be part of this transaction log.

Ivarator
18 hours ago
I will work this tomorrow I think. In the meantime I look forward to any headway on #3721

ctubbsii
😷 18 hours ago
A per-tablet set of files that were compacted away will complicate the no-chop merges feature, because a file might get compacted due to one range entry, but might need to stick around with a different range that has yet to be compacted.

ctubbsii
😷 18 hours ago
That could be mitigated by forcing the compaction code to always compact all ranges for a given file, whenever that file is included in a compaction.

Keith Turner
18 hours ago
For that case, it would probably key the set on file+range like it does when tracking the tablets current files

ctubbsii
😷 18 hours ago
Yeah, key on StoredTabletFile. A lot of those internal structures were changed in 3.0, though. I'm not sure if this idea would target 2.1 or 3.1. If it's targeting 2.1, it may look very different in 3.1 (and because 3.0 is non-LTM, it wouldn't have the feature at all).

Chris Shannon
4 hours ago
If we tracked the set of files based on StoredTabletFile I don't think it would be too much different with the no-chop-merge changes. The internal structure was changed in 3.0 but both 2.1 and 3.0 still only rely on the path as the key for uniqueness (well really it's the file metadata which is just the path.) With my no-chop-merges branch and changes coming we add the concept of a range to go along with the path to determine a unique file but that is transparent and internal to the StoredTabletFile structure (technically it's part of all TabletFiles). So if we just use StoredTabletFile object when tracking in the set then I would think any code written would work for the most part without too many changes.

Chris Shannon
4 hours ago
One of the benefits of treating each combination of path + range as a unique file (vs the original attempt of having one file track a collection of ranges) is most of the code just "works" transparently. Any code that uses a tablet file doesn't really care the paths are the same with different ranges, it just treats them as unique files so things like splits, compactions, etc just work without much modification and i would expect something similar here (edited)

ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 25, 2023
* Created a transaction log for the datafile manager
* Added dumping of the log when metadata differences are detected
ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 26, 2023
* Added configuration for the max log size
* Added configuration for the action to take when out of sync
* Created intial handler function
ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 26, 2023
* Added propery tests
* Updated documentation
ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 27, 2023
* Created an enumeration for the table.operation.log.recovery.action's
* Completed the implementation of the metadata/memory updates
ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 27, 2023
* Use enum in descriptions
ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 27, 2023
* Use enum in descriptions
ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 27, 2023
* Added transaction log tests
* Filled out some hashCode and equality methods
ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 27, 2023
ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 27, 2023
@dlmarion
Copy link
Contributor

If the issue found in #3721 is found to be the cause of the "in-memory differ" issue that has been happening in 2.1.x, then I'm not sure that we would need to add #3727 .

ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 28, 2023
* Comments and parameterized log messages
ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 28, 2023
ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 30, 2023
* Reverted to just logging
ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 30, 2023
* Added more javadoc
ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 30, 2023
ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 30, 2023
handle non-positive configuration value
ivakegg added a commit to ivakegg/accumulo that referenced this issue Aug 31, 2023
* Disable use of the log if the size is 0
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 1, 2023
* avoid array copies by using a ring buffer
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 5, 2023
* Updated to always log even if no transactions
* Updated to clear the log when everything is consistent
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 6, 2023
* Updated to log expected as well as current
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 18, 2023
review comment updates
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
* Removed unused methods
* Verified all modifications are synchronized
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
Cleaner output of Optional members
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
* Comments and parameterized log messages
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
* Reverted to just logging
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
* Added more javadoc
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
handle non-positive configuration value
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
* Disable use of the log if the size is 0
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
* avoid array copies by using a ring buffer
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
* Updated to always log even if no transactions
* Updated to clear the log when everything is consistent
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
* Updated to log expected as well as current
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
review comment updates
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
* Removed unused methods
* Verified all modifications are synchronized
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
Cleaner output of Optional members
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
* Added a test to ensure safety between read and write threads
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 19, 2023
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 29, 2023
* add a property to disable/enable the feature
* More testing to ensure proper thread safety
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 29, 2023
* Lazy creation of the transactions to avoid creation when disabled
* enable/disable testing
ivakegg added a commit to ivakegg/accumulo that referenced this issue Sep 29, 2023
* Updated to use synchronization instead of an update count
@ctubbsii ctubbsii linked a pull request Oct 20, 2023 that will close this issue
@ctubbsii
Copy link
Member

Closing for the same reason as the linked PR was closed: #3727 (comment)

@ctubbsii ctubbsii closed this as not planned Won't fix, can't repro, duplicate, stale Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This issue describes a new feature, improvement, or optimization.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants