You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a tablet server dies, all of the tablets it was serving are assigned its active WALs. See #537 for a more detailed description. Therefore its possible that a clean tablet (had no data in memory when tserver died) is assigned WALs for recovery. Its also possible that the clean tablet only has a compaction finish even in the WAL. However there is checking in the recovery code that checks for only a compaction finish event and flags it as an error. This is not an error. This check made sense before 1.8.0 when WALs were tracked per tablet, because in this case a lone compaction finish event should not happen. After 1.8.0, this check needs to be reconsidered. Discovered this issue as a result of looking into and discussing #535 with @ctubbsii.
The text was updated successfully, but these errors were encountered:
There two changes in this patch. First, removed a sanity check from the code
that resulted in false positives. Second, changed recovery code to use last
compaction finish event for recovery seq #.
In order to test this a modification to Accumulo's data loss test suite was
made to pause ingest. The problems in this issue are unlikely to be seen with
non-stop ingest. To see these problems, the ability to randomly pause ingest was added in apache/accumulo-testing#15.
When a tablet server dies, all of the tablets it was serving are assigned its active WALs. See #537 for a more detailed description. Therefore its possible that a clean tablet (had no data in memory when tserver died) is assigned WALs for recovery. Its also possible that the clean tablet only has a compaction finish even in the WAL. However there is checking in the recovery code that checks for only a compaction finish event and flags it as an error. This is not an error. This check made sense before 1.8.0 when WALs were tracked per tablet, because in this case a lone compaction finish event should not happen. After 1.8.0, this check needs to be reconsidered. Discovered this issue as a result of looking into and discussing #535 with @ctubbsii.
The text was updated successfully, but these errors were encountered: