Skip to content

Failed to restart cluster with old "untouched" tablespaces #754

@diegosalvi

Description

@diegosalvi

BUG REPORT

  • Please describe the issue you observed:

I've a long running cluster with many tablespaces. Some of them aren't really used (no transactions, no datachange). They are checkpointed every 15 minutes with the same log position (no new changes).
After a reboot unused unchanged tablespaces couldn't boot such tablespaces anymore, struck in a recovery/fail loop

21-07-26-14-29-46       herddb.cluster.BookkeeperCommitLog Jul 26, 2021 2:29:46 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: Actual ledgers list:LedgersInfo{activeLedgers=[4072702], firstLedger=374, zkVersion=51} tableSpace q73 (e415f271141e4420a622eeade68b30d1)

21-07-26-14-29-46       herddb.cluster.BookkeeperCommitLog Jul 26, 2021 2:29:46 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: recovery from latest snapshotSequenceNumber:(4072702,2) tableSpace q73 (e415f271141e4420a622eeade68b30d1), node pulcino, fencing false

21-07-26-14-29-46       herddb.cluster.BookkeeperCommitLog Jul 26, 2021 2:29:46 PM herddb.cluster.BookkeeperCommitLog isRecoveryAvailable
INFO: Actual ledgers list:LedgersInfo{activeLedgers=[4072702], firstLedger=374, zkVersion=51} tableSpace q73 (e415f271141e4420a622eeade68b30d1)

21-07-26-14-29-46       herddb.cluster.BookkeeperCommitLog Jul 26, 2021 2:29:46 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: Ledger 4072702: CLOSED {0=[rzv-es03-mass.rozzano.diennea.lan:3181]} created by pulcino, LastEntryId 145,768 Length 2,915,380

21-07-26-14-29-46       herddb.cluster.BookkeeperCommitLog Jul 26, 2021 2:29:46 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: Tablespace q73 (e415f271141e4420a622eeade68b30d1), Recovering from ledger 4072702, first=2 lastAddConfirmed=145768 written by pulcino

21-07-26-14-29-46       herddb.cluster.BookkeeperCommitLog Jul 26, 2021 2:29:46 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: q73 (e415f271141e4420a622eeade68b30d1) From entry 2, to entry 1,002 (0 %) read time 16 ms

21-07-26-14-29-46       herddb.cluster.BookkeeperCommitLog Jul 26, 2021 2:29:46 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: q73 (e415f271141e4420a622eeade68b30d1) From entry 1,003, to entry 2,003 (0.687 %) read time 16 ms

21-07-26-14-29-46       herddb.cluster.BookkeeperCommitLog Jul 26, 2021 2:29:46 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: q73 (e415f271141e4420a622eeade68b30d1) From entry 2,004, to entry 3,004 (1.373 %) read time 15 ms
  
21-07-26-14-29-46       herddb.cluster.BookkeeperCommitLog Jul 26, 2021 2:29:46 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: q73 (e415f271141e4420a622eeade68b30d1) From entry 3,005, to entry 4,005 (2.06 %) read time 15 ms

21-07-26-14-29-46       herddb.cluster.BookkeeperCommitLog Jul 26, 2021 2:29:46 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: q73 (e415f271141e4420a622eeade68b30d1) From entry 4,006, to entry 5,006 (2.747 %) read time 15 ms

21-07-26-14-29-46       herddb.cluster.BookkeeperCommitLog Jul 26, 2021 2:29:46 PM herddb.cluster.BookkeeperCommitLog recovery
INFO: q73 (e415f271141e4420a622eeade68b30d1) From entry 5,007, to entry 6,007 (3.434 %) read time 15 ms

21-07-26-14-29-46       herddb.cluster.BookkeeperCommitLog Jul 26, 2021 2:29:46 PM herddb.cluster.BookkeeperCommitLog recovery
SEVERE: Fatal error during recovery of q73 (e415f271141e4420a622eeade68b30d1)
org.apache.bookkeeper.client.BKException$BKReadException: Error while reading ledger
        at org.apache.bookkeeper.client.BKException.create(BKException.java:62)
        at org.apache.bookkeeper.client.PendingReadOp.submitCallback(PendingReadOp.java:640)
        at org.apache.bookkeeper.client.PendingReadOp$LedgerEntryRequest.fail(PendingReadOp.java:171)
        at org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.sendNextRead(PendingReadOp.java:393)
        at org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.logErrorAndReattemptRead(PendingReadOp.java:436)
        at org.apache.bookkeeper.client.PendingReadOp.readEntryComplete(PendingReadOp.java:582)
        at org.apache.bookkeeper.proto.PerChannelBookieClient$ReadCompletion$1.readEntryComplete(PerChannelBookieClient.java:1836)
        at org.apache.bookkeeper.proto.PerChannelBookieClient$ReadCompletion.handleReadResponse(PerChannelBookieClient.java:1917)
        at org.apache.bookkeeper.proto.PerChannelBookieClient$ReadCompletion.handleV3Response(PerChannelBookieClient.java:1892)
        at org.apache.bookkeeper.proto.PerChannelBookieClient$3.safeRun(PerChannelBookieClient.java:1447)
        at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:834)


21-07-26-14-29-46       herddb.cluster.BookkeeperCommitLog Jul 26, 2021 2:29:46 PM herddb.cluster.BookkeeperCommitLog close
SEVERE: closed
  • What did you do?
    Fortunately in this case data wasn't needed anymore, i just drop manually tablespaces (just to avoid log pollution and log size increase)

  • What did you expect to see?
    The server boots the unchanged tablespaces

  • What did you see instead?
    The activator still attempted to boot such tablespaces, every time failing ad some point during "recovery" with a missing ledger.

Please take note that there was any need of any recovery: all data was fully checkpointed with no pending action or transactions in the log.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions